Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact...

27
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [German National Licence 2007] On: 16 July 2010 Access details: Access Details: [subscription number 912873516] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37- 41 Mortimer Street, London W1T 3JH, UK Cybernetics and Systems Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713722751 A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well- Separated Clusters', Cybernetics and Systems, 3: 3, 32 — 57 To link to this Article: DOI: 10.1080/01969727308546046 URL: http://dx.doi.org/10.1080/01969727308546046 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Transcript of Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact...

Page 1: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [German National Licence 2007]On: 16 July 2010Access details: Access Details: [subscription number 912873516]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Cybernetics and SystemsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713722751

A Fuzzy Relative of the ISODATA Process and Its Use in DetectingCompact Well-Separated ClustersJ. C. Dunn

To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters', Cybernetics and Systems, 3: 3, 32 — 57To link to this Article: DOI: 10.1080/01969727308546046URL: http://dx.doi.org/10.1080/01969727308546046

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Page 2: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

Journal of Cybernetics. 1974, 3, 3, pp. 32-57

A Fuzzy Relative of the ISODATA Processand Its Use in Detecting Compact

Well-Separated Clusters

J. C. DunnDepartment of Theoretical and Applied Mechanics.

Cornell University

Abstract

Two fuzzy versions of the fc-means optimal, least squared error partitioning problem are formulated forfinite subsets A' of a general inner product space. In both cases, the extrcmizing solutions are shown to befixed points of a certain operator T on the class of fuzzy A-partitions of X, and simple iteration of Tprovides an algorithm which has the descent property relative to the least squared error criterion function.In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of A' and theassociated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, inthe second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new;when X consists of k compact well separated (CWS) clusters, Xt, this algorithm generates a limitingpartition with membership functions which closely approximate the characteristic functions of the clustersXj. However, when X is not the union of k CWS clusters, the limiting partition is truly fuzzy in the sensethat the values of its component membership functions differ substantially from 0 or 1 over certain regionsof .V. Thus, unlike ISODATA, the "fuzzy" algorithm signals the presence or absence of CWS clusters in A'.Furthermore, the fuzzy algorithm seems significantly less prone to the "cluster-splitting" tendency ofISODATA and may also be less easily diverted to uninteresting locally optimal partitions. Finally, for datasets .V consisting of dense CWS clusters embedded in a diffuse background of strays, the structure of X isaccurately reflected in the limiting partition generated by the fuzzy algorithm. Mathematical arguments andnumerical results are offered in support of the foregoing assertions.

1. Introduction

Computer implemented partitioning algorithms have proved to be useful in many areas ofapplied science, beginning with Sneath's work on bacteriological taxonomy [ 1 ] and extend-ing over a wide range of unrelated fields, including psychology, sociology, geology,medicine, experimental particle physics, operations research, and the technology ofautomatic reading machines. Extensive bibliographies, good general overviews of partition-ing techniques, and applications can be found in [2]-[6].

The function of a partitioning algorithm is to detect natural subgroupings (clusters)within a large finite data set X of multidimensional vectors (patterns), relative to some givenquantitative measure of pairwise distance (or similarity) between the elements of X.However, since the specification of a distance measure does not by itself impose a uniqueinterpretation on the ambiguous phrase "natural grouping," it can happen that differentalgorithms working within the same metrical framework on X will nevertheless partition A'

32

Copyright ©1974 by Scripta Publishing Company.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 3: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 33

in different ways; in this sense, the organization which an algorithm "sees" in X isdependent upon the structure of the data and the structure of the algorithm. Our particularconcern in this paper is with algorithms that are sensitive to the existence of clusters in Xwhich are "compact and well separated" (CWS) relative to a given metric on X (in a sense tobe made precise in Sect. 2). Our objective is to devise an algorithm which signals thepresence or absence of CWS clusters in X and, in the former case, identifies thecharacteristic functions of these clusters.

The graph theoretic techniques employed by Sneath [11, Johnson [7], Zahn [41, Wishart[8]. and others, afford one possible avenue of approach to the detection of CWS clusters.However, here we follow a different line, which combines Zadeh's fuzzy set concept [9]with the criterion function approach to clustering. Ruspini appears to have been the first tosimgest this general scheme and to propose specific fuzzy criterion functions [10] andassociated algorithms [11] applicable to a very broad class of distance functions on X. Morerecently, Gitman and Levine [12] have also applied the theory of fuzzy sets to clusteringproblems. In the present investigation, we restrict ourselves to the case where A' is a finitesubset of a general inner product space K, and where the distance function on X is the innerproduct-induced metric. In this setting, we consider two of many possible differentiableextensions of the A-means squared error criterion function1 from the class of all "hard" (i.e.ordinary set theoretic) A-partitions of X to the class of all fuzzy A-partitions of X (Sect. 4),and derive corresponding properties of extremal points for these criteria (Sect. 4 and 5): ineach case, the extrema are necessarily fixed points of certain operators T on the class offuzzy A-partitions of X.

The first criterion function is of interest principally because simple iteration or thecorresponding operators T is essentially the well-known ISODATA clustering algorithm[13]. Hence our development amounts to a formal derivation of this algorithm. However, inthis case, the range of T consists for the most part of hard partitions irrespective of whetherCWS clusters are present in X. Consequently, the first criterion function and the associatedISODATA process are unsuitable for our purposes. When CWS clusters are present,experience indicates that the ISODATA process converges rapidly to a partition consistingof these clusters. However, when CWS clusters are not present, the process still converges tosome hard partition defined by "unequivocal" characteristic functions, and there is no wayto tell from a simple inspection of this limiting partition that its component subsets are inTact not CWS.2

The situation is different for the second criterion function, where the range of thecorresponding operators T consist essentually of fuzzy partitions. When CWS clusters arepresent in A', it appears that there is always an extremizing fuzzy partition whosemembership functions closely approximate the characteristic functions of the clusters.Furthermore, numerical experiments indicate that simple iteration of T produces a sequenceof fuzzy partitions which converge rapidly to the extremizing partition from virtually allstarting guesses. On the other hand, when CWS clusters are not present, the iterates of Tconverge (more slowly) to some limiting partition which is truly fuzzy in the sense that thevalues of its component membership functions depart significantly from the hard limits 0and 1. The results of several specific numerical experiments are offered in support of these

1 Neither extension belongs to Ruspini's scheme.2 This objection can be levelled at any algorithm whose output consists exclusively of hard parlitions. In such cases it is

often necessary to make very extensive secondary computations in order to determine whether or not the sought-alterstructural property actually is present in the limiting partition.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 4: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

34 J.C. DUNN

contentions in Sect. 6. Of special interest are the results obtained for two planar CWSclusters embedded in a diffuse low density background of "strays," forming a connectingbridge and a halo (Fig. 3). The behavior of the algorithm in this case suggests that it may bequite effective when A* is a sample drawn from a mixture of unimodal probabilitydistributions. However, a further pursuit of this question and the related parameterestimation problem is not attempted here.

2. Compact Well-Separated Clusters

In this section, we introduce a parameter which provides a simple quantitative index ofseparation among the subsets of a partition of the data set X. This parameter has explicittheoretical significance for the A-means squared error criterion function approach toclustering, and is also meaningful for the fuzzy algorithm developed in Sect. 5.

For present purposes, we take A" to be a non-empty finite subset of an arbitrary realvector space V, and let d denote an arbitrary metric on V. We define set diameters and setdistances in the usual way relative to d, namely:

diam A = sup d(x,y)x,y £ A

dist (A,B) = inf d(x,y) . (\)X € Ay e B

Finally f (A) will denote the class of all partitions P = {JV,,.. . , Xk} of X into it disjointnon-empty subsets Xt, i.e.,

X; ? <£ (empty sot)

X,. fl Xj = <*> if ) (2)

k

U X: = X .1=1 '

Definition 1. The subsets Xj of a partition P in 9 (A) are said to be compact separated(CS) clusters relative to d if and only if they have the following property: for all p,q,r, withq^fcr, any pair of points .v.y in Xp are closer together (as measured by d) than any pair ofpoints n,v, with u in Xq and r in Xr.

For each fixed k, the existence or non-existence of a Ar-partition consisting of CS clustersis an intrinsic property of the pair [X.d] • Furthermore, this property is readily quantifiedas follows. For each P in ? (A) let,

min min dist(X , X'r)1< q<Zk 1< r<k

a(k,P) = rJtl . (3)max diam(X )

and let „(/,) = max a(k,P) . (4)

Then it is easily shown that X can be partitioned into A CS clusters relative to d if and onlyif <*(A) > 1. The CS property is sometimes regarded as an indispensable feature of anyintuitively acceptible concept of "cluster" based on metrics. Nevertheless, other apparentlyquite different definitions are possible and indeed, some of these appear to have greater

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 5: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 35

operational significance for the prototypical cluster-detecting apparatus of the human visualsystem [4].

For example, we might agree that, clusters should be recognized by the followingfundamental "connectivity" property [7]: the subsets X,- of P are connected clustersrelative to d if and only if for every x.y in Xp there exists a chain £ = {.v = £, £/ =

y } of elements in X connecting .v and y, such that the maximum edge length. , m a X </(£,•_,,

£,), of £ is less than the maximal edge length of any chain connecting // to r, with n in Xq, rin Xr and r^hq. On the face of it, this concept of cluster is quite different from the CSdefinition; interestingly enough, however, it is not difficult to prove that while subsets Xt

which satisfy this connectivity criterion relative to d are in general not CS relative to d, theyare always CS relative to a certain (ultra) metric cl induced by d, namely:

d(x,y) = min< max(

where the min operation is taken over all chains £ of arbitrary length / connecting .v and y.In short, the intuitive notion of cohesiveness based upon connectivity relative to (/ has analternative equivalent expression in terms of the CS concept relative to the induced metricd. This example argues for the fundamental nature of Definition 1, while at the same timepointing up the fact that we have been talking about structural properties of thepair j X,d} and not X alone.

We now limit ourselves to the class of metrics d induced by a norm on V, i.e., metrics ofthe form d (x.y) = II x —y II where II • II is any positive definite real function on V satisfyingII au II = I a I II u II for a real and u in V, as well as the triangle inequality II .v + .v II < II x II +IIy II [ 15] .3 In this narrower setting we introduce a more demanding separation index basedupon the distance between X{ and the convex hull CoXj of Xj in V, i.e., the smallest convexsubset of V containing Xj. The related concept of cluster is significant for our purposesbecause it is directly linked to the fundamental CS definition and to the A-means leastsquared error approximation problems considered later.

Definition 2. The subsets Xj of a partition P in ? (A.) are compact well-separated (CWS)clusters relative to d if and only if they have the following property: for all p.q.r, with q =£ r,any pair .v,.v with x in Xp and y in Co/Y;, are closer together as measured by d than any pairH.i1, with H in Xq and r in CoXr.

Once again, the existence or non-existence of a A-partition consisting of CWS clusters isan intrinsic property of {A'.c/} . Furthermore, when d is norm-induced, this property isreadily quantified as follows. For each P in 9 (A) let

min min dist(X ,CoXr)1< q<ik l<r<k

/3(fe,P) = • — (5)max diam(X )

3llltrametrics are excluded from this class (cf. 114]); in particular, the mini-max chain length metric d described aboveis not induced by any norm. Consequently, the methods described in the following sections are generally not applicable tothe detection of "connected" clusters.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 6: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

/«(/.:) = max fi(k,P) . (6)P

36 J.C. DUNN

and let

It is not difficult to prove that X can be partitioned into k CWS clusters relative to d if andonly if

/J(fi> > 1 . (7)

The proof depends on the fact that norm-induced distances between any .v in A',- and any yin CoXj cannot exceed diam (A",-): details are omitted in the interest of brevity.

In view of Eqs. (3) and (5), and the fact that A",- C CoXj, it follows that the subsets X, ofFare CWS clusters only if they are CS clusters; this justifies the terminology of Definitions 1and 2. Moreover, since the centroid Jv,- of A",- lies in CoXj, it readily follows that the A','s in Pare CWS clusters only if

d(x,xt) < dix,xj) (8)

for all x in A",- and for all /, /, with / =£/. Finally, as a consequence of Eq. (8) it follows thatthe Xfs in P are CWS clusters only if P is a fixed point of the ISODATA process; this resultestablishes a link between CS and CWS clusters and the A-means least squared errorapproximation problems which we now consider.

3. The A-Means Least Square Approximation Problem andISODATA

The presence of CWS clusters in X relative to some given metric d (or perhaps, anymember of some general class of metrics d) will be assumed an "interesting" structuralproperty. We then want algorithms which are capable of indicating the presence or absenceof CWS clusters relative to d, and in the former case, of identifying the characteristicfunctions of the clusters. Since the class 9 (A) is finite when X is finite we can, in principal,use brute force exhaustion to find the "most" interesting partition in 9 (A), namely, thepartition P' which solves

/3(/c,P') = fiih) = max 0 ( , P ) . (9)P

Once P' and )3 (A) are known, the problem is effectively solved. However, even formoderately large sets A", the number of elements in P (A) can be huge (being equal to

1 V / * \— 2/1 )(-l) 'in for 1 < A' < n; (cf. [2]), consequently, exhaustion is not feasible ink\ «"=iWgeneral. In fact, the mere calculation of P (k.P) is a decidedly nontrivial task for all but thesimplest problems. It follows that the separation parameter |3(A) and the correspondingmaximally separated partition P' which solves (9) are effectively inaccessible by astraightforward approach at this level of generality. On the other hand, an indirect heuristicapproach based upon criterion functions and associated approximation problems mayprovide useful algorithms, especially when d is not simply an arbitrary metric but has somegeometrical significance in V 3 X. Consequently, in the balance of this paper we make thefurther restriction that d is induced by a norm II* II on V which in turn is induced by an innerproduct < • I* > on V, i.e.,

1/9

d l x , y ) & \ \ x - y \ \ A ( x - y \ x - > ) J (10)

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 7: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 37

where < • I- > is any symmetric positive-definite bi-linear function from the Cartesian productV X V into R' [15]. This abstract formulation includes as a special case,

V = Rn

<x|y> = x%- (11)

d(x,y) = [(x - y)'M(x- y)f2

where M is an arbitrary symmetric positive definite n X ;; matrix, and the superscript /signifies the transpose operation. From the standpoint of clustering problems, threesub-cases are worth mentioning here, namely:

a) M=I = identity matrix,b) M'1 = d i a g j a i , . . . , o,,J, a,- = sample variance of the ith component of vectors x e X,c) M'1 = sample covariance matrix for vectors x e X.

In case a, the corresponding metric d in (11) is simply the prototypical Euclidean metricwhich is invariant under the sub-group of orthogonal transformations on R" 3 X; in case bthe metric in (11) is a weighted Euclidean metric invariant under the sub-group of scaletransformations on R" (i.e., non-singular linear transformations whose matrices are diagonal,relative to the natural basis, in R"); in case c the metric in (11) is a weighted Euclideanmetric invariant under the full linear group on R". For a further discussion of invarianceconsiderations for clustering problems see [2L [16], and [17].4

When d is inner product-induced and when .3 (k) is sufficiently large, it seems clear thatthe maximally separated partition P' solving (7) should also solve the following generalizedproblem.

k-Means Least Square Approximation (LSA) Problem

Let 2 denote the linear hull of X in V, i.e., 2 is the (finite dimensional) linear subspacespanned by the elements of X. Let r = { r, . . . , rk} denote an ordered A:-tuple of vectors in£, i.e., v is a general element in the fc-fold Cartesian product 2k of £ with itself. For eachP = {Xi,..., Xk} e 9 (A.) and each v e 2k put

k k k

J(P,v) = X I d(x,Vi)2 = I I | | x - i > , . | | 2 = I I ( x - « . | x - ( > , - > . ( 1 2 )

i=l xeXj i = 1 x e Xj i = 1 x e X

Find P' e ? (A) and v'e£k such that

J(P\v') = min min J(P,v) . (13)

Since d is inner product-induced, we always have v- = Jv,- = the unweighted mean, orcentroid of Xj. Furthermore, when (1 (A) is sufficiently large, computational experiencesuggests that the optimal partition P"m (13) can probably be found, and very efficiently, bythe following ISODATA Process [131:

step 1) Choose P e 9 (k);step 2) Compute the centroidsx,- of X( e P\step 3) Construct a new partition P according to the rule:

* The clustering algorithm proposed in [17] is essentially the ISODATA process applied to the metric in (11),corresponding to case c above.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 8: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

38 J.C. DUNN

x e X(. «-+ d{x,xt) = min d(x,x~j);

1step 4) If P = P, stop. Otherwise put P = P and go to step 2.

We note that step 3 is ambiguous if d (x.xj) does not have a proper minimum over 1 < / < A.Furthermore, it is possible (although not likely) that one or more of the sets X,• may beempty at the conclusion of step 3, in which event difficulties will arise after the loop backto step 2. For these reasons, working versions of ISODATA must incorporate "tie-breaking"rules which assign "centroids" to empty subsets A",- in step 2, and resolve ambiguities in step3 (a common procedure is to take /" to be the smallest index satisfyingd(x,x) = min <f(x,x)). Regardless of which tie-breaker rule is used, it can be shown that

the sequence of partitions /*(") and the corresponding sequence of centroids {*(")} ={*,<") , *,,(")} has the descent property relatively to J, i.e., /(/<" + ' ) , *<" + ')) <

J(P("\x("))t for all /; (see Theorem 2).A partition P e 9 (A) is culled a fixed point of ISODATA if and only if the derived

partition P in step 3 is identical to P. In the next section, we explore the relationshipbetween extremal points for the A-mcans LSA problem and the fixed points of ISODATA.Here we observe that every partition P consisting of CWS clusters is necessarily a fixed pointof ISODATA, in view of Eq. (8). As we have noted elsewhere, these considerations motivateDefinition 2 and provide explicit links between the A-means LSA problem, the ISODATAprocess, and the general notion of separated clusters in X.

Unfortunately, it is not difficult to produce simple examples in Rl and R2 whereISODATA has many attracting fixed points which are not solutions of (9) and/or of (13)(see Sect. 6). Experience also indicates that the domains of attraction of these spuriousfixed points can become quite large as |3 (A) decreases, leading to the so called"cluster-splitting" tendency of ISODATA [2]. In short, as 0 (A) decreases, the desiredrelationship between the maximally separated solution of (9) and the attracting fixed pointsof ISODATA tends to disintegrate. Furthermore, as noted in the introduction, one cannottell whether this is actually happening from an inspection of the hard partitions generatedby ISODATA. Experience indicates that ISODATA converges to some hard partition,irrespective of the value of /HA). However, since /3 (A) is effectively not computable fornon-trivial X. we have no way of knowing the true relationship between this limitingpartition and the maximally separated solution of (9). For this reason, ISODATA is aneffective technique for identifying CWS clusters in X only if.one knows in advance that suchclusters are actually present; without such a priori knowledge, inferences drawn fromISODATA partitions can be very dangerous.

To avoid this difficulty, we shall base our approach to the detection of CWS clusters uponthe following mathematical device. We embed the class 9 (A) of hard A-partitions in the largerclass ff (A) of fuzzy A-partitions. Among the class of- all possible extensions of the LSAcriterion function J from ? (A) to 5\ (A), we then seek one such extension which meets thefollowing conditions:

a) As P (A) increases beyond 1 the membership functions of the fuzzy partition whichminimizes ./ over ff (A) should closely approximate the characteristic functions of thesolution of (9).

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 9: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 39

b) As (3 (A) decreases below 1 the extremizing partitions for J should becomeincreasingly fuzzy in the sense that the values of their membership functions departsignificantly from 0 or 1, on certain subsets of A'.

c) There should exist a simple and efficient algorithm for computing the extremizingfuzzy partitions for J.

In the following sections we consider two of the infinitely many possible fuzzyextensions of J. The first and most obvious extension leads directly back to a formalderivation of the ISODATA process, and consequently does not advance our purposes.However, the second extension apparently does satisfy conditions a through c above.

4. Fuzzy Embedding I

In carrying out the embedding described in the previous section, we note that by virtue ofcondition (2) 9 (k) is isomorphically represented by the class of all functions (/(•) ={ « , ( • ) , . . . , " * ( • ) ] :X->Rk such that

1) for all /, 1 < / < k, there is some x e X such that H,(.Y) =£ 0,2) for all /", I < / < A, and for all .v e X, »,(.v) = 0 or 1, (14)

*3) for all x- e X, J uAx) = 1.

i = i

The isomophic correspondence is obtained by simply identifying »,(•) with the character-istic function of the /th subset X,- of a given partition P e 9 (k). From now on, we willtherefore say that any function «(•): A"-*Rk satisfying (14) is a (hard) A-partition of X andwe denote the class of all such functions by the same symbol 9 (k) used previously with aset theoretic connotation, i.e.,

9{k) = Iu(-):X-.R* | u(-) satisfies (14)1. (15)

The larger class of all fuzzy A-partitions of A" is now defined as follows:

9f(k) = |u(.):X-.ty c Rk\

Uf = \u e Rk | 0 £ u, < 1, 1 £ ( i k; £ u; =' I < = l '< = l

The component functions »,(•) of a fuzzy partition ;/(•) are called the membershipfunctions of the partition (see [9] for the origins of fuzzy set theory).

Evidently, 9f(k) D 9 (A). In fact, since condition 1) of (14) is not carried out in (16),9j-(k) includes degenerate Ar-partitions containing one or more empty subsets Xj\ for tech-nical reasons, this is a convenience in the development of Theorems I and 2 to follow. If9 (A') denotes P (A) + all degenerate hard A'-partitions (condition 1 omitted), then fPy (k) issimply the convex hull of 9 (A) in the linear space of functions (/(•): X -* Rk. We nowobtain our first extension of the criterion function J from 9 (A) to 9y (A).

Let «,(•) denote the characteristic functions of A",- e P. Then in view of Eq. (12), we have*

J(P,u) = J1(u(-),f) = £ X «,•

1=1 xc X

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 10: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

40 J.C. DUNN

If u(') is now permitted to range over all of 9f(k) D$ (k), formula (17) defines a continuousand differentiable extension / , of J from 9 (k) X £k tofPy-(A) X £*. Corresponding to thisextension we have the following problem.

Relaxed k-Means Least Square Approximation Problem I

Find M'('.)e!Py(A.-)and v'eS.k such that

Jj(u'(-), v') = min min J j ( u ( - ) , f ) . (18)u ( ) c < P / ( f t ) v e £ k

Tlie optimal solutions of this problem are characterized in the following theorem.Theorem 1. If u'(-) e £Py (k) and v' e 2k solve (18), then the following conditions must

hold:

a. For each fixed .v e A', let

f = min ||x - u'||

/c = II i i <, k\ \\x - v';\\ > £| = complement of / in 1 i / i fe.

Then / e /c =» uj(x) = 0

and XujOt) = 1.i f /

In particular, if II .v — vy II has a proper minimum over I < /' < k, then / consists of a singleinteger i and we must have

JO i y i

(1 1=1

b. For all /, 1 < i < A:, there is some x e X such that //,' (.v) ¥= 0, i.e., the optimalA-partition //'(•) e fj- (A) always consists of A non-empty fuzzy sets.

2 «;<*>•*,. x f X

y. =

Proof.

a) If »'(•) and r'satisfy (18) then in particular,

J^u'* • ) , ( / )< Jjd/C •),!;') (19)

for all »(•) e 9f (A). In view of (16) and (17), condition (19) holds if and only if

(20)2

for all .v e A", or equivalently.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 11: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 41

X«,'<x>ll*- Wll2 = min j > , . | | x - i>-||2 (21)i=l w e U/ ' = 1

for all .v e X. With reference to (16) we have

1 = 1 ' ' \icl y ' ielc '

where the strict inequality holds if and only if u1,- > 0 for some / e Ic. It follows from (22)

that «(•) satisfies (20) if and only if»/(.v) = 0. f o r / e / c , and2 u'jlx) = 1.i el

b) Suppose that

u\(x) = 0 a l lx£ X . (23)

Since there are A vectors i1,' and since 1 ̂ A < N = number of elements in X, there is at leastone element of A", say x, such that

| | x - D,'|| > 0 a l l / / / . (24)

Put v" = (v'{,. . . , i'£), with

" ' • | f W ! • - ! • ( 2 5 )

We have r" e £* and, in view of (17), (23), and (25)

Thus(«'(•), >'") is also an optimal solution of (18) and consequently must satisfy condition aof this theorem, since IIx - vj I! has a proper minimum over I < / < A at / = /, because of(24) and (25), we must have uj (.v) = 1, which contradicts (23).

c) If (»'(•), >') satisfies (18), then, in particular,

Jjd/'t •),!>') < J^u't-Ku) all v e £* (26)

i.e., r ' must minimize the quadratic function

i = l xe X i = l « X

on the linear space £*•. Furthermore, since u\ (x) > 0, g is positive semidefinite.Consequently, r ' minimizes g if and only if g is stationary at r \ i.e. if and only if thedirectional derivative Dg(v\ w) vanishes for all u> e2k. By definition, we have

Dg(v',w) £ —g(v' + hw)dh

= X X u](x) — (x - v[ - hw^x - v\ - hwt)

k— 9 V V ' f - w - ' i ' \

i=l xc X

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 12: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

42 J.C. DUNN

k

1=1 JiX

Thus )•'satisfies (26) if and only if

< £ u'Ax)(x - vp\Wf) = 0 for all w e £* . (27)

But (27) holds for arbitrary H' e£k if and only if

V «;(x)(x - v';) = 0 1 < i < ft

i.e. r'satisfies (26) if and only if

\x " x / xfX

Finally, with reference to part /; of this theorem, we therefore obtain

Corollary 1. Suppose that (»'(•), v') is an optimal solution of the relaxed A-means LSAproblem (18) and also satisfies condition (a): for all .v e X, II x — vf II has a proper minimumover 1 < / < k. Then u'(') is a hard partition (i.e.,«'(-) e fftfc) C f((k), v] = centroid of thesubset X\ C X with characteristic function u\ (•). and (;/'(•), v') is an optimal solution ofthe original A'-mcans LSA problem (13) over? (A) X £*.

Corollary 2. Tlie following condition defines a non-empty class J , of operators T:9f(k)-*?f(k).

If «(•) denotes T(n(-)) = image of »(•) under T, then for some >' e 2k satisfying condition

1 - ' - k

»(•) must satisfy

SAx) = 0 llx - u,|| ^ min |U - o.ll

A fuzzy partition /('(•) is part of an optimal solution ((/'(•), r ') of (18) only if »'(•) is a fixedpoint of some T e CS\. i.e.,

«'(•) = T(u'( •)) for some T e J j . (28)

In addition, »'(•) must also satisfy

«;<•) t 0 l < / < f c . (29)

. L

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 13: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 43

Corollary 1 suggests that the characteristic functions of an optimal hard partition for theoriginal A-means LSA problem are likely to differ from the membership functions of anoptimal partition for the relaxed problem only on relatively "small" subsets of X, and inmany cases, the two solutions will coincide everywhere on X. Furthermore, condition (j3) ofCorollary 2 suggests that we may be able to approximate optimal solutions of the relaxedproblem by iterating some operator in 'St, i.e. by implementing the recursion

u<"!+1)(.) = T(u)(m)(-) 0 < m < ~ T e 3 \ (30)

for some T in J , and for some initial guess »°(-) in 9f (k). Indeed, we can now see that theISODATA process described in Sec. 3 is essentially equivalent to (30). In general, the class3\ contains infinitely many operators T. However, all of these operators coincide on thesubclass of fuzzy partitions »(•) satisfying (29) and condition (a) of Corollary 1. In thissubclass, T(u(')) is always a hard partition uniquely prescribed by condition (/3) of Corollary2. Outside this subclass, Theorem 1 does not suggest a unique value for 7X»(*)),S and thetie-breaking rules employed in working versions of ISODATA (cf. Sect. 3) are merely ad hocconventions for keeping T(u(-)) in the class ? (A) of hard partitions with k non-emptysubsets, and for resolving ambiguities in condition ((3). However, regardless of whichtie-breaking rule is employed, the following result shows that ISODATA always has thedescent property relative to . / , .

Theorem 2. For arbitrary fixed T e St let {//'"(*)} denote the sequence of partitions in!?/ (k) generated by (30), and let {)•('")} denote the corresponding sequence in £*•associated with T and »'"(•) via condition (|3). Then for all in > 0,

J1(u<m+1>(-),i>(m + 1)) < J1(u(m>(-),y(m)) .

Proof. As an immediate consequence of the reasoning employed in parts a and c in theproof of Theorem I, we have

J^u*"1*1^-),^) = min J1(u(-),i'm)

and J1(u(m + 1)(.),t;<'" + 1))= min J1(u

(m + 1)( •), u).v e Zk

T h u s Jl(um(-),vm)~>. J j ( u ( m + 1 ) ( - ) , D m ) > J 1 ( u ( m + 1 ) ( - ) , t > ( ' " t l ) ) . Q E D

In conclusion, we observe that the conditions of Theorem 1 are necessary, but in generalnot sufficient for (global) optimality. Thus, the operators in 3", may have many non-optimalfixed points and some of these may have substantial zones of attraction for the iterates of(30). This is indeed the case. As noted elsewhere, the ISODATA versions of (30) apparentlyalways converge to some hard partition. However, this limiting partition may vary with thestarting guess H°(-) , and need not be globally optimal. Behavior of this sort is apparentlyexacerbated by the absence of CWS clusters in X.

5. Fuzzy Embedding II

We now consider a second extension of the criterion function J which apparently doessatisfy the condition set forth at the end of Sect. 3.

s In analogy with a situation which arises for "singular" extremals in optimal control theory.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 14: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

44 J.C. DUNN

Let //,-(• )denote the characteristic functions of Xt e P; then in view of Eq. (12) we have

J(P,v) = JJLu(.),v) k X X «,-2<*)ll*- vi\i -1 i t X

= X 2 ui2{x) {x - vi I x - "<• > •i - 1 x eX

(31)

If u(') is once again permitted to range over all of 9{(k) D 9(k), formula (31) defines anothercontinuous and differentiable extension J2 of J from 9(k) x £*to 9f(k) x £*. Correspondingto this extension we have the following problem.

Relaxed k-Means Least Square Approximation Problem II

Find u'(-) e 9f(k) and V e £k such that

J2(u'( •), vl = min min J2(«(-), i>) . (32)u(-) e <?/(k) v € £*

The optimal solutions of this problem are characterized in the following theorem.Theorem 3. If u'(-) e 9f(k) and u' e 2k solve (32), then the following conditions must

hold:

a. For each fixed .v e X, let

/ = I U I S /J|I>; = x\

la = {I &. i < k\v'; ft x\ = complement of / in 1 < i < k.

case 1) Suppose / = 0 = empty set. Then

v i |x - »;II2

uUx) = 1 < i < k

case 2) Suppose / =£ 0. Then

i £ r => «;(.x) = o

and y ,, , _ |

In particular, if / consists of a single integer i, then

i 1 i = iu'Ax) =

0 »> i

6. For all /, I < / < A:, there is some x e X such that ;/,' (.v) ^ 0, i.e., the optimalA-partition » ' (•)«?/ (k) always consists of A- non-empty fuzzy sets.

y (u-Ax))2x... ***

X (u\(x c X

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 15: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 45

Proof.

a) If ((('(•), >'') satisfies (32), then in particular,

J2(u'(-),i>') < J2iu(-),v') (33)

for all u(-)e?f (k). In view of (16) and (17), ;/'(•) satisfies (33) if and only if:

1=1 ' i = i '

for all x e X, or equivalently,

k k

X(uJ(x))2| |x - i>;||2 = min Zwi2Wx - v'i\\2 < 3 4 )

1 = 1 W € Vf i - l

for all x e X. k ,case 1 (/ = 0): Let Uf denote the set)u e Rk X "; = l [ 3 ^> obtained by relaxing the

inequality constraints in the definition of Uf(]iq. (16)). Consider the corresponding relaxedversion of (34), namely

V 2 11 'ii2 /-)p\

min_ 2J wi \\X ~ vi\\ • (35)w eVj i = i

This quadratic minimization problem always has a solution. Furthermore, since noinequality constraints are present, the classical Lagrange multiplier rule can be applied hereand yields the following results: w' e Uf is an optimal solution of (35) only if for some real

k k

X, the augmented function X w2\\x - "ill2 + ^ X wi IS stationary at w'. This condition,

together with the constraint X «',- = 1 and the fact that / = £, insure that the solution of

(35) is unique and is given by

l / l l * - »JII2

/ • = 1

Evidently 0 < w- < 1 for 1 < / < k. Consequently, the optimal solution of (35) lies in Uj- CUf after all, and is therefore also the unique solution of (34). It follows that »'(•) satisfies(33) if and only if H'(.V) = w' for all .v e X.

Case 2 (/ =£ 0): If / =£ <t> then II A- - v\ IP = 0 for some / and it readily follows that

min X " V M l * - ^ l l 2 = 0 -W £Uf 1 = 1

Suppose uj (.v) ^ 0 for some / e Ic. Then {uj(.Y)? II x - ry IP > 0 and it follows that

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 16: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

46 J.C.DUNN

k *

£ (u'(x))2||x - i>;||2 > 0 = min £ W?WX ~ v'i\\2

which contradicts (34). Thus »'(•) satisfies (33) only if:

u;<x) = 0 i e Jc

(36)and . £ uj(x) = 1 .

• el

Conversely if ti'(x) satisfies (36) then

* k

£(uj(x) | 2 | |x - y|| |2 = 0 = min £ «>,2l|x - v't\\2 .

i - 1 iv e Uf i = l

b) Suppose thatu\(x) = 0 allx e X . (37)

Since there are A' vectors, v\, and since 1 < A- < JV = number of elements in X, there is atleast one element of X, say .v, such that

v\jix if I. (38)

Put r" = { v'{ , . . . , \'k } with

v- = !"' ' ' ^ ' . . (39)/ x i = I

We have v" e £ * , and in view of (31), (37), and (39)

Thus (»'(•), r") is also an optimal solution of (32) and consequently must satisfy conditiona of Theorem 3. With reference to (38) and (39), we have x = v'{ <=»/ = /, hence »/' (.v) mustequal 1, which contradicts (37).

c) Proof is obtained by replacing J{ with J2 and uj (.v) with (tij (.v))2 in part c of theproof of Theorem I. QKD

Corollary I. If (»'(•), r') is an optimal solution of (32), then r/ is a weighted mean of A'and consequently falls in'Co(X) for 1 < / < k. Furthermore, for at least n — k elements.v eX,

0 < u\(x) < 1 1 < i < k . (40)

Corollary 2. The following condition defines a non-empty class 'S2 of operators T:'?f(k) -> lSf(k). Let G(-) = TOKO) = image of //(•) under T. For some v e Lk satisfying

(41)( Y, (ui(x))2)vi =\x € X I

2j («,-(x)) x I < i < k,x eX

Condition (7): (?(•) must satisfy the following constraint: for each fixed .v e X, let

/ = I I < i < L k \ v t = x \

1C = 11 < i <. k\V; ? X\.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 17: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 47

I f / = 0 then

uAx) = -. .

thenUj. (x) = 0 i e lc

and I B,-(x) = 1 .id

A fuzzy partition »'(•) is part of an optimal solution (»'(•), >'') of (32) only if »'(•) is a fixedpoint of some operator T e 3~2 - i-e->

u'( •) = T(u'( •)) for some T e 3"2 . (42)

In addition, »'(•) must also satisfy

u'Ax) ?0 1 < i < k. (43)

As in the previous sections, Corollary 2 of Theorem 3 suggests an iterative approach tothe relaxed problem (32); specifically, it suggests that for some T e 'J2 we implement

u(m + l)(.) = T(i/m>(.)) 0 < m < ~ T e'J2 . (44)

Once again, the class 'S2 contains infinitely many operators. In general however, all of theseoperators coincide on the subclass of fuzzy partitions »(•) satisfying (43) and condition(5): For each .v e X, I contains at most one element; i.e., 7*(M(«)) is uniquely prescribed byconditions (43). (7), and (5). Furthermore, in contrast to the development of the previoussection, we also have the following useful result.6

Theorem 4. Let 5^ (̂ ") denote the subclass of non-degenerate fuzzy A-partitions satisfying

uA-) t 0 1 < 1 < k. (45)

Then for all T e 'J2. T maps ?f (A.) i n t o ^ (A).Proof. Since there are A: vectors »',- and A' elements x e X (k <S A'), either (i) there is at least

one .v e X such that / = 0, or (ii) A: = A' and for each x e X, I contains precisely one element.In either case, condition (7) insures that ;?(•) = T(u(')) e f / (A). QED

In view of this result, the first step of (44) always places it' )(•) injPy (A.) where condition(41) uniquely prescribes the vectors r,-. Thus, the only significant ambiguity in (7) occurswhen (<(•) fails to satisfy (5); in such cases (which appear to be very rare) a supplementarytic-breaking rule is required to make (44) a completely well-posed algorithm. For example if/ contains c elements, we might put »,(.v) = 1 /c for / e / ; or. we might put H,-(.V) = I, where iis the smallest integer in / . The first order analysis of Theorem 3 makes no distinctionbetween such rules. However, regardless of which rule is selected, the following result showsthat the algorithm (44) always has the descent property relative to J2.

Theorem 5. For arbitrary fixed Te'J2, let {(/('")(•)} denote the sequence of partitionsin ?y (A.) generated by (44) and let { r( '")} denote the corresponding sequence in S.k

associated with T and ;<"'(•) via condition ((S). Then for all m > 0,

'This result is generally false fur operators 7"c 3 , .The more restrictive result, 7":<Py(*)-»<jy(£) also fails on 3 , .

L

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 18: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

48 J.C. DUNN

Proof. As an immediate consequence of the reasoning employed in parts a and c in theproof of Theorem 3, we have

do\U \' / , v ) — min JnitiK • ) , f /

a n d J 2 ( « < m + ' ' ( • ) , i>(m>) = min J 2 ( u ( m + 1 ' ( • ) , v)v c £*

t h u s j ( / / m ^ ( . ) i /m^) > / (i/m + ^( •) i / m ' ) > J ( z / m + ^ ( « ) i^m + ^ ) Q E D^ ' 2 2

Several important questions remain. First, we would like to know whether the process(44) actually does converge to a fixed point of T, i.e., to a partition ;/(•) satisfying thenecessary conditions of Theorem 3. In view of Corollary 1 we can see that such a limitingpartition is typically fuzzy, however, when 0(A) > 1 we would like to know whether themembership functions of the limiting partition closely approximate the characteristicfunctions of the subsets X\ in the maximally separated partition P' solving (9). Finally,when P(k) decreases below 1 we would like to know whether the limiting partition(s)produced by (44) is always genuinely fuzzy in the sense that H,-(.Y) differs from 0 or I over asubstantial subset of X. At present, these questions have not been decided rigorously.However, the numerical experiments presented in the next section suggest that the process(44) does behave the way we would like it to behave.

6. Numerical Results

The following version of (44) was programmed for the IBM 360 digital computer:

Step I. Choose a partition »(•) in ?y (A') = class of non-degenerate fuzzy partitions.Step 2. Compute the weighted mean vectors

v, = — 1 < i < k . (46)

I 2 < >Step 3. Construct a new partition »(•) ej?/ (£) according to the following rule: for each

.v e A", Let / = 11 < / < A- I r,- = .\-| . If / is empty put

- vAx - u,->1 < i < k .

If/ is not empty, let i = smallest integer in / and put

i = i

f 0 i ? i '

Step 4. Compute new weighted mean vectors »",- corresponding to ;/(•) via (46) andcompute the corresponding maximum norm defect

S = max max \vt • - C(- |

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 19: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 49

• 50

1 6

• 75

Fig. 1. Three Clusters with ccntroids on the vertices of an isosceles triangle with alti-tude ft. Clusters are C\VS for h > 8A, where A = bilateral spacing within each

lattice.

where d = dimension of £ (A") = linear hull of X and vt j = /th component of the vector r,-relative to some specified basis for£ (X).

Step 5. If 5 is less than some specified threshold e, stop. Otherwise put v = v and go toStep 3.

We tried this algorithm on a variety of planar point sets X C R2 using the ordinaryEuclidean metric as the measure of distance, i.e..

= (X 1,1 , ) 2 (.X 1.2 ,rwhere .v,;- = /th entry of the 2-tuple .v, e R2. In all cases considered, we were able to achieve6 < e* = 10~5 by performing a sufficiently large number N* of iterations. As might beexpected, the values of A'* required varied with the structure of X and, to a lesser extent,with the choice of initial partition; these values are quoted for each of the cases reportedbelow, along with descriptions of the limiting fuzzy partition. For purposes of comparison,hard ISODATA partitions were also generated for each example, via the algorithm describedin Sect. 3.

X2

X ' . . . . . . . . .

1 . . » . . . . . . . . .

1 • • . . » -

- 7

11 • •

1019 82

FIG. 2. One large and one small cluster. Clusters areCWS for s > 8 -JIA, where A = bilateral spacing with-

in each lattice.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 20: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

50 J.C. DUNN

75

. . 73I t i . . . .

•30 33

37• • • • 31 32

It . • • •

1 * 3 2 Uo

1*5 71

68 70

69

FIG. 3. Two CWS Clusters embedded in a background of strays forming a bridge and a halo.

The point sets considered are shown in Figs. 1, 2, and 3. In Fig. I, A" consists of threeidentical square lattices A",, X2. X3 each containing 25 points with uniform bilateralinternal spacing A = 1, with centroids at the vertices of an isosceles triangle with altitude/;and base 2h, and with elements .v,-, 1 < / < 75, labeled as indicated. For /; > 5, thecomponent lattices X, are "natural" clusters relative to the connectivity criterion discussedin Sect. 2. Furthermore we have a(3) = 0(3) = /i/4 — 1 for li > 5. In particular{ Xi,X2,X2 } consists of CWS clusters for /; > 8. The initial partitions employed at the startof all calculations were always hard partitions of the following kind: P = {«,(•), u2(-),th(-)} . with

1 1 < i < p

0 p < <• <; 75

1 p < i < qu2(x,-) = -i (47)

(0 1 < i < p or q < i < 75

1 q < i < 75

0 1 < i < q

where 1 < p < q < 75.These partitions are completely characterized by the integer pair (p,q). Runs were made

with the fuzzy algorithm and with ISODATA for a variety of different initial partitions (47)and for different values of// (i.e. 0(3)). Tables 1-3 give the first two decimal places of thelimiting fuzzy partition produced by the fuzzy algorithm for// = 6, 8, and 12 (0(3) = 'A, 1,2)respectively, starting from the initial partition (47) with (p,q) = (30.55); Table 4 gives thecorresponding absolute deviations of the components of the limiting weighted mean vectorr,- in (46) from the corresponding components of the centroid vector for A",-. The numbers ofiterations required to reach the 8-place versions of the limiting partitions displayed in Tables1-3 were, respectively, A'* = 14, 10, and 8. However, the partitions obtained after only A' =5, 4, and 3 iterations, respectively, differ from the corresponding limiting partitions only by

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 21: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 51

TABLE I. Limiting Fuzzy Partition for the data set in Fig. 1, with h = 6 and k = 3a

1

12345678910111213141516171819202122232425

<<•(*,>

.88

.93

.95

.94

.91

.90

.96

.98

.97

.93

.90

.981.00.98.94.86.94.97.96.91.75.85.89.89.85

«*,(x,)

.09

.05

.04

.04

.06

.07

.03

.01

.02

.04

.07

.02

.00

.01

.04

.11

.04

.02

.03

.05

.19

.11

.07

.07

.09

26272829303132333435363738394041424344454647484950

.08

.07

.07

.10

.19

.05

.03

.02

.04

.10

.04

.01

.00

.02

.07

.04

.02

.01

.03

.07

.05

.04

.04

.05

.08

.86

.89

.90

.84

.74

.91

.96

.97

.94

.83

.93

.981.00.97.86.91.96.97.94.83.86.89.90.85.74

i

51525354555657585960616263646566676869

1 707172737475

.06

.04

.04

.04

.07

.04

.02

.01

.02

.04

.03

.01

.00

.01

.03

.03

.01

.01

.01

.03

.04

.02

.02 •

.02

.04

.19

.11

.07

.07

.09

.11

.04

.02

.03

.05

.07

.02

.00

.01

.04

.07

.03

.01

.02

.04

.09

.05

.04

.04

.06

aIn Tables 1-3, u,( •) is not shown, but may be obtained from the relation

£ !/.(*)= 1.

at most one or two units in the second place. Thus, e* = 10~5 is a conservatively smallthreshold for the present example.

Runs were also made starting from initial partitions (47) with (p.ci) = (51,61) and(60,70): for each fixed value of /;, the iterates generated by the fuzzy algorithm led alwaysto the same limiting partition (up to 6 places) obtained for (p.q) = (30,55), with little or nochange in A'*. For instance, with /; = 6, Ar* = 15 iterations were required to reach thepartition displayed in Table 1 from (p,q) = (51,61). As anticipated, the limiting fuzzypartitions become increasingly fuzzy, and A'* becomes increasingly large, as (3(3) decreases.

The behavior exhibited by ISODATA for this example is interesting. As one mightexpect, ISODATA does very well from the initial partition (47) with (p,q) = (30,55); for allthree values of //. just one iteration of ISODATA maps (p.q) = (30,55) into (p,q) = (25,50).However, in marked contrast to the fuzzy algorithm, ISODATA produces some rathersurprising and dramatic splittings of the natural clusters A',- when different starting partitionsare employed. For example, a simple calculation shows that the partitions (47) with (p.q) =(50,60) and (50.65) are fixed points of the ISODATA algorithm. Furthermore, it turns outthat many nearby partitions are either also fixed points or else get mapped quickly into suchfixed points. Thus, for /; = 8. (p,q) = (60,70) maps into (50,65) after four iterations, and(p.q) = (51,61) maps into (50,61) after 3 iterations. Therefore, for this example at least, itappears that the fuzzy algorithm is far superior to ISODATA in the sense that its

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 22: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

52 J.C. DUNN

TABLE 2.

(' 1/

123456789

10111213141516171819202122232425

Limiting

.(*,•>

.92

.95

.97

.96

.94

.94

.98

.99

.98

.96

.95

.991.00.99.96.92.97.99.98.95.87.92.94.94.91

fuzzy par

U, (Xj)

.05

.03

.02

.03

.04

.04

.02

.01

.01

.03

.04

.01

.00

.01

.02

.06

.02

.01

.02

.03

.10

.06

.04

.04

.05

'.ition

i

26272829303132333435363738394041424344454647484950

for the data

U, (Xj)

.05

.04

.04

.05

.10

.03

.02

.01

.02

.05

.02

.01

.00

.01

.04

.03

.01

.01

.02

.04

.04

.03

.02

.03

.05

set-in Fig.

«,(*,-)

.91

.93

.94

.92

.85

.94

.97

.98

.96

.91

.95

.991.00.98.92.94.97.98.96.91.91.93.94.92.85

1, with

i

51525354555657585960616263646566676869707172737475

h = 8, and k

u, (xf) u

.04

.02

.02

.02

.04

.02

.01

.00

.01

.02

.02

.00

.00

.00

.02

.02

.01

.00

.01

.02

.02

.02

.01

.02

.02

= 3

,(*;)

.10

.06

.04

.04

.05

.06

.02

.01

.02

.03

.04

.01

.00

.01

.02

.04

.02

.01

.01

.03

.05

.03.02.03.04

TABLE

I

123456789

10111213141516171819202122232425

3. Limiting

«.(*,->

.96

.98

.98

.98

.97

.97

.991.00.99.98.98.99

1.001.00.98.97.99.99.9998.95.97.98.97.96

fuzzy

«,(*,)

.03

.02

.01

.01

.02

.02

.01

.00

.01

.01

.02

.00

.00

.00

.01

.02

.01

.00

.01

.02

.04

.02

.02

.02

.03

partition

i

26272829303132333435363738394041424344454647484950

for the data

",(*,>

.03

.02

.02

.02

.04

.02

.01

.00

.01

.02

.01

.00

.00

.00

.02

.01

.01

.00

.01

.02

.02

.01

.01

.02

.03

set in Fig.

" : (Xj)

.95

.97

.97

.96

.94

.97

.99

.99

.98

.96

.98

.991.00.99.97.97.99.99.98.96.95.97.97.96.94

1, with It

i u

51525354555657585960616263646566676869707172737475

= 12 and k

.02

.01

.01

.01

.02

.01

.00

.00

.00

.01

.01

.00

.00

.00

.01

.01

.00

.00

.00

.01

.01

.01

.01

.01

.01

= 3

,<*,)

.04

.02

.02

.02

.03

.02

.01

.00

.01

.02

.02

.00

.00

.00

.01

.02

.01

.00

.01

.01

.03

.02

.01

.01

.02

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 23: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 53

TABLE

h

68

12

4. Absolute

'"l.l " * ! , . '

.032712

.014845

.004565

deviations of the limiting fuzzy mean vector components fromcomponents for the data set in Fig. 1

'•'i.i " * • ,

.022391

.010167

.003089

1 I " , , , - * * . . '

.000111

.000110

.000020

.035244

.018010

.005900

1 " , , . - * 3 , ,

.032750

.014840

.004570

centroid vector

1 Iv, , - * , , !

.022422

.010191

.003107

convergence to the "correct" limiting partition is not disrupted by a multitude of spuriousattracting fixed points corresponding to uninteresting local minimum of/. What seems to behappening here is that hard partitions which provide local minima of the extended payoff 7,on the continuum 9f(3) are no longer local minima for the extended payoff J2; as aconsequence, these partitions are not stationary points of J2, and therefore are not fixedpoints of the fuzzy algorithm. Furthermore, if spurious fuzzy local minima of J2 do existfor this data set, their zones of influence were too small to defied the iterates of the fuzzyalgorithm in any of the experiments peformed.

The data set of X in Fig. 2 consists of two square lattices A1, and X2 containing 9 and 81elements respectively, with uniform bilateral internal spacing A = 1, and labels as indicated.X\ and X2 are natural clusters relative to the connectivity criterion when the distance sbetween Xt and X2 exceeds 1. Furthermore, we have diam X2 = 8\/2~, and<5(2) =/3(2) =sl&\/2; hence, {x , , X2 } consists of CWS clusters when s > 8\/2 = 11.3. The startingpartitions used for this example were always of the form

j 1 1 < I < p

\o p < i < 90

1 p < / < 90

0 1 < i < p

(48)

TABLE 5. Limiting

i

I23456789

101112131415

u,(x)

.79

.79

.79

.81

.82

.81

.84

.85

.84

.74

.81

.89

.96

.99

.96

i

161718192021222324252627282930

fuzzy partition

11, (X)

.89

.81. .74

.66

.72

.79

.86

.89

.86

.79

.72

.66

.55

.57

.60

i

313233343536373839404142434445

for the data

11, (X)

.63

.65

.63

.60

.57

.55

.43

.41

.38

.34

.32

.34

.38

.41

.43

/

464748495051525354555657585960

set in Fig

u, (x)

.33

.27

.20

.13

.09

.13

.20

.27

.33

.26

.19

.11

.04

.01

.04

• 2,

I

616263646566676869707172737475

with s =

II, (X)

.11

.19

.26

.22

.15

.08

.03

.01

.03

.08

.15

.22

.20

.15

.09

5 and A:

/

767778798081828384858687888990

= 2a

II, (X)

.06

.04

.04

.09

.15

.20

.20

.16

.12

.09

.08

.09

.12

.16

.20

aIn Tables 5-8, the value of u2 (x) may be obtained from the relation

£ //,(*):

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 24: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

f 54 J.C. DUNN

TABLE 6. Limiting fuzzy partition for the data set in Fig. 2, with s = 7 and k = 2

123456789101112131415

.95

.95

.95

.97

.97

.97

.98

.99

.98

.41

.39

.38

.36

.36

.36

i

161718192021222324252627282930

u.Oc/)

.38

.39

.41

.30

.27

.23

.21

.20

.21

.23

.27

.30

.22

.17

.13

I

313233343536373839404142434445

.10

.09

.10

.13

.17

.22

.16

.11

.07

.04

.03

.04

.07

.11

.16

i

464748495051525354555657585960

.12

.08

.04

.01

.00

.01

.04

.08

.12

.11

.07

.03

.01

.00

.01

/

616263646566676869707172737475

»i (*/)

.03

.07

.11

.10

.07

.04

.02

.02

.02

.04

.07

.10

.11

.08

.06

'•

767778798081828384858687888990

«, U,)

.04

.04

.04

.06

.08

.11

.12

.10

.08

.07

.06

.07

.08

.10 •

.12

with p = 54. Runs were made with the fuzzy algorithm and ISODATA for values of sranging from 5 to 16 (0(2) ranging from « .44 to'1.41). Tables 5-8 give the first twodecimal places of the limiting fuzzy partitions produced by the fuzzy algorithm for.v = 5, 7,9 and 16 (0(2) = .44, .62, .79 and 1.4); the numbers of iterations required to obtain thesepartitions were, respectively. A'* = 28, 73, 24, and 12, although in each case substantiallyfewer iterations would suffice to reach partitions essentially like the limiting partitions.Once again, the limiting fuzzy partitions become increasingly fuzzy, and A'* tends toincrease in general (.? = 5 is the obvious exception) as (1(2) decreases.

Again, the behavior of ISODATA is interesting. When s > 3, a simple calculation revealsthat the "natural" partition (48) corresponding to /) = 9 is indeed a fixed point ofISODATA, however the partitions corresponding to p = 18, 27, 36, and 45 are also fixedpoints when 8 > s > 4, 9 > s > 3, 8 > s > 1, and 5 > s > 1 respectively. Furthermorecalculations for s = 9, 8. 7, 6, and 5 show that ISODATA quickly maps the initial partitioncorresponding to p = 54 into/; = 27, 36, 36, 36, and 45 respectively. For5 = 7, 8, and 9, the

TABLE 7. Limiting fuzzy partition for the data set in Fig. 2, with s = 9 and Jt = 2

i

123456789101112131415

.98

.98

.98

.991.00.99.991.00.99.26.23.20.19.18.19

161718192021222324252627282930

UlU,)

.20

.23

.26

.18

.15

.12

.10

.09

.10

.12

.15

.18

.13

.10

.07

313233343536373839404142434445

.05

.04

.05

.07

.10

.13

.10

.06

.04

.02

.01

.02

.04

.06

.10

'

464748495051525354555657585960

.08

.05

.02

.01

.00

.01

.02

.05

.08

.07

.04

.02

.01

.00

.01

i

616263646566676869707172737475

.02

.04

.07

.07

.05

.03

.02

.01

.02

.03

.05

.07

.08

.06

.04

I

767778798081828384858687888990

.03

.03

.03

.04

.06

.08

.09

.07

.06

.05

.05

.05

.06

.07

.09

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 25: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 55

TABLE

1 U

123456789

101112131415

8. Limiting

• <*,)

.99

.00

.99

.00

.00

.00

.00

.00

.00

.10

.08

.07

.06

.06

.06

/

161718192021222324252627282930

fuzzy partition

«,(*,)

.07

.08

.10

.07

.05

.04

.03

.03

.03

.04

.05

.07

.05

.04

.02

I

313233343536373839404142434445

for the

tl, (*,)

.02

.01

.02

.02

.04

.05

.04

.03

.01

.01

.00

.01

.01

.03

.04

data

I

464748495051525354555657585960

set in Fig. 2,

",(*,)

.04

.02

.01

.00

.00

.00

.01

.02

.04

.03

.02

.01

.00

.00

.00

i

616263646566676869707172737475

with s =

»,(*,)

.01

.02

.03

.04

.02

.02

.01

.01

.01

.02

.02

.04

.04

.03

.02

16 and k

' " i

767778798081828384858687888990

= 2

(x,)

.02

.02

.02

.02

.03

.04

.05

.04

.03

.03

.03

.03

.03

.04

.05

TABLE 9. Limiting fuzzy partition for the dataset in Fig. 3, with k = 2a

123456789

10111213141516171819202122232425

»,w.96.96.96.96.98.98.98.96.95.97.99.99.99.97.95.95.98.99

1.00.99.98.95.97.99.99

I

26272829303132333435363738394041424344454647484950

.99

.97

.96.97.96.91.82.70.55.39.25.14.07.06.07.05.03.02.03.05.06.03.01.00.01

51525354555657585960616263646566676869707172737475

'.(xj

.03

.06

.05

.02

.01

.00

.01

.02

.05

.03

.01

.01

.01

.03

.03

.02

.03

.16

.17

.08

.07

.15

.10

.15

.23

a«j(0 may be obtained from the relation

S »,(*) = 1.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 26: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

56 J.C. DUNN

fuzzy algorithm generates substantially better approximations to the "natural" partition(48) with /; = 9; for 5 = 1 6 , ISODATA converges to the natural partition, and the fuzzyalgorithm also generates a very good approximation to this partition.

Finally, the data set X in Fig. 3 contains two octagonal lattices A", and X2, eachcomprised of 30 elements with uniform bilateral internal spacing A = 1. The remainingelements of X form a six element bridge joining Xt and X2 (points 31-36) and a halo ofeight elements about X2 (pointy68-75).7 This data set differs from the previous examplesin a significant way: Although 0(2) « 1 for A', we have X = Y V Z where jf(2) > I for Y •and where the "average density" of points in Z is much less than in Y (e.g., take Y = unionof AT, and X2, and Z as the remaining bridge and halo elements.) In this sense X differsnegligibly from a set consisting of two CWS clusters. Although we have not attempted aprecise quantitative characterization of this kind of structure, it is clearly important inapplications, especially where X is a sample drawn from a mixture of unimodal probabilitydistributions. Zahn [4] and Wishart [8] describe pruning techniques for dealing withprecisely this problem of extracting dense nuclear clusters from a noisy background. Thegeneral idea behind their methods is to first isolate and remove the "negligible" subset of ATand then submit the remainder set to an algorithm capable to identifying clusters relative tothe metric of interest. In the present investigation, we were curious to see how the fuzzyalgorithm would behave on the unaltered set X of Fig. 3, for A: = 2. Accordingly,calculations were made for initial partitions of the form

1 1 < i < p

° P<i<~lb (49)1 p < / < 750 1 < i < p

In all cases considered, the fuzzy algorithm converged rapidly to the same limiting partition,which is given to 2 decimal places in Table 9 (the number of iterations required being A'* =10). Since /3(2) « 1 for X, we expect fuzziness in this partition. However, it can be seenthat divided membership is most pronounced on the bridge and halo elements. For .v in thelattice X{, u,(-) is essentially equal to the characteristic function of AT,; similarly, for.v inX2, u2(-) closely approximates the characteristic function of X2. Thus, the limitingpartition generated by the fuzzy algorithm provides insight into the structure of X. Incontrast. ISODATA converges rapidly 0V* « 3) to the limiting hard partition (49) with/) =34 for all initial partitions considered; while this limiting partition does separate the nuclearclusters Xx and X2, it gives no clue to the structure of X.

References

[ 11 P. II. A. Sneath, "The application of computers to taxonomy,"/. Gen. Microbiology, vol. 17, p. 201-226, 1957.| 2 | R.O. Duda and P. K. Hart, Pattern Classification and Scene Analysis. New York: Academic Press, 1972.| 3 | G. Nagy, "State or the art in pattern recognition." lEEEProc., vol.56, no. 5, pp. 836-862, 1968.[4] C. T. Zahn, "Graph theoretical methods for detecting and describing gestalt clusters," IEEE Trans, on Comp., vol.

C-20, no. 1, pp. 68-86, 1971.[ 5 | R. G. Casey and G. Nagy, "An autonomous reading machine,"IEEE Trans, on Comp., vol. C-17, no. 5, pp. 492-503,

1968.[61 R. G. Casey and G. Nagy, "Recognition or printed Chinese characters," IEEE Trans, on Comp.. vol. C-15, pp.

91-101,1966.

7 Figure 3 is drawn to scale.

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010

Page 27: Cybernetics and Systems A Fuzzy Relative of the ISODATA … · 2015-10-27 · Compact Well-Separated Clusters J. C. Dunn To cite this Article Dunn, J. C.(1973) 'A Fuzzy Relative of

A FUZZY RELATIVE OF THE ISODATA PROCESS 57

| 7 | S. C. Johnson, "Hierarchical clustering schemes."Psychometrika, vol. 32, pp. 241-254, 1967.[8J D. Wishart, "Mode analysis: A generalization of nearest neighbor which reduces chaining effects," in Numerical

Taxonomy, A. J. Cole, Ed. New York: Academic Press, 1969.[9] L. Zadeh, "Fuzzy sets," Inform, and Control, vol. 8, no. 3, pp. 338-353, 1965.

[10] E. H. Ruspini, "A new approach to clustering," Inform, and Control vol. 15, pp. 22-32, 1969.[11 | E. 11. Ruspini, "Numerical methods for fuzzy clustering,"//)/ Science, vol. 2, p. 319, 1970.|12 | I. Gitman and M. D. Levine, "An algorithm for detecting unimodal fuzzy sets and its application as a clustering

technique," IEEE Trans, on Comp.. vol. C-19, no. 7, pp. 583-593, 1970.[13) G. II. Ball and D. J. Hall, "ISODATA, an iterative method of multivariate analysis and pattern classification,"

Behavioral Science, vol. 12, pp. 153-155, 1967.114] J. Dieudonne, Foundations of Modern Analysis. New York: Academic Press, 1960.[ 15 j C. W. Curtis, linear Algebra. Boston: Allyn &. Bacon, 1963.[ 16] II. P. Friedman and J. Rubin, "On some invariant criteria for grouping data,"/iMitv. Slat. Assoc. J., vol. 62, no. 320,

pp. 1159-1178, 1967.[17] K. I'ukunaga and \V. L. G. Koontz, "A criterion and an algorithm for groupinu data." IEEE Trans, on Comp., vol.

C-19, no. 10, pp. 917-923, 1970.

Received September 17, 1973

Downloaded By: [German National Licence 2007] At: 09:12 16 July 2010