An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

11
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 7, JULY 1970 An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application as a Clustering Technique ISRAEL GITMAN AND MARTIN D. LEVINE, MEMBER, IEEE Abstract-An algorithm is presented which partitions a given sample from a multimodal fuzzy set into unimodal fuzzy sets. It is proven that if certain assumptions are satisfied, then the algorithm will derive the optimal partition in the sense of maximum separation. The algorithm is applied to the problem of clustering data, defined in multidimensional space, into homogeneous groups. An artificially generated data set is used for experimental purposes and the results and errors are discussed in detail. Methods for extending the algo- rithm to the clustering of very large sets of points are also described. The advantages of the method (as a clustering technique) are that it does not require large blocks of high speed memory, the amount of computing time is relatively small, and the shape of the distribution of points in a group can be quite general. Index Terms -Clustering algorithms, multimodal data sets, pattern recognition, symmetric fuzzy sets, unimodal fuzzy sets. I. INTRODUCTION rT HE PRIMARY objective of clustering techniques is to partition a given data set into so-called homoge- neous clusters (groups, categories). The term homo- geneous is used in the sense that all points in the same group are similar (according to some measure) to each other and are not similar to points in other groups. The clusters gen- erated by the partition are used to exhibit the data set and to investigate the existence of families as is done in numeri- cal taxonomy or alternatively, as categories for classifying future data points as in pattern recognition. The role of cluster analysis in pattern recognition is discussed in detail in two excellent survey papers [1] [16]. The basic practical problems that clustering techniques must address themselves to involve the following: 1) the availability of fast computer memory, 2) computational time, 3) the generality of the distributions of the detected categories. Clustering algorithms that satisfactorily overcome all of these problems are not yet available. In general, techniques that can handle a relatively large data set (say 1000 points) are only capable of detecting very simple distributions of points [Fig. 1(a)]; on the other hand, techniques that per- form an extensive search in the feature space (the vector Manuscript received September 22, 1969; revised December 15, 1969. The research reported here was sponsored by the National Research Council of Canada under Grant A4156. I. Gitman was with the Department of Electrical Engineering, McGill University, Montreal, Canada. He is now with the Research and Develop- ment Laboratories, Northern Electric Co. Ltd., Ottawa, Ontario, Canada. M. D. Levine is with the Department of Electrical Engineering, McGill University, Montreal, Canada. 62. (b) Fig. 1. Distribution of points in a two-dimensional feature space. The curves represent the closure of the sets which exhibit a high concentration of sample points. space in which the points are represented) are only able to handle a small data set. Some authors [6], [9], [19], [20] have formulated the clustering problem in terms of a minimization of a func- tional based on a distance measure applied to an underlying model for the data. The clustering methods used to derive this optimal partition perform an extensive search and are therefore only applicable to small data sets (less than 200 points). In addition, there is no guarantee that the conver- gence is to the true minimum. Other methods [3], [12], [15] use the so-called pairwise similarity matrix or sample co- variance matrix [15]. These are memory-limited since, for example, one-half million memory locations are required just to store the matrix elements when clustering a data set of 1000 points. Also, the methods in [12], [15] will generally not give satisfactory results in detecting categories for an input space of the type shown in Fig. 1(b). It is rather difficult to make a fruitful comparison among the many clustering techniques that have been reported in the literature and this is not the aim of the paper. The diffi- culty may be attributed to the fact that many of the algo- rithms are heuristic in nature, and furthermore, have not been tested on the same standard data sets. In general it seems that most of the algorithms are not capable of detect- ing categories which exhibit complicated distributions in the feature space [Fig. 1(c)] and that a great many are not applicable to large data sets (greater than 2000 points). This paper discusses an algorithm which partitions the given data set into "unimodal fuzzy sets." The notion of a unimodal fuzzy set has been chosen to represent the parti- tion of a data set for two reasons. First, it is capable of de- tecting all the locations in the vector space where there exist highly concentrated clusters of points, since these will ap- pear as modes according to some measure of "cohesive- ness." Second, the notion is general enough to represent clusters which exhibit quite general distributions of points. 583

Transcript of An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

Page 1: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 7, JULY 1970

An Algorithm for Detecting Unimodal

Fuzzy Sets and Its Application as a

Clustering Technique

ISRAEL GITMAN AND MARTIN D. LEVINE, MEMBER, IEEE

Abstract-An algorithm is presented which partitions a givensample from a multimodal fuzzy set into unimodal fuzzy sets. It isproven that if certain assumptions are satisfied, then the algorithmwill derive the optimal partition in the sense of maximum separation.

The algorithm is applied to the problem of clustering data, definedin multidimensional space, into homogeneous groups. An artificiallygenerated data set is used for experimental purposes and the resultsand errors are discussed in detail. Methods for extending the algo-rithm to the clustering of very large sets of points are also described.

The advantages of the method (as a clustering technique) are thatit does not require large blocks of high speed memory, the amount ofcomputing time is relatively small, and the shape of the distribution ofpoints in a group can be quite general.

Index Terms -Clustering algorithms, multimodal data sets,pattern recognition, symmetric fuzzy sets, unimodal fuzzy sets.

I. INTRODUCTIONrT HE PRIMARY objective of clustering techniques is

to partition a given data set into so-called homoge-neous clusters (groups, categories). The term homo-

geneous is used in the sense that all points in the same groupare similar (according to some measure) to each other andare not similar to points in other groups. The clusters gen-erated by the partition are used to exhibit the data set andto investigate the existence of families as is done in numeri-cal taxonomy or alternatively, as categories for classifyingfuture data points as in pattern recognition. The role ofcluster analysis in pattern recognition is discussed in detailin two excellent survey papers [1] [16].The basic practical problems that clustering techniques

must address themselves to involve the following:

1) the availability of fast computer memory,2) computational time,3) the generality of the distributions of the detected

categories.

Clustering algorithms that satisfactorily overcome all ofthese problems are not yet available. In general, techniquesthat can handle a relatively large data set (say 1000 points)are only capable of detecting very simple distributions ofpoints [Fig. 1(a)]; on the other hand, techniques that per-form an extensive search in the feature space (the vector

Manuscript received September 22, 1969; revised December 15, 1969.The research reported here was sponsored by the National ResearchCouncil of Canada under Grant A4156.

I. Gitman was with the Department of Electrical Engineering, McGillUniversity, Montreal, Canada. He is now with the Research and Develop-ment Laboratories, Northern Electric Co. Ltd., Ottawa, Ontario, Canada.

M. D. Levine is with the Department of Electrical Engineering, McGillUniversity, Montreal, Canada.

62.(b)

Fig. 1. Distribution of points in a two-dimensional feature space.The curves represent the closure of the sets which exhibit a highconcentration of sample points.

space in which the points are represented) are only able tohandle a small data set.Some authors [6], [9], [19], [20] have formulated the

clustering problem in terms of a minimization of a func-tional based on a distance measure applied to an underlyingmodel for the data. The clustering methods used to derivethis optimal partition perform an extensive search and aretherefore only applicable to small data sets (less than 200points). In addition, there is no guarantee that the conver-gence is to the true minimum. Other methods [3], [12], [15]use the so-called pairwise similarity matrix or sample co-variance matrix [15]. These are memory-limited since, forexample, one-half million memory locations are requiredjust to store the matrix elements when clustering a data setof 1000 points. Also, the methods in [12], [15] will generallynot give satisfactory results in detecting categories for aninput space of the type shown in Fig. 1(b).

It is rather difficult to make a fruitful comparison amongthe many clustering techniques that have been reported inthe literature and this is not the aim of the paper. The diffi-culty may be attributed to the fact that many of the algo-rithms are heuristic in nature, and furthermore, have notbeen tested on the same standard data sets. In general itseems that most of the algorithms are not capable of detect-ing categories which exhibit complicated distributions inthe feature space [Fig. 1(c)] and that a great many are notapplicable to large data sets (greater than 2000 points).

This paper discusses an algorithm which partitions thegiven data set into "unimodal fuzzy sets." The notion of aunimodal fuzzy set has been chosen to represent the parti-tion of a data set for two reasons. First, it is capable of de-tecting all the locations in the vector space where there existhighly concentrated clusters of points, since these will ap-pear as modes according to some measure of "cohesive-ness." Second, the notion is general enough to representclusters which exhibit quite general distributions of points.

583

Page 2: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

The generated partition is optimal in the sense that the pro-gram detects all of the existing unimodal fuzzy sets andrealizes the maximum separation [21] among them. Thealgorithm attempts to solve problems 1), 2), and 3) men-tioned above; that is, it is economical in memory space andcomputational time requirements and also detects groupswhich are fairly generally distributed in the feature space[Fig. 1(c)]. The algorithm is a systematic procedure (as op-posed to an iterative technique) which always terminatesand the computation time is reasonable.An important distinction between this procedure and the

methods reported in the literature' is that the latter use adistance measure (or certain average distances) as the onlymeans of clustering. We have introduced another "dimen-sion," the dimension of the order of "importance" of everypoint, as an aid in the clustering process. This is accom-plished by associating with every point in the set a grade ofmembership or characteristic value [21 ]. Thus the order ofthe points according to their grade of membership, as wellas their order according to distance, are used in the algo-rithm. The latter partitions a sample from a multimodalfuzzy set into unimodal fuzzy sets.

In Section II the concept ofa fuzzy set is extended in orderto define both symmetric and unimodal fuzzy sets. Thebasic algorithm consists of the two procedures, F and S,which are described in detail in Sections III and IV, respec-tively. Section V deals with the application of the algorithmto the clustering of data and the various practical implica-tions. Section VI discusses the experimental results. Possibleextensions of the algorithm to handle very large data sets(say greater than 30 000 points) are presented in SectionVII. The conclusions are given in Section VIII.

II. DEFINITIONS

The notion of a fuzzy set was introduced by Zadeh [21].Let X be a space of points with elements xeX. Then:

"a fuzzy set A in X is characterized by a membership(characteristic) function fA(x) which associates witheach point inX a real number in the interval [0, 1], withthe value fA(x) at x representing the 'grade of member-ship' of x in A."

In the rest of this section we shall introduce some nota-tion and certain definitions required for the description ofthe algorithm.

Let B be a fuzzy set in X with the membership (character-istic) function f, and let 4u be the point at which the maximalgrade of membership is attained, that is,

f(=)= Sup [f(x)].xeB

We may define two sets in B as follows:

' Rogers and Tanimoto [17] introduced a certain order among thepoints by associating with the point i, a value which is the number of pointsat a constant finite distance from i. When the number of attributes is large,this so-called "prime mode" will be the centroid of the data set, ratherthan a "mode" of a cluster. A measure of inhomogeneity is used to detectclusters one at a time.

V i= {xif(x) 2 f(xi)}2and

.- ={xdd(jl, x) < d(u, xi)}

where xi is some point in B and d is a metric.Definition: A fuzzy set B is symmetric if and only if, for

every point xi in B, xi = rxid*Clearly, if B is symmetric, then for every two points xi

and Xk in B,

d(xi,,) < d(Xk, U):: tf(Xi) . f(x4

As an example of a symmetric fuzzy set, consider the set Bdefined as "all the very tall buildings." B is a symmetricfuzzy set, since the taller the building, the higher the gradeof membership it will have in B. Any symmetric (in theordinary sense) function, or a truncated symmetric function,can represent a characteristic function of a symmetricfuzzy set.

Definition: A fuzzy set B is unimodal if and only if theset Fx, is connected for all xi in B (see Fig. 2).

In order to consider the problem of clustering data pointsit will be necessary to define discrete fuzzy sets.A sample point from B will be a point xeX with its associ-

ated characteristic value, f(x). Further, we will denote asample of N points from B by Si{(xi,L)N}, where xi is apoint in X and fi its corresponding grade of membership.S can be considered as a discrete fuzzy set which includesonly those points xi given by the sample. We shall requirea large sample S; in particular, S is large in comparison tothe dimension of the space X, and to the number of localmaxima in f.

{(Si, yi)m} denotes a partition of S into m subsets, whereSi is a discrete fuzzy subset and pi the point in Si at whichthe maximal grade of membership is attained. We refer to

pi as the mode of Si. A mode will be called a local maximumif it is a local maximum of f and will then be denoted by vi.It will be assumed that every local maximal grade of mem-bership is unique [211.The notion of an interior point in a discrete fuzzy set

is defined as follows.Definition: Let S be a sample from a fuzzy set B and Si a

proper subset of S. For some point Xk in Si we associate a

point x, in (S - Si) such that

d(x,, Xk) = min [d(xk, xi)].xje(S - Si)

The point Xk is defined to be an interior point in Si if andonly if the set F= {xld(xi, x)< d(xi, Xk)} includes at least onesample point in Si (see Fig. 3).Note that when the sample is of infinite size, in the sense

that every point in X is also in S, this definition reduces tothat of an interior point in ordinary sets.

Given a sample, the algorithm to be described in the nexttwo sections is composed of two parts: procedure F which

2 This is equivalent to the set Fr in [21 ] where x =f(x,i).

584

Page 3: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

X2

Qrb

(b)

x)

(a)Xi

Fig. 2. An example of a unimodal fuzzy set in a two-dimensional space.The curves indicate lines of equigrade of membership. (a) A unimodalfuzzy set where for every point xi in the set, the set 17, is not disjoint.(b) A multimodal fuzzy set, since there exists a point xr for which F,ris disjoint.

X2000

r2

XiFig. 3. The points in Si are denoted by x. The point Xkl is on the boundary

of Si since IF includes no sample points in S.. The point Xk2 is aninterior point in Si, since F2 includes points in Si.

detects all the local maxima off, and procedure S whichuses these local maxima and partitions the given sample intounimodal fuzzy sets.

III. PROCEDURE FGiven a sample S= {(xi, fi)N} from a multimodal fuzzy set,

subject to certain conditions on f and S (see Theorem 1),procedure F detects all the local maxima of f. It is dividedinto two parts: in the first part, the sample is partitioned intosymmetric subsets and in the second, a search for the localmaxima in the generated subsets is performed.

In order to make the steps in the procedure clear, some

preliminary explanations are given below. An examplewhich demonstrates the procedure is presented later.The number of groups (subsets) into which the sample is

partitioned is not known beforehand. The procedure isinitialized by the construction of two sequences: a sequenceA in which the points are ordered according to their gradeofmembership, and a sequence A1 in which they are orderedaccording to their distance to the mode of A (the first pointin A). The order of the points in the sequence A is the orderin which the points are considered for assignment intogroups. This process will initiate new groups when certainconditions are satisfied. Whenever a group, say n, is initiated,a sequence of points An is formed of all the points in S whichmight be considered for assignment into group n. The first

point in An is its mode and the points are ordered accordingto their distance to this mode. Not all the points in An willnecessarily be assigned into group n at the termination ofthe procedure. At every stage of assignment, every group idisplays a point from its sequence Ai, which is its candidatepoint, to be accepted into the group. The point of A to beassigned is compared (for identity) with each of the candi-date points in turn and is either assigned into one of theexisting groups (if it is identical to the correspondingcandidate point) or initiates a new group. If, for example,the point is assigned into groupj, then the candidate of thisgroup is replaced by its next point in the sequence A4. Thusa point is assigned to the group in which its order accordingto the grade of membership corresponds to its order ac-cording to the distance to its mode.

Part 1 of Procedure FLet S= {(xi, fj)N} be a sample from a fuzzy set (assume,

for simplicity, that fi# fj for i #j).'1) Initially it is required to generate the following two

sequences.

a) A = (yv1, Y2, YN) is a descending sequence of thepoints in the sample ordered according to their gradeof membership; that is, fj2 f, for1< t, where fj and f,are the grades of membership of yj and Yt, respectively.

b) A1 =(yl, y' y',Y3 ,*Y), where yI =y1,4 is the sequenceof the points ordered according to their distance to

that is, d(yl, yJ) < d(yl, y1) forj. t.5

We will also refer to A1 as the sequence of ordered"candidate" points to be assigned into group 1. Thus y2is the first candidate, and if it is assigned into group 1,then y' becomes the next candidate, and so on. We cantherefore state that the current candidate point for group 1,y1, is the nearest point to its mode yI(mY1up1) except forpoints that have already been assigned to group 1. This willhold true for any sequence Ai; that is, y' -pi is the modefor group i, and y' is its candidate point.

2) If yjmy=y, for i-2, 3,..., r-1, and Y 1, then yi,i= 1, 2,"*, r-1, are assigned into group 1 and a new groupis initiated with y2- l2 Yr as its mode. That is, the sequenceA2=(yf y2 y32** y2) iS generated. The latter includesfrom among the points that have not yet been assigned,those points which are closer to Yr than the shortest distancefrom Yr to the points that have already been assigned; this isshown for one dimension in Fig. 4. The points in A2 are nowordered according to their distance to Yr; that is, d(yr, y2)y.d(ry2) forj.t.3) Suppose that G groups have been initiated. Thus there

exist G sequences, Ai, i= 1, * , G, each of which displays acandidate point, y'. Suppose that yq in the sequence A is thepoint currently being considered for assignment (all the

3 The case in which there are equal grades of membership will be dis-cussed in Section V.

4 We shall use the symbol "-" to mean "is identical to."5 The sets A and A1 are sequences of the same N points; however, the

ordering principle is different. Thus the point y, is some point in A and itslabel indicates that it is also in location k in the sequence Ai.

585

Page 4: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

f

rp ~~~riFig. 4. At the stage where the point xi(=ji,) initiates a new group, all the

sample points that have already been assigned are in the domain VP.Thus the nearest point in Fp to xi is at a distance Ri, which defines thedomain Fj of all the points which are at a shorter distance to xi than Ri.The sample points in Fi will be ordered as candidate points to beassigned into the group in which xi is the mode.

points in A for i< q have already been assigned); then thefollowing holds.

a) Ifyq=my-and Yq#yJj= 1,.* , G,j= i, then yq is assignedinto group i.

b) Ifyq Yi for some ieI, where I is a set of integers repre-senting those groups whose candidate points areidentical to yq, then yq is assigned into that group towhich its nearest neighbor with a higher grade ofmembership, has been assigned.

c) If Yq# yi for i= 1,*, G, then a new group is initiatedwith Yq as its mode.

Part 1 of procedure F is terminated when the sequence A isexhausted.

Theorem 16: Let f be a characteristic function of a fuzzyset with K local maxima so that:

1) if VK is a local maximum of f, then there exists a finiteE >0 such that the set {xld(vk, x)< E} is a symmetricfuzzy set.

Let S= {(xi, f1)N} be a large sample from f, such that:

2) for every xi in the domain of f, the set {xId(xi, x)<e/2}includes at least one point in S, and

3) {(Vk,fk) }CS.

Let {(Si, pj)m} denote the partition generated by part 1 ofprocedure F, where Si denotes the discrete fuzzy set, ji itsmaximal grade of membership (mode), and m the numberof groups. Then pi is an interior point in Si if and only ifit is a local maximum of f.Theorem 1 states the sufficient condition under which the

procedure will detect all the local maxima of f. The mainrestriction is the requirement that every local maximum off shall have a small symmetric subset in its neighborhood(condition 1). It is not necessary for the sample to be ofinfinite size; it will be sufficient if it is large in the neighbor-hood of a local maximum. Condition 2 indirectly relatesthe dimension of the space to the size ofthe sample set.

Using the result of Theorem 1, part 2 of the procedure isemployed to check all the modes pi in order to detect which

6 The proofs of the theorems are given in Appendix I.

fS3S2 St S4

x ' ,,I

XIxlX7X IX9 1X3 X15 lX X20 X2F3 X25 IX27 X2 X

Fig. 5. The characteristic functionfand the 30 point sample for the exam-ple are shown. The dotted lines indicate the partition (the sets Si)resulting from the application of part I of procedure F. We can observethat x15 and x25 are the only interior modes in the partition and thuswill be recognized as the local maxima points (vi) off.

are interior points. This is done according to the definitiongiven in the previous section.

Part 2Let {Si, yui)l} be the partition generated by part 1 of

procedure F. For every mode pi and set Si, a point x,j and adistance Ri can be found as follows:

Ri = d(,u, xpi) = min [d(i, Xk)J.Xke(S - Si)

Ri is the minimum distance from the mode to a point in Soutside the set Si. We say that ,ui is a local maximum if the set

FRj = {xld(xpi, X) < Ri}includes points in Si. Otherwise we decide that pi is not alocal maximum because it is a boundary point of Si.To summarize this section, iff and S satisfy the conditions

stated in Theorem 1, then the procedure presented detectsall the local maxima off .

Example: The following example demonstrates the vari-ous procedures associated with the algorithm. A sample of30 points was taken from a one-dimensional characteristicfunction. The latter, as well as the sample points with theirassociated sample numbers, are shown in Fig. 5. The se-quences A and A1 are given by

A (Y1, Y2, Y,Y30) (x15, x14, x16, x13, (x2X,x1l,lx17X25, X24, X10, X26, X9, X27, X23, X18,X8, X28, X7, X6, X5, X4, X3, X19, X29,

X2, X30, X1, X20, X22, X21).A1 = (yll Y21 Y31* Y30) X= X514 X16 X13 , X2 X18

Xll, XlO, X9,X19, X20, X8, X7,

X21, X6, X22, X5, X4, X23, X3, X24,

X2, X1, X25, X26, X27, X28, X29,

X30)-

Observing these sequences, we can see that the first fourpoints in A and A1 are pairwise identical and thus they areassigned to group 1. Thereafter, the candidate point forgroup 1 is y1 =x17, whereas the point to be assigned is x12.Thus the latter will initiate a new group and a new sequenceA2 will be generated.

586

Page 5: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

After the first four groups are initiated, the sequences Aiand the resulting partition to this point are as follows:

i A - (Y1,Y2, Y30) =: (X15, X14, X16, X13, X12, Xll,

X17, X25, X24, , X26,... IX21)

-:: X1 5, X14, X1l6, X 13, X17X127?(A =1 . I(-Y'l, Y2L * , Y30J

SI = (XI 5, X1 4, X 1 6 1l(A (y2)S2= (X12,)

(A3 (yS y, y3)S3 = (X11,)A4 - (Y4,Y4 4,

S4 =(X25,X24)

1317,

- (X12)

(Xi 11 X I,x9)

,143) = (X25, X24,X26, X23, X27, X28,X22, X29, X21, X30, X20, X19,

X18)

In relation to the procedure described above, we note thefollowing.

1) The sequence A2 includes only one point (its mode)since the nearest point to x12 in S has already beenassigned. Therefore there are no sample points in S togenerate a symmetric fuzzy set whose mode is x12.

2) At the stage shown, x1o in the sequence A is to be as-

signed. The candidate points for the four groups thathave already been initiated are x12, no candidate, x10,and x26, respectively. Thus x1o will be assigned togroup 3 since it is identical with the latter's candidatepoint.

3) No more points will be assigned into group 1, since itscandidate x12 has already been assigned to anothergroup, and thus cannot be replaced as a candidate forgroup 1.

The resulting symmetric fuzzy sets generated by the applica-tion of part 1 of procedure F are shown in Fig. 5. Part 2 ofthe procedure is now applied to test each of the 13 modes todetect which of these are interior points. In Fig. 5 we can

see that only the modes x15 and x25 are interior points intheir corresponding sets, and therefore only two localmaxima are discovered. Based on this partial result, theexample will be continued at the end of the next section inorder to demonstrate procedure S.

IV. PROCEDURE S

Procedure S partitions a sample from a fuzzy set intounimodal fuzzy sets, providing the local maxima off are

known. Thus this procedure uses the information obtainedfrom the application of procedure F; that is, the number,location, and characteristic values of the local maxima off.The rule for assigning the points differs from the known clas-sification rules appearing in the pattern recognition litera-

ture. Rather than an arbitrary order which is the usual case,the points are finally assigned in the order in which theyappear in the sequence A.

Specifically, let S = {(xi, fi)N} be a sample from a fuzzy set,and {(vi, fJ(v))K} c S be the sample of the K local maximavi of f Assume that f1(xi)Af+j(xj) for i #j, and f(vj)>f(vj)for i .j. Let A be the sequence of the points ordered accord-ing to their grade of membership, and suppose that the Klocal maxima off are in locations pi, i= 1, i K in A. Wecan infer the following proposition.

Proposition: The point xj in location j in the sequenceA, PM<j<P(M+1), M<K, can only be assigned into one ofthe groups ieIM= {1, 2, * * * , M}.

If f(Xpr), r=M+ 1, M+ 2,.* , K is the local maximumof group r, then only points with a lower grade of member-ship can be assigned into group r. Since all the points thatprecede location p, in A have higher grades of membership,none of them can be assigned into group r, r=M+ 1,M+2, * **,K.

This proposition implies that all the points in A which arefound in the locations Pj .J<p will automatically be as-signed into group 1; the points in locations P2<] <P3 willbe divided between group 1 and group 2, and so on.

Procedure S uses the following rule: assign the point xjin location j in the sequence A into the group in which itsnearest neighbor with a higher grade of membership (allthe points preceding xj in A) has been assigned. This ruleapplies to all the points with the exception of the localmaxima that initiate new groups. Note that the rule is dif-ferent from the "nearest neighbor classification rule" [5 ] be-cause of the particular order in which the points are intro-duced.

Theorem 2: Let f be a piecewise continuous character-istic function of a fuzzy set.

Let S= {(xi, fj)N} be an infinite sample from f, such that

1) for every xi in the domain of f and for an oa 0, the setF= {xId(xi, x)< a/2} includes at least one samplepoint S.

If L-*O, then procedure S partitions the given sample intounimodal fuzzy sets.

Theorem 3: Let S be a sample from a fuzzy set with acharacteristic function f. Let f and S be constrained as inTheorem 2.

If x-+0, then every final set is a union of the sets Si gen-erated in part 1 of procedure F.

IfX= E', a more powerful result than Theorem 2 can bestated; for simplicity we will state it for the case of two localmaxima.

Theorem 4: Let f be a piecewise continuous characteristicfunction of a fuzzy set and d the distance between its twolocal maxima.

Let S be a sample from f, such that,

1) for every point xi in the domain of f and for a finiteox >0, a d, the set F= {xjd(x., x)< o} includes at leastone point in S, and

2) the local maxima, (v1, f(v1)), (v2, f(v2)) are in S.

587

Page 6: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

Let H = xo7 be the optimal hyperplane (point) separatingf into unimodal fuzzy sets and F, = {xld(xo, x) <oc/2}.

If S does not include any points in F7, then procedure Sderives the optimal partition of S for any finite cx, oc <<«VTheorem 2 states the sufficient conditions (but not neces-

sary) under which procedure S derives the optimal partitioninto unimodal fuzzy sets. Note that when cx = 0, the sampleS is identically equal to the domain of f. On the other hand,given a characteristic function f, we can always find a finiteac for which the result holds. Observing procedure S, we maysee that the sample must be large, particularly in the neigh-borhood of the separating hypersurface (see Theorem 4).

Utilizing the result of Theorem 3, we can modify proce-dure S to assign subsets Si, generated in part 1 of procedureF, rather than individual points of S. That is, we can firstassign pi (the mode of Si), and then automatically all thepoints in Si to this same group. In fact no further computa-tion is necessary since, if pi is a mode, it will initiate a newset (group) in part 1 of procedure F. In the latter, whenevaluating the distances to the points that have alreadybeen assigned, we can record its nearest point (with a highergrade of membership). Hence procedure S reduces to anautomatic classification of the points.Theorem 4 implies that if a is finite, but ox << d, then only

poinis in S within a distance cx to H can be misclassified.Example: To demonstrate procedure S, we again consider

the sequence A, where now it is assumed that the localmaxima are known.

A = ( ,x14,x16,x13,3x12,xXll 7,x15 , X24, X10, x26,X9, X27, (i), x18, X8, x28, x7, x6, x5, x4, x3, x19, x29,

X2, X30, X1, X20, X22, X21).

All the points up to x25 are automatically assigned to thegroup in which x15 is the local maximum (see proposition).The other points are assigned either to the first group or tothe second (where x25 is the local maximum) according tothe classification of the nearest point to the point to beassigned.

In particular, if x23 is the point to be currently assigned,then the partial sets (S, and S2) are given by

Sl = (X15, X14, x16, X13, x12, xll, X17, X10, X%,)S2 = (X25, X24, X26, X27,)-

Since the nearest point to x23 (among the ones that havealready been assigned) is x24, the former will be assigned intoS2. This stage of the process and the final partition areshown in Fig. 6.

V. THE APPLICATION OF THE ALGORITHMTO CLUSTERING DATA

The problem we have treated so far, which can be statedas "the partition of a fuzzy set into unimodal fuzzy sets,"is well defined. This is not, however, the case in the clustering

' If X= El, then a unimodal fuzzy set is also a convex fuzzy set [21],and the hypersurface becomes a hyperplane.

8 Among other changes in the statement of the theorem in the case whenthe number of local maxima is greater than two, we must replace thedistance d by the minimum distance between any two local maxima off.

II

-23)4

X15 X17 X23 X2025

r.X

Fig. 6. This figure demonstrates procedure S. At a certain stage in theprocedure, all the points in the sequence A with higher grade of mem-bership than f1 have been assigned, and X23 is the next point to beassigned. In this case, all the points in the domains F1 and F2 havealready been assigned. The distance of x23 to all the points F, and F2is evaluated and this point will eventually be assigned into the groupin which x24 is a member. The dotted line indicates the final partitionfor this example.

problem [16] where a set of points {(xi)'} is given that mustbe clustered into categories. In order to directly employ thealgorithm it is necessary to associate with every point xi agrade of membership fi. In other words, a certain order ofimportance is introduced to facilitate the discriminationamong the points not only on the basis of their location (inthe vector space) but also according to their "importance."There are many possible ways to discriminate among thepoints. One possibility is to use a clustering model to asso-ciate with every point a membership value according to its"contribution" to the desired partition. By a clusteringmodel, we mean functionals which describe the propertiesof the desired categories and the structure of the final par-tition [19].

In our experiments, we have used a threshold value T andassociated with every point xi, an integer number ni which isthe number of sample points in the set Fi= {xid(xi, x) < T}.It is obvious that the resulting partition is dependent on T,although for any T, a unique partition into unimodal fuzzysets is derived. A previous knowledge about the data to bepartitioned is not essential in order to choose T. The lattermust be determined in such a way that there is "sufficient"discrimination among the points. For example, in the ex-treme, if a very large threshold is chosen, then every pointwill have the same number ni(ni= N) and no discriminationis achieved; on the other hand, this is also true for a verysmall T, but in this case ni= 1. The threshold essentiallycontrols the resolution of the characteristic function f. It isquite within the realm of possibility to automate this pro-cedure but this was not done for the experiments reported inSection VI.

It is also necessary to consider the practical situationwhere many points have the same grade of membershipsince this was explicitly excluded in the previous theo-retical developments. This problem was solved by allowingfor a permutation of the points in the sequence A when theyhave the same grade of membership. More specifically, con-sider part I of the procedure F in which the symmetric fuzzysets are derived. Suppose that G groups have already beeninitiated and Yq is the point in the sequence A to be assignednext. If YqYic, for i= 1, -, G and if f(Yq+I)=(Yq), then

588

f,

14 --I

Page 7: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

the identity between Yq± 1 and y', i= 1, , G, is checked.Thus yq will initiate a new group only if none of the pointsYq+ 1' Yq+2, '', with the same grade of membership as Yq,is identical to y i= 1, G.

Another consideration is the case in which the maximalgrade of membership in a set Si is attained by a number ofpoints. To solve this problem, we have modified part II ofprocedure F (in which a search for the local maxima is per-

formed) in the following way. Let Sii be the subset of pointsin Si which have the same (maximal) grade ofmembership as

pi; then every point in Sii is examined as the mode of Si.If at least one of these points is on the boundary of Si, thenpi is not considered as a local maximum.

VI. EXPERIMENTS

Clustering techniques can be compared only if they are

applied to the same data set or to data sets in which thecategories are well-known beforehand. Such experimentscan therefore be performed either on artifically generateddata, or on data sets such as, for example, handprintedcharacters [4].

In order to be able to reach some significant conclusionsconcerning the performance of the algorithm we have ap-

plied it to artifically generated data sets. The latter consistsof points in a ten-dimensional vector space and belongingto sets described by multimodal spherical and ellipsoidaldistributions. The samples from each category of the formerwere generated by adding zero-mean Gaussian noise to a

prototype vector. The ellipsoidal data sets were determinedby subjecting the vectors of the spherical data sets to certainlinear transformations, stretching, and rotation. This dataset is a part of the version that was used in [7 ], [18 ] for pat-tern recognition experiments and is described in [8]. Wehave taken the first ten prototype vectors and generated a

data set of 1000 points-100 points for each prototypevector.9Two series of experiments were performed. In the first

series, the algorithm was applied to six data sets; the spher-ical sets with a= 15, 20, and 25, and the ellipsoidal data setsderived from these. A summary of the results is given in

Tables I and II. In the second series, two additional runs

with the ellipsoidal data set (derived from the sphericalset with a= 15) obtained with different initial conditions forthe random number generator, were performed. The same

threshold T as in the first series was used, thus facilitating a

comparison of the results of three runs for different initialconditions of the random number generator. The resultsare shown in Tables III and IV.The optimal partitions for these data sets are unknown

but will be characterized by the rates of error associatedwith the optimal solution of the supervised pattern recogni-tion problem. This classification is achieved by assuming a

knowledge of the functional form and the parameters of theparent populations and using an optimal classifier (Bayessense). Although these solutions are known theoretically,the computation for the ellipsoidal data sets is difficult be-

9 The prototype vectors which have been used for the data sets are

listed in Appendix II.

TABLE I*

Group Spherical EllipsoidalNum-ber a=15 u=25 6=25 a=15 a=20 u=25

1 100 201(100, 1) 202(99, 3) 101(1) 101(1) 271(89, 83, 2)2 100 100 100 100 100 101(1)3 100 100 100 100 99(1) 1004 100 100 99 100 97(1) 93(1)5 100 99 99(2) 100 93 886 99 97 97 97 80 877 85 96 94 95 78 62(2)8 81 94(2) 88 71 65 62(8)9 79(1) 81 78(1) 71 62 4510 66 19 20 69 60 3811 21 8 12 29 38 1312 21 3 8 29 26 13(1)13 19 2 3 22 24 1014 15 7 18 915 13 4 12 516 1 3 10 317 2 1018 519 420 4t

* n(nl, n2, n3) indicates that there is a total ofn points in the correspond-ing group of which n1, n2, and n3 are from different categories.

t In this case 5 additional groups of 4, 2, 2, 2, 2 points, respectively,were generated.

TABLE II*

Data 6 2 Jv) E. JEt CPUDeJ(vrpercent) (percent) (minute)

15 4000 92 0.1 9.1 3.40Spherical 20 2500 17 10.3 11.6 5.29

25 4000 18 10.5 12.8 5.10

15 3500 65 0.1 9.7 3.33Ellipsoidal 20 4000 31 0.3 16.8 5.59

25 4500 15 18.7 23.9 5.04

*f(vi) indicates the maximal grade ofmembership in the correspondingtest. CPU is the number of minutes required to cluster the data on anIBM 360/75 and includes the time needed to generate the data set.

cause the hyperellipsoids which indicate the hypersurfacesof equal probability density have different shapes and ori-entations (see [7]). The reference partition that we haveused is the partition into the original ten categories of 100points each. It is appreciated that this partition cannot beachieved by any clustering technique because ofoverlappingamong the categories, in particular for the case of a= 25.Two types oferrors have been used to grade the partitions.

1) Em, the mixing error, defines the error caused by someof the points of category i being assigned to categoryj,i #1; it is therefore a result of the possible overlappingamong the categories or the linking of several cate-gories.

2) Et, the total error, consists of Em plus the error pro-duced by the generation of small clusters not in theoriginal set of ten. These small clusters are the resultof the fact that a finite sample from a Gaussian dis-tribution can be made up of several modes.

589

Page 8: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

TABLE III

Group EllipsoidalNumber =15 3500

1 101(1) 100 158(58)2 100 100 1003 100 100 1004 100 100 1005 100 100 1006 97 96 1007 95 95 1008 71 92 1009 71 57 6310 69 48 42

11 29 38 1612 29 20 1 113 22 11 1014 7 815 4 716 3 517 2 518 519 520 421 J 4

TABLE IV

f(v1) Em (percent) E, (percent) CPU

Ellipsoidal 65 0.1 9.7 3.33a=15, T2=3500 70 0 11.2 3.29

70 5.8 9.5 3.33

From Table I, we can see that nine to ten major categoriesas well as a number of small clusters were generated in eachtest. These clearly indicate that the samples of some of thecategories are in fact multimodal. The experiments showthat there is a small amount of overlapping among some ofthe categories. The major mixing error can be attributed tothe fact that the algorithm did not detect a local maximumin the neighborhood of the prototype vector for some of thecategories. This can be seen in Table I by the entries in thefirst row, columns 2, 3, and 6, where 2, 2, and 3 categories,respectively, have been linked together. The reason for thisseems to be that the sample was not large enough. This issupported by the low values off(v1) in Table II where forthe above tests the entries are 17, 18, and 15, respectively.We believe that a better choice for T could have eliminatedthis mixing for the spherical data set with a= 20, althoughit is doubtful that this could be achieved for the data setswith v= 25, given the sample size. On the other hand, it isreasonable to assume that the problem could be eliminatedusing larger data sets.A total of 25 experiments (3 to 5 per data set) have been

performed and the best results are included in the tablesThe threshold T was varied coarsely over a wide range andno fine adjustments were made in order to improve the re-

sults. The minimum value of T is constrained by the resolu-

tion, while the maximum is constrained by the possible

linkage of several categories; that is, if T is very large, thena point which is not in a cluster at all, but in a space amongseveral categories, might have the largest grade of member-ship. However, even in this case, the point will usually notbecome a local maximum, since the condition for having asymmetric fuzzy set in its neighborhood will not be satisfied.As a guide, a small T is preferred when no previous knowl-edge of the data set is available.The required computing time lay between three and six

minutes on an IBM 360/75 computer and this depended onthe discrimination in the values off. If it is such that manypoints have the same grade of membership, then procedureF requires more computer time (see Section V). The valuef(v1) in Table II gives some indication as to the discrimina-tion achieved; comparing the entries in this column withthe corresponding ones in the CPU column gives some sup-port to the above statement. This factor could be eliminatedby, for example, using an additional measure for discrim-inating among the points which have the same grade ofmembership, or possibly by using an underlying model toevaluate the grade ofmembership and so yield a continuousvariation in f The computer program used the process ofassigning a point at a time in procedure S. All of the com-puting time in the latter which includes N(N- 1) computa-tions of distance and the search for the minimum distancefor every point, could be saved by applying Theorem 3. Itis estimated that this would result in an approximate 25percent reduction in computational time.From the results of the second series of experiments (see

Tables III and IV) we can see that the partitions generatedwith data sets obtained for different initial conditions of therandom number generator, are similar. In one of these ex-periments, a local maximum in the neighborhood of one ofthe categories was not detected, thus linking 58 points ofthis category with another. The difference in the error ratesis within 1.7 percent.

Generally speaking, the results are quite encouraging. Inthe two series of experiments, 5 out of 80 local maxima inthe neighborhood of the prototype vectors were not de-tected. This problem could be eliminated if the size of thesample were increased. In particular, the fact that the errorrates for the ellipsoidal data are comparable with those forthe spherical sets, indicates that the shape of distribution ofthe points was not a major factor in causing the error. Thissupports our claim that the algorithm is capable ofdetectingcategories with general distributions of points.

VII. THE EXTENSION OF THE ALGORITHM TOVERY LARGE DATA SETS

The computers available now are generally not capableof clustering very large data sets (say, greater than 30 000points in a many dimensional space) because of both mem-ory space and computing time limitations. We propose twoways in which such sets could be treated to derive partitionswhich are very similar (ifnot identical) to the ones discussedin the previous sections. These have not yet been testedexperimentally.

590

Page 9: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

Threshold FilteringIn this process we reduce the sample size before applying

procedures F and S. A small threshold T1 is employed forfiltering purposes while a large value T2 (equivalent to T inthe previous section) is used to evaluate the final grade ofmembership.

The first point, say x1, is introduced. Then all the otherpoints are introduced sequentially and the distance from x1to every point is measured. If d(x1, xi) < T1, then the gradeof membership of x1 is increased by 1; the correspondingpoint xi is assignedfinally into the group into which x1 willlater be assigned. Thus xi is not considered further in theapplication of procedures F and S. On the other hand, ifd(x1, xi)> T1, then xi will again be introduced until everypoint has been assigned. When this process of filtering isterminated, there remains a smaller set of points, x1,X2,.* *, XN with the temporary grades of membershipnl, n2,. ,nN, where

N

Z ni = the number of points in the original data set.i= 1

Now the usual discrimination procedure is employed; forexample, to evaluate f(xi), if

S ' {xld(xi, x) < T2} = {xi, Xi, Xm},

then set

f(xi) = ni + ni +, nm.

IfN is of such a size that can be handled by the availablecomputer, then the algorithm can be employed; if not, afurther filtering stage can be imposed in the same manner.Although threshold filtering has been used before, it has aparticular significance here. This is because the pointswhich are filtered out contribute to the partition of the en-tire set since they are represented in the grade of member-ship of the points which are included for clustering.

Truncating the Sequence AIt can be observed that the major memory space limita-

tions are governed by the requirements of part 1 in proce-dure F. By truncating the sequence A, part 1 of procedure Fcan be applied sequentially to the truncated parts. Once thesample has been partitioned into symmetric fuzzy subsets,then part 2 of procedure F and procedure S can be appliedto the entire set.

First the sequence A is generated in the usual way. Thenit is truncated at several points according to the desired sam-ple size, and the truncated parts can then be introduced se-quentially in order to generate the symmetric fuzzy subsets.An example of the truncation process when X= E' is givenin Fig. 7. Here the sequenceA is truncated at a point xl whereF(xi) =fJ and at xi, where f(x11) =Jji. This operation resultsin the partition (division) of the domain off into three dis-joint domains where each of the latter may be a union ofseveral disjoint subdomains. Every subset which is pro-duced includes sample points in only one of the above do-

f

x xi x

(a)

fd

I (b)

__El_(c) x

~N Kx

(d)Fig. 7. The truncation process. (a) The characteristic function f, truncated

at f1 and f11. (b), (c), (d). The resultant characteristic functions fb, f,and fd.

mains and the grade ofmembership ofthe points is as in theoriginal functionf The entire domain and the three domainsresulting from the truncation are shown in Fig. 7(a), (b),(c), and (d), respectively. It can be seen that a local maxi-mum may sometimes not be detected if the truncation isdone immediately after a local maximum point.

VIII. CONCLUSIONS AND REMARKSAn algorithm is presented which partitions a given sample

from a multimodal fuzzy set into unimodal fuzzy sets. It isproven that if certain conditions are satisfied, then the algo-rithm derives the optimal partition in the sense treated. Thispartition is also optimal in the sense of maximum separa-tion [21]. The use ofthis algorithm as a clustering techniquewas also demonstrated. Experiments on artificially gen-erated data sets were performed and both the results anderrors were discussed in detail.The algorithm can also be applied effectively in super-

vised pattern recognition, in particular when the categoriesare multimodal and this information is not known. Suchexperiments have been reported in [7], [101, [18]. We canuse this algorithm to first partition every category inde-pendently into unimodal fuzzy sets. In this case we asso-ciate with every point xi the distance membership function

fi= min [d(xi, xj)]Xje(S-Ci)

where Ci is the set of points in the category in which xi is amember.

591

Page 10: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

It is suggested that the clustering algorithm reported inthis paper possesses three advantages over the ones dis-cussed in the literature.

1) It does not require a great amount of fast core memoryand therefore can be applied to large data sets. Thestorage requirement is (20N+ CN+ S)10 bytes, whereN is the number of points to be partitioned, 20N andCN are required for the fixed portion of the programand the variable length data sequences (A, Ai), respec-tively, and S is the number of storage locations re-quired for the given set of data points. Obviously, Sdepends on the particular resolution of the magnitudeof the components of the data vectors.

2) The amount of computing time is relatively small.3) The shape of the distribution of the points in a group

(category) can be quite general because of the distribu-tions that the unimodal fuzzy sets include. This can bean advantage, especially in practical problems inwhich the categories are not distributed in "round"clusters.

APPENDIX IProofof Theorem ILemma: The sets Si are disjoint symmetric fuzzy sets.

Proof: Let Aii define a subsequence of A; of the pointsthat have been assigned into group i, arranged in the orderthat they stand in A. Let Ai be the sequence of candidatepoints to be assigned into group i. Bearing in mind pro-cedure F, any two points xp and xq can be assigned to thesame set Si if and only if their order in Aii corresponds totheir order in Ai. Suppose their order does not correspond,that is,

Aiiad * xp, Xq ...

and

Ai = (.. Xq, * * * Xp * * .).Then xp in Aii must be assigned first. But xq precedes xp as acandidate to be assigned into group i; thus xq will preventxp from being assigned into Si since it is not replaced as acandidate point, unless it is assigned to Si. Thus if Ni is thenumber of points that have been assigned into group i, thenfor every n< Ni:

d(i, x)), d(pi, xj)} for j = 1, 2, , n-1,xj, S

and

f(xn) > f(Xr)

d(Iti, x.) < d(Iti,Xr)Jforr=n + 1, ,Ni, xreSi

which proves that S is symmetric.Disjointness is demonstrated by the same argument. Sup-

pose Xq is an interior point in Si and has been assigned into

" C depends on the number of categories that the given data setrepresents; in our experiments, C= 5 was found to be sufficient. Note thatthis estimate of the total memory space is correct for N up to 32 767.

groupj,]j i. Then there exists a subset Siic Si which satisfiesthe condition d(4i, xr)>> (pi, xq) for xreSii. Thus xq precedesall the points in Sii in the sequence Ai. Since xq is assignedto group j, j =# i, it will not be replaced as the candidate pointin group i, and thus will block all the points in Sii from beingassigned into group i.

Proof of Theorem 1The lemma implies that if ji is not a local maximum then

it must be on the boundary of Si. It remains to be shown thatif it is a local maximum of f, then it is an interior point.

If pi is a local maximum, then assumption 1 of Theorem 1implies that the subset Si is

Si = {xjfd(ui, xi) < q}, where n >E/2. (1)Now let xt be the sample point such that

Rt = d(ui,xt) = min [d(Ci,Xk)].Xk(SthSi)

Assumption 1 implies that R,. e.To show that the set F ={xld(x,, x) < R1} includes at least

one sample point in Si, we may consider the line segmentjoining x, and ji,, and the point xi, on this line such thatd(xin, pi)=e/2. Defining the set F={xld(xin, x)<e/2}, as-sumption 2 assures that F includes at least one samplepoint and (1) shows that this point is in Si.Proof of Theorem 2

Without loss of generality, let us assume that f has onlytwo local maxima. Let H be the optimal hypersurfaceseparating f into the two unimodal fuzzy sets, and S, andS2 the optimal partition of S.

Suppose that (n-1) points have already been assignedcorrectly, thus generating the sets S(n -1) c S1 and S(n 1) S

and that xneS1 is the point to be assigned next. It is suffi-cient to show that there exists a sample point x eS(n1such that

d(x0, x.) < min [d(xn, x)] --n1)XpOS2

Let

r1 = {xld(x,,, x) = oc/2}, f(u) = sup [f(x)]xer

and IF ={xjd(u, x) <. /2}. Clearly, f(u) .f(xn), since xn isnot a local maximum. In the limit when a- 0, f(xn) <(x) forevery xe- F,. Let xv be the point such that

d(x, xv) = min [d(Xn, Xj)].xjce(F;")

Assumption 1 implies that (IFnS) is not empty; thusd(x,, xv)< a.We have shown that there is a sample point xveS such

that

f(xv) > f(xn) and d(xn, xv) < 2.ocx-+O establishes the proof, since

min [d(x,, xp)] < min [d(x,, xi)]XpES2 X E >#(n- 1)

if and only if d(xn, H)< a.

592

Page 11: An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application ...

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

Proof of Theorem 3

In this proof we make use of the lemma to Theorem 1.

Note that in the proof of this lemma, none of the constraintsof Theorem 1 were applied; thus the sets Si generated bypart 1 of procedure F are always symmetric and disjointfuzzy sets.

Let us assume that f has only two maxima and let H bethe optimal hypersurface separating f into the two uni-modal fuzzy sets. It is sufficient to show that if Si is a setgenerated by the above procedure, then it is on one side(either inside or outside) of H. Then the application ofTheorem 2 will complete the proof.

Suppose that Si includes points on both sides of H, saythe sets Si, and Si2(Si1uSi2=Si) and suppose that pi (themode of Si) is in Si,. Let x2 be the point such that

d(pi,x2) = min [d(pi,xj)].XjG5i2

Let L be the line segment joining x2 and pi, and Xk be the

point at which L intersects H (suppose that there is one

point of intersection; if not, let Xk be the point of intersec-

tion with the lowest value f). Define the following sets:

Fk= {xld(Xk, X) (}

and

Sk = S n Fk

Condition 1 (see Theorem 2) implies that Sk is not empty.Now if c--+O and xr is any point in Sk, then

f(Xr) < f(x2)

and

d(pi, Xr) < d(pi, X2),

which implies that Si is not symmetric. This contradicts the

above assumption. An application of Theorem 2 completesthe proof since if S, and S2 is the optimal partition, then

S, is on one side of H and S2 is on its other side.

Proof of Theorem 4

Let S, and S2 denote the optimal partition of S. Supposethat (n-1) points have already been assigned correctly, thus

generating S(¶- 1) c S, and S(n - 1) c S2, and suppose xneS1 is

the next point to be assigned.Let xu be the point such that

d(xn, Xu) = min [d(xn, xj)].Xj E (S'(n - 1) ')tasun - 1 ))

Condition 1 implies that d(xn, xu) <c. Now xU and xn must

belong to S,, since if there are no sample points in 17, then

d(xn, xp)2co for every xp in S2.

APPENDIX II

The following are the ten prototype vectors, given bytheir integer components in the ten-dimensional space,which were used to generate the data sets for the experi-

ments discussed in Section VI. The order of the vectorsbears no relation to the group numbers in Tables I and III.

The vectors are the first ten of the eighty prototype vectorsgiven in [8].

VI v2 V3 V4 <5 V_6 V7 V8 V9 V o

-77 -57 -791 -27 47 I 29 3 99 43-67 -57 -13 59 89 13 -59 - 3 51 -35-131 19 69 -55 -19 -37 69 35 25 1127 -65 - 11 25 - 5 -43 65 -43 27 5

-63 83 65 -27 47 45 -25 51 21 -6551 53 33 -33 -75 -71 - 17 -23 29 -73

-87 67 1 1 -47 -93 -87 41 21 3 -97-73 -69 67j 33 -49 -21 -65 5 23 15

53 73 53 31 - 71 - 37 - 37 87 59 -4149 57 - I 67 -71 -91 -65 -17 43 -85

REFERENCES[1] G. H. Ball, "Data analysis in the social sciences: What about the

details?" 1965 Fall Joint Computer Conf. AFIPS Proc., vol. 27, pt. 1.Washington, D. C.: Spartan, 1965, pp. 533-559.

[2] G. H. Ball and D. J. Hall, "ISODATA, A novel method of data analysisand pattern classification," Stanford Research Institute, Menlo Park,Calif., April 1965.

[3] R. E. Bonner, "On some clustering techniques," IBM J. Res. alndDevelop., vol. 8, pp. 22-32, January 1964.

[4] R. G. Casey and G. Nagy, "An autonomous reading machine,"IEEE Trans. Computers, vol. C-17, pp. 492-503, May 1968; also IBMCorp., Yorktown Heights, N. Y., Research Rept. RC-1768, Febru-ary 1967.

[5] T. M. Cover and P. E. Hart, "Nearest neighbor pattern classifica-tion," IEEE Trans. Information Theory, vol. IT-13, pp. 21-27,January 1967.

[6] A. A. Dorofeyuk, "Teaching algorithms for a pattern recognitionmachine without a teacher based on the method of potential func-tions," Automation and Remote Control, vol. 27, pp. 1728-1737,December 1966.

[7] R. 0. Duda and H. Fossum, "Pattern classification by iterativelydetermined linear and piecewise linear discriminant functions," IEEETrans. Electronic Computers, vol. EC-15, pp. 220-232, April 1966.

[8] , "Computer-generated data for pattern recognition experi-ments," available from C. A. Rosen, Stanford Research Institute,Menlo Park, Calif., 1966.

[9] W. D. Fisher, "On grouping for maximum homogenity," Amer. Stat.Assoc. J., vol. 53, pp. 789-798. 1958.

[10] 0. Firschen and M. Fischler, "Automatic subclass determination forpattern-recognition applications," IEEE Trans. Electronic Computers(Correspondence), vol. EC-12, pp. 137-141, April 1963.

[11] E. W. Forgy, "Detecting natural clusters of individuals," presentedat the 1964 Western Psych. Assoc. Meeting, Santa Monica, Calif.,September 1964.

[12] J. A. Gengerelli, "A method for detecting subgroups in a populationand specifying their membership," J. Psych., vol. 55, pp. 457-468,1963.

[13] T. Kaminuma, T. Takekawa, and S. Watanabe, "Reduction ofclustering problem to pattern recognition," Pattern Recognition, vol.1, pp. 195-205, 1969.

[14] J. MacQueen, "Some methods for classification and analysis ofmultivariate observations," Proc. 5th Berkeley Symp. on Math.Statist. and Prob. Berkeley, Calif.: University of California Press,1967, pp. 281- 297.

[15] R. L. Mattson and J. E. Dammann, "A technique for determiningand coding subclasses in pattern recognition problems," IBM J. Res.and Develop., vol. 9, pp. 294-302, July 1965.

[16] G. Nagy, "State of the art in pattern recognition," Proc. IEEE, vol.56, pp. 836-862, May 1968.

[17] D. J. Rogers and T. T. Tanimoto, "A computer program for classify-ing plants," Science, vol. 132, pp. 115-118, October 1960.

[18] C. A. Rosen and D. J. Hall, A pattern recognition experiment withnear-optimum results," IEEE Trans. Electronic Computers (Cor-respondence), vol. EC-15, pp. 666-667, August 1966.

[19] J. Rubin, "Optimal classification into groups: An approach for solv-ing the taxonomy problem," IBM Rept. 320-2915, December 1966.

[20] J. H. Ward, ";Hierarchical grouping to optimize an objective func-tion," Amer. Stat. Assoc. J., vol. 58, pp. 236-244, 1963.

[21 ] L. A. Zadeh, "Fuzzy sets," Information and Control, vol. 8, pp. 338-353, 1965.

593