Fuzzy c- means cluster analysis, a robust multivariate ...

6
Journal of Geology and Mining Research Vol. 3(1), pp. 1-6, January 2011 Available online http://www.academicjournals.org/jgmr ISSN 2006–9766 ©2011 Academic Journals Full Length Research Paper Fuzzy c- means cluster analysis, a robust multivariate technique in stream sediment geochemical exploration, a case study in Eastern part of Iran, Birjand Mohammad Shiva 1 , Ahmad Aryafar 1 * and Soheil Zaremotlagh 2 1 Department of Mining, Faculty of Engineering, Birjand University, P. O. Box: 97175-376 Birjand, Iran. 2 Department of Mining, Faculty of Engineering, Sistan and Baluchestan University, Zahedan, Iran. Accepted 29 December, 2010 Fuzzy c-means cluster analysis (FCMC) applied to the results of a geochemical exploration project where 175 stream sediment samples were analyzed for 20 elements, using X-ray Fluorescence Spectrometry (XRFS). The FCMC technique allowed to separate the dominant associations of lithologies and specified the background concentration for all lithotypes. It was found that the FCMC is a power tool for elimination of syngenetic component from geochemical data. Using the residual components of the geochemical signals, the anomalous locations were detected across the study area. Key words: Fuzzy logic, geochemical exploration, stream sediment. INTRODUCTION The complexity of the stream sediment compositions is a major problem in delineating the source of the elemental enrichment particularly where the catchment area include various lithological units (Aryafar, 2004). Some attempts have been made by the researchers such as Singh et al. (1997), Selinus and Esbenson (1995), Ranitisch (2000) and Ziaii et al. (2009) using unaffected samples by anthro- pogenic processes, statistical techniques, and separation of anthropogenic from natural anomalies respectively. In this work which is a part of a national project conducted by the Geological Survey of Iran, the FCMC helped to produce the data matrix of background concentrations and subsequently the generation of the residual matrix of data. The aim of this paper is to indicate the usefulness of fuzzy clustering, as a robust technique, to quantify lithological background concen- tration and to detect the anomalous locations in stream sediment geochemical exploration in KHUSF1:50000 sheet. The geographical map of the study area has been shown in Figure 1. The distribution map of samples throughout of the study has been given in Figure 2. *Corresponding author. E-mail: [email protected]. Principles of the FCMC Procedure Number of clusters The complexity and variation of lithological units dictates the number of clusters that should be involved. Since in the study area particularly in the north and west part of the area, there are plenty of distinct lithologies, 8 predetermined numbers of clusters were assigned to the data sets comprising 175 samples and 20 elements (variables). K-means clustering was initially used for 8 hard clusters, as a prototype, to consider a general view of the number of samples which are allocated to the clusters. This is an empirical approach; prior to the selection of 8 clusters, 7, 9 and 10 number of clusters was tested. Table 1 indicates the results of the k-means clustering. It may be worthy to note that the determination of the number of clusters may be tested using the partition coefficient (F) and the classification entropy (H), extracted from the following relations [Bezdek et al., 1984]. H = [ ] = = c i ki ki N k N 1 1 / ) log( . μ μ , 0 H log (c) (1)

Transcript of Fuzzy c- means cluster analysis, a robust multivariate ...

Page 1: Fuzzy c- means cluster analysis, a robust multivariate ...

Journal of Geology and Mining Research Vol. 3(1), pp. 1-6, January 2011 Available online http://www.academicjournals.org/jgmr ISSN 2006–9766 ©2011 Academic Journals Full Length Research Paper

Fuzzy c- means cluster analysis, a robust multivariate technique in stream sediment geochemical exploration,

a case study in Eastern part of Iran, Birjand

Mohammad Shiva1, Ahmad Aryafar1* and Soheil Zaremotlagh2

1Department of Mining, Faculty of Engineering, Birjand University, P. O. Box: 97175-376 Birjand, Iran.

2Department of Mining, Faculty of Engineering, Sistan and Baluchestan University, Zahedan, Iran.

Accepted 29 December, 2010

Fuzzy c-means cluster analysis (FCMC) applied to the results of a geochemical exploration project where 175 stream sediment samples were analyzed for 20 elements, using X-ray Fluorescence Spectrometry (XRFS). The FCMC technique allowed to separate the dominant associations of lithologies and specified the background concentration for all lithotypes. It was found that the FCMC is a power tool for elimination of syngenetic component from geochemical data. Using the residual components of the geochemical signals, the anomalous locations were detected across the study area. Key words: Fuzzy logic, geochemical exploration, stream sediment.

INTRODUCTION The complexity of the stream sediment compositions is a major problem in delineating the source of the elemental enrichment particularly where the catchment area include various lithological units (Aryafar, 2004). Some attempts have been made by the researchers such as Singh et al. (1997), Selinus and Esbenson (1995), Ranitisch (2000) and Ziaii et al. (2009) using unaffected samples by anthro-pogenic processes, statistical techniques, and separation of anthropogenic from natural anomalies respectively.

In this work which is a part of a national project conducted by the Geological Survey of Iran, the FCMC helped to produce the data matrix of background concentrations and subsequently the generation of the residual matrix of data. The aim of this paper is to indicate the usefulness of fuzzy clustering, as a robust technique, to quantify lithological background concen-tration and to detect the anomalous locations in stream sediment geochemical exploration in KHUSF1:50000 sheet. The geographical map of the study area has been shown in Figure 1. The distribution map of samples throughout of the study has been given in Figure 2. *Corresponding author. E-mail: [email protected].

Principles of the FCMC Procedure Number of clusters The complexity and variation of lithological units dictates the number of clusters that should be involved. Since in the study area particularly in the north and west part of the area, there are plenty of distinct lithologies, 8 predetermined numbers of clusters were assigned to the data sets comprising 175 samples and 20 elements (variables). K-means clustering was initially used for 8 hard clusters, as a prototype, to consider a general view of the number of samples which are allocated to the clusters. This is an empirical approach; prior to the selection of 8 clusters, 7, 9 and 10 number of clusters was tested. Table 1 indicates the results of the k-means clustering.

It may be worthy to note that the determination of the number of clusters may be tested using the partition coefficient (F) and the classification entropy (H), extracted from the following relations [Bezdek et al., 1984].

H = [ ]��

==

c

ikiki

N

k

N11

/)log(. µµ , 0 � H � log (c) (1)

Page 2: Fuzzy c- means cluster analysis, a robust multivariate ...

2 J. Geol. Min. Res.

� Figure 1. The Geographical map of the study area.

1

2 34 5

67

89 10111213

141516 17

181920 21 22

23

24

2526

27

2829 30

31

323334

35

363738

3940

4142 4344 4546

47

4849

505152

53

54

5556

57 5859 60

61

62

6364

65 6667

68

69

7071

727374

75 76

7778 7980

81

82

83

8485 8687 8889 90

91 92 93949596

979899100101102

103104105 106107

108109110111

112

113114115

116117

118119120

121122123

124125126

127128129130

131 132

133134

135

136137

138139140141

142143144

145146147

148

149

150151152153154

155156157158

159160

161 162

163164

165

166167

168

169

170171172

173

174175

665000 670000 675000 680000 685000

Easting

3630000

3635000

3640000

3645000

3650000

Nor

thin

g

� Figure 2. Location map of samples in study area.

Page 3: Fuzzy c- means cluster analysis, a robust multivariate ...

Table 1. Distribution of samples in the 8 clusters.

Number of samples assigned Clusters 5 3 22 2 17 47 69 10

1 2 3 4 5 6 7 8

175 0

Valid Missing

F=[ ]��

= =

N

k

c

iki N

1 1

2 /)(µ , 1/c � F � 1 (2)

Where, kiµ is the membership value of sample k to the cluster i, N is the total number of the samples and C is the concentration value. The initial best fit of cluster numbers may offer, among some empirically given number of clusters, the selection of that number of cluster which includes the highest value for F and the lowest value for H, but the geological knowledge of the area may change this selection. However, three criteria such as, a). Calculation of the partition coefficient and the entropy, b). Consideration of the variety of the lithological units of the area, c). Using the k- means clustering to observe the distribution of the samples in the clusters, will help to allocate the best fit of the number of clusters to a specific data set. The principles of the algorithm used in this study are described by Bezdek et al. (1984) and Kramar (1995). The data set is portioned by the Fuzzy c- means clustering method in which the variables (elements) are allocated to the predetermined number of clusters. The matrix which is resulted from the initial operation is an 8 × 20 matrix comprising 8 rows (clusters) and 20 columns (elements) as shown in Table 2 (The matrix is indicated as transposed for suitable fitting in the page).

The calculation of the cluster centers, Cij, is extracted from the following relation:

Cij = �

=

=n

k

qik

n

kkj

qik X

1

1

)(

.)(

µ

µ

(3)

where, ikµ is the membership value of the sample k to the cluster i, and q is the degree of fuzziness, and Xkj is the value of the variable j for the sample k, using these center values as prototypes, the membership values, µ

Shiva et al 3 of the samples to the clusters are calculated. This operation generates a 175 × 8 matrix (in this study). The membership values, µ , are obtained from the following relation :

�=

−−

−−

=c

j

qkj

qik

d

d

1

1/12

1/12

)(

)(µ

(4)

Where, ikd is the distance of sample k to the center of cluster i , and q is the degree of fuzziness. Based on these membership values, the new cluster centers are calculated again. This procedure is iterative calculations which continue until the cluster remains stable. Because the allocation of the elements to the clusters is based on the minimum distance, the degree of fuzziness, q, in the above relation must minimize the assignment variance. Thus, the stability of clusters occurs when the minimum variance is obtained. The degree of fuzziness q, may vary from 1 to ∞ ; for a fuzzy coefficient of q = 1, hard clusters are obtained and when it approaches to ∞ , all samples contribute equally to all clusters (Vriend et al., 1988). Hence the most adequate fuzzy coefficient is the amount which minimizes the variances in the clusters. This has been achieved empirically by Vriend et al., 1988 between 1.3 and 3. In this study the fuzzy coefficient of q = 1.5 gives the best results. In this study the multiplication of the two matrices, [175 × 8 ] and [ 8 × 20 ] , gives an matrix of [ 175 × 20 ]. The FCMC calculates the matrix of [175 × 20] which in fact resembles the background con-centration of every element in the sample location. This means that for each sample and element the matrix of new data has been calculated based on the weighted means (the weights are the membership values). This calculated matrix is somehow different from the measured matrix of data. By subtracting these two matrices, the residual matrix is obtained.

The residual proportion of the measured concentration is interpreted as the result of contamination or epigenetic mineralization. The residual matrix is calculated from the following equation:

�=

−=c

iijikkjkj CX

1

.µδ (5)

Where, kjX are the measured values of the variables

(elements) and � C.µ is the calculated values of the

variables (elements). Since the effect of syngenetic component has been omitted within the calculations, the epigenetic component which may be interpreted as real anomalies could be mapped across the study area. The targets area for Cr, Zn, As and Cu have been indicated in Figure 3.

In Figure 4, the Fuzzy c-means Clustering Algorithm

Page 4: Fuzzy c- means cluster analysis, a robust multivariate ...

4 J. Geol. Min. Res. Table 2. The element concentration of the cluster centers, Minor elements are expressed in ppm.

Clusters elements 1 2 3 4 5 6 7 8 Zn 57 54.3 73.4 44.8 60.3 58.7 58 43 Pb 39 45.1 49.1 49.3 53.3 87.5 45.3 52.4 Ag 0.19 0.14 0.18 0.09 0.19 0.18 0.18 0.05 Cr 8 26.4 24.3 37.6 12.5 18.3 13.5 60.3 Ni 62.9 62.9 90.2 54.1 74.2 67.3 68.4 53.5 Bi 0.3 0.3 0.45 0.25 0.39 0.62 0.34 0.23 Sc 26.7 32.7 30.6 36.4 27.8 29.7 28.4 42.9 Cu 43.9 41.5 58.8 34.6 48 45 45.2 33.1 As 5.53 6.06 8.66 6.08 7.71 13.4 6.56 6.65 Sb 6.6 6.81 6.74 6.93 6.58 6.71 6.67 7.09 Cd 0.63 0.74 1 0.79 0.93 1.54 0.77 0.9 Co 8.71 15.1 14.3 17.3 10.1 12.2 10.8 23.5 Sn 3.28 4.29 3.39 5.03 3.36 3.62 3.51 5.97 Ba 719 712 678 716 698 581 709 712 V 145 196 181 232 154 170 159 295 Sr 173 181 213 186 208 276 188 196 Hg 0.02 0.02 0.03 0.02 0.02 0.03 0.02 0.02 Fe2O3 % 3.37 3.18 4.7 2.56 3.7 3.46 3.47 2.46 MnO % 0.09 0.09 0.1 0.09 0.09 0.09 0.09 0.1 TiO2 % 2.63 4.15 2.72 5.44 2.63 3.32 2.92 7.03

-20

-15

-10

-5

0

5

10

15

20

25

C r

-20-15-10-50510152025303540455055

Z n

(a) (b)

Page 5: Fuzzy c- means cluster analysis, a robust multivariate ...

Shiva et al 5

-7

-5

-3

-1

1

3

5

7

9

11

13

A s

-15-10-50510152025303540

C u

(c) (d)

Figure 3. The residual values (anomalies) for the elements Cr, Zn, As, Cu.

Data Matrix

Matlab Software

Fuzzy

FCM Function

Matrix of Membership Values, [a]

Matrix of Cluster Centers, [b]

Multiplication of Matrices, [a], [b]

Matrix of Background Values, [c]

Subtraction, Measured Matrix and Matrix [c]

Matrix of Residual

Figure 4. Fuzzy c-means Clustering Algorithm

Page 6: Fuzzy c- means cluster analysis, a robust multivariate ...

6 J. Geol. Min. Res. has been illustrated. DISCUSSION AND CONCLUSIONS The high cluster memberships which are mapped as indicated in Figure 2 together with their relevant elements are both in coincidence with the anomaly locations in residual maps for a definite element. The anomalous location of the element Cr is indicated in the west part of the study area meanwhile the high tenors of the element Cr is clearly seen in Figure 3c. It should be stated that the rest of the anomaly points which are seen in the cluster map are related to the other enriched elements of that cluster. The good agreement of the high values of the cluster memberships with the known geological units and the known mineralization (if there is) is an advantage of using this technique. Fuzzy c-means clustering provided an effective way for the specification of the background concentrations in stream sediment geochemical explora-tion. Using this technique, the fluctuations of the element contents in stream sediments which normally lead to misinterpretations are avoided. ACKNOWLEDGMENTS The author would like to express his appreciations to the authorities of the Geological Survey of Iran who provided

the facilities of mutual relations between GSI and the University of Birjand. The financial support of the Geological Survey of Iran is gratefully thanked. REFERENCES Aryafar A (2004). The analysis of the geochemical data in order to

recognize the promising area in Khusf 1:50000 sheets, MSc Thesis, the University of Shahrood, Shahrood, Iran.

Bezdek JC, Ehrlich RR, Full W (1984). FCM: the fuzzy c- means clustering algorithm. Comput. Geosci., 10: 191-203.

Kramar U (1995). Application of limited fuzzy clusters to anomaly recognition in complex geological environments. J. Geochem. Explor., 55: 81-92.

Ranitisch G (2000). Application of fuzzy clusters to quantify lithological background concentrations in stream-sediment geochemistry. J. Geochem. Explor., 71: 73-82.

Selinus OS, Esbensen K (1995). Separating anthropogenic from natural anomalies in environmental geochemistry. J. Geochem. Explor., 55: 55-66.

Singh M, Ansari AA, Muller G, Singh IB (1997). Heavy metal in freshly deposited sediments of the Gomati River (a tributary to the Ganga River): effects of human activities. Environ. Geol., 29: 246-252.

Vriend SP, Van Gaans PFM, Middelburg J, De Nijis A (1988). The application of fuzzy c-means cluster analysis and nonlinear mapping to geochemical datasets: examples from Portugal. Appl. Geochem., 3: 213-224.

Ziaii M, Pouyan AA, Ziaii M (2009). Neuro-fuzzy modeling in mining geochemistry: identification of geochemical anomalies, J. Geochem. Explor., 100: 25-36.