Fuzzy c- means cluster analysis, a robust multivariate ...
Transcript of Fuzzy c- means cluster analysis, a robust multivariate ...
Journal of Geology and Mining Research Vol. 3(1), pp. 1-6, January 2011 Available online http://www.academicjournals.org/jgmr ISSN 2006–9766 ©2011 Academic Journals Full Length Research Paper
Fuzzy c- means cluster analysis, a robust multivariate technique in stream sediment geochemical exploration,
a case study in Eastern part of Iran, Birjand
Mohammad Shiva1, Ahmad Aryafar1* and Soheil Zaremotlagh2
1Department of Mining, Faculty of Engineering, Birjand University, P. O. Box: 97175-376 Birjand, Iran.
2Department of Mining, Faculty of Engineering, Sistan and Baluchestan University, Zahedan, Iran.
Accepted 29 December, 2010
Fuzzy c-means cluster analysis (FCMC) applied to the results of a geochemical exploration project where 175 stream sediment samples were analyzed for 20 elements, using X-ray Fluorescence Spectrometry (XRFS). The FCMC technique allowed to separate the dominant associations of lithologies and specified the background concentration for all lithotypes. It was found that the FCMC is a power tool for elimination of syngenetic component from geochemical data. Using the residual components of the geochemical signals, the anomalous locations were detected across the study area. Key words: Fuzzy logic, geochemical exploration, stream sediment.
INTRODUCTION The complexity of the stream sediment compositions is a major problem in delineating the source of the elemental enrichment particularly where the catchment area include various lithological units (Aryafar, 2004). Some attempts have been made by the researchers such as Singh et al. (1997), Selinus and Esbenson (1995), Ranitisch (2000) and Ziaii et al. (2009) using unaffected samples by anthro-pogenic processes, statistical techniques, and separation of anthropogenic from natural anomalies respectively.
In this work which is a part of a national project conducted by the Geological Survey of Iran, the FCMC helped to produce the data matrix of background concentrations and subsequently the generation of the residual matrix of data. The aim of this paper is to indicate the usefulness of fuzzy clustering, as a robust technique, to quantify lithological background concen-tration and to detect the anomalous locations in stream sediment geochemical exploration in KHUSF1:50000 sheet. The geographical map of the study area has been shown in Figure 1. The distribution map of samples throughout of the study has been given in Figure 2. *Corresponding author. E-mail: [email protected].
Principles of the FCMC Procedure Number of clusters The complexity and variation of lithological units dictates the number of clusters that should be involved. Since in the study area particularly in the north and west part of the area, there are plenty of distinct lithologies, 8 predetermined numbers of clusters were assigned to the data sets comprising 175 samples and 20 elements (variables). K-means clustering was initially used for 8 hard clusters, as a prototype, to consider a general view of the number of samples which are allocated to the clusters. This is an empirical approach; prior to the selection of 8 clusters, 7, 9 and 10 number of clusters was tested. Table 1 indicates the results of the k-means clustering.
It may be worthy to note that the determination of the number of clusters may be tested using the partition coefficient (F) and the classification entropy (H), extracted from the following relations [Bezdek et al., 1984].
H = [ ]��
==
c
ikiki
N
k
N11
/)log(. µµ , 0 � H � log (c) (1)
2 J. Geol. Min. Res.
� Figure 1. The Geographical map of the study area.
1
2 34 5
67
89 10111213
141516 17
181920 21 22
23
24
2526
27
2829 30
31
323334
35
363738
3940
4142 4344 4546
47
4849
505152
53
54
5556
57 5859 60
61
62
6364
65 6667
68
69
7071
727374
75 76
7778 7980
81
82
83
8485 8687 8889 90
91 92 93949596
979899100101102
103104105 106107
108109110111
112
113114115
116117
118119120
121122123
124125126
127128129130
131 132
133134
135
136137
138139140141
142143144
145146147
148
149
150151152153154
155156157158
159160
161 162
163164
165
166167
168
169
170171172
173
174175
665000 670000 675000 680000 685000
Easting
3630000
3635000
3640000
3645000
3650000
Nor
thin
g
� Figure 2. Location map of samples in study area.
Table 1. Distribution of samples in the 8 clusters.
Number of samples assigned Clusters 5 3 22 2 17 47 69 10
1 2 3 4 5 6 7 8
175 0
Valid Missing
F=[ ]��
= =
N
k
c
iki N
1 1
2 /)(µ , 1/c � F � 1 (2)
Where, kiµ is the membership value of sample k to the cluster i, N is the total number of the samples and C is the concentration value. The initial best fit of cluster numbers may offer, among some empirically given number of clusters, the selection of that number of cluster which includes the highest value for F and the lowest value for H, but the geological knowledge of the area may change this selection. However, three criteria such as, a). Calculation of the partition coefficient and the entropy, b). Consideration of the variety of the lithological units of the area, c). Using the k- means clustering to observe the distribution of the samples in the clusters, will help to allocate the best fit of the number of clusters to a specific data set. The principles of the algorithm used in this study are described by Bezdek et al. (1984) and Kramar (1995). The data set is portioned by the Fuzzy c- means clustering method in which the variables (elements) are allocated to the predetermined number of clusters. The matrix which is resulted from the initial operation is an 8 × 20 matrix comprising 8 rows (clusters) and 20 columns (elements) as shown in Table 2 (The matrix is indicated as transposed for suitable fitting in the page).
The calculation of the cluster centers, Cij, is extracted from the following relation:
Cij = �
�
=
=n
k
qik
n
kkj
qik X
1
1
)(
.)(
µ
µ
(3)
where, ikµ is the membership value of the sample k to the cluster i, and q is the degree of fuzziness, and Xkj is the value of the variable j for the sample k, using these center values as prototypes, the membership values, µ
Shiva et al 3 of the samples to the clusters are calculated. This operation generates a 175 × 8 matrix (in this study). The membership values, µ , are obtained from the following relation :
�=
−−
−−
=c
j
qkj
qik
d
d
1
1/12
1/12
)(
)(µ
(4)
Where, ikd is the distance of sample k to the center of cluster i , and q is the degree of fuzziness. Based on these membership values, the new cluster centers are calculated again. This procedure is iterative calculations which continue until the cluster remains stable. Because the allocation of the elements to the clusters is based on the minimum distance, the degree of fuzziness, q, in the above relation must minimize the assignment variance. Thus, the stability of clusters occurs when the minimum variance is obtained. The degree of fuzziness q, may vary from 1 to ∞ ; for a fuzzy coefficient of q = 1, hard clusters are obtained and when it approaches to ∞ , all samples contribute equally to all clusters (Vriend et al., 1988). Hence the most adequate fuzzy coefficient is the amount which minimizes the variances in the clusters. This has been achieved empirically by Vriend et al., 1988 between 1.3 and 3. In this study the fuzzy coefficient of q = 1.5 gives the best results. In this study the multiplication of the two matrices, [175 × 8 ] and [ 8 × 20 ] , gives an matrix of [ 175 × 20 ]. The FCMC calculates the matrix of [175 × 20] which in fact resembles the background con-centration of every element in the sample location. This means that for each sample and element the matrix of new data has been calculated based on the weighted means (the weights are the membership values). This calculated matrix is somehow different from the measured matrix of data. By subtracting these two matrices, the residual matrix is obtained.
The residual proportion of the measured concentration is interpreted as the result of contamination or epigenetic mineralization. The residual matrix is calculated from the following equation:
�=
−=c
iijikkjkj CX
1
.µδ (5)
Where, kjX are the measured values of the variables
(elements) and � C.µ is the calculated values of the
variables (elements). Since the effect of syngenetic component has been omitted within the calculations, the epigenetic component which may be interpreted as real anomalies could be mapped across the study area. The targets area for Cr, Zn, As and Cu have been indicated in Figure 3.
In Figure 4, the Fuzzy c-means Clustering Algorithm
4 J. Geol. Min. Res. Table 2. The element concentration of the cluster centers, Minor elements are expressed in ppm.
Clusters elements 1 2 3 4 5 6 7 8 Zn 57 54.3 73.4 44.8 60.3 58.7 58 43 Pb 39 45.1 49.1 49.3 53.3 87.5 45.3 52.4 Ag 0.19 0.14 0.18 0.09 0.19 0.18 0.18 0.05 Cr 8 26.4 24.3 37.6 12.5 18.3 13.5 60.3 Ni 62.9 62.9 90.2 54.1 74.2 67.3 68.4 53.5 Bi 0.3 0.3 0.45 0.25 0.39 0.62 0.34 0.23 Sc 26.7 32.7 30.6 36.4 27.8 29.7 28.4 42.9 Cu 43.9 41.5 58.8 34.6 48 45 45.2 33.1 As 5.53 6.06 8.66 6.08 7.71 13.4 6.56 6.65 Sb 6.6 6.81 6.74 6.93 6.58 6.71 6.67 7.09 Cd 0.63 0.74 1 0.79 0.93 1.54 0.77 0.9 Co 8.71 15.1 14.3 17.3 10.1 12.2 10.8 23.5 Sn 3.28 4.29 3.39 5.03 3.36 3.62 3.51 5.97 Ba 719 712 678 716 698 581 709 712 V 145 196 181 232 154 170 159 295 Sr 173 181 213 186 208 276 188 196 Hg 0.02 0.02 0.03 0.02 0.02 0.03 0.02 0.02 Fe2O3 % 3.37 3.18 4.7 2.56 3.7 3.46 3.47 2.46 MnO % 0.09 0.09 0.1 0.09 0.09 0.09 0.09 0.1 TiO2 % 2.63 4.15 2.72 5.44 2.63 3.32 2.92 7.03
-20
-15
-10
-5
0
5
10
15
20
25
C r
-20-15-10-50510152025303540455055
Z n
(a) (b)
Shiva et al 5
-7
-5
-3
-1
1
3
5
7
9
11
13
A s
-15-10-50510152025303540
C u
(c) (d)
Figure 3. The residual values (anomalies) for the elements Cr, Zn, As, Cu.
Data Matrix
Matlab Software
Fuzzy
FCM Function
Matrix of Membership Values, [a]
Matrix of Cluster Centers, [b]
Multiplication of Matrices, [a], [b]
Matrix of Background Values, [c]
Subtraction, Measured Matrix and Matrix [c]
Matrix of Residual
Figure 4. Fuzzy c-means Clustering Algorithm
6 J. Geol. Min. Res. has been illustrated. DISCUSSION AND CONCLUSIONS The high cluster memberships which are mapped as indicated in Figure 2 together with their relevant elements are both in coincidence with the anomaly locations in residual maps for a definite element. The anomalous location of the element Cr is indicated in the west part of the study area meanwhile the high tenors of the element Cr is clearly seen in Figure 3c. It should be stated that the rest of the anomaly points which are seen in the cluster map are related to the other enriched elements of that cluster. The good agreement of the high values of the cluster memberships with the known geological units and the known mineralization (if there is) is an advantage of using this technique. Fuzzy c-means clustering provided an effective way for the specification of the background concentrations in stream sediment geochemical explora-tion. Using this technique, the fluctuations of the element contents in stream sediments which normally lead to misinterpretations are avoided. ACKNOWLEDGMENTS The author would like to express his appreciations to the authorities of the Geological Survey of Iran who provided
the facilities of mutual relations between GSI and the University of Birjand. The financial support of the Geological Survey of Iran is gratefully thanked. REFERENCES Aryafar A (2004). The analysis of the geochemical data in order to
recognize the promising area in Khusf 1:50000 sheets, MSc Thesis, the University of Shahrood, Shahrood, Iran.
Bezdek JC, Ehrlich RR, Full W (1984). FCM: the fuzzy c- means clustering algorithm. Comput. Geosci., 10: 191-203.
Kramar U (1995). Application of limited fuzzy clusters to anomaly recognition in complex geological environments. J. Geochem. Explor., 55: 81-92.
Ranitisch G (2000). Application of fuzzy clusters to quantify lithological background concentrations in stream-sediment geochemistry. J. Geochem. Explor., 71: 73-82.
Selinus OS, Esbensen K (1995). Separating anthropogenic from natural anomalies in environmental geochemistry. J. Geochem. Explor., 55: 55-66.
Singh M, Ansari AA, Muller G, Singh IB (1997). Heavy metal in freshly deposited sediments of the Gomati River (a tributary to the Ganga River): effects of human activities. Environ. Geol., 29: 246-252.
Vriend SP, Van Gaans PFM, Middelburg J, De Nijis A (1988). The application of fuzzy c-means cluster analysis and nonlinear mapping to geochemical datasets: examples from Portugal. Appl. Geochem., 3: 213-224.
Ziaii M, Pouyan AA, Ziaii M (2009). Neuro-fuzzy modeling in mining geochemistry: identification of geochemical anomalies, J. Geochem. Explor., 100: 25-36.