Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe...

23
Mean shift-based clustering of remotely sensed data 1 LIOR FRIEDMAN†, NATHAN S. NETANYAHU 2 †3 and MAXIM SHOSHANI‡ † Dept. of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel ‡ Dept. of Civil and Environment Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel 3 Center for Automation Research, University of Maryland at College Park, MD 20742, USA The mean shift algorithm (MSA) is a statistical approach to the clustering problem. The method is a variant of density estimation. We present in this thesis the approach and its use for clustering of remotely sensed images. We explain how the approach can be applied to this type of data and show some experimental results obtained from real data sets, which indicate that the MSA technique has a fairly good accuracy and high reliability. The adaptation of the procedure to a parallel environment is also presented and discussed 1. Introduction Unsupervised clustering plays a most significant role in numerous applications of image processing and remote sensing. For example, unsupervised clustering is often used to ultimately classify an area of interest to land cover categories. The approach is especially useful when reliable 1 A preliminarily version of this paper appeared in Proceedings of the IEEE International Conference on Geosciences and Remote Sensing (Friedman et al. 2003) 2 - 1 -

Transcript of Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe...

Page 1: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

Mean shift-based clustering of remotely sensed data1

LIOR FRIEDMAN†, NATHAN S. NETANYAHU2†3 and MAXIM SHOSHANI‡† Dept. of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel‡ Dept. of Civil and Environment Engineering, Technion, Israel Institute of

Technology, Haifa 32000, Israel3 Center for Automation Research, University of Maryland at College Park, MD

20742, USA

The mean shift algorithm (MSA) is a statistical approach to the clustering problem. The method is a variant of density estimation. We present in this thesis the approach and its use for clustering of remotely sensed images. We explain how the approach can be applied to this type of data and show some experimental results obtained from real data sets, which indicate that the MSA technique has a fairly good accuracy and high reliability. The adaptation of the procedure to a parallel environment is also presented and discussed

1. Introduction

Unsupervised clustering plays a most significant role in numerous applications of image processing and remote sensing. For example, unsupervised clustering is often used to ultimately classify an area of interest to land cover categories. The approach is especially useful when reliable training data are either scarce or expensive, and when relatively little known information about the data is available. Thus, unsupervised clustering serves as a fundamental building block in the pursuit of unsupervised classification (Anil et al. 2000).

An approach based on the principle of mean shift (MS) (Backer 1995) has been pursued in recent years for image segmentation and clustering (Bezdek 1981, Castleman 1996, Cihlar et al. 2000). The principle comprises, essentially, a variant of statistical density estimation. Typically, an MS procedure (which operates in feature space) examines - for each data point - the "center of mass" of its local neighborhood and then shifts the point in the general direction of that center of mass.  This process is repeated iteratively until every point converges to its cluster center.

Given the special characteristics of MS-based clustering (see below) and its enhanced performance on colored images (Castleman 1996 ), we believe that it is of interest to the remote sensing community to investigate the method's applicability also to large, multi-spectral data sets. In general, an MS-based approach has the following characteristics:

1 A preliminarily version of this paper appeared in Proceedings of the IEEE International Conference on Geosciences and Remote Sensing (Friedman et al. 2003)2

- 1 -

Page 2: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

1. It does not assume a specific type of data distribution, unlike many standard clustering approaches which assume, e.g., Gaussian distribution. Instead, it takes the most general approach, whereby the immediate region of each point is examined and a corresponding cluster is estimated.

2. Unlike ISOCLUS (or k-means), it does not require a pre-specified number of clusters.

3. In general, it is fully deterministic (assuming that its specific implementation does not depend on selecting initially a subset of data points).  This makes it easy to understand and analyze.

4. It is highly parallelizable, which should prove especially valuable and efficient in remote sensing applications.  (Parallelization can be realized, e.g., by applying the MS procedure in parallel to disjoint feature subsets and combining the individual clusters obtained.)

Many standard clustering approaches make certain assumptions as to the data distribution. The most common is that of a Gaussian distribution. The mean shift procedure does not make any such assumptions. Instead, it takes the most general approach to the problem, by examining for each point its neighboring region, and estimating which cluster the point belongs to according to its local density.

The mean shift algorithm is a deterministic process. Although some variants of the process depend on an initial choice of points from the starting data set, the main approach by which we treat the entire data set is deterministic and is, therefore, easier to understand and analyze.

In addition, the algorithm is highly parallel, which is a most important feature. This attribute is crucial in remote sensing applications. The aim of remote sensing applications is to analyze satellite-acquired images of very large regions. Typically these images are multispectral and cover a large area resulting in large data sets.

Recent development of hardware has reduced significantly the cost of parallel systems. The penetration of local area networks into the office environment also makes such systems more pervasive. Overall, the growing availability of CPU power for distributed processing results in the ability nowadays to process larger data sets more accurately, which as mentioned, is critical in remote sensing applications.

We have studied the mean shift algorithm, in the context of clustering. The main goal of this paper is to demonstrate the applicability of the mean shift approach to clustering of remotely sensed data. Furthermore, we demonstrate the use of parallelism to reduce the amount of running time of the procedure, thereby demonstrating the benefits of local area networks in the computation of demanding image processing applications.

The rest of the paper is organized as follows: Section 2 contains background of the mean shift procedure. In Section 3 we present the steps taken to demonstrate the applicability of mean shift in remote sensing, and the steps taken for parallel implementation. In Section 4 we show the results achieved by the work described and in Section 5 we present our conclusions.

2. Background of the mean shift algorithm

The idea of the mean shift algorithm was first suggested by Fukunaga et al. (1975) for use in cluster analysis (Silverman 1986, Catleman 1996). It is categorized within the statistical approaches for clustering or more specifically, as a density estimation approach. Simply put, the idea behind the algorithm is to estimate in the proximity of each point, the average density and shift the point in that general direction.

- 2 -

Page 3: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

Mean shift is a simple iterative procedure, where each data point is “shifted” toward the average of the data points in its neighborhood. The output of this algorithm is a mapping for each point in the original data set S into its appropriate cluster center. This information is denoted by the vector of means.

The following diagram taken from Comaniciu et al. (1999) depicts the trajectory of a single point during the mean shift process.

Figure 1: Trajectory of a point in a mean shift procedure.

Cheng (1995) generalized the original definition in three aspects:1. First he allows for the use of any kernel instead of the flat kernel. A kernel function is

defined by him as follows: Let X be a point in the n-dimensional Euclidean space, . Denote the i-th component of as . The norm of is a nonnegative

number such that . The inner product of x and y in X is

. A function is said to be a kernel if there exits a profile

, such that, and

k is non-negative,

k is non-increasing, i.e., if , and

- 3 -

Page 4: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

k is piecewise continuous and .

2. The second generalization is weighting the points in the computation of the mean sample. This weight function can be constant throughout the procedure or can change between iterations.

3. Finally he allowed the initial data set picked to be any subset of the original data.

This leads to the following generalization of the mean shift procedure:Let be a finite set (the “data”), let K be a kernel, and let be a weight

function. The sample mean at with kernel K is defined as:

(1)

Let be a finite set. The evolution of T in the form of iterations with is called the generalized mean shift algorithm.

The original mean shift version proposed by Fukunaga et al. (1975), initially assigns the subset T as S, it also used the computed means at the end of each iteration to replace S. This procedure is known as the “blurring” process. However, as Chang (1995) showed, T can be assigned to any subset of the data while S remains intact throughout the procedure. It will be shown later that this convention has great advantages in constructing a parallel implementation the mean shift. The pseudocode of the generalized version of the mean shift algorithm is provided below in Figure 2.

Although in the mean shift procedure each point is shifted towards the maximum average density of its neighborhood, the convergence of the process is not guaranteed. Indeed, each iteration only guarantees convergence in infinitesimal steps. Cheng et al. (1992) first proved the convergence of the process for the discrete case. This was also shown by Comanicio and Meer (1999). Cheng (1995) also addressed issues such as the rate of convergence, the number of clusters produced by the process, and the effect of the kernel size on the convergence rate.

Comaniciu et al. (2001) further studied the effect of the kernel function and the window size on the process. In Comaniciu et al. (1999) they suggested the use of the Epanechnikov kernel, which yields the minimum integrated square error (MISE) in filtering and segmentation of an image. This kernel is defined by

was used by them in applying the procedure to a so-called joint spatial range domain for segmentation. In Comaniciu et al. (2001) the advantages of using an adaptive window size were introduced. It was shown that for some applications it would be beneficial to allow the computation of the kernel to adapt for each evaluated point according to its local neighborhood.

- 4 -

Page 5: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

As can be seen from Chang's generalization and other related work (Meer 2005), before any attempt to implement a mean shift procedure one must consider the following parameters:

1. Specific algorithmic variant,

2. kernel function to be used, and

3. window size.

Initial dataset Kernel function Do { //------- a single shift iteration (computes )

for each {

//will hold the new sampled mean of t//will hold the weighted sum of the kernel

for each {

}

}//optional (done in the blurring variant)

} while //until convergence

Figure 2: Psedocode of the generalized mean shift algorithm.

The exact effect of these parameters on the results will be discussed in the following chapters. It suffices to say that different combinations of these parameters will result in different clustering outcomes.

One of the downsides of the mean shift procedure is its computational cost. In the naive approach, the new sampled means are computed for all the points in the subset at each iteration. In order to compute it for a single point, we go over all the points in the original data set and apply to them the kernel function. Applying the kernel function is proportional to the dimensionality of the data resulting in an overall complexity of , where:

n is the number of points in the entire dataset (S),

m the number of points chosen for the subset T,

- 5 -

Page 6: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

d the dimension of the feature space, and

N the number of iterations until convergence is achieved.

Several approaches for reducing the complexity of the algorithm were explored. Comaniciu and Meer (1998) combined the mean shift procedure with the k-nearest neighbor technique in order to reduce the running time by using a smaller number of points (m). They chose a relatively small, evenly distributed subset T and applied to it the mean shift procedure. The rest of the points in the data set were associated with the resulting clusters using a k-nearest neighbor approach.

A different approach was taken by Elgammal et al. (2003), who showed how the use of fast Gauss transform (FGT) can speed up the calculation of the Gaussian kernel function. The FGT is an important variant of the fast multipole method, which relies on the fact that the computation of the Gaussian is only required only up to a certain degree of accuracy. This approach was further generalized in higher dimensions by Yang et al. (2003) who showed that although direct extension of the FGT to higher dimensions is exponential, an improved version of this approach can still achieve linear computational complexity.

Another method for improving the computation of the sampled mean was introduced by Georgescu et al. (2003). They showed how the use of locality sensitive hashing (LSH) can reduce the time it takes to compute adaptive mean shift. They showed that by improving the time it takes to perform neighborhood queries (when evaluating the kernel function) a substantial speedup can be achieved.

Yang et al. (2003) also showed how the use of a quasi Newton method, specifically the Broyden, Feltcher, Goldfarb, and Shanno method (BFGS), can improve the convergence rate of of the mean shift procedure, thereby obtaining a faster procedure.

Another way to approach the computational needs of the mean shift procedure is by adapting it to run in parallel. As Cheng et al. (1995) pointed out, mean shift in its core is a parallel process. In this research we explored the applicability of such an approach for the mean shift procedure. We demonstrated that a parallel implementation for the mean shift procedure can achieve the expected linear speedup.

3. Our methodology

3.1 Application to remote sensing

We have applied the mean shift algorithm to remotely sensed data, and obtained some early promising results. In addition to a mere qualitative visual assessment of the results (which is common in performance evaluation of image segmentation), we have measured quantitatively the accuracy of our implemented version of MS against ground truth (GT) of several remotely sensed images. On average, our current results indicate an overall accuracy of roughly 70%. (Running k-means on the same data sets did not yield higher accuracy.)

In applying MS to remotely sensed data, we have tested the method with several kernel types and sizes. Although a more complex kernel may provide slightly more accuracy, taking into an overall consideration the procedure's simplicity, running time, etc., the flat kernel appears to be "optimal" (at least according to our practical experience so far). We have also tested several MS variants, in addition to the common blurring procedure. (See Comaniciu and Meer (1998), for further details.) The alternative variants did not seem to improve the overall accuracy. They do have some advantages, however, as far as running time and ease of parallel implementation may be concerned. As mentioned, the kernel

- 6 -

Page 7: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

bandwidth is the only parameter that needs to be decided upon before the procedure is invoked. For the specific task we have considered (i.e., clustering of NDVI data), the optimal kernel size (i.e., the one for which the highest overall accuracy was attained) remained essentially the same for all tested images. Note that for a very small kernel the accuracy will approach 100% (each point will be in its own cluster); as the kernel size increases, the overall accuracy (and the number of clusters found) will decrease. When the number of clusters found approaches the real number of clusters, the overall accuracy will start rising again until an optimum is reached. Afterwards, increasing the kernel size results in reducing the overall accuracy until a single cluster is found. (See Figure 5, for curves obtained on two real data sets.)

We have also observed that the accuracy of MS has risen proportionally to d, i.e., the number of bands (dimensions) used. As more bands were used, the overall accuracy was higher. (Of course, the running time becomes much slower with the dimension.)

3.2 Computationally efficient algorithm(s) for mean shift.

Efficiency is an important issue for every algorithm. In dealing with remote sensing applications this is all the more significant as we usually deal with very large images (i.e. points). Standard images may contain up to several millions of data points, and unlike standard applications of image processing, which usually deal with up to three bands (RGB), remotely sensed images are typically multispectral and may contain a large number of spectral bands. For example, Landsat images contain seven different bands and currently new satellites are capable of providing hyperspectral images that contain several hundred spectral bands. Occasionally it also makes sense to examine multitemporal images, i.e., a stack of images over the same area that were taken at different times and that are composed into a single image. This leads of course to an even larger number of image bands, that need to processed. Therefore, it is not uncommon that processing of even a relatively small sized image could take several hours and even days.

Given this critical issue, we aimed at deriving relatively fast methods for the computation of the mean shift algorithm. Basically the mean shift runs in time, where n is the number of data points to be processed. If we take into account that the number of bands is typically large, the multiplication constants could grow significantly and the running time of the procedure can be very high.

- 7 -

Page 8: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

750 1500 2250 3000 3750 4500 52500

50

100

150

200

250

300

350

400

450

500

Mean Shift Procedure Running Time

without HashingWith Hashing

Number of Points

Run

ning

Tim

e(Se

c)

Figure 3: Running time of mean shift with and without hashing.

We started out by examining how the different parameters (algorithmic variant, kernel type, kernel window size, etc.) affect the running time. We continued by exploring several optimization methods and their use in our application. It turns out that the mean shift algorithm has a special characteristic that is very useful for optimizing the running time. Being a deterministic process, the mean shift computation performed on a given point does not depend on other computations within the same iteration. Furthermore, since data points tend to coincide, therefore, much of the processing could be saved by saving intermediate results on the fly. In order to take advantage of this characteristic we maintained a hash table that maps data points to their computed means in a given iteration. Before a new mean is computed the algorithm first checks the table whether the point was already processed. Only if it was not, does its mean get computed (and stored for future use).

Having implemented a hash table to exploit this observation, it turned out that in the blurring process, on average 90% of the mean computations were repeated at some stage. Thus, it was possible to reduce the running time by approximately 70%-80%. Figure 3 shows the running time as a function of the number of points for both implementations, with and without the use of the hash table mechanism. The test was performed on randomly generated data using the blurring variant with a flat kernel.

While exploring the hashing mechanism we came across another interesting aspect. If we treat close points as having the mean, more computation can be saved. By controlling the "closeness" parameter, one can increase or decrease the processing time of the procedure at the expense of its accuracy. This technique is somewhat related to the bucketing technique that uses fast Gaussian transforms (Elgammal et al. 2003). Although it seemed a promising avenue of further research, it was deemed at that stage that the accuracy of the algorithm was more important than additional speedup that could be gained. Thus, no further study was performed to check the practical impact of this technique.

- 8 -

Page 9: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

3.3 Parallelized mean shift

As Cheng et al. (1992) noted, mean shift can be executed in parallel. Consider, for example, the blurring process. At each iteration the new sampled mean is calculated for the entire set T using only results from previous iterations and the weight function w. This means that the computations at each iteration can be divided between several processors as long as they possess results for the previous iteration.

On Initial dataset

Kernel function

send S to divide into n sections compute until convergencefor

receive On

receive S from compute until convergencesend to

Figure 4: Pseudocode of parallel mean shift using the initial data variant.

Of course, other algorithmic variants can be adapted to a multiprocessor environment. Figure 4 contains pseudocode describing a parallel implementation of the initial data variant. Using this variant, removes the need to communicate back the intermediate results between each iteration, so each processor can complete the entire mean shift computation independently to its end. (In this procedure as the dataset S is kept constant throughout the procedure, while each processor computes the next phase results in relation to S. (Cheng 1995.)

Every parallel implementation involves the use of a server. This machine is responsible for the coordination of the entire process. The process starts when the server loads up the data set for the tested image. It then waits until all the clients are connected to it and are ready to start processing. The server then communicates to all the clients the initial data for processing and allocates the points according to the number of clients. When a client receives a set of points, it starts processing and reports back the results to the server. The server then combines the results received to create a single image. If necessary, as in the blurring variant, the process is repeated several times until the entire procedure is completed and the final results are integrated into the output image. There can be various parallel implementations which vary in many aspects depending on the initial data set, the approximation used, whether intermediate results are communicated back between each iterations, and the exact work distribution between the different processors.

In summary, the parallel implementation of the mean shift procedure does yield the expected speedup. Although currently there are still some open issues regarding optimal distribution between the processors it is clear that even a relatively simple mechanism can considerably reduce the processing time.

- 9 -

Page 10: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

4. Experimental results

The study area that was selected represents a Mediterranean environment in Southern Israel, and contains a relatively large number of crop types. Ground truth of the area is available due to a survey conducted by experts of the Israeli Ministry of Agriculture. Five Landsat-TM images were acquired during the '96-'97 growing season, under clear sky conditions (see Figure 6 and 7). These images were radiometrically calibrated, using the empirical line method (Shoshany and Svoray 2002, Cohen and Shoshani 2002), and geometrically rectified with 0.5-pixel positional accuracy. Image acquisition dates allow for a representation of the different phyto-phenologies of crop types in this environment, i.e., distinguishing between summer and winter crops. Each of the images was converted to an Normalized Difference Vegetation Index (NDVI) layer to form a multi-layer input for subsequent clustering /classification. Following is an exemplary result of applying the mean shift on the given area. The overall accuracy was arrived at by first mapping the clusters found against the clusters provided by GT. A cluster is mapped to that cluster for which the number of overlapping points is the largest. The remaining points are considered errors. In some cases, if clusters correspond to the same GT clusters, they are merged.

Table 1: Contingency table for Image 2, with flat kernel, and bands 2, 3, 4, 5.Results obtained: Overall accuracy = 70.12%, number of clusters = 8

(2,888 points in ground truth), and optimal kernel distance = 21.

GT/MS 1 2 3 4 5 Reliability

1 430 0 5 24 52 84.15%2 2 170 18 19 12 77.63%3 3 0 310 51 47 75.98%4 7 0 2 650 73 89.66%5 76 5 0 37 465 91.72%

Unknown 2 0 132 138 158 36.92%Accuracy 82.69% 97.14% 66.38% 70.73% 57.62%

The first set of tests was aimed at examining the basic behavior of the algorithm. To keep things simple, the blurring variant was used with the flat kernel function. Various tests were then performed on one of the images. These tests included runs with different band combinations and various kernel sizes. The aim was to find the optimal kernel size for a given band combination and whether it remained roughly fixed for the same data type. The experiments were then repeated for different kernels. Specifically, we experimented with the Gaussian kernel, truncated kernel and Epanechnikov kernel. The last set of experiments focused on different algorithmic variants e.g., the blurring variant, the initial data variant, and the random select variant (Comaniciu and Meer 1999) (in which only a random subset of the data is clustered using the mean shift and the rest is classified according to minimal distance1.) The optimal parameter values obtained previously was used in this last set of experiments.

Having experimented extensively with the first image, it was deemed unnecessary to repeat all of the above tests on the rest of the images. Only a subset of tests was conducted

- 10 -

Page 11: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

to establish the validity of the parameter settings. It appeared that those optimal values remained relatively fixed for the same data type. Chapter 4 provides a detailed presentation of the results obtained.

We arrived at two important conclusions. The mean shift procedure reaches roughly an overall accuracy of 70% and overall reliability of 80%. This performance is satisfactory, since it is comparable to the performance of different approaches on the same data. Secondly, the more complex kernel functions did not seem to have a drastic effect on the accuracy. Considering the significant processing they require it might be desirable to use the simpler flat kernel instead.

At this point it was also evident, as was assumed initially, that working with a larger number of bands adds more information which results in a more accurate classification. For example using two bands resulted in an overall accuracy of roughly 60%, while using 3 and 4 bands resulted in 65% and 70% respecting. Of course, this justifies the additional processing which is required for a larger number of bands. It was not conclusive, however, as to which algorithmic variant to use. Although the blurring process is generally more accurate than the initial data variant, it seems that the difference in accuracy is not significant (although usually the initial data tends to recognize more clusters than there are). On the other hand, as will be explained in the following section, the advantages of the initial data variant, specifically in terms of parallelization, might justify its use.

Figure 5: Overall accuracy for two data sets as a function of the kernel size (with flat kernel).

Secondly, as can be noticed see Figure 5, the accuracy is usually high for a small distance. As the kernel size increases, the overall accuracy tends to decrease, as well as the number of clusters that the procedure recognizes. This trend continues until the number of detected clusters approaches the real underlying number of clusters. In that region the accuracy has a local minimum, i.e., the overall accuracy rises again (this shows as a knee shape in the graphs). When the kernel distance is further increased, the accuracy starts to decrease again. This is followed by the convergence of all points into a small number of clusters. The local minimum behavior is a good method of detecting the region in which the optimal kernel distance resides.

Table 1 shows results of the MS procedure invoked on 4 bands with a kernel size of 20. GT contained 2888 points. The overall accuracy and reliability obtained were roughly 70% and 80%, respectively, Figure 8 represent the resulting clustered image.

- 11 -

Page 12: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

Figure 6: Grey scale representation of NDVI data taken in November 1996.

Figure 7: Grey scale representation of NDVI data taken in February 1996.

- 12 -

Page 13: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

Figure 8: resulting clustered image of test image 2 when performing mean shift on bands 2, 3, 4, 5,

flat kernel with size of 21. Overall accuracy = 70.12% (see Table 1).

5. Conclusions

The first goal of the research was to demonstrate that mean shift can be used successfully to cluster remotely sensed images. Although the results, due to mean shift are not significantly better than those due to other common techniques, it is still evident that the mean shift is comparable. The accuracy for the tested images was roughly 70% with a relatively high reliability. This accuracy was also the resulting accuracy produced by the ISO data algorithm when run on the same images and it is also a common ball park figure for various other approaches. Therefore, we can safely conclude that the mean shift procedure performs in an acceptable manner, as far as clustering of remotely sensed data is concerned.

However the usefulness of the mean shift algorithm does not rely solely on the accuracy it achieves. Its other advantages make it an important technique for image clustering. The first advantage of the mean shift is its simplicity. It has very few free variables that need to be decided upon by the user. Once the exact variant and the kernel type are chosen, only the kernel distance needs to be determined. Other approaches usually require a larger number of free variables. This advantage is very important and it is made even more so by the ability to pinpoint the optimal values. It appeared, that the accuracy had a characteristic behavior vs. kernel size. It is speculated that this distinct behavior might be used in an automated process that will enable to focus on a relatively small range in which the optimal kernel distance belong to. It’s also important to notice that once found, the optimal kernel distance is expected to reside. Also, it is important to note that once found, the optimal kernel distance remained relatively fixed for all the test images. To conclude, it seems fairly reasonable to assume, at least for remotely sensed images, that the kernel distance can be easily determined for any environment type after which it can be used for any data sets of the same type.

- 13 -

Page 14: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

Another even more important advantage of the mean shift is regarding its processing time. Although the mean shift is not the most efficient process around, we have shown that by using a parallel implementation of the process we can achieve a linear speed up in its processing time. This is of course is a huge advantage, especially in today computing environment were the use of local area networks, which contains many relatively weak processors is most common. Other approaches does not present such an ability and when trying to process big data sets there is no other option other than to wait for the full process to be performed on a single processor machine.

Another important thing to note is the tradeoff between the number of clusters detected by the procedure and the overall accuracy. As the number of detected clusters increases, the overall accuracy increases too. This is a direct byproduct of our computation. The overall accuracy is computed by assigning each cluster found to the ground truth cluster for which it contains the greatest number of points. For a small kernel distance, each point represents its own cluster, such that the accuracy is 100%. On the other hand, for a sufficiently large kernel distance, all the points converge into a single cluster and the accuracy is proportional to the biggest real cluster (i.e., the cluster that has the most points in ground truth). These extreme cases demonstrate two things. First, in principle controlling the kernel distance determined the number of clusters found in the image. In general as the kernel distance increases, more points converge into the same clusters, so less clusters are reported by the procedure. Secondly, the computation of the overall accuracy should be combined with the number of clusters found. Only when this number is correlated with the actual ground truth, can the results be considered correct. Fortunately, as we found in our experiments, the overall accuracy has a local minimum when the detected number of clusters approaches the actual number of the clusters in the image, and this minimum can be used to determine the optimal kernel size.

A more important advantage of the mean shift is its processing time. Although the mean shift is not the most efficient process, we showed that its parallel implementation achieves a linear speedup in running time. This of course is a significant advantage, especially in today's computing environment when the use of local area networks, which contains many relatively weak processors is most common. Other approaches does not present such an ability and when trying to process big data sets there is no other option other than to wait for the full process to be performed on a single processor machine.

As to open issues regarding mean shift, we propose two research directions that might yield some useful results. Although a linear speedup was demonstrated by the parallel implementation, the current application can still benefit from a more sophisticated distribution of the processing. We showed that a hash table saves considerable running time on a single processor. But is less efficient in our parallel implementation. The hash table related advantages can still be applied though in a parallel environment. If the points are distributed such that "close" points are allocated to the same machine, there will be much less hash misses, i.e., the heart of the hash mechanism is expected to become apparent. A relatively simple approach for this, might be that before distributing the jobs to the different machines, the points will be divided to a number of sets each of which will contain points that are relatively near each other and should therefore converge to the same cluster.

The second research direction is also related to hashing. As mentioned earlier hashing can be used for a tradeoff tunning between time and algorithmic correctness, i.e., by allowing also sufficiently close points, (where "close" is variable), to share the same mean, much computation can be saved. However the effect on the overall accuracy of the procedure should be further studied. It might turn out that such a tradeoff is not of interesting since it yields a significant reduction in the overall accuracy.

In conclusion, this research demonstrated the use of the mean shift procedure in clustering of remotely sensed images. We showed that the overall performance of this technique is

- 14 -

Page 15: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

comparable to other common approaches. We showed how the technique can be applied in remote sensing and how to adapt the various parameters for this type of data. We also demonstrated that parallel implementation results in a linear speedup and discussed several issues for such an implementation. The main conclusion that can be drawn is that in view of the significant advantages that mean shift offers it is worth while incorporating it also in other applications.

References

Anil, K. J., Robert, P.W., Mao, J., 2000, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, pp. 4-37.

Backer, E., 1995, Computer-Assisted Reasoning in Cluster Analysis (New Jersey: Prentice Hall International).

Bezdek, J.C.,1981, Pattern Recognition with Fuzzy Objective Function Algorithms (New York: Plentum Press).

Castleman, K. R, 1996, Digital Image Processing (New Jersey: Prentice Hall International).

Cihlar, J., Latifovic, R., Beaubien, J, 2000, A Comparison of Clustering Strategies for Unsupervised Classification, Canadian Journal of Remote Sensing, 26, pp. 446-454.

Cheng, Y., 1995, Mean Shift, Mode Seeking, and Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, pp. 790-799.

Cheng, Y. and Wan, Z., 1992, Analysis of the Blurring Process, In Computational Learning Theory and Natural Learning Systems, 1992, Petsche, T. (Ed.) et al. (London: MIT Press), pp. 257-276.

Cohen, Y. and Shoshany M., 2002, Integration of remote sensing, GIS and expert knowledge in national knowledge-based crop recognition in Mediterranean environment, International Journal of Applied Earth Observation and GeoInformation, 4, pp. 75-87.

Comaniciu, D., Meer, P., 1998, Distribution Free Decomposition of Multivariate Data, International Workshops on Advances in Pattern Recognition, 1451, pp. 602-610.

Comaniciu, D., Meer, P.,2002, Mean Shift: A Robust Approach toward Feature Space Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, pp 603-619.

Comaniciu, D., Meer, P., 2003, Kernel Based Object Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, pp 564-577.

Comaniciu, D., Meer, P., 1999, Mean Shift Analysis and Applications, IEEE International Conference on Computer Vision, Kerkyra, Greece, pp. 1197-1203.

- 15 -

Page 16: Word template for authors, EIAS Style Bu.cs.biu.ac.il/~nathan/journal_paper.doc  · Web viewThe method is a variant of density estimation. We present in this thesis the approach

Comaniciu, D., Ramesh, V., Meer, P., 2001, The Variable Bandwidth Mean Shift and Data-Driven Scale Selection, International Conference on Computer Vision, 1, pp. 438-445.

Elgammal, A., Duraiswami, R., Davis, L. S., 2003, Efficient Kernel Density Estimation Using the Fast Gauss Transform with Applications to Color Modeling and Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, pp. 1499-1504.

Fukunaga, K. and Hostetler, L.D., 1975, The Estimation of the Gradient of a Density Function with Applications in Pattern Recognition, IEEE Transactions on Information Theory, 21, pp. 32-40.

Georgescu, B., Shimshoni, I., Meer, P.,2003, Mean shift based clustering in high dimensions: A texture classification example, In International Conference on Computer Vision, 2003, Nice: France, pp. 456-463.

Meer, P., 2005, Robust Techniques for Computer Vision, In Emerging Topics in Computer Vision, 2005 Medioni, G. and Kang, S. B. (Eds.) (New Jersey: Prentice Hall).

Silverman, B. W., 1986, Density Estimation for Statistics and Data Analysis (London: Chapman and Hall).

Shoshany, M. and Svoray, T., 2002, Multidate adaptive unmixing and its application to analysis of ecosystem transitions along a climatic gradient, Remote Sensing of Environment, 82, pp. 5-20.

Yang, C., Duraiswami, R, Gumerov, N. A., Davis, L., 2003, Improved Fast Gauss Transform and Efficient Kernel Density Estimation, In International Conference on Computer Vision, 2003, Nice: France, pp. 464-471.

Yang, C., Duraiswami, R., DeMenthon D., Davis, L., 2003, Mean Shift Analysis using Quasi Newton Methods, International Conference Image Processing, Barcelona,3, pp. 447-450.

- 16 -