A Study of Grid- based Algorithms for Spatial Data Clustering

download A Study of Grid- based Algorithms for Spatial Data Clustering

of 5

Transcript of A Study of Grid- based Algorithms for Spatial Data Clustering

  • 7/29/2019 A Study of Grid- based Algorithms for Spatial Data Clustering

    1/5

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    4

    A Study of Grid- based Algorithms for Spatial Data Clustering

    Ch.N.Santhosh Kumar1, Dr. K Nageswara Rao

    2, Dr.A.Govardhan

    3, V. Sitha Ramulu

    4, K.Sudheer Reddy

    5

    1

    Research Scholar, Dept. of CSE, JNTU- Hyderabad, India.2Professor & Head, Computer Science & Engineering,PVPSIT Vijayawada, A.P., India.3Professor in CSE & Director of Evaluation, JNTU Hyderabad, A.P., India.

    4Research Scholar, Dept. of CSE, Acharya Nagarjuna University, A.P., India.

    5Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P, India.

    Abstract

    Spatial data mining is the process of discovering interesting and previously unknown and potentially useful patterns from large spatial datasets.

    Extracting interesting and useful patterns from large spatial datasets is more difficult than extracting the corresponding patterns from

    traditional numeric and categorical data due to the complexity of spatial data types, spatial relationships, and spatial autocorrelation.

    Clustering is one of the important methods in spatial data mining. Clustering is the subject of active research in many fields such as statistics,

    pattern recognition, and machine learning. Clustering is the process of division of data into groups of similar objects. Representing the data byfewer clusters necessarily looses certain fine details, but it achieves simplification. Clustering models data by its clusters. There are various

    methods for spatial data clustering which includes partitioning methods, grid- based methods, density- based methods etc..,. In this paper we

    present an overview of grid- based algorithms for spatial data clustering.

    Keywords Clustering, Grid, Spatial data.

    I. INTRODUCTIONSpatial data mining[1] methods can be applied to extract the interesting and regular knowledge from large spatial

    databases. In particular, they can be used for understanding the spatial data, discovering relationships between spatial and non

    spatial data, construction of spatial knowledge-bases, query optimization, data reorganization in spatial databases, capturing the

    general characteristics in simple and concise manner, etc. This has wide applications in the Geographic Information Systems

    (IllS), remote sensing, image databases exploration, medical imaging, robot navigation, and other areas where spatial data areused. Knowledge discovered from the spatial data can be of various forms, like characteristic and discriminant rules, extraction

    and description of prominent structures or clusters, spatial associations, and others.

    Clustering[2] is one of the important methods in spatial data mining. Clustering is the subject of active research in many

    fields such as statistics, pattern recognition, and machine learning. Clustering is the process of division of data into groups ofsimilar objects. Representing the data by fewer clusters necessarily loses certain fine details, but it achieves simplification.

    Clustering models data by its clusters. There are various methods for spatial data clustering which includes partitioning

    methods, grid- based methods, density- based methods [3] etc..,.

    In this paper we present an overview of grid- based methods. Grid-based method[4] quantifies the object space to a limitednumber of square cells, forming a grid structure. All the clustering operations are to be operated within the grid structure (i.e.

    quantitative space). The main advantage of this approach is efficiency and speed - the processing time is usually independent of

    the number of data objects, which only depends on the number of cells in each the dimension in the quantitative space.

    II. STING METHODSTatistical INformation Grid-based method(STING)[5] is a hierarchical statistical information grid based approach for spatial

    data mining. The idea is to capture the statistical information associated with spatial cells in such a manner that whole classes of

    queries and clustering problems can be answered without the recourse to the individual objects. Divide the spatial area into

    several rectangle cells and employ a hierarchical structure. Let the root of the hierarchy be at level 1; its children at level 2, etc.

    A cell in level i corresponds to the union of the areas of its children at level i + 1. With the hierarchical structure of grid cells on

    hand, we can use a top-down approach to answer spatial data mining queries. For each query, we begin by examining cells on a

    high level layer. Note that it is not necessary to start with the root; we may begin from an intermediate layer. Starting with the

  • 7/29/2019 A Study of Grid- based Algorithms for Spatial Data Clustering

    2/5

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    5

    root, we calculate the likelihood that this cell is relevant to the query at some confidence level using the parameters of this cell.

    This likelihood can be defined as the proportion of objects in this cell that satisfy the query conditions. After we obtain the

    confidence interval, we label this cell to be relevant or not relevant at the specified confidence level. When we finish examining

    the current layer, we proceed to the next lower level of cells and repeat the same process.

    The algorithm for STING method is as follows:

    1. Determine a layer to begin with.2. For each cell of this layer, we calculate the confidence interval (or estimated range) of probability that this cell is

    relevant to the query.

    3. From the interval calculated above, we label the cell as relevant or not relevant.4. If this layer is the bottom layer, go to Step 6; otherwise, go to Step 5.5. We go down the hierarchy structure by one level. Go to Step 2 for those cells that form the relevant cells of the higher

    level layer.

    6. If the specification of the query is met, go to Step 8; otherwise, go to Step 7.7. Retrieve those data fall into the relevant cells and do further processing. Return the result that meet the requirement of

    the query. Go to Step 9.

    8. Find the regions of relevant cells. Return those regions that meet the requirement of the query. Go to Step 9.9. Stop.

    III.OPTIGRID METHODOptimal Grid(OPTIGRID) is based on constructing an optimal grid partitioning of the data. The optimal grid partitioning is

    determined by the calculation of the best partitioning hyper planes for each dimension using certain projections of the data.

    The OPTIGRID algorithm is given below. OPTIGRID algorithm works recursively. In each step it partitions the actual data

    set in to a number of sub- sets if possible. The sub sets which contain at least a cluster are treated recursively. The

    partitioning is done using the multi- dimensional grid defined by at most q cutting planes. Each cutting plane is orthogonalto a minimum of one projection. The point density at the cutting planes is bounded by the density of the orthogonal

    projection of the cutting plane in the projected space. The q cutting planes are chosen to have the minimal point density. The

    recursion stops for a sub set if no good cutting plane can be found any more.

    OPTIGRID(data set D, q, min_cut_score)

    1. Determine a set of contracting projections P= {p0, p1, ...pk}.2. Calculate all projections of the data set D p0(D)

    ....pk(D).

    3. Initialize a list of cutting planes BEST_CUT , CUT .4. For i=0 to K do

    (a) CUT Determine best_local_cuts(pi(D)).(b) CUT_SCORE score best_local_cuts(pi(D)).(c) Insert all cutting planes with a score >= min_cut_score in to BEST_CUT.

    5. IF BEST_CUT= Then RETURN D as a cluster.6. Determine the q cutting planes with highest score from BEST_CUT and delete the rest.7. Construct a multi dimensional grid G defined by the cutting planes in BEST_CUT and insert all data points xD in

    to G.

    8. Determine clusters i.e.., determine the highly populated grid cells in G and add them to the set of cluster C.9. Refine(C).10. For Each Cluster Ci C do

    OPTIGRID(Ci, q, min_cut_score).

  • 7/29/2019 A Study of Grid- based Algorithms for Spatial Data Clustering

    3/5

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    6

    IV.COSINE BASED GRID METHODCosine Cluster[6] method is based on cosine transformation which detect clusters of arbitrary shapes, insensitive to the

    outliers (noise) and the order of input data. Using multi-resolution property of the cosine transforms, arbitrary shape clusters

    can be effectively identified at different degrees of accuracy. Given a set of spatial objects Oi, 1

  • 7/29/2019 A Study of Grid- based Algorithms for Spatial Data Clustering

    4/5

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    7

    applications in a range of disciplines, such as computational fluid dynamics, astrophysics, meteorological simulations, structural

    dynamics, magnetics, and thermal dynamics. Basically, the algorithm can place very high resolution grids precisely where the

    high computational cost requires. Its adaptability allows simulating multi scale resolutions that are out of reach with methods

    using a global uniform fine grid. The motivation of combining the AMR concept into the clustering comes from the observation

    that a very fine mesh can be required for clustering on a highly irregular or concentrated data distribution if a grid-based

    clustering algorithm that employs a single uniform mesh is used. A fine mesh results in high computation cost and, in somecases, the mesh size can even overwhelm the number of the data objects.

    AMR clustering algorithm connects the grid-based and density-based approaches through AMR technique and, hence,preserves the advantages of both type algorithms. The algorithm consists of two stages: constructing the AMR tree and data

    clustering. The AMR tree construction is a top-down process starting from the root node that covers the entire problem volume.

    The data clustering stage is a bottom-up process which starts at a given tree level (depth) and works toward to the root. The

    algorithm for constructing AMR tree is as follows:

    AMR(grid, level)

    1. create a new subgrid containing this cell only2. for each particle contained in this grid3. find the mesh cell id in which the particle is located4. Accumulate the particle to the density of cell id5. for each mesh cell6. if the density of this cell is greater than the threshold7. Mark this cell to be refined8. Connect this cell to its neighbor that is also

    marked.

    9. if no such neighbor exists10. create a new sub grid containing this cell only.11. for each subgrid12. Call AMR(subgrid, level+1).

    VII. CONCLUSIONSSpatial data mining is the application of data mining techniques to spatial data. It is the discovery of interesting the

    relationship and characteristics that may exist implicitly in spatial databases. Clustering is an important concept in spatial data

    mining. Cluster analysis divides data into meaningful or useful clusters. There are many methods for spatial clustering that

    includes partitioning methods, grid- based methods, density- based methods. Grid-based method quantifies the object space to a

    limited number of square cells, forming a grid structure. All the clustering operations are to be operated within the grid structure

    (i.e. quantitative space). Statistical Information Grid-based method(STING) is a hierarchical statistical information grid based

    approach for spatial data

    mining. The idea is to capture the statistical information associated with spatial cells in such a manner that whole classes of

    queries and clustering problems can be answered without the recourse to the individual objects. Optimal Grid(OPTIGRID) is

    based on constructing an optimal grid partitioning of the data. The optimal grid partitioning is determined by the calculation of

    the best partitioning hyper planes for each dimension using certain projections of the data. Cosine based grid method applies

    cosine transform on the spatial data feature space which helps in detecting arbitrary shape clusters at different scales. Thedeflected grid based algorithm is to deflect the original grid structure in each dimension of the data space after the clusters

    generated from this original structure have been obtained. AMR method partitions the problem domain into regions that are

    represented by the grids in a hierarchical tree. Each grid represents the data in a spatial sub domain and grids at different levels

    of the tree employ meshes of the different granularity.

  • 7/29/2019 A Study of Grid- based Algorithms for Spatial Data Clustering

    5/5

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    8

    REFERENCES

    [1] M.Hemalatha.M; Naga Saranya.N. A Recent Survey on Knowledge Discovery in Spatial Data Mining, IJCI International Journal of

    Computer Science, Vol 8, Issue 3, No.2, may,2011.

    [2] T. Ng. Raymond and J. Han. Efficient and effective clustering methods for spatial data mining. VLDB Conference. Santiago, Chile, 1994.

    [3] X. Hu and I. Yoo. Cluster ensemble and its application in gene expression analysis. Proceedings, The Second Conference on Asia-Pacific

    Bioinformatics. Vol.29, Dunedin, New Zealand, pp. 297-302, 2004.

    [4] Cao Q., Bouqata B., Mackenzie P. D., Messier D., Salvo J., A Grid-Based Clustering Method For Mining Frequent Trips From Large-

    Scale, Event-Based Telematics Datasets, The 2009 IEEE International Conference on Systems, Man, and Cybernetics , San Antonio, TX,

    USA, October 2009, pp. 2996-3001.

    [5] W. Wang, J. Yang, and R. R. Muntz. STING: A satistical information grid approach to spatial data mining. Technical Report No. 970006,

    Computer Science Department, UCLA, February 1997.

    [6] Safdar Ali Syed Abedi, " Exploring Discrete Cosine Transform For Multi-Resolution Analysis " Under the Direction of Saeid Belkasim.

    [7] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data, An Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.

    [8] M. Berger and P. Colella. Local Adaptive Mesh Refinement for Shock Hydrodynamics. Journal of Computational Physics, 82(1):6484,

    May 1989.

    [9] M. Berger and J. Oliger. Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations. Journal of Computational Physics,

    53:484512, Mar. 1984.