Volume 5, Issue 6, June 2015 ISSN: 2277 128X International...

© 2015, IJARCSSE All Rights Reserved Page | 751

Volume 5, Issue 6, June 2015 ISSN: 2277 128X

International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com

Association Rule Mining for Ground water and Wastelands Using Apriori

Algorithm: Case Study of Jodhpur District Mainaz Faridi

* Seema Verma

Saurabh Mukherjee

Department of Computer Science Department of Electronics Department of Computer Science

Banasthali University, India Banasthali University, India Banasthali University, India

Abstract— The advancement and improvement in data collection and storage techniques have led to collect and store

terabytes of data on daily basis. This large volume of data hides meaningful and interesting information that need to

be brought in light. This has made data mining as one of the profoundly researched domain of the recent years.

Uncovering and finding out the non- trivial, previously unknown and hidden information from large data repositories

and data warehouses is the primary goal of data mining. Data mining when applied to spatial data sets is called

Spatial Data Mining or Geographic Data Mining, where it can be used to characterize spatial data, interrelate spatial

and non spatial data and depict hidden and veiled spatial patterns. Data mining has many methods for discovering the

previously unseen patterns and trends such as clustering, classification, prediction, regression, outlier detection,

association rule mining etc. In this research paper, authors propose to mine association rules between ground water

and wastelands using spatial data mining techniques. The salt-affected waste lands and waste lands without scrubs

showing higher ground water level underneath can be irrigated using this water thereby increasing the area under

cultivation.

Keywords— Spatial Data Mining, Association Rule Mining, Apriori Algorithm, Wastelands, Ground Water.

I. INTRODUCTION

WRIS and BOOSAMPDA are two major projects run by ISRO (Indian Space Research Organization) and NRSC

(National Remote Sensing Centre) providing country wide information on ground water and data relevant to land cover

across India in form of maps respectively, producing huge amount of data related to ground water and land-cover[1]. The

tremendous volume of numeric and geospatial data stored in different formats, databases and data repositories imposes a

need for a wide range of tools and techniques to analyze, query, uncover data patterns or even predict phenomenon where

human intelligence alone is not sufficient to solve complex cases [2] New technologies and methods are needed to

explore these large databases for hidden and implicit knowledge, special patterns, or correlation between spatial and non

spatial attributes[3]. Recent research activities on knowledge discovery on large spatial databases have paved a

foundation for spatial data mining techniques.

A. Spatial data mining

Spatial data mining i.e. discovery of interesting, implicit knowledge in spatial databases, provides means for

understanding and use of spatial data- and knowledge- bases. Spatial data mining is also referred to as Geographical Data

Mining [4] and Knowledge Discovery in Spatial Database [5]. The main difference between data mining and spatial data

mining is that in spatial data mining tasks we use not only non-spatial attributes (as it is usual in data mining in non-

spatial data), but also spatial attributes. Traditional data mining has no or very little dependence between the studied

variables and lacks the ability to correlate non-spatial attributes with spatial information [6]. Spatial data mining is the

process to find and uncover useful and interesting patterns which are hidden in large spatial datasets.

Revealing interesting and potentially useful patterns from large spatial datasets is much more complex than

extracting the corresponding patterns from conventional numeric and categorical data sets. The complexity of spatial data

types, relationships and autocorrelation of spatial attributes account to this difficulty [7].

B. Association Rule Mining using Apriori Algorithm

Association Rule Mining (ARM) is an important and widely used technique of data mining. This is one of the

extensively used and studied methods of data mining, having a wide range of application areas. The most common

example is the market basket analysis where association between different consumer products is figured out which can

assist in taking effective business and marketing decisions. Other application domains which provide large data sets

where ARM can be applied are finance, insurance, banking, fraud detection, medical, bioinformatics, demographic

studies, telecommunication, GIS, remote sensing, e-commerce and retailing. More recently association rule mining is

also applied to areas like pharmaceutics, law and justice, aviation management, agriculture, weather forecast etc.

Let there are T transactions in database D and X and Y are disjoint itemsets containing collection of items i.e.

there intersection is null, (X ∩ Y = ∅). An association rule can be written in form X → Y, where X is the antecedent (left

hand side of the rule) and Y is the consequent (right hand side). A rule may contain more than one item in antecedent and

consequent of rule. The strength and reliability of an association rule is measured by two factors: support and confidence.

http://www.ijarcsse.com/

Faridi et al., International Journal of Advanced Research in Computer Science and Software Engineering 5(6),

June- 2015, pp. 751-758


Support (prevalence) is percentage of database transactions that contains X and Y or it can be viewed as the

probability where X and Y occur together i.e. σ (X∪Y). Support s for rule (X→Y) can be calculated as:

Support(s) for (X →Y) = σ X ∪ Y

N (1)

Confidence (predictability) is the percentage of database transactions containing X that also contain Y. In other words, it

could be seen as the conditional probability, σ(Y|X). It can be calculated as:

Confidence(c) for (X → Y) = σ(X ∪ Y )

σ(X) (2)

Support provides statistical significance to the rule. If it is too low then it may be possible that the rule has occurred

mere by chance. On the other hand, confidence measures reliability or predictability of the rule. If it is kept high then one

can easily infer that Y is also present in transactions containing X. Therefore, to select only those rules which have high

interestingness threshold levels are set on support and confidence values, called as minsup and minconf, respectively.

Generally a low minsup and a high minconf are set to ensure that all the possible interesting rules have been mined.

Association rules are mined in two phases. In first step (Frequent itemset generation), using minsup all the itemsets are

found whose support is greater than minsup. Such itemsets are called frequent itemsets. In the next phase, all the rules are

pruned from frequent itemsets, who satisfy the minconf threshold (Rule generation) [8].

1)Apriori Algorithm: Many algorithms have been proposed for association rule mining. But the eminent one remains the

Apriori Algorithm, proposed by Agrawal et. al in 1994 [9]. This has remained the much studied and researched algorithm

even after many years of its introduction. Many advancements and extensions have been proposed for this algorithm, but

its applicability to many areas has still to be utilized.

Apriori algorithm works on the principle of downward closure property or anti monotone property. In order to

generate frequent itemsets by searching all the possible itemsets, whole database needs to be scanned. To reduce the

number of candidate itemsets during frequent itemset generation, anti monotone property is used. It states that if an

itemset is frequent then all its subsets will also be frequent or if an itemset is not frequent then its supersets are also not

frequent. Let P be the power set and X be the subset of Y. Reference [8] shows that a measure f is anti monotone if

∀ X, Y ∈ P: (X ⊆ Y) → f(Y) ≤ f(X).

Apriori algorithm uses breadth-first technique to search the candidate itemsets. It uses itemsets with k-1 length to

generate itemsets of k length (join step). Then it uses the anti monotone property to generate frequent itemsets (prune

step). Association rules can be generated by using frequent itemsets such that X → Y-X. Those rules whose confidence

does not satisfy minconf threshold are dropped out and only the remaining strong rules are chosen.

2)Pseudocode: The pseudo code for the algorithm is stated as follows:

ALGORITHM. Apriori

Input: D, a database of transactions; minsup, the minimum support count threshold.

Output: Lk, frequent itemsets in D.

L1= {frequent 1-itemsets};

for(k= 2; Lk-1 !=∅; k++) {

Ck = candidates generated from Lk-1

//that iscartesian product Lk-1 x Lk-1 and eliminating any k-1 size itemset that

//is not frequent

for each transaction t in database do{

#increment the count of all candidates in Ck that are contained in t

Lk = candidates in C k with minsup

}//end for each

}//end for

return ⋃kLk;

}

II. AIM AND OBJECTIVES

Land and water are undoubtedly the two major natural resources which are essential for the very existence of life.

With the increase of population the demand for land has raised many folds. Therefore, objective of the study is to find

those barren lands having a substantial ground water level, so that these lands can be used for cultivation of crops and

fodder for animals. The study aims to unearth association rules between ground water and wastelands of Jodhpur District.

The outcomes will reveal some useful patterns helping us to relate ground water and wastelands.

III. RESEARCH METHODOLOGY

A.Study Area

Jodhpur district comes under arid zone of the Rajasthan situated between 250 51’ 08” & 27

0 37’ 09” North latitude and

710

48’ 09” & 730

52’ 06” East longitude. It covers 11.60% of total arid area of the state. Jodhpur district, part of

Jodhpur Division covers a geographical area of 2256405 hectares and is divided into 5 sub-divisions that are

Jodhpur, Shergarh, Pipar City, Osian & Phalodi. The district has 07 tehsils & 09 blocks. The district is bounded by

Bikaner in North, Nagaur in East, Jaisalmer in west, and Barmer and Pali in the South.


June- 2015, pp. 751-758


Fig .1. Map showing study area location

B.Data Collection

The study required information about land use, ground water and soil in the study area in GIS format. For the

proposed system the data has been collected from Indian Space and Research organization (ISRO) Jodhpur Center. The

center provided the data for land use, ground water and soil for Jodhpur district for the year 2005 in GIS format.

The different types of dataset and their basic characteristics pertaining to this study are briefly described as follows:

1) Landuse Data of Jodhpur District: Land use Map of Jodhpur shows the division of land into Agricultural Land,

Built-up, Forest, Waste-land, Water bodies and Wetlands.

2)Ground Water Data of Jodhpur District: Jodhpur District is classified into different regions depending upon on the

level and quality of ground water viz. Good, Good but saline, Good to Moderate, Moderate, Moderate to Poor, Poor,

poor to Nil, Saline, Settlement, Very Good to Good and Water Body mask.

C. Tools/ Softwares used

ArcMap 10 is used for creating thematic maps and overlays. Weka 3.6 is used for generating Association rules.

D. Methods

The methodology developed for this study is shown below in figure 2. Each block represents the sub-processing step

to reach up to the final output.

Fig. 2. Overall approach of the study.

1)Pre-processing of Data: The spatial datasets are preprocessed to create a transactional database before association rule

mining can be applied. The preprocessing of spatial data may include selection of non spatial attributes, feature selection,

dimension reduction, carrying out join, union or intersection operations, data categorization etc [10].The study required

two different types of data set for ground water and waste lands. The pre-processing of data was carried in three steps:


June- 2015, pp. 751-758


a.Thematic layer with the required attributes is created for waste land data.

b.Thematic layer with the required attributes is created for ground water data.

c.Intersection is performed on the waste land and ground water layers to get a new intersection layer and a new thematic

layer is created that shows those areas of Jodhpur district which are either salt-affected waste lands or waste lands

without scrubs having good ground water beneath. The details of the above pre-processing steps are as follows:

a. Thematic Layers for Waste Land

Land use data of Jodhpur district as provided by the ISRO center Jodhpur, classifies the land use into following types:

Agricultural Land, Built-up, Forest, Waste-land, Water bodies and Wetlands. The table I shows the land use pattern

in the order of decreasing area and figure 3 shows the land use map.

Table I: Land Use Pattern

Fig. 3. Land Use map of Jodhpur District

Out of all the above classified lands, the study focuses on waste-lands only. Therefore, to get the waste-land distribution

pattern a new thematic layer is prepared showing only waste lands. The figure 4 and table II show the newly created

thematic layer for waste land only. The layer shows that the waste lands are again classified into Sandy-desertic Land,

Salt Affected, Land Mining/ Industrial waste, Land without scrub, Land with scrub, Gullied/Ravenous Land,

Barren Rocky/ Stony waste land.

Table II: Waste Land Pattern

Fig. 4. Waste Land distribution of Jodhpur District

Among all the types of waste lands only waste lands that are either salt affected or without scrubs are chosen for further

study. The reason behind it is that all other types of waste-lands are either already contain some vegetation(Land with

scrub) or are not suitable for growing any type of vegetation(Sandy-desertic Land, Land Mining/ Industrial waste,

Gullied/Ravenous Land, Barren Rocky/ Stony waste land). Therefore, a new thematic layer for “Land Without Scrubs”

and “Salt Affected Waste Land” is created. The figure 5 and table III show this layer.

Table III: Waste Land (Salt affected/ Without Scrub)

Fig. 5. Waste Land (Salt affected/Without Scrub) distribution.

Land –Type Area(Hectares)

Agriculture 1940925.7

Waste-lands 675378.7

Built-up 29594

Water bodies 20406.8

Forest 14164.8

Wetlands 6110.4

Waste-land Type Area(Hectares)

Sandy-desertic Land 213737.9

Land without scrub 155328.8

Land with scrub 154027

Barren Rocky/Stony

waste 141733.4

Mining Industrial waste 4017.7

Salt Affected Land 3716.7

Gullied/Ravenous Land 2816.9

Wasteland Area(Hectares)

Land without

scrub 155328.80

Salt Affected

Land 3716.73


June- 2015, pp. 751-758


Thus, the above process can be summarized as:

Fig. 6. Thematic layers of Land use data

b. Thematic Layers for Ground water

Ground water data, as provided by the ISRO Center, Jodhpur is classified into different types like Good, Good but

saline, Good to Moderate, Moderate, Moderate to Poor, Poor, poor to Nil, Saline, Settlement, Very Good to Good

and Water Body mask. Based on this classification Jodhpur District is divided into these regions .This distribution of

ground water is shown in the figure 7 and table IV.

Table IV: Ground water Pattern

Fig. 7. Ground water distribution of Jodhpur.

Out of these classified regions, only those regions of Jodhpur District are selected having Good, Good but saline,

Good to Moderate and Very Good to Good ground water level. As a next step, new thematic layer for ground water is

created containing only the selected attributes as showed in figure 8 and table V.

Table V: Good ground water Pattern

Figure 8: Good ground water distribution of Jodhpur District.

Ground Water Area (Hectares)

Good 40115.58

Good but Saline 11168.37

Good to moderate 27975.06

Moderate 582198.82

Moderate to Poor 1460483.16

Poor 313362.70

Poor to Nil 98935.60

Saline 3345.65

Settlement 31270.99

Very good to good 266028.01

Water Body Mask 21649.42

Ground Water Area(Hectare

s) Good 40115.58

Good but Saline 11168.37

Good to moderate 27975.07

Very good to good 266028.01


June- 2015, pp. 751-758


Thus, the above process can be summarized as shown in figure 9:

Fig. 9. Thematic layers of Ground water data.

c. Overlays and Intersection of Thematic Layers

As the next step overlay maps of waste lands (salt affected and without scrubs) and good ground water is created.

An overlay operation is much more than a simple merging of linework, all the attributes of the features taking part in the

overlay are carried through, as shown in the figure 10 below, where wastelands (polygons) and good ground water

(polygons) are overlayed to create a new polygon layer.

Fig. 10. Overlay Map of Wasteland (Salt affected/ Without Scrub) and Good Ground Water.

Then a new layer is created for those areas of the district having waste lands which are salt affected or without scrub

and have good ground water beneath, by using intersection. The newly constructed layer is shown in the figure 11. Table

VI shows the area under mining pattern.

Table VI: Area under mining pattern.

Fig.11. Intersect Map of Wastelands (Salt affected/Without Scrub)

and Good Ground Water.

2)Association Rules Generation: For generating Association rules, a tool called Weka 3.6 is used. The database file

obtained from the above map (figure 11) is converted into ARFF format on which association rules are generated using

Apriori algorithm.

IV. RESULTS AND DISCUSSION Apriori algorithm was run in Weka using the arff file created after the preprocessing of data. Three attributes were

chosen viz. Taluk, WasteLandType and GroundWaterType from the database file as predicates. Six itemsets of size1, 7

itemsets of size 2 and 2 itemsets of size 3 were discovered from a total of 285 instances of data in 17 cycles. Minimum

support and minimum confidence kept were 15% (0.15) and 90% (.9) respectively. Tables VII,VIII and IX show large

item sets found in the data.

Waste Land Area(Hectare

s)

Land without

scrub 13308.98

Salt Affected

Land 329.96

Total 13638.94


June- 2015, pp. 751-758


Table VII. Large Itemsets L(1)

Item 1 Count

Taluk=Bilara 99

Taluk=Jodhpur 76

Taluk=Phalodi 61

WasteLandType=Landwithout scrub 280

GroundWaterType=Very good to good 181

GroundWaterType=Good 44

Table VIII. Large Itemsets L(2)

Item 1 Item 2 Count

Taluk=Bilara WasteLandType=Land without scrub 99

Taluk=Bilara GroundWaterType=Very good to good 81

Taluk=Jodhpur WasteLandType=Land without scrub 76

Taluk=Jodhpur GroundWaterType=Very goog to good 75

Taluk=Phalodi WasteLandType=Land without scrub 57

WasteLandType=Land without scrub GroundWaterType=Very good to good 181

WasteLandType=Land withut scrub GroundWaterType=Good 44

Table IX. Large Itemsets L(3)

Item 1 Item 2 Item 3 Count

Taluk=Bilara WasteLandType=Land without

scrub

GroundWaterType=Very good to

good 81

Taluk=Jodhpur WasteLandType=Land without

scrub


good 75

The best rules found after applying Apriori algorithm are listed in the table X below.

Table X. Association Rules Mined for Ground Water and Waste Lands of Jodhpur District.

S.No. Body Implies Head Support

Conf %

1. GroundWaterType=Very good to

good

==> WasteLandType=Land without scrub 81 100

2. Taluk=Bilara ==> WasteLandType=Land without scrub 99 100

3. Taluk=Bilara


good 81


4. Taluk=Jodhpur 76 ==> WasteLandType=Land without scrub 76 100

5. Taluk=Jodhpur


good 75


6. GroundWaterType=Good 44 ==> WasteLandType=Land without scrub 44 100

7. Taluk=Jodhpur 76 ==> GroundWaterType=Very good to

good

76 99

8. Taluk=Jodhpur

WasteLandType=Land without

scrub 76

==> GroundWaterType=Very good to

good

76 99

9. Taluk=Jodhpur 76 ==> WasteLandType=Land without scrub


good

76 99

10. Taluk=Phalodi 61 ==> WasteLandType=Land without scrub 61 93

Results show that 13638.94 hectares of land fall under mining pattern. Analysis of results is shown in form of a graph

in figure 12. It shows that Bilara has the maximum (6481.05 hectares) waste lands distribution of the mined pattern. The

area mined is substantially a large one that can be utilized for vegetation production using the water underneath. The

same results presented above are obtained by implementing the WEKA Apriori Algorithm in own Java code.


June- 2015, pp. 751-758


Fig.12. Graph showing distribution of Wastelands in taluks of Jodhpur District.

V. CONCLUSION

The analysis of pattern shows that majority of wastelands without scrubs having very high groundwater lie in Bilara

region of Jodhpur District. Having good amount of water underneath, these lands can be used to produce firewood and

fodder for animals. Plant species like Acacia jacquemontii, Acacia leucophloea, Acacia senegal, Albizia lebbeck,

Azadirachta indica, Anogeissus rotundifolia, Prosopis cineraria, Salvadora oleoides, Tecomella undulata, Tamarix

articulate, Leucaena leucocephala, Tephrosia purpurea and Crotalaria medicaginea can be grown. Farmers can be

advised to cultivate crops using ground water irrigation. If we know that a land has good ground water level, then land

can be irrigated using this water. Even if the water underneath is saline, then also salt resistant species of plants can be

grown. In this way we can effectively utilize waste-lands.

VI. FUTURE WORK

A wide variety of research is being carried in the field of spatial data mining.

As the next level of this research, Fuzzy Spatial Association Rules could be determined.

Soil and crop data could also be used along with the ground water and wasteland data.

Also spatio-temporal association rules could be determined as an extension to this current research.

Hence, a lot of research is needed to be carried out in these emerging areas, focusing on its applicability to agriculture,

data mining and GIS, which will provide means for better utilization of natural resources.

ACKNOWLEDGMENT

The authors would like to thank ISRO, Jodhpur Centre for providing necessary data about the research scenario.

REFERENCES

[1] Mainaz Faridi, Seema Verma and Saurabh Mukherjee. 2012. Impact of ground water level and its quality on

fertility of land using GIS and Agriculture Business Intelligence. In Proceedings of Geomatrix’12- An

International Conference on Geospatial Technologies and Applications, IIT Bombay (Feb 2012).

[2] Yuan, May, B. Buttenfield, M. Gahegan, and Harvey Miller. 2004. Geospatial data mining and knowledge

discovery. Chapter 14 (2004): 365-388.

[3] Krzysztof Koperski, and Jiawei Han. 1995. Discovery of spatial association rules in geographic information

databases. Advances in spatial databases, Springer Berlin Heidelberg. vol 6, 47-66.

[4] Stan Openshaw. 1999. Geographical data mining: key design issues. In Proceedings of GeoComputation, vol.

99.

[5] Krzysztof Koperski, Jiawei Han, and Nebojsa Stefanovic. 1998. An efficient two-step method for classification

of spatial data. In Proceedings of International Symposium on Spatial Data Handling (SDH 1998), Vancouver,

BC, Canada. 45-54.

[6] Hong Tang and Simon McDonald. 2002. Integrating GIS and spatial data mining technique for target marketing

of university courses. In ISPRS Commission IV, Symposium, Ottawa Canada, (Jul 2002).

[7] D. Rajesh. 2011. Application of Spatial Data Mining for Agriculture. International Journal of Computer

Applications 15,2 (2011), 7-9.

[8] Tan, Pang-Ning, and Vipin Kumar. 2005. Chapter 6. Association Analysis: Basic Concepts and Algorithms."

Introduction to Data Mining. Addison-Wesley. ISBN 321321367 (2005).

[9] Rakesh Agrawal, and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In

Proceedings of 20th int. conf. very large data bases, VLDB, (1994), vol. 1215, 487-499.

[10] Chen, Junming, Guangfa Lin, and Zhihai Yang. 2011. Extracting spatial association rules from the maximum

frequent itemsets based on Boolean matrix. In Geoinformatics, 2011 19th International Conference on, IEEE

(2011), 1-5.

Volume 5, Issue 6, June 2015 ISSN: 2277 128X International...

Documents

Transcript of Volume 5, Issue 6, June 2015 ISSN: 2277 128X International...