Spatial Modelling of Gully Erosion Using GIS and R...

21
applied sciences Article Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms Alireza Arabameri 1 ID , Biswajeet Pradhan 2, * ID , Hamid Reza Pourghasemi 3 ID , Khalil Rezaei 4 and Norman Kerle 5 1 Department of Geomorphology, Tarbiat Modares University, Tehran 36581-17994, Iran; [email protected] 2 Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia 3 Department of Natural Resources and Environmental Engineering, College of Agriculture, Shiraz University, Shiraz 71441-65186, Iran; [email protected] 4 Faculty of Earth Sciences, Kharazmi University, Tehran 14911-15719, Iran; [email protected] 5 Department of Earth Systems Analysis (ESA), Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, 7522 Enschede, The Netherlands; [email protected] * Correspondence: [email protected] or [email protected] Received: 13 July 2018; Accepted: 10 August 2018; Published: 14 August 2018 Abstract: Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region. Keywords: gully erosion; environmental variables; data mining techniques; SCAI; GIS 1. Introduction Today, reducing natural resources, especially soil and water, is one of the major problems and major threats to human life and is one of the most important environmental problems worldwide that has intensified in recent years, with increasing population and the alternation of human activities [1]. According to the data from United Nations research, the world’s population is growing at a rate of 1.8% per year and it is expected to rise from 8 billion in 2025 to 9.4 billion in 2050 [2]. This increase in world population would demand the need for food, water, forage, and others, which consequently would add huge pressure on land exploitation, non-standard exploitation, and eventually lead to an increase in erosion rates [1,3]. Soil erosion is one of the factors that endangers water and soil [1]. Soil erosion by water, such as GE, is considered as a major cause of land degradation around the world [4,5]. It leads Appl. Sci. 2018, 8, 1369; doi:10.3390/app8081369 www.mdpi.com/journal/applsci

Transcript of Spatial Modelling of Gully Erosion Using GIS and R...

Page 1: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

applied sciences

Article

Spatial Modelling of Gully Erosion Using GIS andR Programing A Comparison among Three DataMining Algorithms

Alireza Arabameri 1 ID Biswajeet Pradhan 2 ID Hamid Reza Pourghasemi 3 ID Khalil Rezaei 4

and Norman Kerle 5

1 Department of Geomorphology Tarbiat Modares University Tehran 36581-17994 Iranalirezaameri91yahoocom

2 Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS) Faculty of Engineeringand IT University of Technology Sydney Ultimo NSW 2007 Australia

3 Department of Natural Resources and Environmental Engineering College of Agriculture Shiraz UniversityShiraz 71441-65186 Iran hm_porghasemiyahoocom

4 Faculty of Earth Sciences Kharazmi University Tehran 14911-15719 Iran khrezaeigmailcom5 Department of Earth Systems Analysis (ESA) Faculty of Geo-Information Science and Earth

Observation (ITC) University of Twente 7522 Enschede The Netherlands kerleitcnl Correspondence biswajeet24gmailcom or BiswajeetPradhanutseduau

Received 13 July 2018 Accepted 10 August 2018 Published 14 August 2018

Abstract Gully erosion triggers land degradation and restricts the use of land This study assessesthe spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) usingWeights-of-Evidence (WoE) Bayes theory and then applies three data mining methodsmdashRandomForest (RF) boosted regression tree (BRT) and multivariate adaptive regression spline (MARS)mdashforgully erosion susceptibility mapping (GESM) in the Shahroud watershed Iran Gully locations wereidentified by extensive field surveys and a total of 172 GE locations were mapped Twelve gully-relatedGEVs Elevation slope degree slope aspect plan curvature convergence index topographic wetnessindex (TWI) lithology land useland cover (LULC) distance from rivers distance from roadsdrainage density and NDVI were selected to model GE The results of variables importance by RFand BRT models indicated that distance from road elevation and lithology had the highest effect onGE occurrence The area under the curve (AUC) and seed cell area index (SCAI) methods were usedto validate the three GE maps The results showed that AUC for the three models varies from 0911 to0927 whereas the RF model had a prediction accuracy of 0927 as per SCAI values when comparedto the other models The findings will be of help for planning and developing the studied region

Keywords gully erosion environmental variables data mining techniques SCAI GIS

1 Introduction

Today reducing natural resources especially soil and water is one of the major problems andmajor threats to human life and is one of the most important environmental problems worldwide thathas intensified in recent years with increasing population and the alternation of human activities [1]According to the data from United Nations research the worldrsquos population is growing at a rate of 18per year and it is expected to rise from 8 billion in 2025 to 94 billion in 2050 [2] This increase in worldpopulation would demand the need for food water forage and others which consequently wouldadd huge pressure on land exploitation non-standard exploitation and eventually lead to an increasein erosion rates [13] Soil erosion is one of the factors that endangers water and soil [1] Soil erosion bywater such as GE is considered as a major cause of land degradation around the world [45] It leads

Appl Sci 2018 8 1369 doi103390app8081369 wwwmdpicomjournalapplsci

Appl Sci 2018 8 1369 2 of 21

to a range of problems such as desertification flooding and sediment deposition in reservoirs [67]the destructive effects on the ecosystem reducing soil fertility and imposes huge economic costs [8]GE is typically defined as a deep channel that has been eroded by concentrated water flow removingsurface soils and materials [910] The amount of moisture and its changes as a result of the dryand wet seasons is a main parameter in creating cracks and grooves in fine-grained clay formationscontaining clay and silt and ultimately developing rilled erosion and gullies [10] The alternation ofwarm and dry seasons makes it possible to create cracks in the formation of fine grains in warmseasons with the drying of the land and the wilting of the vegetation and these cracks at the time of thefirst sudden rainfall concentrate the runoff and therefore cause rill and GE to emerge [11] GE occurswhen the erosion of the water flow or the erodibility of the sediments or the formation of the areais higher than the geomorphological threshold of the area [11] Mapping gully erosion systems isessential for implementing soil conservation measures [6] GEVs that influence gully occurrence arerainfall topography-derived factors such as elevation slope degree slope aspect and plan curvaturelithology [12] soil properties [13] and LULC [14] The distribution of precipitation affects thehydraulic flow and moisture content of the soil and the erosion strength of the flow and soil resistanceto erosion is different before and after erosion [11] Generally the amount and volume of flow arecontrolled by the topographic features of the area including slope aspect and drainage area of the areaDepth and morphology of the cross section of the gullies are controlled by soil erodibility features ofthe geological layers of the area The characteristics of the regionrsquos soil affect the subsurface flow andthe phenomenon of piping erosion and the pipes cause a gully when their ceiling collapses [10]

Susceptibility maps of GE are essential for conservation of natural resources and for evaluatingthe relationship among gully occurrence and relevant GEVs [12] Several models have been applied toassess soil erosion and GE rate in a quantitative and qualitative way such as the Universal Soil LossEquation (USLE) [115] Erosion Potential Method Modified Pacific Southwest Interagency CommitteeModel (MPSIAC) [16] Water Erosion Prediction Project (WEEP) [17] European Soil Erosion Model(EUROSEM) [18] Ephemeral Gully Erosion Model (EGEM) [19] and Chemicals Runoff and Erosionfrom Agricultural Management Systems (CREAMS) [20]

Within the soil conservation research field the distribution of soil erosion is one of the primarysources of information This is also relevant for GE however in the above mentioned methodsspatial distribution of gullies has not been addressed Remote sensing-based methods to identify GEhave been developed [21] including with RF machine learning though they serve more to validatesusceptibility models and to explain the actual erosion presence and distribution In recent yearsscientific research for susceptibility analysis of GE and work on the statistical relationships betweenGEVs and the spatial distribution of gullies have been addressed using various statistical and machinelearning methods including bivariate statistics (BS) [1] weights-of-evidence (WoE) [13] index ofentropy (IofE) [8] logistic regression (LR) [22ndash26] information value (IV) [2425] random forest(RF) [27] bivariate statistical models [2829] maximum entropy (ME) [3031] frequency ratio (FR) [28]analytical hierarchy processes (AHP) [29] artificial neural network (ANN) [1231] support vectormachine (SVM) [31] and boosted regression trees (BRT) [12] For this purpose various GEVs such astopography (eg elevation slope aspect plan curvature profile curvature slope length) lithologyland use soil properties (eg soil texture soil type erosivity soil water content) land use climate(rainfall intensity rainfall period and spatial distribution of rainfall) infrastructures (road bridge)and hydrology (eg TWI SPI drainage density) were used

A comprehensive literature review shows that there are still dimensions that require furtherresearch and that a large number of potentially useful methods have not yet been fully implementedto provide GE susceptibility maps The main objectives of this study are (i) To determine therelationship between gully occurrence and conditioning factors using Weights-of-Evidence Bayestheory (ii) assessing the capability of RF MARS and BRT data miningmachine learning models topredict GE susceptibility and (iii) validation of models using the AUC curve and SCAI methods Study

Appl Sci 2018 8 1369 3 of 21

of the research background showed that using MARS BRT and RF data mining models in GE zonationis very new It will help managers in future planning to prevent human intervention in sensitive areas

2 Materials and Methods

21 Study Area

The Shahroud watershed with an area of about 848 km2 and elevation range from 1084 to 2131 masl is located in the northeastern part of Semnan Province Iran (Figure 1) The study area receivesan average rainfall of less than 250 mm has an arid and semi-arid climate [32] Various types oflithological formations cover this watershed and the landforms are mainly low level pediment fansand valley terrace deposits The dominant land use is rangelands but irrigation farming and barelands are also present

Appl Sci 2018 8 x FOR PEER REVIEW 3 of 21

zonation is very new It will help managers in future planning to prevent human intervention in sensitive areas

2 Materials and Methods

21 Study Area

The Shahroud watershed with an area of about 848 km2 and elevation range from 1084 to 2131 m asl is located in the northeastern part of Semnan Province Iran (Figure 1) The study area receives an average rainfall of less than 250 mm has an arid and semi-arid climate [32] Various types of lithological formations cover this watershed and the landforms are mainly low level pediment fans and valley terrace deposits The dominant land use is rangelands but irrigation farming and bare lands are also present

Figure 1 (a) Location of the Semnan provinces in Iran (b) location of study area and (c) gully erosion locations with the digital elevation model map of the Shahroud watershed

22 Data and Method

Figure 2 shows the methodological approach applied to map GE susceptibility in the Shahroud watershed using BRT MARS and RF models For preparing an accurate and reliable gully inventory map extensive field surveys with a DGPS device were performed in the study area to determine the location of the Gullies [2728] Then among 172 detected gully locations randomly (7030 ratio) 121 gully locations (70) and 51 gully locations (30) in the polygon format were used for training the testing models [28] The locations of training and testing gullies are shown in Figure 1

Figure 1 (a) Location of the Semnan provinces in Iran (b) location of study area and (c) gully erosionlocations with the digital elevation model map of the Shahroud watershed

22 Data and Method

Figure 2 shows the methodological approach applied to map GE susceptibility in the Shahroudwatershed using BRT MARS and RF models For preparing an accurate and reliable gully inventorymap extensive field surveys with a DGPS device were performed in the study area to determinethe location of the Gullies [2728] Then among 172 detected gully locations randomly (7030 ratio)121 gully locations (70) and 51 gully locations (30) in the polygon format were used for training the

Appl Sci 2018 8 1369 4 of 21

testing models [28] The locations of training and testing gullies are shown in Figure 1 Interventionarystudies involving animals or humans and other studies require ethical approval must list the authoritythat provided approval and the corresponding ethical approval code

Appl Sci 2018 8 x FOR PEER REVIEW 4 of 21

Interventionary studies involving animals or humans and other studies require ethical approval must list the authority that provided approval and the corresponding ethical approval code

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basic maps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000 satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34] In this study based on literature review [242631] and local conditions of the study area twelve factors were selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m 1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and pattern of drainage density Slope degree map was classified into six classes [2426] lt5deg 5ndashlt10deg 10ndashlt15deg 15ndashlt20deg 20ndashlt25deg 25ndash30deg and gt30deg (Figure 3b)

Figure 2 Flowchart of research methodology

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plan curvatures define convexity and concavity of slope curvature whereas zero is flat surface The plan curvature map was divided into 3 categories Concave Flat and Convex The TWI indicator is important for identifying prone areas to GE [36] TWI is calculated by Equation (1) TWI = In Stan prop (1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11 (Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and divided into 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect of drainage network and infrastructures on GE the distance from rivers and roads was considered [14] and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers

Figure 2 Flowchart of research methodology

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basicmaps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34]In this study based on literature review [242631] and local conditions of the study area twelve factorswere selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and patternof drainage density Slope degree map was classified into six classes [2426] lt5 5ndashlt10 10ndashlt1515ndashlt20 20ndashlt25 25ndash30 and gt30 (Figure 3b)

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plancurvatures define convexity and concavity of slope curvature whereas zero is flat surface The plancurvature map was divided into 3 categories Concave Flat and Convex The TWI indicator isimportant for identifying prone areas to GE [36] TWI is calculated by Equation (1)

TWI = ln(

Stan prop

)(1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11(Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence

Appl Sci 2018 8 1369 5 of 21

index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and dividedinto 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect ofdrainage network and infrastructures on GE the distance from rivers and roads was considered [14]and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers (Figure 3g)and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density toolin ArcGIS 105 was used for calculating drainage density and then its map was divided into fourcategories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scalewas used to prepare the lithological unit layer The lithological units were classified into ten categoriesbased on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1)The advantage of this method is it is easy to use however this method has certain disadvantages suchas the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age

Murmg Gypsiferous marl MioceneQft2 Low level piedment fan and vally terrace deposits QuaternaryKu Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceouslimestone with intercalations of calcareous shale (DALICHAI FM) Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryJl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-CretaceousE2c Conglomerate and sandstone EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryE1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identifiedin the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map wasalso produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025(Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were usedIf during modeling there is collinearity among the variables the accuracy of the modelrsquos predictiondecreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearityamong parameters [28]

Appl Sci 2018 8 x FOR PEER REVIEW 5 of 21

(Figure 3g) and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density tool in ArcGIS 105 was used for calculating drainage density and then its map was divided into four categories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scale was used to prepare the lithological unit layer The lithological units were classified into ten categories based on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1) The advantage of this method is it is easy to use however this method has certain disadvantages such as the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age Murmg Gypsiferous marl Miocene Qft2 Low level piedment fan and vally terrace deposits Quaternary Ku Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceous limestone with intercalations of calcareous shale (DALICHAI FM)

Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary Jl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-Cretaceous E2c Conglomerate and sandstone Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary E1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identified in the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map was also produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025 (Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were used If during modeling there is collinearity among the variables the accuracy of the modelrsquos prediction decreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearity among parameters [28]

Figure 3 Cont

Appl Sci 2018 8 1369 6 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 6 of 21

Figure 3 Cont

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 2: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 2 of 21

to a range of problems such as desertification flooding and sediment deposition in reservoirs [67]the destructive effects on the ecosystem reducing soil fertility and imposes huge economic costs [8]GE is typically defined as a deep channel that has been eroded by concentrated water flow removingsurface soils and materials [910] The amount of moisture and its changes as a result of the dryand wet seasons is a main parameter in creating cracks and grooves in fine-grained clay formationscontaining clay and silt and ultimately developing rilled erosion and gullies [10] The alternation ofwarm and dry seasons makes it possible to create cracks in the formation of fine grains in warmseasons with the drying of the land and the wilting of the vegetation and these cracks at the time of thefirst sudden rainfall concentrate the runoff and therefore cause rill and GE to emerge [11] GE occurswhen the erosion of the water flow or the erodibility of the sediments or the formation of the areais higher than the geomorphological threshold of the area [11] Mapping gully erosion systems isessential for implementing soil conservation measures [6] GEVs that influence gully occurrence arerainfall topography-derived factors such as elevation slope degree slope aspect and plan curvaturelithology [12] soil properties [13] and LULC [14] The distribution of precipitation affects thehydraulic flow and moisture content of the soil and the erosion strength of the flow and soil resistanceto erosion is different before and after erosion [11] Generally the amount and volume of flow arecontrolled by the topographic features of the area including slope aspect and drainage area of the areaDepth and morphology of the cross section of the gullies are controlled by soil erodibility features ofthe geological layers of the area The characteristics of the regionrsquos soil affect the subsurface flow andthe phenomenon of piping erosion and the pipes cause a gully when their ceiling collapses [10]

Susceptibility maps of GE are essential for conservation of natural resources and for evaluatingthe relationship among gully occurrence and relevant GEVs [12] Several models have been applied toassess soil erosion and GE rate in a quantitative and qualitative way such as the Universal Soil LossEquation (USLE) [115] Erosion Potential Method Modified Pacific Southwest Interagency CommitteeModel (MPSIAC) [16] Water Erosion Prediction Project (WEEP) [17] European Soil Erosion Model(EUROSEM) [18] Ephemeral Gully Erosion Model (EGEM) [19] and Chemicals Runoff and Erosionfrom Agricultural Management Systems (CREAMS) [20]

Within the soil conservation research field the distribution of soil erosion is one of the primarysources of information This is also relevant for GE however in the above mentioned methodsspatial distribution of gullies has not been addressed Remote sensing-based methods to identify GEhave been developed [21] including with RF machine learning though they serve more to validatesusceptibility models and to explain the actual erosion presence and distribution In recent yearsscientific research for susceptibility analysis of GE and work on the statistical relationships betweenGEVs and the spatial distribution of gullies have been addressed using various statistical and machinelearning methods including bivariate statistics (BS) [1] weights-of-evidence (WoE) [13] index ofentropy (IofE) [8] logistic regression (LR) [22ndash26] information value (IV) [2425] random forest(RF) [27] bivariate statistical models [2829] maximum entropy (ME) [3031] frequency ratio (FR) [28]analytical hierarchy processes (AHP) [29] artificial neural network (ANN) [1231] support vectormachine (SVM) [31] and boosted regression trees (BRT) [12] For this purpose various GEVs such astopography (eg elevation slope aspect plan curvature profile curvature slope length) lithologyland use soil properties (eg soil texture soil type erosivity soil water content) land use climate(rainfall intensity rainfall period and spatial distribution of rainfall) infrastructures (road bridge)and hydrology (eg TWI SPI drainage density) were used

A comprehensive literature review shows that there are still dimensions that require furtherresearch and that a large number of potentially useful methods have not yet been fully implementedto provide GE susceptibility maps The main objectives of this study are (i) To determine therelationship between gully occurrence and conditioning factors using Weights-of-Evidence Bayestheory (ii) assessing the capability of RF MARS and BRT data miningmachine learning models topredict GE susceptibility and (iii) validation of models using the AUC curve and SCAI methods Study

Appl Sci 2018 8 1369 3 of 21

of the research background showed that using MARS BRT and RF data mining models in GE zonationis very new It will help managers in future planning to prevent human intervention in sensitive areas

2 Materials and Methods

21 Study Area

The Shahroud watershed with an area of about 848 km2 and elevation range from 1084 to 2131 masl is located in the northeastern part of Semnan Province Iran (Figure 1) The study area receivesan average rainfall of less than 250 mm has an arid and semi-arid climate [32] Various types oflithological formations cover this watershed and the landforms are mainly low level pediment fansand valley terrace deposits The dominant land use is rangelands but irrigation farming and barelands are also present

Appl Sci 2018 8 x FOR PEER REVIEW 3 of 21

zonation is very new It will help managers in future planning to prevent human intervention in sensitive areas

2 Materials and Methods

21 Study Area

The Shahroud watershed with an area of about 848 km2 and elevation range from 1084 to 2131 m asl is located in the northeastern part of Semnan Province Iran (Figure 1) The study area receives an average rainfall of less than 250 mm has an arid and semi-arid climate [32] Various types of lithological formations cover this watershed and the landforms are mainly low level pediment fans and valley terrace deposits The dominant land use is rangelands but irrigation farming and bare lands are also present

Figure 1 (a) Location of the Semnan provinces in Iran (b) location of study area and (c) gully erosion locations with the digital elevation model map of the Shahroud watershed

22 Data and Method

Figure 2 shows the methodological approach applied to map GE susceptibility in the Shahroud watershed using BRT MARS and RF models For preparing an accurate and reliable gully inventory map extensive field surveys with a DGPS device were performed in the study area to determine the location of the Gullies [2728] Then among 172 detected gully locations randomly (7030 ratio) 121 gully locations (70) and 51 gully locations (30) in the polygon format were used for training the testing models [28] The locations of training and testing gullies are shown in Figure 1

Figure 1 (a) Location of the Semnan provinces in Iran (b) location of study area and (c) gully erosionlocations with the digital elevation model map of the Shahroud watershed

22 Data and Method

Figure 2 shows the methodological approach applied to map GE susceptibility in the Shahroudwatershed using BRT MARS and RF models For preparing an accurate and reliable gully inventorymap extensive field surveys with a DGPS device were performed in the study area to determinethe location of the Gullies [2728] Then among 172 detected gully locations randomly (7030 ratio)121 gully locations (70) and 51 gully locations (30) in the polygon format were used for training the

Appl Sci 2018 8 1369 4 of 21

testing models [28] The locations of training and testing gullies are shown in Figure 1 Interventionarystudies involving animals or humans and other studies require ethical approval must list the authoritythat provided approval and the corresponding ethical approval code

Appl Sci 2018 8 x FOR PEER REVIEW 4 of 21

Interventionary studies involving animals or humans and other studies require ethical approval must list the authority that provided approval and the corresponding ethical approval code

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basic maps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000 satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34] In this study based on literature review [242631] and local conditions of the study area twelve factors were selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m 1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and pattern of drainage density Slope degree map was classified into six classes [2426] lt5deg 5ndashlt10deg 10ndashlt15deg 15ndashlt20deg 20ndashlt25deg 25ndash30deg and gt30deg (Figure 3b)

Figure 2 Flowchart of research methodology

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plan curvatures define convexity and concavity of slope curvature whereas zero is flat surface The plan curvature map was divided into 3 categories Concave Flat and Convex The TWI indicator is important for identifying prone areas to GE [36] TWI is calculated by Equation (1) TWI = In Stan prop (1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11 (Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and divided into 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect of drainage network and infrastructures on GE the distance from rivers and roads was considered [14] and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers

Figure 2 Flowchart of research methodology

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basicmaps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34]In this study based on literature review [242631] and local conditions of the study area twelve factorswere selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and patternof drainage density Slope degree map was classified into six classes [2426] lt5 5ndashlt10 10ndashlt1515ndashlt20 20ndashlt25 25ndash30 and gt30 (Figure 3b)

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plancurvatures define convexity and concavity of slope curvature whereas zero is flat surface The plancurvature map was divided into 3 categories Concave Flat and Convex The TWI indicator isimportant for identifying prone areas to GE [36] TWI is calculated by Equation (1)

TWI = ln(

Stan prop

)(1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11(Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence

Appl Sci 2018 8 1369 5 of 21

index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and dividedinto 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect ofdrainage network and infrastructures on GE the distance from rivers and roads was considered [14]and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers (Figure 3g)and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density toolin ArcGIS 105 was used for calculating drainage density and then its map was divided into fourcategories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scalewas used to prepare the lithological unit layer The lithological units were classified into ten categoriesbased on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1)The advantage of this method is it is easy to use however this method has certain disadvantages suchas the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age

Murmg Gypsiferous marl MioceneQft2 Low level piedment fan and vally terrace deposits QuaternaryKu Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceouslimestone with intercalations of calcareous shale (DALICHAI FM) Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryJl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-CretaceousE2c Conglomerate and sandstone EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryE1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identifiedin the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map wasalso produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025(Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were usedIf during modeling there is collinearity among the variables the accuracy of the modelrsquos predictiondecreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearityamong parameters [28]

Appl Sci 2018 8 x FOR PEER REVIEW 5 of 21

(Figure 3g) and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density tool in ArcGIS 105 was used for calculating drainage density and then its map was divided into four categories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scale was used to prepare the lithological unit layer The lithological units were classified into ten categories based on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1) The advantage of this method is it is easy to use however this method has certain disadvantages such as the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age Murmg Gypsiferous marl Miocene Qft2 Low level piedment fan and vally terrace deposits Quaternary Ku Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceous limestone with intercalations of calcareous shale (DALICHAI FM)

Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary Jl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-Cretaceous E2c Conglomerate and sandstone Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary E1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identified in the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map was also produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025 (Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were used If during modeling there is collinearity among the variables the accuracy of the modelrsquos prediction decreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearity among parameters [28]

Figure 3 Cont

Appl Sci 2018 8 1369 6 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 6 of 21

Figure 3 Cont

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 3: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 3 of 21

of the research background showed that using MARS BRT and RF data mining models in GE zonationis very new It will help managers in future planning to prevent human intervention in sensitive areas

2 Materials and Methods

21 Study Area

The Shahroud watershed with an area of about 848 km2 and elevation range from 1084 to 2131 masl is located in the northeastern part of Semnan Province Iran (Figure 1) The study area receivesan average rainfall of less than 250 mm has an arid and semi-arid climate [32] Various types oflithological formations cover this watershed and the landforms are mainly low level pediment fansand valley terrace deposits The dominant land use is rangelands but irrigation farming and barelands are also present

Appl Sci 2018 8 x FOR PEER REVIEW 3 of 21

zonation is very new It will help managers in future planning to prevent human intervention in sensitive areas

2 Materials and Methods

21 Study Area

The Shahroud watershed with an area of about 848 km2 and elevation range from 1084 to 2131 m asl is located in the northeastern part of Semnan Province Iran (Figure 1) The study area receives an average rainfall of less than 250 mm has an arid and semi-arid climate [32] Various types of lithological formations cover this watershed and the landforms are mainly low level pediment fans and valley terrace deposits The dominant land use is rangelands but irrigation farming and bare lands are also present

Figure 1 (a) Location of the Semnan provinces in Iran (b) location of study area and (c) gully erosion locations with the digital elevation model map of the Shahroud watershed

22 Data and Method

Figure 2 shows the methodological approach applied to map GE susceptibility in the Shahroud watershed using BRT MARS and RF models For preparing an accurate and reliable gully inventory map extensive field surveys with a DGPS device were performed in the study area to determine the location of the Gullies [2728] Then among 172 detected gully locations randomly (7030 ratio) 121 gully locations (70) and 51 gully locations (30) in the polygon format were used for training the testing models [28] The locations of training and testing gullies are shown in Figure 1

Figure 1 (a) Location of the Semnan provinces in Iran (b) location of study area and (c) gully erosionlocations with the digital elevation model map of the Shahroud watershed

22 Data and Method

Figure 2 shows the methodological approach applied to map GE susceptibility in the Shahroudwatershed using BRT MARS and RF models For preparing an accurate and reliable gully inventorymap extensive field surveys with a DGPS device were performed in the study area to determinethe location of the Gullies [2728] Then among 172 detected gully locations randomly (7030 ratio)121 gully locations (70) and 51 gully locations (30) in the polygon format were used for training the

Appl Sci 2018 8 1369 4 of 21

testing models [28] The locations of training and testing gullies are shown in Figure 1 Interventionarystudies involving animals or humans and other studies require ethical approval must list the authoritythat provided approval and the corresponding ethical approval code

Appl Sci 2018 8 x FOR PEER REVIEW 4 of 21

Interventionary studies involving animals or humans and other studies require ethical approval must list the authority that provided approval and the corresponding ethical approval code

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basic maps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000 satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34] In this study based on literature review [242631] and local conditions of the study area twelve factors were selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m 1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and pattern of drainage density Slope degree map was classified into six classes [2426] lt5deg 5ndashlt10deg 10ndashlt15deg 15ndashlt20deg 20ndashlt25deg 25ndash30deg and gt30deg (Figure 3b)

Figure 2 Flowchart of research methodology

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plan curvatures define convexity and concavity of slope curvature whereas zero is flat surface The plan curvature map was divided into 3 categories Concave Flat and Convex The TWI indicator is important for identifying prone areas to GE [36] TWI is calculated by Equation (1) TWI = In Stan prop (1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11 (Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and divided into 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect of drainage network and infrastructures on GE the distance from rivers and roads was considered [14] and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers

Figure 2 Flowchart of research methodology

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basicmaps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34]In this study based on literature review [242631] and local conditions of the study area twelve factorswere selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and patternof drainage density Slope degree map was classified into six classes [2426] lt5 5ndashlt10 10ndashlt1515ndashlt20 20ndashlt25 25ndash30 and gt30 (Figure 3b)

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plancurvatures define convexity and concavity of slope curvature whereas zero is flat surface The plancurvature map was divided into 3 categories Concave Flat and Convex The TWI indicator isimportant for identifying prone areas to GE [36] TWI is calculated by Equation (1)

TWI = ln(

Stan prop

)(1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11(Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence

Appl Sci 2018 8 1369 5 of 21

index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and dividedinto 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect ofdrainage network and infrastructures on GE the distance from rivers and roads was considered [14]and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers (Figure 3g)and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density toolin ArcGIS 105 was used for calculating drainage density and then its map was divided into fourcategories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scalewas used to prepare the lithological unit layer The lithological units were classified into ten categoriesbased on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1)The advantage of this method is it is easy to use however this method has certain disadvantages suchas the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age

Murmg Gypsiferous marl MioceneQft2 Low level piedment fan and vally terrace deposits QuaternaryKu Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceouslimestone with intercalations of calcareous shale (DALICHAI FM) Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryJl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-CretaceousE2c Conglomerate and sandstone EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryE1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identifiedin the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map wasalso produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025(Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were usedIf during modeling there is collinearity among the variables the accuracy of the modelrsquos predictiondecreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearityamong parameters [28]

Appl Sci 2018 8 x FOR PEER REVIEW 5 of 21

(Figure 3g) and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density tool in ArcGIS 105 was used for calculating drainage density and then its map was divided into four categories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scale was used to prepare the lithological unit layer The lithological units were classified into ten categories based on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1) The advantage of this method is it is easy to use however this method has certain disadvantages such as the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age Murmg Gypsiferous marl Miocene Qft2 Low level piedment fan and vally terrace deposits Quaternary Ku Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceous limestone with intercalations of calcareous shale (DALICHAI FM)

Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary Jl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-Cretaceous E2c Conglomerate and sandstone Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary E1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identified in the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map was also produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025 (Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were used If during modeling there is collinearity among the variables the accuracy of the modelrsquos prediction decreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearity among parameters [28]

Figure 3 Cont

Appl Sci 2018 8 1369 6 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 6 of 21

Figure 3 Cont

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 4: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 4 of 21

testing models [28] The locations of training and testing gullies are shown in Figure 1 Interventionarystudies involving animals or humans and other studies require ethical approval must list the authoritythat provided approval and the corresponding ethical approval code

Appl Sci 2018 8 x FOR PEER REVIEW 4 of 21

Interventionary studies involving animals or humans and other studies require ethical approval must list the authority that provided approval and the corresponding ethical approval code

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basic maps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000 satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34] In this study based on literature review [242631] and local conditions of the study area twelve factors were selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m 1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and pattern of drainage density Slope degree map was classified into six classes [2426] lt5deg 5ndashlt10deg 10ndashlt15deg 15ndashlt20deg 20ndashlt25deg 25ndash30deg and gt30deg (Figure 3b)

Figure 2 Flowchart of research methodology

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plan curvatures define convexity and concavity of slope curvature whereas zero is flat surface The plan curvature map was divided into 3 categories Concave Flat and Convex The TWI indicator is important for identifying prone areas to GE [36] TWI is calculated by Equation (1) TWI = In Stan prop (1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11 (Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and divided into 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect of drainage network and infrastructures on GE the distance from rivers and roads was considered [14] and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers

Figure 2 Flowchart of research methodology

The tools used in present study are ArcGIS105 ENVI 48 SAGA-GIS 211 and a DGPS The basicmaps used were geological maps [33] at a scale of 1100000 topographic maps at a scale of 150000satellite images acquired by Landsat8 and ASTER GDEM with spatial resolution of 30 m [34]In this study based on literature review [242631] and local conditions of the study area twelve factorswere selected Elevation map was divided into six classes lt1200 m 1200ndashlt1350 m 1350ndashlt1450 m1450ndash1600 m and gt1600 m (Figure 3a) Slope degree affects surface runoff [35] soil erosion and patternof drainage density Slope degree map was classified into six classes [2426] lt5 5ndashlt10 10ndashlt1515ndashlt20 20ndashlt25 25ndash30 and gt30 (Figure 3b)

The aspect map was classified into nine classes (Figure 3c) Positive and negative values of plancurvatures define convexity and concavity of slope curvature whereas zero is flat surface The plancurvature map was divided into 3 categories Concave Flat and Convex The TWI indicator isimportant for identifying prone areas to GE [36] TWI is calculated by Equation (1)

TWI = ln(

Stan prop

)(1)

TWI map of study area is divided into four classes [242637] including lt5 5ndashlt75 75ndash11 and gt11(Figure 3e) The convergence index (CI) gives a measure of how flow in a cell diverges (convergence

Appl Sci 2018 8 1369 5 of 21

index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and dividedinto 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect ofdrainage network and infrastructures on GE the distance from rivers and roads was considered [14]and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers (Figure 3g)and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density toolin ArcGIS 105 was used for calculating drainage density and then its map was divided into fourcategories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scalewas used to prepare the lithological unit layer The lithological units were classified into ten categoriesbased on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1)The advantage of this method is it is easy to use however this method has certain disadvantages suchas the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age

Murmg Gypsiferous marl MioceneQft2 Low level piedment fan and vally terrace deposits QuaternaryKu Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceouslimestone with intercalations of calcareous shale (DALICHAI FM) Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryJl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-CretaceousE2c Conglomerate and sandstone EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryE1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identifiedin the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map wasalso produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025(Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were usedIf during modeling there is collinearity among the variables the accuracy of the modelrsquos predictiondecreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearityamong parameters [28]

Appl Sci 2018 8 x FOR PEER REVIEW 5 of 21

(Figure 3g) and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density tool in ArcGIS 105 was used for calculating drainage density and then its map was divided into four categories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scale was used to prepare the lithological unit layer The lithological units were classified into ten categories based on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1) The advantage of this method is it is easy to use however this method has certain disadvantages such as the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age Murmg Gypsiferous marl Miocene Qft2 Low level piedment fan and vally terrace deposits Quaternary Ku Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceous limestone with intercalations of calcareous shale (DALICHAI FM)

Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary Jl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-Cretaceous E2c Conglomerate and sandstone Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary E1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identified in the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map was also produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025 (Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were used If during modeling there is collinearity among the variables the accuracy of the modelrsquos prediction decreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearity among parameters [28]

Figure 3 Cont

Appl Sci 2018 8 1369 6 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 6 of 21

Figure 3 Cont

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 5: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 5 of 21

index in negative and positive values) [38] The CI map was prepared in SAGA-GIS 211 and dividedinto 3 classes lt0 0ndash10 and gt10 (Figure 3f) In this research for the computation of the effect ofdrainage network and infrastructures on GE the distance from rivers and roads was considered [14]and divided into four classes lt170 m 170ndashlt370 m 370ndash650 m and gt650 m for rivers (Figure 3g)and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density toolin ArcGIS 105 was used for calculating drainage density and then its map was divided into fourcategories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scalewas used to prepare the lithological unit layer The lithological units were classified into ten categoriesbased on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1)The advantage of this method is it is easy to use however this method has certain disadvantages suchas the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age

Murmg Gypsiferous marl MioceneQft2 Low level piedment fan and vally terrace deposits QuaternaryKu Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceouslimestone with intercalations of calcareous shale (DALICHAI FM) Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryJl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-CretaceousE2c Conglomerate and sandstone EocenePlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-QuaternaryE1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identifiedin the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map wasalso produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025(Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were usedIf during modeling there is collinearity among the variables the accuracy of the modelrsquos predictiondecreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearityamong parameters [28]

Appl Sci 2018 8 x FOR PEER REVIEW 5 of 21

(Figure 3g) and lt500 m 500ndashlt1500 m 1500ndash3000 m and gt3000 m for roads (Figure 3h) The line density tool in ArcGIS 105 was used for calculating drainage density and then its map was divided into four categories lt14 14ndashlt24 24ndash37 and gt37 kmkm2 (Figure 3i) A geological map at a 1100000 scale was used to prepare the lithological unit layer The lithological units were classified into ten categories based on their sensitivity to gully occurrence using expert knowledge method (Figure 3j and Table 1) The advantage of this method is it is easy to use however this method has certain disadvantages such as the possibility of a mistake by the expert

Table 1 Lithology of the study area

Code Lithology Geological Age Murmg Gypsiferous marl Miocene Qft2 Low level piedment fan and vally terrace deposits Quaternary Ku Upper cretaceous undifferentiated rocks Cretaceous

Jd Wellmdashbedded to thinmdashbedded greenishmdashgrey argillaceous limestone with intercalations of calcareous shale (DALICHAI FM)

Jurassic

PeEz Reef-type limestone and gypsiferous marl (ZIARAT FM) Paleocene-Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary Jl Light grey thinmdashbedded to massive limestone (LAR FM) Jurassic-Cretaceous E2c Conglomerate and sandstone Eocene PlQc Fluvial conglomerate Piedmont conglomerate and sandstone Pliocene-Quaternary E1c Pale-red polygenic conglomerate and sandstone Paleocene-Eocene

The LULC map was obtained using Landsat 8 images [39ndash41] The main LULC types identified in the study area were range irrigation farming and bare lands (Figure 3k) The NDVI map was also produced using Landsat 8 images and classified into 3 categories lt011 011ndash025 and gt025 (Figure 3l)

For multi-collinearity checking the tolerance (TOL) and variance inflation factor (VIF) were used If during modeling there is collinearity among the variables the accuracy of the modelrsquos prediction decreases Values of TOL and VIF were le01 and ge10 respectively indicating that multi-collinearity among parameters [28]

Figure 3 Cont

Appl Sci 2018 8 1369 6 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 6 of 21

Figure 3 Cont

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 6: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 6 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 6 of 21

Figure 3 Cont

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 7: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 7 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 7 of 21

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature (e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distance from road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effective factor classes through a statistical approach [42ndash51] In this method the spatial relationship between GE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) and negative (Wminus) weights This model computes the weight of each GEVs according to the existence or absence of the gully inventory [52] as follows W = ln p B|LP B|L (2)

W = ln P B|LP B|L (3) C = W +W (4) S(C) = S ( ) + S ( ) (5) S (W ) = 1N B cap L + 1B cap L (6)

S (W ) = 1B cap L + 1B cap L (7)

W = CS(C) (8)

where ln is the natural log function and P is the probability B and B indicate the presence and absence of the gully geo-environmental factor respectively L is the presence of gully and L is the absence of a gully W and W are positive and negative weights with W indicating that a geo-environmental factor is present in the gully inventory S (W ) is the variance of the W and S (W ) is the variance of W C indicates the overall association between GEVs and gully occurrence S (C) is the standard deviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RF algorithm by replacing and continuously changing the factors that affect the target leads to the creation of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of 3 user-defined parameters which include (1) The number of variables used in the

Figure 3 Gully erosion conditioning factors (a) Elevation (b) slope (c) aspect (d) plan curvature(e) TWI (f) convergence index (j) geology (h) distance from road (i) drainage density (g) distancefrom road (k) land useland cover (LULC) and (l) NDVI

23 Gully Erosion Modelling

231 WoE Model

WoE is according to the Bayesian probability framework to predict the significance of effectivefactor classes through a statistical approach [42ndash51] In this method the spatial relationship betweenGE areas and GEVs are identified The WoE model is based on the calculation of positive (W+) andnegative (Wminus) weights This model computes the weight of each GEVs according to the existence orabsence of the gully inventory [52] as follows

W+i = ln

(pB|L

P

B∣∣L

)(2)

Wminusi = ln

(P

B∣∣L

P

B∣∣L

)(3)

C = W+ + Wminus (4)

S(C) =radic

S2(W+) + S2(Wminus) (5)

S2(W+)=

1NBcap L +

1Bcap L (6)

S2(Wminus) = 1Bcap L

+1

Bcap L (7)

W =

(C

S(C)

)(8)

where ln is the natural log function and P is the probability B and B indicate the presence and absenceof the gully geo-environmental factor respectively L is the presence of gully and L is the absence ofa gully W+ and Wminus are positive and negative weights with W+ indicating that a geo-environmentalfactor is present in the gully inventory S2(W+

)is the variance of the W+ and S2(Wminus) is the variance

of Wminus C indicates the overall association between GEVs and gully occurrence S (C) is the standarddeviation of the contrast and W is final weight of each class factor

232 RF Model

RF is a controlled learning method that uses multiple trees in the classification [21] The RFalgorithm by replacing and continuously changing the factors that affect the target leads to the creation

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 8: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 8 of 21

of a large number of decision trees then all trees are combined to make decisions [21] The RF consists of3 user-defined parameters which include (1) The number of variables used in the construction of eachtree which expresses the power of each independent tree (2) number of trees in RF and (3) minimumnumber of nodes [43] RF prediction power increases with the increasing strength of independent treesand reducing the correlation between them [44] This algorithm uses 66 of the data to grow a treecalled Bootstrap and then a predictor variable is introduced randomly during the growing process tosplit a node in the tree construction The remaining 33 of the data is also used to evaluate the fittedtree [45] This process is repeated several times and the average of all predicted values is used as thefinal prediction of the algorithm In this model two factors including the mean decrease accuracy andmean decrease Gini are used to prioritize of each of the effective factors The use of the mean decreaseaccuracy in comparison to mean decrease Gini index is more effective in determining the priorityof effective factors especially in the context of the relationship between environmental factors [46]The RF analyses were carried out in R 331 using the ldquoRandomforestrdquo package [21]

233 BRT Model

BRT is one of several techniques that can help improve the performance of a single model bycombining multiple models [47] BRT uses two algorithms for modeling Boosting and regression [48]Boosting is a way to increase the accuracy of the model and based on this the constructioncombination and averaging of a large number of models are better and more accurate thanan individual model on its own [49] BRT overcomes the greatest weakness of the single decisiontree which is relatively weak in data processing In BRT only the first tree of all the training data isconstructed the next trees are grown on the remaining data from the tree before it trees are not builton all data and only use a number of data [50] The main idea in this method is to combine a set ofweak predictor models (high predictive error) to arrive at strong prediction (low predictive error) [51]Thus in this study BRT was used for GE spatial modeling using GMB (Generalized Boosted Models)and dismo (Species Distribution Modeling) packages in R 3313

234 MARS Model

The MARS model is a form of regression algorithm that was introduced by Friedman in 1991 topredict continuous numerical outputs [52] This technique generates flexible regression models forpredicting the target variable by means of dividing the problem space into intervals of input variablesand processing a basic function in each interval

The base function represents information in relation to one or more independent variables A basefunction is defined in a given interval in which the primary and end points are called knote The knoteis the key concept in this method and represents the point at which the behavior of the functionchanges at that point The base function expresses the relationship between the input variables and thetarget variable and is in the form of Max (0 X minus c) or Max (0 c minus X) in which c is threshold value andX is the impute variable The general form of the MARS model is as follows

f (x) = β0 +P

sumj=1

B

sumb=1

[β jb(+)Max

(0 xi minus Hbj

)+ β jb(minus) Max

(0 Hbj minus xJ

)](9)

where x = input f (x) = output P = predictor variables and B = basis function Max (0 x minus H) and Max(0 H minus x) are basis function and do not have to be present if their coefficients are 0 β0 is constantβ jb is the coefficient of the jth base function (BF) and the H values are called knots The MARSmodel includes three main steps (1) A forward stepwise algorithm to select certain spline basisfunctions (2) a backward stepwise algorithm to delete base functions (BFs) until the best set isfound and (3) a smoothing method which gives the final MARS approximation a certain degree ofcontinuity [52] First the MARS model estimates the value of the target function with a constant valueand then generates the best processing in the forward direction by searching among the variables

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 9: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 9 of 21

The search process continues as long as all possible (BFs) are added to the model At this stage a verycomplex model with a large number of knotes is obtained In the next step through the process ofpruning backward BFs that are less important are identified and deleted by using the generalizedcross-validation (GCV) criterion [27] GCV is a criterion for data fitting and eliminates a large numberof BFs and reduces the probability of overfitting This indicator is obtained by using Equation (10)

GCV =12

N

sumi=1

[yi minus f (xi)]2[

1minus C (B)N

]2(10)

where N is the number of data and C (B) is a complexity penalty that increases with the number of BFin the model and which is defined as

C (B) = (B = 1) + dB (11)

where d is a penalty for each BF included into the model This process continues until a completereview of all the basic functions and at the end of the optimal model is obtained by applying basefunctions [52] MARS model is an adaptive approach since the selection of BFs and node locations isbased on the data and type of purpose After determining the optimal MARS model the analysis ofvariance (ANOVA) method can be used to estimate the participation rate of each of the input variablesand BFs A detailed description of the MARS model can be found in [45] MARS was run with R 331and the ldquoEarthrdquo package [53]

24 Validation of GESMs Using Three Data Mining Models

A single criterion is not enough to select the best model among a large number of modelsand judging about choosing a superior model by one criteria It is not a suitable approach and itraises the chance of mistake in choosing the suitable model [273754] In this study to compare theperformance between data mining models and select the appropriate model AUC and SCAI wereused [283655] For calculating AUC different thresholds were considered from 0 to 1 and for eachthreshold the number of cells detected by the model as gully erosion was compared with observedgully erosion cells and positive and negative ratio indicators was calculated After calculating thesetwo indicators we arranged them in ascending order then they were plotted to calculate AUCThe AUC values range from 05 to 1 If a model cannot estimate the occurrence of an event better thana probable or random viewpoint its AUC is 05 and therefore it will have the least accuracy while ifthe AUC is equal to one the model will have the highest accuracy [5657] The quantitativendashqualitativerelationship between AUC value and prediction accuracy can be classified as follows 05ndash06 poor06ndash07 average 07ndash08 good 08ndash09 very good and 09ndash1 excellent SCAI is the ratio of thepercentage area of each of the zoning classes to the percentage of gullies occurring on each classBased on the SCAI indicator the values of SCAI in very high sensitivity class are lower than very lowsensitivity class

3 Results

31 Multi-Collinearity Analysis

Multicollinearity is a condition of very high inter-correlations or inter-associations among theindependent variables Therefore it is a type of disturbance in the data and if present in the datathe statistical conclusions of the data may not be reliable [27] A TOL value less than 01 or a VIF valuelarger than 10 indicates a high multicollinearity [56] The outcomes of the coherent analysis among the12 GEVs are shown in Table 2 The outcomes showed that the TOL and VIF of all GEVs were ge01 andle5 respectively As a result no multi-collinearity is seen among the GEVs

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 10: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 10 of 21

Table 2 Multi-collinearity of effective factors using tolerance (TOL) and variance inflation factor (VIF)

Conditioning FactorsCollinearity Statistics

Tolerance VIF

Constant Coefficient - -Slope degree 0998 1002

Distance from road 0672 1489Distance from river 0323 3094

Plan curvature 0674 1483Lithology 0945 1058

LU 0864 1158Drainage density 0826 1211

Elevation 0920 1087Convergence index 0666 1503

Aspect 0299 3343TWI 0942 1062

NDVI 0941 1063

32 Spatial Relationship Using WoE Model

The outcomes of WoE model are shown in Table 3 In elevation the results of WoE indicate thatthere is a direct correlation between classes of elevation and GE and with an increase in elevationGE also increases Therefore the class of gt1600 m with WoE 4795 had the greatest impact on gullyoccurrence The result of slope degree indicate that classes 5ndashlt10 with WoE 3496 had a strong relationwith GEIM For slope aspect NEndashfacing slopes with a value of 1946 show high probability of gullyoccurrence In the case of plan curvature among the three classes of concave flat and convexthe concave class had the highest value (7804) and thus a positive correlation with GE This result isin line with [1150] In TWI the class of gt11 has the strongest relationship with GE with the highestvalue (7804) In the case of the convergence index the class of 0ndash10 with values of 1318 has a positiverelation with gully occurrence With respect to distance from river class of gt650 with value of 2586and regarding distance from road the class of gt3000 m with values of 1625 had the greatest effect ongully occurrence For the drainage density factor the class of lt14 kmkm2 showed the highest value(1423) and thus high correlation with gully occurrence According to the lithology factor Gypsiferousmarl with greatest value (5123) is more prone to GE than other lithology units Concerning LULCmost gullies are located in the range land use type and this class with the highest value (2102) has thestrongest relationship with gully occurrence In NDVI results indicated that all gullies are located inthe class of lt011 showing that very low vegetation density renders slopes susceptible to GE

Table 3 Relationship between conditioning factors and gully erosion using weights-of-evidence(WoE) model

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

1

lt1200 144200 21 minus316 005 000 022 minus14411200ndashlt1350 348463 89 minus287 001 000 011 minus26601350ndashlt1450 230735 502 minus037 000 000 005 minus7521450ndash1600 133305 1057 033 000 000 000 000

gt1600 85376 1074 188 000 000 004 4795

2

lt5 705163 896 minus183 000 000 004 minus44905ndashlt10 171923 1259 134 000 000 004 349610ndashlt15 38854 397 138 000 000 005 253615ndashlt20 13936 121 113 001 000 009 121320ndashlt25 6223 50 103 002 000 014 72225ndash30 3396 15 042 007 000 026 162gt30 2584 5 minus041 020 000 045 minus092

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 11: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 11 of 21

Table 3 Cont

Factor Class Number of Pixelsin Domain Pixels of Gullies

Weights-of-Evidence (WoE)

C S2 (w+) S2 (wminus) S W

3

Flat 16770 2 minus322 050 000 071 minus455N 72345 208 minus001 000 000 007 minus019

NE 79383 209 minus011 000 000 007 minus152E 72794 43 minus166 002 000 015 minus1081

SE 91567 54 minus168 002 000 014 minus1224S 114731 246 minus034 000 000 007 minus513

SW 119263 396 015 000 000 005 281W 142533 459 012 000 000 005 235

NW 232693 1126 076 000 000 004 1946

4Concave 54613 1493 299 000 000 004 7804

Flat 574180 749 minus143 000 000 004 minus3333Convex 313286 501 minus080 000 000 005 minus1627

5

lt7 24272 63 000 000 000 002 minus0155ndashlt75 42453 91 minus032 001 000 011 minus30075ndash11 89328 225 minus016 000 000 007 minus229

gt11 786026 2364 021 000 000 006 387

6lt0 75370 26 minus221 004 000 020 minus1121

0ndash10 776920 2534 095 000 000 007 1318gt10 89789 183 minus039 001 000 008 minus508

7

lt170 382383 522 minus107 000 000 005 minus2199170ndashlt370 329586 835 minus021 000 000 004 minus499370ndash650 179671 914 075 000 000 004 1862

gt650 50444 472 131 000 000 005 2586

8

lt500 90285 0 minus010 000 000 002 minus529500ndashlt1500 102453 0 minus012 000 000 002 minus6051500ndash3000 113685 12 minus344 008 000 029 minus1191

gt3000 635661 2731 470 000 008 029 1625

9

lt14 277251 1150 055 000 000 004 142314ndashlt24 353215 1000 minus004 000 000 004 minus11224ndash37 231503 573 minus021 000 000 005 minus449

gt37 80115 20 minus254 005 000 022 minus1132

10

Murmg 144412 1544 197 000 000 004 5123Qft2 617176 417 minus236 000 000 005 minus4447Ku 23972 0 minus003 000 000 002 minus135Jd 18232 0 minus002 000 000 002 minus103

PeEz 1449 0 000 000 000 002 minus008PlQc 71058 600 124 000 000 005 2685

Jl 3274 0 000 000 000 002 minus018E2c 58380 174 003 001 000 008 033E1c 4820 8 minus056 013 000 035 minus160

11Range 708879 2669 248 000 001 012 2102

Farming 193682 33 minus306 003 000 018 minus1747Bare land 39523 41 minus106 002 000 016 minus675

12lt011 863198 2743 009 000 000 002 459

011ndash025 56745 0 minus006 000 000 002 minus326gt025 22140 0 minus002 000 000 002 minus125

1 Elevation 2 Slope degree 3 Slope aspect 4 Plan curvature 5 topographic wetness index (TWI) 6 Convergenceindex 7 Distance from river 8 Distance from road 9 Drainage density 10 Lithology 11 land use (LU) 12 NDVI

33 Applying RF Model

The outcomes of the confusion matrix for RF model are shown in Table 4 The result showsthat the model predicted 2487 non-gully pixels as non-gullies and 256 non-gullies as gully On theother hand the RF model predicted 2677 gullies as gullies and 66 gullies as non-gullies Moreoverthe out-of-bag error (OOB) for RF was 582 This means that the model has a precision of 9418which expresses the excellent accuracy of the model in predicting gully erosion

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 12: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 12 of 21

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error

0 2487 256 009331 66 2677 00240

Prioritization results of RF are shown in Table 5 and Figure 4 The results show that the distancefrom roads (38167 22) elevation (33506 19) and lithology (23421 14) had the highest valuesfollowed by slope degree drainage density distance from river NDVI convergence index slope aspectTWI plan curvature and LULC

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning Factors Weight

Distance from road 38167Elevation 33506Lithology 23421Slope degree 15385Drinage density 12672Distance from river 10684NDVI 10526Convergence index 7397Slope aspect 7241TWI 713Plan curvature 4243LU 2538

Appl Sci 2018 8 x FOR PEER REVIEW 12 of 21

susceptibility class Of the total area of GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class 567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195 (016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Table 4 Confusion matrix from the random forest (RF) model (0 = no gully 1 = gully)

0 1 Class Error 0 2487 256 00933 1 66 2677 00240

Table 5 Relative influence of effective conditioning factors in the RF model

Conditioning factors Weight Distance from road 38167 Elevation 33506 Lithology 23421 Slope degree 15385 Drinage density 12672 Distance from river 10684 NDVI 10526 Convergence index 7397 Slope aspect 7241 TWI 713 Plan curvature 4243 LU 2538

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVs in the study area The results of the model are shown in Figure 6 They indicate that the factors distance from roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroring the outcomes of the RF model followed by slope degree (7) drainage density (67) distance from river (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16) TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared in ArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The

Slope9Road

22

River6

Plan3

Lithology14

LU2

Drainage7

Elevation19

Convergence4

Aspect4

TWI4

NDVI6

Other18

Figure 4 Relative influence of effective conditioning factors in the random forest (RF) model

Finally the GESM by the RF model was prepared in ArcGIS 105 and divided into five classes fromvery low to very high (Figure 5a) using a natural break classification [8] According to the results ofthe entire study area (84787 km2) 52597 km2 (6203) are located in the very low susceptibility class14828 km2 (1749) in the low susceptibility 7942 km2 (937) in the moderate class 5634 km2 (664)in the high class and 3788 km2 (447) are located in the very high susceptibility class Of the total areaof GE (0729 km2) in the study area 086 (001 km2) are located in the very low susceptibility class

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 13: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 13 of 21

567 (004 km2) in the low susceptibility 1480 (011 km2) in the moderate susceptibility 2195(016 km2) in the high susceptibility and 5672 (041 km2) in the very high susceptibility classes

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariateadaptive regression spline (MARS) model

34 Applying BRT Model

The BRT model was used to reveal the spatial correlation between the existing GE and the GEVsin the study area The results of the model are shown in Figure 6 They indicate that the factors distancefrom roads (311) elevation (272) and lithology (11) had the highest importance on GE mirroringthe outcomes of the RF model followed by slope degree (7) drainage density (67) distance fromriver (51) slope aspect (38) convergence index (24) NDVI (22) plan curvature (16)TWI (16) and LULC (03) The gully susceptibility map by the BRT model was also prepared inArcGIS 105 and divided into five classes of very low to very high (Figure 6c) The results of the GEsusceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in thevery low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km23413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006(826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibilityclasses respectively

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 14: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 14 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 13 of 21

results of the GE susceptibility class by the BRT model covered 84787 km2 of the study area an area distribution in the very low low moderate high and very high susceptibility classes are 60537 km2 8838 km2 5201 km2 3413 km2 and 6798 km2 and percentage distribution in the susceptibility classes of are 7140 1042 613 403 and 802 respectively Of the actual GE area of 0729 km2 004 (555) 003 (456) 006 (826) 008 (1134) and 051 km2 (7028) are located in the very low to very high susceptibility classes respectively

Figure 5 Gully erosion susceptibility maps using (a) RF model (b) BRT model and (c) multivariate adaptive regression spline (MARS) model

Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model Figure 6 Relative influence of effective conditioning factors in boosted regression tree (BRT) model

35 Applying MARS Model

The optimal MARS model included 28 terms and the GCV was 0157 MARS model providesthe optimal model only by selecting the necessary parameters In this research nine GEVs includinglithology distance from road distance from river drainage density elevation aspect convergenceindex slope and NDVI were used to construct the optimal model from the 12 GEVs The GESM bythe MARS model was implemented in ArcGIS 105 using Equation (12) According to Equation (12)distance from roads elevation and lithology were the most important variables Values of GESMby MARS model varies from minus98 to 73 At first GESM classified using quantile equal intervalnatural break and geometrical interval classification techniques then by comparatively analyses ofthe distribution of training and validation gullies in high and very high classes the natural breakclassification technique was most accurate As a result GESM by MARS were classified into very low(minus986ndashminus624) low (minus624ndashminus231) moderate (minus23ndash004) high (004ndash038) and very high (038ndash732)gully erosion susceptibility zones by natural break classification technique (Figure 5c) The resultsindicate that 002 km2 (210) of GE in the study area are located in the very low susceptibilityclass with 33901 km2 (3998 of total study area) and 058 km2 (7916) located in the very highsusceptibility class with 10550 km2 (058) (Table 6) In general the results indicate that for all threemodels with increasing susceptibility (from very low to very high) the area of the respective classesdecreased while in contrast the areas of GE increased These results is in line with Youssef et al (2015)

Table 6 Area under the curve (AUC) values of RF MARS and BRT data mining models

Models AUC Standard ErrorAsymptoticSignificant

Asymptotic 95 ConfidenceInterval

Lower Bound Upper Bound

RF 0927 0007 0000 0914 0941MARS 0911 0008 0000 0896 0926

BRT 0919 0007 0000 0905 0933

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 15: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 15 of 21

36 Validation of Models

The results of the validation of the models using the AUC curve and SCAI indicator are shown inFigure 7 and in Tables 6 and 7 The results show that the values of the AUC for the three models varyfrom 0911 to 0927 indicating very good prediction accuracy for all models with RF resulting in thehighest value In addition the SCAI values for the three models RF (6108ndash000) MARS (1045ndash003)BRT (1259ndash001) show that the RF model has higher SCAI values compared to the other models inthe very low low and very high susceptibility classes (Figure 7) In spite of the high efficiency andaccuracy of the RF model for GE sensitivity mapping so far this model has not been used by theresearch community

GESPMARS = 074 + (0659times Lithology1) + (0656times Lithology7)minus 00001timesmax(0 13445minus Distance f rom road) + 00001timesmax(0 Distance f rom roadminus 13445)minus 00002timesmax(0 290797minus Distance f rom River)minus 0087timesmax(0 2377minus Drainage density)minus 0106timesmax(0 Drainage densityminus 2377) + 0001timesmax(0 1793minusElevation)minus 0002timesmax(0 Elevationminus 1793)minus 0605timesLithology7times Aspect4minus 00001timesmax(0 735532minusDistance f rom road)times Lithology1minus 00001timesmax(0 112492minus Distance f rom road)times Lithology7minus000002timesmax(0 Distance f rom roadminus 112492)timesLithology7 + 00001timesmax(0 13445minusDistance f rom road)times Lithology10minus 0005timesmax(0 84853minus Distance f rom River)times Lithology1minus00003timesmax(0 Distance f rom Riverminus 84853)timesLithology1 + 0001times Lithology2timesmax(0 Elevationminus1249)minus 0001times Lithology2timesmax(0 1249minus Elevation)minus0019times Lithology7timesmax(0 0772minus Convergence)minus 2354timesLithology7timesmax(0 NDVI minus 0055)minus 2223times Lithology7timesmax(0 0055minus NDVI)minus 000001timesmax(0 765minus Slope)timesmax(0 Distance f rom roadminus 13445) + 000001timesmax(0 Slopeminus 765)timesmax(0 Distance f rom roadminus 13445)minus00001timesmax(0 8186minus Slope)timesmax(0 1793 minus Elevation)+000004timesmax(0 Slopeminus 819)timesmax(0 1793minus Elevation)minus00000001timesmax(0 Distance f rom roadminus 387778)timesmax(0 90797minus Distance f rom River) + 000000004timesmax(0 886103minus Distance f rom road)timesmax(0 90797minusDistance f rom River) + 00000001timesmax(0 Roadminus886103)timesmax(0 90797minus Distance f rom River)minus 0001timesmax(0 Distance f rom roadminus 13445)timesmax(0 Drainage densityminus 1821)minus 0000001timesmax(0 Distance f rom roadminus 114354)timesmax(0 1793minusElevation)minus 0000001timesmax(0 Distance f rom roadminus132383)timesmax(0 1793minus Elevation)

(12)

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 16: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 16 of 21

Appl Sci 2018 8 x FOR PEER REVIEW 15 of 21

= 074 +(0659 times ℎ 1) +(0656 times ℎ 7) minus 00001times (0 13445 minus ) + 00001times (0 minus 13445) minus 00002times (0 290797 minus ) minus 0087times (0 2377 minus ) minus 0106times (0 minus 2377) + 0001 times (0 1793minus ) minus 0002 times (0 minus 1793) minus 0605times ℎ 7 times 4ndash 00001 times (0 735532minus ) times ℎ 1 minus 00001times (0 112492 minus ) times ℎ 7minus 000002 times (0 minus 112492) times ℎ 7 + 00001 times (0 13445minus ) times ℎ 10 minus 0005times (0 84853 minus ) times ℎ 1minus 00003 times (0 minus 84853) times ℎ 1 + 0001 times ℎ 2 times (0 minus 1249) minus 0001 times ℎ 2 times (0 1249 minus ) minus 0019 times ℎ 7 times (0 0772 minus ) minus 2354times ℎ 7 times (0 minus 0055) minus 2223 times ℎ 7times (0 0055 minus ) minus 000001 times (0 765 minus ) times (0 minus 13445) + 000001times (0 minus 765) times (0 minus 13445) minus 00001 times (0 8186 minus )times (0 1793 minus ) + 000004 times (0 minus 819) times (0 1793 minus ) minus 00000001 times (0 minus 387778) times (0 90797 minus ) + 000000004times (0 886103 minus ) times (0 90797minus ) + 00000001 times (0 minus 886103) times (0 90797 minus ) minus 0001times (0 minus 13445) times (0 minus 1821) minus 0000001times (0 minus 114354) times (0 1793minus ) minus 0000001 times (0 minus 132383) times (0 1793 minus )

(12)

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRT data mining models

000

1000

2000

3000

4000

5000

6000

7000

Very Low Low Moderate High Very High

SCA

I

Susceptibility classes

RF

MARS

BRT

Figure 7 Seed cell area index (SCAI) values for different susceptibility classes in RF MARS and BRTdata mining models

Table 7 Seed cell area index (SCAI) values in RF multivariate adaptive regression spline (MARS) andboosted regression tree (BRT) data mining models

Model SusceptibilityClasses

Total Area of Classes Gully in Classes No GullyArea (km)

SeedCell () SCAI

Area (km) Area (km)

RF

Very Low 52597 6203 001 086 52596 001 6108Low 14828 1749 004 567 14824 024 074

Moderate 7942 937 011 1480 7931 115 008High 5634 664 016 2195 5618 241 003

Very High 3788 447 041 5672 3746 927 000

MARS

Very Low 33901 3998 002 210 33900 004 1045Low 19483 2298 001 148 19482 005 489

Moderate 13117 1547 004 567 13113 027 058High 7735 912 008 1159 7726 093 010

Very High 10550 1244 058 7916 10492 464 003

BRT

Very Low 60537 7140 004 555 60533 006 1259Low 8838 1042 003 456 8834 032 033

Moderate 5201 613 006 826 5195 098 006High 3413 403 008 1134 3405 206 002

Very High 6798 802 051 7028 6746 640 001

4 Discussion

Determining effective parameters in GE and providing a GESM are the first steps in riskmanagement In regards to this prediction of areas susceptible to erosion is associated with uncertaintyvarious models can be used to predict it accurately Over the past decades numerous statistical andempirical models have been developed to predict environmental hazards such as GE by variousresearchers around the world [121428303145] Due to some of the limitations of the aforementionedmodels such as time consuming complexity costly and need a lot of data in recent years datamining methods have been presented Data mining is a process of discovery of relationships patternsand trends that consider the vast amount of information stored in databases with template recognitiontechnology [515859] The most important applications of data mining are categorization estimationforecasting group dependency clustering and descriptions The results of data mining models showthat in RF BRT and MARS mode distance from roads had the highest impact in the occurrenceof gully erosion in the study area This result is in line with [1049] If the engineering measures

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 17: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 17 of 21

are not considered in site selection and construction of roads as anthropogenic structures in naturethey can act as a causative factor in environmental hazards such as landslide and gully erosionThe construction of roads in bare lands with erosion-sensitive formations has led to the expansionof gully erosion in the study area so that the construction of a road without proper culverts causesdisrupted of natural drainage and runoff concentrations thus eroding the bare lands and resultingin the formation of a gully The results of the validation of data mining models showed that the RFmodel more accurately predicted areas that are sensitive to gully erosion These results are consistentwith the results of [36434659] which introduced the RF model as a strong and high-performancemodel One of the most widely used data mining methods is the RF model The advantages of theRF method over other models is that this model can apply several input factors without eliminatingany factors and return a very small set of categories that support high prediction accuracy [6]The classification accuracy of this model is affected by many factors such as the number scale typeand precision of input data Thus in the processing the use of all suitable factors causes the accuracyof the model to increase Compared with other models RF has higher sufficiency to apply a veryhigh number of datasets [6] The RF model has the potential as a tool of spatial model for assessingenvironmental issues and environmental hazards The RF model combines several tree algorithms togenerate a repeated prediction of each phenomenon This method can learn complicated patterns andconsider the nonlinear relationship between explanatory variables and dependent variables It canalso incorporate and combine different types of data in the analysis due to the lack of distributionof assumptions about the data used This model can use and apply thousands of input variableswithout deleting one of them This method is less sensitive to artificial neural networks in case ofnoise data and can better estimate the parameters [60] The greatest advantages of RF model arehigh predictive accuracy the ability to learn nonlinear relationships the ability to determine theimportant variables in prediction its nonparametric nature and in dealing with distorted data itworks better than other algorithms for categorization The main disadvantages of this algorithminclude high memory occupation hard and time-consuming in implementation for large datasetshigh cost of pruning high number of end nodes in case of overlap and the accumulation of layersof errors in the case of the tree growing [1561] stated that the CART BRT and RF models showedbetter accuracy compared to bivariate and multivariate methods Pourghasemi et al concluded thatthe RF and maximum entropy (ME) models have high performance and precision in modeling [31]Mojaddadi et al showed that BRT CART and RF methods are suitable for modelling [55] Chen et alindicates that the MARS and RF models are good estimators for mapping [36] Lai et al indicated thatthe RF model has significant potential for weight determination on landslide modelling [62] Kuhnertet al stated that RF with AUC = 970 is suitable for landslide susceptibility [27] Lee et al stated that theprediction accuracy of RF model is high (908) and that this model had a high capability for landslideprediction [43] They applied RF and boosted-tree models for spatial prediction of flood susceptibilityin Seoul metropolitan city Korea [43] They stated that the RF model has better performance comparedto boosted-tree As a scientific achievement the methodology framework used in this research hasshown that the proper selection of effective variables in gully erosion along with the use of moderndata mining models and Geography Information System (GIS) technique are able to successfullyidentify areas susceptible to gully erosion The susceptibility map prepared using this methodology isa suitable tool for sustainable planning to protect the land against gully erosion processes Thereforethis methodology can be used to assess gully erosion in other similar areas especially in arid andsemi-arid regions

5 Conclusions

GE is one of the main processes causing soil degradation and there is a need to improve methodsto predict susceptible areas and responsible environmental factors to allow early intervention toprevent limit or reverse gully formation The utility of three data mining models RF BRT and MARSto predict GE in the Shahroud watershed Iran was assessed For this purpose twelve causative

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 18: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 18 of 21

factors and 121 gully locations (70) are used for applying the models In addition 51 gully locations(30) are used for validation of models The correlation between GE and conditioning factor classeswas researched with a WoE Bayes theory Distance to roads elevation and lithology were the keyfactors Validation of the models showed that all three models have high accuracy for GE mappingData miningmachine learning methods have a unique ability and accuracy for GESM The resultsalso showed that the southwestern part of the study region has a high susceptibility to GE

Therefore it is recommended that the following suggestions should be made to prevent andreduce soil erosion and its subsequent risks in the Sharoud watershed (1) Control of gullies byrestoration of vegetation adaptable with the natural conditions of the area (2) gully controlling bybuilding dams that could prevent soil erosion by slowing down the flow of water and aggravationof sedimentation (3) awareness of farmers by environmental officials of the region in terms of thetype and principles of proper cultivation and prevention of overgrazing and destruction of vegetation(4) correction of land use based on natural ability and restrictions related to geomorphologic andphysiographic soil characteristics of the area

Author Contributions Conceptualization AA BP and HRP Data curation AA Formal analysis AA andHRP Investigation AA BP and HRP Methodology BP AA HRP and KR Resources BP and AASoftware HRP and AA Supervision BP NK and HRP Validation HRP and AA Writingndashoriginal draftAA Writingndashreview and editing BP AA HRP NK and KR

Funding This research was supported by the UTS under grant number 3217402232335 and 3217402232357

Conflicts of Interest The authors declare no conflict of interest

References

1 Magliulo P Assessing the susceptibility to water-induced soil erosion using a geomorphological bivariatestatistics-based approach Environ Earth Sci 2012 67 1801ndash1820 [CrossRef]

2 UNEP The Emissions Gap Report United Nations Environment Programme (UNEP) Nairobi 2017Available online wwwunenvironmentorgresourcesemissions-gap-report (accessed on 13 January 2018)

3 Haregeweyn N Tsunekawa A Poesen J Tsubo M Meshesha DT Fenta AA Nyssen J Adgo EComprehensive assessment of soil erosion risk for better land use planning in river basins Case study of theUpper Blue Nile River Sci Total Environ 2017 574 95ndash108 [CrossRef] [PubMed]

4 Nampak H Pradhan B Mojaddadi Rizeei H Park H-J Assessment of Land Cover and Land UseChange Impact on Soil Loss in a Tropical Catchment by Using Multi-Temporal SPOT-5 Satellite Images andRUSLE model Land Degrad Dev 2018 [CrossRef]

5 Rizeei HM Saharkhiz MA Pradhan B Ahmad N Soil erosion prediction based on land cover dynamicsat the Semenyih watershed in Malaysia using LTM and USLE models Geocarto Int 2016 31 1158ndash1177[CrossRef]

6 Zhang X Fan J Liu Q Xiong D The contribution of gully erosion to total sediment production in a smallwatershed in Southwest China Phys Geogr 2018 39 246ndash263 [CrossRef]

7 Mojaddadi H Habibnejad M Solaimani K Ahmadi M Hadian-Amri M An Investigation of Efficiencyof Outlet Runoff Assessment J Appl Sci 2009 9 105ndash112

8 Zabihi M Mirchooli F Motevalli A Darvishan AK Pourghasemi HR Zakeri MA Sadighi F Spatialmodelling of gully erosion in Mazandaran Province northern Iran Catena 2018 161 1ndash13 [CrossRef]

9 Kirkby M Bracken L Gully processes and gully dynamics Earth Surf Process Landf J Br GeomorpholRes Group 2009 34 1841ndash1851 [CrossRef]

10 Torri D Poesen J Borselli L Bryan R Rossi M Spatial variation of bed roughness in eroding rillsand gullies Catena 2012 90 76ndash86 [CrossRef]

11 Mccloskey G Wasson R Boggs G Douglas M Timing and causes of gully erosion in the riparian zone ofthe semi-arid tropical Victoria River Australia Management implications Geomorphology 2016 266 96ndash104[CrossRef]

12 Rahmati O Tahmasebipour N Haghizadeh A Pourghasemi HR Feizizadeh B Evaluating the influenceof geo-environmental factors on gully erosion in a semi-arid region of Iran An integrated frameworkSci Total Environ 2017 579 913ndash927 [CrossRef] [PubMed]

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 19: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 19 of 21

13 Dube F Nhapi I Murwira A Gumindoga W Goldin J Mashauri D Potential of weight of evidencemodelling for gully erosion hazard assessment in Mbire DistrictndashZimbabwe Phys Chem Earth Part ABC2014 67 145ndash152 [CrossRef]

14 Zakerinejad R Maerker M An integrated assessment of soil erosion dynamics with special emphasis ongully erosion in the Mazayjan basin southwestern Iran Nat Hazards 2015 79 25ndash50 [CrossRef]

15 Pham TG Degener J Kappas M Integrated universal soil loss equation (USLE) and GeographicalInformation System (GIS) for soil erosion estimation in A Sap basin Central Vietnam Int Soil WaterConserv Res 2018 6 99ndash110 [CrossRef]

16 Pournader M Ahmadi H Feiznia S Karimi H Peirovan HR Spatial prediction of soil erosionsusceptibility An evaluation of the maximum entropy model Earth Sci Inform 2018 11 389ndash401 [CrossRef]

17 Althuwaynee OF Pradhan B Park H-J Lee JH A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logistic regressionfor landslide susceptibility mapping Catena 2014 114 21ndash36 [CrossRef]

18 Morgan R Quinton J Smith R Govers G Poesen J Auerswald K Chisci G Torri D Styczen MThe European Soil Erosion Model (EUROSEM) A dynamic approach for predicting sediment transportfrom fields and small catchments Earth Surf Process Landf J Br Geomorphol Res Group 1998 23 527ndash544[CrossRef]

19 Barber M Mahler R Ephemeral gully erosion from agricultural regions in the Pacific Northwest USAAnn Wars Univ Life Sci-SGGW Land Reclam 2010 42 23ndash29 [CrossRef]

20 Leonard R Knisel W Still D GLEAMS Groundwater loading effects of agricultural management systemsTrans ASAE 1987 30 1403ndash1418 [CrossRef]

21 Liaw A Breiman WM Cutlerrsquos Random Forests for Classification and Regression Available onlinehttpswwwrdocumentationorgpackagesrandomForest (accessed on 1 April 2018)

22 Akguumln A Tuumlrk N Mapping erosion susceptibility by a multivariate statistical method A case study fromthe Ayvalık region NW Turkey Comput Geosci 2011 37 1515ndash1524 [CrossRef]

23 Conoscenti C Angileri S Cappadonia C Rotigliano E Agnesi V Maumlrker M Gully erosion susceptibilityassessment by means of GIS-based logistic regression A case of Sicily (Italy) Geomorphology 2014 204399ndash411 [CrossRef]

24 Conforti M Aucelli PP Robustelli G Scarciglia F Geomorphology and GIS analysis for mapping gullyerosion susceptibility in the Turbolo stream catchment (Northern Calabria Italy) Nat Hazards 2011 56881ndash898 [CrossRef]

25 Lucagrave F Conforti M Robustelli G Comparison of GIS-based gullying susceptibility mapping usingbivariate and multivariate statistics Northern Calabria South Italy Geomorphology 2011 134 297ndash308[CrossRef]

26 Meyer A Martınez-Casasnovas J Prediction of existing gully erosion in vineyard parcels of the NE SpainA logistic modelling approach Soil Tillage Res 1999 50 319ndash331 [CrossRef]

27 Kuhnert PM Henderson AK Bartley R Herr A Incorporating uncertainty in gully erosion calculationsusing the random forests modelling approach Environmetrics 2010 21 493ndash509 [CrossRef]

28 Rahmati O Haghizadeh A Pourghasemi HR Noormohamadi F Gully erosion susceptibility mappingThe role of GIS-based bivariate statistical models and their comparison Nat Hazards 2016 82 1231ndash1258[CrossRef]

29 Svoray T Michailov E Cohen A Rokah L Sturm A Predicting gully initiation Comparing data miningtechniques analytical hierarchy processes and the topographic threshold Earth Surf Proc Land 2012 37607ndash619 [CrossRef]

30 Zakerinejad R Maumlrker M Prediction of Gully erosion susceptibilities using detailed terrain analysis andmaximum entropy modeling A case study in the Mazayejan Plain Southwest Iran Geogr Fis Din Quat2014 37 67ndash76

31 Pourghasemi HR Yousefi S Kornejady A Cerdagrave A Performance assessment of individual and ensembledata-mining techniques for gully erosion modeling Sci Total Environ 2017 609 764ndash775 [CrossRef][PubMed]

32 IR of Iran Meteorological Organization 2012 Available online httpwwwmazandaranmetir (accessedon 12 October 2017)

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 20: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 20 of 21

33 Geological Survey Department of Iran (GSDI) 2012 Available online httpwwwmazandaranmetir(accessed on 12 October 2017)

34 Althuwaynee OF Pradhan B Lee S Application of an evidential belief function model in landslidesusceptibility mapping Comput Geosci 2012 44 120ndash135 [CrossRef]

35 Rizeei HM Pradhan B Saharkhiz MA Surface runoff prediction regarding LULC and climate dynamicsusing coupled LTM optimized ARIMA and GIS-based SCS-CN models in tropical region Arab J Geosci2018 11 53 [CrossRef]

36 Chen W Xie X Wang J Pradhan B Hong H Bui DT Duan Z Ma J A comparative study of logisticmodel tree random forest and classification and regression tree models for spatial prediction of landslidesusceptibility Catena 2017 151 147ndash160 [CrossRef]

37 Tehrany MS Pradhan B Mansor S Ahmad N Flood susceptibility assessment using GIS-based supportvector machine model with different kernel types Catena 2015 125 91ndash101 [CrossRef]

38 Claps P Fiorentino M Oliveto G Informational entropy of fractal river networks J Hydrol 1996 187145ndash156 [CrossRef]

39 Aal-Shamkhi ADS Mojaddadi H Pradhan B Abdullahi S Extraction and modeling of urban sprawldevelopment in Karbala City using VHR satellite imagery In Spatial Modeling and Assessment of Urban FormSpringer Berlin Germany 2017 pp 281ndash296

40 Abdullahi S Pradhan B Mojaddadi H City compactness Assessing the influence of the growth ofresidential land use J Urban Technol 2018 25 21ndash46 [CrossRef]

41 Rizeei HM Shafri HZ Mohamoud MA Pradhan B Kalantar B Oil palm counting and age estimationfrom WorldView-3 imagery and LiDAR data using an integrated OBIA height model and regression analysisJ Sensors 2018 2018 2536327 [CrossRef]

42 Xie Z Chen G Meng X Zhang Y Qiao L Tan L A comparative study of landslide susceptibilitymapping using weight of evidence logistic regression and support vector machine and evaluated bySBAS-InSAR monitoring Zhouqu to Wudu segment in Bailong River Basin China Environ Earth Sci 201776 313 [CrossRef]

43 Lee S Kim J-C Jung H-S Lee MJ Lee S Spatial prediction of flood susceptibility using random-forestand boosted-tree models in Seoul metropolitan city Korea Geomat Nat Hazards Risk 2017 8 1185ndash1203[CrossRef]

44 Cutler DR Edwards Jr TC Beard KH Cutler A Hess KT Gibson J Lawler JJ Random forests forclassification in ecology Ecology 2007 88 2783ndash2792 [CrossRef] [PubMed]

45 Simpson GL Birks HJB Statistical learning in palaeolimnology In Tracking Environmental Change UsingLake Sediments Springer Berlin Germany 2012 pp 249ndash327

46 Nicodemus KK Letter to the editor On the stability and ranking of predictors from random forest variableimportance measures Brief Bioinform 2011 12 369ndash373 [CrossRef] [PubMed]

47 Bui DT Bui Q-T Nguyen Q-P Pradhan B Nampak H Trinh PT A hybrid artificial intelligenceapproach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest firesusceptibility modeling at a tropical area Agr For Meteorol 2017 233 32ndash44

48 Elith J Leathwick JR Hastie T A working guide to boosted regression trees J Anim Ecol 2008 77802ndash813 [CrossRef] [PubMed]

49 Regmi AD Devkota KC Yoshida K Pradhan B Pourghasemi HR Kumamoto T Akgun AApplication of frequency ratio statistical index and weights-of-evidence models and their comparison inlandslide susceptibility mapping in Central Nepal Himalaya Arab J Geosci 2014 7 725ndash742 [CrossRef]

50 Aertsen W Kint V Van Orshoven J Oumlzkan K Muys B Comparison and ranking of different modellingtechniques for prediction of site index in Mediterranean mountain forests Ecol Model 2010 221 1119ndash1130[CrossRef]

51 Krishnaiah V Narsimha G Chandra NS Heart disease prediction system using data mining techniquesand intelligent fuzzy approach A review Heart Dis 2016 136 43ndash51 [CrossRef]

52 Oh H-J Pradhan B Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallowlandslides in a tropical hilly area Comput Geosci 2011 37 1264ndash1276 [CrossRef]

53 Torgo L Data Mining with R Learning with Case Studies Chapman and HallCRC Boca Raton FL USA 2016

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References
Page 21: Spatial Modelling of Gully Erosion Using GIS and R ...research.utwente.nl/files/76581837/Arabameri2018spatial.pdf · Susceptibility maps of GE are essential for conservation of natural

Appl Sci 2018 8 1369 21 of 21

54 Umar Z Pradhan B Ahmad A Jebur MN Tehrany MS Earthquake induced landslide susceptibilitymapping using an integrated ensemble frequency ratio and logistic regression models in West SumateraProvince Indonesia Catena 2014 118 124ndash135 [CrossRef]

55 Mojaddadi H Pradhan B Nampak H Ahmad N Ghazali AHB Ensemble machine-learning-basedgeospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS Geomat NatHazards Risk 2017 8 1080ndash1102 [CrossRef]

56 Pourghasemi HR Beheshtirad M Pradhan B A comparative assessment of prediction capabilities ofmodified analytical hierarchy process (M-AHP) and Mamdani fuzzy logic models using Netcad-GIS forforest fire susceptibility mapping Geomat Nat Hazards Risk 2016 7 861ndash885 [CrossRef]

57 Hong H Naghibi SA Dashtpagerdi MM Pourghasemi HR Chen W A comparative assessment betweenlinear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidencemodels for forest fire susceptibility mapping in China Arab J Geosci 2017 10 167 [CrossRef]

58 Mezaal MR Pradhan B Shafri H Mojaddadi H Yusoff Z Optimized Hierarchical Rule-BasedClassification for Differentiating Shallow and Deep-Seated Landslide Using High-Resolution LiDAR DataIn Global Civil Engineering Conference Springer Berlin Germany 2017

59 Rizeei HM Pradhan B Saharkhiz MA An integrated fluvial and flash pluvial model using 2Dhigh-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS ComplexIntell Syst 2018 1ndash20 [CrossRef]

60 Kantardzic M Data mining Concepts Models Methods and Algorithms John Wiley amp Sons Hoboken NJUSA 2011

61 Pham BT Pradhan B Bui DT Prakash I Dholakia M A comparative study of different machinelearning methods for landslide susceptibility assessment A case study of Uttarakhand area (India) EnvironModel Softw 2016 84 240ndash250 [CrossRef]

62 Lai C Chen X Wang Z Xu C-Y Yang B Rainfall-induced landslide susceptibility assessment usingrandom forest weight at basin scale Hydrol Res 2017 nh2017044 [CrossRef]

copy 2018 by the authors Licensee MDPI Basel Switzerland This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (httpcreativecommonsorglicensesby40)

  • Introduction
  • Materials and Methods
    • Study Area
    • Data and Method
    • Gully Erosion Modelling
      • WoE Model
      • RF Model
      • BRT Model
      • MARS Model
        • Validation of GESMs Using Three Data Mining Models
          • Results
            • Multi-Collinearity Analysis
            • Spatial Relationship Using WoE Model
            • Applying RF Model
            • Applying BRT Model
            • Applying MARS Model
            • Validation of Models
              • Discussion
              • Conclusions
              • References