How to use Logistic Regression in GIS using ArcGIS and R statistics

12
Logistic Regression in GIS using R environment Omar F. Althuwaynee, PhD Geomatics Engineering

Transcript of How to use Logistic Regression in GIS using ArcGIS and R statistics

Page 1: How to use Logistic Regression in GIS using ArcGIS and R statistics

Logistic Regression in GIS using R environment

Omar F. Althuwaynee, PhDGeomatics Engineering

Page 2: How to use Logistic Regression in GIS using ArcGIS and R statistics

You have to go through the following videos regarding data preparation, software's, and more info about R:

1. Prediction (susceptibility mapping) In GIS (Part1)Using Modified Frequency Ratio2. Using R as GIS tools: here is my own learning experience (Week 1 -Part 1)3. Using R as GIS tools: here is my own learning experience (Week 1 -Part 2)

Course preparations

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 3: How to use Logistic Regression in GIS using ArcGIS and R statistics

1. Evaluate and compare the results of applying the multivariate logistic regression method, to Produce susceptibility map, using GIS and R environment.

Course objectives

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 4: How to use Logistic Regression in GIS using ArcGIS and R statistics

1. Create dichotomous (0,1) training and testing data

2. Effectively set your project environment , and install

packages according to R.

3. Prepare spatial data in R environment .

4. Learn basic operations with spatial data in R.

End of these sessions, you will be able to

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 5: How to use Logistic Regression in GIS using ArcGIS and R statistics

5. Develop mapping cartographical skills in R,

like,Resampling, clipping of Raster Data.

6. Run Statistical analysis, using binary Logistic

regression.

7. Run statistical tests and produce output reports.

8. Run accuracy and validation tests using AUC of ROC.

9. Producing and export resultant maps.

End of these sessions, you will be able to

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 6: How to use Logistic Regression in GIS using ArcGIS and R statistics

• LR is a method for fitting a regression curve, y = f(x)• LR is part of a larger class of algorithms known as Generalized

Linear Model (glm).

• Dependent factor y – Binomial logistic regression, when dependent factor (y), has 2 values

(value either 0, or 1). like; landslide, pollutants, (0= not exist, 1= exist).– Multinomial logistic regression, If dependent variable has more than 2

values. (Like classifying fruits between; “Ripe”, “Over-ripe” or “Under-ripe”.

• Independent factors x– Set of predictors x (Slope, elevation, land use.).– The predictors (x) can be continuous, categorical or a mix of both.

Logistic regression (LR)

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 7: How to use Logistic Regression in GIS using ArcGIS and R statistics

• To predict the probability, whether a landslide will occur(y) in a particular places, or not.

Data:• Independent factor Y (Landslide training data locations) 75

observations.• Dependent factors X (Elevation, slope, NDVI, Curvature)

Current Application

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 8: How to use Logistic Regression in GIS using ArcGIS and R statistics

Model predicts the probability of occurrence by fitting data to a logit functiong(y) = βo + β1(Elevation)+ β2(slope)+ β3(NDVI)+ β4(Curvature) (a)

Where g(y):link function,

Now g() donate with ‘p’ initially• probability must always be positive(never be negative), so the linear equation

will be in exponential form. For any value of slope and dependent variable.

p = exp(βo + β(Elevation)+…..) = e^(βo + β(Elevation)+…) (b)

• To make the probability less than 1, we must divide p by a number greater than p.

p = exp(βo + β(Elevation)+…..) / exp(βo + β(Elevation)+…..) + 1 = e^(βo + β(Elevation)+…..) / e^(βo + β(Elevation)+…..) + 1 (c)

Equations

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 9: How to use Logistic Regression in GIS using ArcGIS and R statistics

Using (a), (b) and (c), we can redefine the probability as:

p = e^y/ 1 + e^y (d)P= 1/1+e^(-y)

where p : is the probability of success. (d) equation is the Logit Function

A typical logistic model plot is shown below. You can see probability never goes below 0 and above 1.

Equations

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 10: How to use Logistic Regression in GIS using ArcGIS and R statistics

It is nothing but a tabular representation of Actual vs Predicted values. This helps us to find the accuracy of the model and avoid overfitting.

Resource:https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/

Omar F. Althuwaynee, PhD in Geomatics engineering

Confusion Matrix

Page 11: How to use Logistic Regression in GIS using ArcGIS and R statistics

Note:• Analysis will depend only on the number of the observations,

more training observations will increase the model efficiency.• LR may produce lower prediction rate, but this value will have

higher confidence level (low uncertainty compare to bivariate)

More details , I will talk about it along each video session.

Without further ado…let us begin..!

Logistic regression

Omar F. Althuwaynee, PhD in Geomatics engineering

Page 12: How to use Logistic Regression in GIS using ArcGIS and R statistics

##http://neondataskills.org/R/Raster-Data-In-R/

##http://r-sig-geo.2731867.n2.nabble.com/How-I-make-2-rasters-with-equal-

extents-td7584918.html

##https://geoscripting-wur.github.io/IntroToRaster/

##https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-

logistic-regression-in-r/

##https://www.r-bloggers.com/how-to-perform-a-logistic-regression-in-r/

##http://www.cookbook-r.com/Statistical_analysis/Logistic_regression/

References

Omar F. Althuwaynee, PhD in Geomatics engineering