How to use Logistic Regression in GIS using ArcGIS and R statistics
-
Upload
omar-f-althuwaynee -
Category
Education
-
view
82 -
download
1
Transcript of How to use Logistic Regression in GIS using ArcGIS and R statistics
Logistic Regression in GIS using R environment
Omar F. Althuwaynee, PhDGeomatics Engineering
You have to go through the following videos regarding data preparation, software's, and more info about R:
1. Prediction (susceptibility mapping) In GIS (Part1)Using Modified Frequency Ratio2. Using R as GIS tools: here is my own learning experience (Week 1 -Part 1)3. Using R as GIS tools: here is my own learning experience (Week 1 -Part 2)
Course preparations
Omar F. Althuwaynee, PhD in Geomatics engineering
1. Evaluate and compare the results of applying the multivariate logistic regression method, to Produce susceptibility map, using GIS and R environment.
Course objectives
Omar F. Althuwaynee, PhD in Geomatics engineering
1. Create dichotomous (0,1) training and testing data
2. Effectively set your project environment , and install
packages according to R.
3. Prepare spatial data in R environment .
4. Learn basic operations with spatial data in R.
End of these sessions, you will be able to
Omar F. Althuwaynee, PhD in Geomatics engineering
5. Develop mapping cartographical skills in R,
like,Resampling, clipping of Raster Data.
6. Run Statistical analysis, using binary Logistic
regression.
7. Run statistical tests and produce output reports.
8. Run accuracy and validation tests using AUC of ROC.
9. Producing and export resultant maps.
End of these sessions, you will be able to
Omar F. Althuwaynee, PhD in Geomatics engineering
• LR is a method for fitting a regression curve, y = f(x)• LR is part of a larger class of algorithms known as Generalized
Linear Model (glm).
• Dependent factor y – Binomial logistic regression, when dependent factor (y), has 2 values
(value either 0, or 1). like; landslide, pollutants, (0= not exist, 1= exist).– Multinomial logistic regression, If dependent variable has more than 2
values. (Like classifying fruits between; “Ripe”, “Over-ripe” or “Under-ripe”.
• Independent factors x– Set of predictors x (Slope, elevation, land use.).– The predictors (x) can be continuous, categorical or a mix of both.
Logistic regression (LR)
Omar F. Althuwaynee, PhD in Geomatics engineering
• To predict the probability, whether a landslide will occur(y) in a particular places, or not.
Data:• Independent factor Y (Landslide training data locations) 75
observations.• Dependent factors X (Elevation, slope, NDVI, Curvature)
Current Application
Omar F. Althuwaynee, PhD in Geomatics engineering
Model predicts the probability of occurrence by fitting data to a logit functiong(y) = βo + β1(Elevation)+ β2(slope)+ β3(NDVI)+ β4(Curvature) (a)
Where g(y):link function,
Now g() donate with ‘p’ initially• probability must always be positive(never be negative), so the linear equation
will be in exponential form. For any value of slope and dependent variable.
p = exp(βo + β(Elevation)+…..) = e^(βo + β(Elevation)+…) (b)
• To make the probability less than 1, we must divide p by a number greater than p.
p = exp(βo + β(Elevation)+…..) / exp(βo + β(Elevation)+…..) + 1 = e^(βo + β(Elevation)+…..) / e^(βo + β(Elevation)+…..) + 1 (c)
Equations
Omar F. Althuwaynee, PhD in Geomatics engineering
Using (a), (b) and (c), we can redefine the probability as:
p = e^y/ 1 + e^y (d)P= 1/1+e^(-y)
where p : is the probability of success. (d) equation is the Logit Function
A typical logistic model plot is shown below. You can see probability never goes below 0 and above 1.
Equations
Omar F. Althuwaynee, PhD in Geomatics engineering
It is nothing but a tabular representation of Actual vs Predicted values. This helps us to find the accuracy of the model and avoid overfitting.
Resource:https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/
Omar F. Althuwaynee, PhD in Geomatics engineering
Confusion Matrix
Note:• Analysis will depend only on the number of the observations,
more training observations will increase the model efficiency.• LR may produce lower prediction rate, but this value will have
higher confidence level (low uncertainty compare to bivariate)
More details , I will talk about it along each video session.
Without further ado…let us begin..!
Logistic regression
Omar F. Althuwaynee, PhD in Geomatics engineering
##http://neondataskills.org/R/Raster-Data-In-R/
##http://r-sig-geo.2731867.n2.nabble.com/How-I-make-2-rasters-with-equal-
extents-td7584918.html
##https://geoscripting-wur.github.io/IntroToRaster/
##https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-
logistic-regression-in-r/
##https://www.r-bloggers.com/how-to-perform-a-logistic-regression-in-r/
##http://www.cookbook-r.com/Statistical_analysis/Logistic_regression/
References
Omar F. Althuwaynee, PhD in Geomatics engineering