Extending ArcGIS with R
description
Transcript of Extending ArcGIS with R
Technical Workshops
Extending ArcGIS with R
Mark Janikas, Ph. D.
Outline
• Introduction via demonstration- Analyzing the future of the Wolverine in
Washington State
• Description of R- Why/When to use it?
• Integration options- How to use it?
• Conclusions and Future Directions
Last Year: Point Clustering (Partitioning)
• Cluster a given a set of point locations:- Spatial Proximity- Attributes Values (Including Time)- Cancer Clusters in New Mexico - Partitioning Earthquakes in California
The Problem
• We want to examine what the potential effects of climate change will be on the distribution of animal species
• We have the known current locations of the distributions of the species
• We have a series of independent variables including- Vegetation type (as dummy variables)
- Elevation, slope, and aspect
- Distance from roads and cities
- Etc.
The climate data
• We have two climate change models - Hadley (from the UK)
- MIROC 3.2 (from Japan)
• Each model has two scenarios- The moderate, mid-level “A1B” carbon scenario
- The higher, more extreme “A2” carbon scenario
• There are three time periods- “e”: Early-century, or 2020-2024 averaged
- “m”: Mid-century, or 2050-2054 averaged
- “l”: Late-century, or 2095-2099 averaged
From Ron Nielson’s group at Oregon State University/ US Forest Service
Special Thanks to Kevin Johnston
The model
• Sample points created from the raster data
• Tools created to run R logistics regression
• Fit model
• Coefficients and diagnostics statistics
• Use coefficients to create a raster surface
Creating the raster surface
• Apply the logistics formula with coefficients
• Select for probability of .5 or greater
• Repeat for each model, for each scenario, and for time period
1 / (1 + exp( -1 * (9.595857 + (-1.28212 * tmp1991) + (-0.003687 * ppt1991) + (0.426121 * veg8_10) + (-0.560821 * veg7_10) + (-2.077026 * veg6_10) + (-2.941375 * veg2_10) + (-0.496024 * veg17_10) + (-1.740473 * veg16_10) + (0.557113 * veg12_10) + (-7.103907 * veg10_10) + (0.016223 * slope) + (-0.000674 * elevation) + (-0.000555 * aspect) + (-0.000062 * disthigh) + (0.000049 * distcity))))
Demo Title: Climate Change
Sampling
Logistics regression
Creating a surface
What is R? Why should I use it?
• R (The R Project for Statistical Computing) is an open-source data analysis package. (GNU S)- Widely Used
- Over 60 CRAN sites across 30+ countries
- Its Free- GNU GENERAL PUBLIC LICENSE
- Base is powerful- Statistics, Linear Algebra, Visualization , etc…
- Its extendible- 1800+ Contributed Extensions
- splancs, spatstat, spdep, rgdal, maptools, shapefiles
Integration with ArcGIS
• Two Integration Options With ArcGIS
• Both require Python
• Both have pros and cons
• ESRI UC Plenary 2008- predicting plant species in unknown areas
Integration: R Option
• Decouples R and Python
• Python - Retrieves and organize parameters from ArcGIS
- Convert Data (Interchange)- Shapefiles, netcdf, img etc….
- Spawns R given the *.r file with provided parameters
• R- Does the analysis
• Python- Post-Processing
- Projecting Data, Applying Symbology
Integration: R Option (Cont…)
ArcGISArcGIS R ScriptR ScriptPython ScriptPython Script
Integration: RPy Option
• R and Python closely coupled
• RPy and RPy2- Python Interface to the R Programming Language
• Python- Retrieves and organize parameters from ArcGIS
- RPy module is imported and R commands are executed within the Python script file
Integration: RPy Option (Cont…)
ArcGISArcGIS
Python ScriptPython Script
R ProcessingR Processing
Which One Should I Use?
• R Option vs RPy (RPy2) Options
• http://resources.arcgis.com/gallery/file/geoprocessing
• ArcGIS 9.3 vs. ArcGIS 10
Which One Should I Use? (Cont…)
• R Option- Integration Easy to Implement
- Attractive to R Programmers
- “Out of Proc”: Spawning R on every execute
- Use Copy Features- Shapefiles
- selection sets
- Projections and other environment variables
- You must use an R library for handling shapefiles- maptools, shapefiles
- Two files per script tool (*.py and *.r)
- Debugging can be difficult
Which One Should I Use? (Cont…)
• RPy / RPy2 Option- For more advanced users (Python and R knowledge)
- “In Process”- Will be MUCH faster after the first call
- Honors selection sets
- A robust choice of database formats
- Will honor environment settings (GP Functions)
- Only a single file associated with your script tool
- RPy (First Generation)- Existing Bug with GP Python Framework
- RPy2 (“Second” Generation 2.0.x)- Interaction with NumPy arrays incomplete
- RPy2 (2.1.x)- No Windows Binaries
•
Conclusions
• R - contains “cutting edge” data analysis techniques from
a wide body of academic and applied fields
- extendible
- Open-source
• Can be integrated with ArcGIS using Python- R versus RPy (RPy2)
- Pros and Cons
Future Directions
• RPy2
• Web Portal: RTools- spdep, spatstat and splancs
- SAR/CAR Regression, Point Pattern Analysis
• ArcNews Article
• MATLAB, SPSS, PySAL
• Calling Python from R- Leveraging geoprocessing within the R environment
- RSPython: http://www.omegahat.org/RSPython/
Links
• R- http://www.r-project.org/index.html
• Rpy and RPy2- http://rpy.sourceforge.net/
• Python - http://www.python.org/
• NumPy - http://www.numpy.org/
Related Sessions
• Spatial Pattern Analysis
• Regression Analysis for Spatial Data with ArcGIS
• Geostatistical Analyst
• ArcGIS Spatial Analyst – Statistical Modeling
• Agent-Based Modeling in ArcGIS
• Python Essentials in ArcGIS