Getting Data Science with R and ArcGIS
Transcript of Getting Data Science with R and ArcGIS
GettingDataSciencewithRandArcGISShaunWalbridge
MarkJanikas
MarjeanPobuda
https://github.com/scw/r-devsummit-2016-talk
HandoutPDF
HighQualityPDF(4MB)
ResourcesSection
DataScience
DataScienceAmuch-hypedphrase,buteffectivelyisabouttheapplicationofstatisticsandmachinelearningtoreal-worlddata,anddevelopingformalizedtoolsinsteadofone-offanalyses.Combinesdiversefieldstosolveproblems.
DataScienceWhat'sadatascientist?
“Adatascientistissomeonewhoisbetteratstatisticsthananysoftwareengineerandbetteratsoftwareengineeringthananystatistician.”—JoshWills
DataScienceUsgeographicfolksalsorelyonknowledgefrommultipledomains.Weknowthatspatialismorethanjustanxandycolumninatable,
andhowtogetvalueoutofthisdata.
DataScienceLanguages
Languagescommonlyusedindatascience:
R— Python— Matlab— Julia
We'reabigPythonshop,sowhyR?
RvsPythonforDataScience
R
Why ?PowerfulcoredatastructuresandoperationsDataframes,functionalprogramming
UnparalleledbreadthofstatisticalroutinesThedefactolanguageofStatisticians
CRAN:6400packagesforsolvingproblemsVersatileandpowerfulplotting
Why ?PowerfulcoredatastructuresandoperationsDataframes,functionalprogramming
UnparalleledbreadthofstatisticalroutinesThedefactolanguageofStatisticians
CRAN:6400packagesforsolvingproblemsVersatileandpowerfulplotting
WeassumebasicproficiencyprogrammingSeeresourcesforadeeperdiveintoR
RDataTypesyou'reusedtoseeing...
Numeric-Integer-Character-Logical-timestampDatatypes
RDataTypesyou'reusedtoseeing...
Numeric-Integer-Character-Logical-timestamp...butothersyouprobablyaren't:
vector-matrix-data.frame-factor
Datatypes
RDataTypes
Vector:
Matrix:
Examplesource
a.vector <- c(4, 3, 8, 7, 1, 5)
A = matrix( c(4, 3, 8, 7, 1, 5), # same data as above nrow=2, ncol=3, # what's the shape of the data? byrow=TRUE) # what order are the values in?
RDataTypesDataFrames:
Treatstabular(andmulti-dimensional)dataasalabeled,indexedseriesofobservations.Soundssimple,butisagamechangerovertypicalsoftwarewhichisjustdoing2Dlayout(e.g.Excel)
RDataTypes# Create a data frame out of an existing tabular sourcedf.from.csv <- read.csv("data/growth.csv", header=TRUE)
# Create a data frame from scratchquarter <- c(2, 3, 1) person <- c("Goodchild", "Tobler", "Krige") met.quota <- c(TRUE, FALSE, TRUE)df <- data.frame(person, met.quota, quarter)
R> df person met.quota quarter1 Goodchild TRUE 22 Tobler FALSE 33 Krige TRUE 1
0D:SpatialPoints1D:SpatialLines2D:SpatialPolygons3D:Solid4D:Space-time
spTypes
Entity+Attributemodel
DataSciencewithR
HadleyStackDeveloperatRStudio,ProfessoratRiceUniversityggplot2,scales,dplyr,devtools,manyothers
HadleyWickham
StatisticalFormulas
DomainspecificlanguageforstatisticsSimilarpropertiesinotherpartsofthelanguage
formodelspecificationconsistency
fit.results <- lm(pollution ~ elevation + rainfall + ppm.nox + urban.density)
caret
LiterateProgramming
packages:RMarkdown,Roxygen2Jupyternotebooks
Ibelievethatthetimeisripeforsignificantlybetterdocumentationofprograms,andthatwecanbestachievethisbyconsideringprogramstobeworksofliterature.—DonaldKnuth,“LiterateProgramming”
DevelopmentEnvironments
néeIPythonbrandnewRToolsforVisualStudio
DevelopmentEnvironments
néeIPythonbrandnew
Bestofclasstoolsforinteractingwithdata.
RToolsforVisualStudio
dplyrPackageBatting %.% group_by(playerID) %.% summarise(total = sum(G)) %.% arrange(desc(total)) %.% head(5)
Introducingdplyr
RChallengesPerformanceissuesNotageneralpurposelanguageLackspurelyUImodeofinteraction(e.g.plotsmustbemanuallyspecified)Programmeronly.Thereisshiny,butRisfirstandforemostalanguagethatexpectsfluencyfromitsusers
R—ArcGISBridge
R—ArcGISBridge
ArcGISdeveloperscancreatecustomtoolsandtoolboxesthatintegrateArcGISandRArcGISuserscanaccessRcodethroughgeoprocessingscriptsRuserscanaccessorganizationsGIS'data,managedintraditionalGISways
https://r-arcgis.github.io
R—ArcGISBridgeStoreyourdatainArcGIS,accessitquicklyinR,returnRobjectsback
toArcGISnativedatatypes(e.g.geodatabasefeatureclasses).
Knowshowtoconvertspatialdatatospobjects.PackageDocumentation
ArcGISvsRDataTypesArcGIS R ExampleValue
AddressLocator
Character Address Locators\\MGRS
Any Character
Boolean Logical
CoordinateSystem
Character "PROJCS[\"WGS_1984_UTM_Zone_19N\"...
Dataset Character "C:\\workspace\\projects\\results.shp"Date Character "5/6/2015 2:21:12 AM"Double Numeric 22.87918
ArcGISvsRDataTypesArcGIS R ExampleValue
Extent Vector(xmin,ymin,xmax,ymax)
c(0,-591.561,1000,992)
Field Character
Folder Character fullpath,usewithe.g.file.info()
Long Long 19827398L
String Character
TextFile Character fullpath
Workspace Character fullpath
AccessArcGISfromRStartbyloadingthelibrary,andinitializingconnectiontoArcGIS:
# load the ArcGIS-R bridge librarylibrary(arcgisbinding)# initialize the connection to ArcGIS. Only needed when running directly from R.arc.check_product()
AccessArcGISfromROpeningdatahastwostages,likedatacursors:
Opendatasourcewitharc.openSelectwithfilteringwitharc.selectSimilartousingarcpy.dacursors
AccessArcGISfromRFirst,selectadatasource(canbeafeatureclass,alayer,oratable):
Then,filterthedatatothesetyouwanttoworkwith(createsin-memorydataframe):
ThiscreatesanArcGISdataframe--lookslikeadataframe,butretainsreferencesbacktothegeometrydata.
input.fc <- arc.open('data.gdb/features')
filtered.df <- arc.select(input.fc, fields=c('fid', 'mean'), where_clause="mean < 100")
AccessArcGISfromRNow,ifwewanttodoanalysisinRwiththisspatialdata,weneedittoberepresentedasspobjects.arc.data2spdoestheconversion
forus:
arc.sp2datainvertsthisprocess,takingspobjectsandgeneratingArcGIScompatibledataframes.
df.as.sp <- arc.data2sp(filtered.df)
AccessArcGISfromRFinishedwithourworkinR,wanttogetthedatabacktoArcGIS.Writeourresultsbacktoanewfeatureclass,witharc.write:
arc.write('data.gdb/new_features', results.df)
AccessArcGISfromRWKTtoproj.4conversion:
Interactingdirectlywithgeometries:
Geoprocessingsessionspecific:
arc.fromP4ToWkt, arc.fromWktToP4
arc.shapeinfo, arc.shape2sp
arc.progress_pos, arc.progress_label, arc.env (read only)
BuildingRScriptTools
BuildingRScripttoolstool_exec <- function(in_params, out_params) { # the first input parameter, as a character vector input.features <- in_params[[1]]
# alternatively, can access by the parameter name: input.input <- in_params$input_features print(input.dataset) # ... next, do analysis steps
# this will be returned as the "Output Graphs" parameter. out_params[[1]] <- plot(results.dataset) return(out_params)}
RArcGISBridgeDemoDetailsofmodelbasedclusteringanalysisintheRSampleTools
TheHowandWhere
HowToInstallInstallwiththeRbridgeinstallDetailedinstallationinstructions
WhereCanIRunThis?
WhereCanIRunThis?Now:First, 3.1orlaterArcGISPro(64-bit)1.1orlaterArcGIS10.3.1orlater:32-bitRbydefaultinDesktop64-bitRavailableviaServerandBackgroundGeoprocessing
Upcoming:CondaformanagingRenvironments
installR
Resources
OtherSessionsIntegratingOpen-sourceStatisticalPackageswithArcGISPython:DevelopingGeoprocessingToolsHarnessingthePowerofPythoninArcGISUsingtheCondaDistributionPython:WorkingwithScientificData
RLookingforapackagetosolveaproblem?Usethe .
TonsofgoodbooksandresourcesonRavailable,checkouttheenginetofindresourcesforthelanguagewhichcanbe
difficulttolocatebecauseofthename.
CRANTaskViews
RSeek
RPackagesbyHadleyWickham
SpatialR/DataScience Afreeand
accessibleversionoftheclassicinthefield,ElementsofStatisticalLearning.
AnIntroductiontoStaisticalLearning(PDF) website
GettingStartedinDataScience
ArcGIS+RDemoof
CamPlouffe(EsriCA)ranan ,coversmaterialsinmoredepth.
UCPlenaryDemo:StatisticalIntegrationwithRSSN:spatialmodelingonstreamnetworks
RArcGISWorkshop
MaterialsCourses:
Books:
KonstantinKrivoruchko(GAcreator)Toobigtoprint.Tonsofusefulstuff,coversbothRandArcGISextensively.
HighPerformanceScientificComputingTheDataScientist'sToolbox
SpatialStatisticalDataAnalysisforGISUsers
PackagesClusteringdemocoversmclustandsp.Tree-basedmodels,e.g.Timeseriesdata,e.g.
CARTLittleBookofR
RArcGISExtensions
CombinesPython,R,andMATLABtosolveawidevarietyofproblems
AnRflavoredlanguageforspatialanalysis
RArcGISBridgeMarineGeospatialEcologyTools(MGET)
GeospatialModelingEnvironment
ConferencesuseR2016isbeingheldatStanfordJune27-30
Manyhappeningaroundworld,someupcomingones:ODSCEastMay20-22inBostonODSCWestNov4-6inSantaClara
useR!Conference
OpenDataScienceConference(ODSC)
Closing
OutreachResourcesandoutreach--connectthedots,wantthistobeoutreachsowecanbuildupmoreR+ArcGISpeoplewhoaren'tascommonasourcorelanguagefolks.Futureoftheproject,questions
CommunityOpensourceproject,differentethosContributionsarethecurrencyThatsaid,majoruptakeinthecommercialspace:MicrosoftR(boughtRevolutionAnalytics);RStudio
Ourinvolvement:RecentlyhostedaSpace-timeStatisticsSummitMoresoon
ThanksRteam:DmitryPavlushko,SteveKopp,KonstantinKrivoruchko;today'sspeakers
GeoprocessingTeamContactUs
RateThisSessioniOS,Android:Feedbackfromwithintheapp
RateThisSessioniOS,Android:Feedbackfromwithintheapp
WindowsPhone,ornosmartphone?Cuneiformtabletsaccepted.