Getting Data Science with R and ArcGIS

58
Getting Data Science with R and ArcGIS Shaun Walbridge Mark Janikas Marjean Pobuda

Transcript of Getting Data Science with R and ArcGIS

Page 1: Getting Data Science with R and ArcGIS

GettingDataSciencewithRandArcGISShaunWalbridge

MarkJanikas

MarjeanPobuda

Page 3: Getting Data Science with R and ArcGIS

DataScience

Page 4: Getting Data Science with R and ArcGIS

DataScienceAmuch-hypedphrase,buteffectivelyisabouttheapplicationofstatisticsandmachinelearningtoreal-worlddata,anddevelopingformalizedtoolsinsteadofone-offanalyses.Combinesdiversefieldstosolveproblems.

Page 5: Getting Data Science with R and ArcGIS

DataScienceWhat'sadatascientist?

“Adatascientistissomeonewhoisbetteratstatisticsthananysoftwareengineerandbetteratsoftwareengineeringthananystatistician.”—JoshWills

Page 6: Getting Data Science with R and ArcGIS

DataScienceUsgeographicfolksalsorelyonknowledgefrommultipledomains.Weknowthatspatialismorethanjustanxandycolumninatable,

andhowtogetvalueoutofthisdata.

Page 7: Getting Data Science with R and ArcGIS

DataScienceLanguages

Languagescommonlyusedindatascience:

R— Python— Matlab— Julia

We'reabigPythonshop,sowhyR?

RvsPythonforDataScience

Page 8: Getting Data Science with R and ArcGIS

R

Page 9: Getting Data Science with R and ArcGIS

Why ?PowerfulcoredatastructuresandoperationsDataframes,functionalprogramming

UnparalleledbreadthofstatisticalroutinesThedefactolanguageofStatisticians

CRAN:6400packagesforsolvingproblemsVersatileandpowerfulplotting

Page 10: Getting Data Science with R and ArcGIS

Why ?PowerfulcoredatastructuresandoperationsDataframes,functionalprogramming

UnparalleledbreadthofstatisticalroutinesThedefactolanguageofStatisticians

CRAN:6400packagesforsolvingproblemsVersatileandpowerfulplotting

WeassumebasicproficiencyprogrammingSeeresourcesforadeeperdiveintoR

Page 11: Getting Data Science with R and ArcGIS

RDataTypesyou'reusedtoseeing...

Numeric-Integer-Character-Logical-timestampDatatypes

Page 12: Getting Data Science with R and ArcGIS

RDataTypesyou'reusedtoseeing...

Numeric-Integer-Character-Logical-timestamp...butothersyouprobablyaren't:

vector-matrix-data.frame-factor

Datatypes

Page 13: Getting Data Science with R and ArcGIS

RDataTypes

Vector:

Matrix:

Examplesource

a.vector <- c(4, 3, 8, 7, 1, 5)

A = matrix( c(4, 3, 8, 7, 1, 5), # same data as above nrow=2, ncol=3, # what's the shape of the data? byrow=TRUE) # what order are the values in?

Page 14: Getting Data Science with R and ArcGIS

RDataTypesDataFrames:

Treatstabular(andmulti-dimensional)dataasalabeled,indexedseriesofobservations.Soundssimple,butisagamechangerovertypicalsoftwarewhichisjustdoing2Dlayout(e.g.Excel)

Page 15: Getting Data Science with R and ArcGIS

RDataTypes# Create a data frame out of an existing tabular sourcedf.from.csv <- read.csv("data/growth.csv", header=TRUE)

# Create a data frame from scratchquarter <- c(2, 3, 1) person <- c("Goodchild", "Tobler", "Krige") met.quota <- c(TRUE, FALSE, TRUE)df <- data.frame(person, met.quota, quarter)

R> df person met.quota quarter1 Goodchild TRUE 22 Tobler FALSE 33 Krige TRUE 1

Page 16: Getting Data Science with R and ArcGIS

0D:SpatialPoints1D:SpatialLines2D:SpatialPolygons3D:Solid4D:Space-time

spTypes

Entity+Attributemodel

Page 17: Getting Data Science with R and ArcGIS

DataSciencewithR

Page 18: Getting Data Science with R and ArcGIS

HadleyStackDeveloperatRStudio,ProfessoratRiceUniversityggplot2,scales,dplyr,devtools,manyothers

HadleyWickham

Page 19: Getting Data Science with R and ArcGIS

StatisticalFormulas

DomainspecificlanguageforstatisticsSimilarpropertiesinotherpartsofthelanguage

formodelspecificationconsistency

fit.results <- lm(pollution ~ elevation + rainfall + ppm.nox + urban.density)

caret

Page 20: Getting Data Science with R and ArcGIS

LiterateProgramming

packages:RMarkdown,Roxygen2Jupyternotebooks

Ibelievethatthetimeisripeforsignificantlybetterdocumentationofprograms,andthatwecanbestachievethisbyconsideringprogramstobeworksofliterature.—DonaldKnuth,“LiterateProgramming”

Page 21: Getting Data Science with R and ArcGIS

DevelopmentEnvironments

néeIPythonbrandnewRToolsforVisualStudio

Page 22: Getting Data Science with R and ArcGIS

DevelopmentEnvironments

néeIPythonbrandnew

Bestofclasstoolsforinteractingwithdata.

RToolsforVisualStudio

Page 23: Getting Data Science with R and ArcGIS

dplyrPackageBatting %.% group_by(playerID) %.% summarise(total = sum(G)) %.% arrange(desc(total)) %.% head(5)

Introducingdplyr

Page 24: Getting Data Science with R and ArcGIS

RChallengesPerformanceissuesNotageneralpurposelanguageLackspurelyUImodeofinteraction(e.g.plotsmustbemanuallyspecified)Programmeronly.Thereisshiny,butRisfirstandforemostalanguagethatexpectsfluencyfromitsusers

Page 25: Getting Data Science with R and ArcGIS

R—ArcGISBridge

Page 26: Getting Data Science with R and ArcGIS

R—ArcGISBridge

ArcGISdeveloperscancreatecustomtoolsandtoolboxesthatintegrateArcGISandRArcGISuserscanaccessRcodethroughgeoprocessingscriptsRuserscanaccessorganizationsGIS'data,managedintraditionalGISways

https://r-arcgis.github.io

Page 27: Getting Data Science with R and ArcGIS

R—ArcGISBridgeStoreyourdatainArcGIS,accessitquicklyinR,returnRobjectsback

toArcGISnativedatatypes(e.g.geodatabasefeatureclasses).

Knowshowtoconvertspatialdatatospobjects.PackageDocumentation

Page 28: Getting Data Science with R and ArcGIS

ArcGISvsRDataTypesArcGIS R ExampleValue

AddressLocator

Character Address Locators\\MGRS

Any Character

Boolean Logical

CoordinateSystem

Character "PROJCS[\"WGS_1984_UTM_Zone_19N\"...

Dataset Character "C:\\workspace\\projects\\results.shp"Date Character "5/6/2015 2:21:12 AM"Double Numeric 22.87918

Page 29: Getting Data Science with R and ArcGIS

ArcGISvsRDataTypesArcGIS R ExampleValue

Extent Vector(xmin,ymin,xmax,ymax)

c(0,-591.561,1000,992)

Field Character

Folder Character fullpath,usewithe.g.file.info()

Long Long 19827398L

String Character

TextFile Character fullpath

Workspace Character fullpath

Page 30: Getting Data Science with R and ArcGIS

AccessArcGISfromRStartbyloadingthelibrary,andinitializingconnectiontoArcGIS:

# load the ArcGIS-R bridge librarylibrary(arcgisbinding)# initialize the connection to ArcGIS. Only needed when running directly from R.arc.check_product()

Page 31: Getting Data Science with R and ArcGIS

AccessArcGISfromROpeningdatahastwostages,likedatacursors:

Opendatasourcewitharc.openSelectwithfilteringwitharc.selectSimilartousingarcpy.dacursors

Page 32: Getting Data Science with R and ArcGIS

AccessArcGISfromRFirst,selectadatasource(canbeafeatureclass,alayer,oratable):

Then,filterthedatatothesetyouwanttoworkwith(createsin-memorydataframe):

ThiscreatesanArcGISdataframe--lookslikeadataframe,butretainsreferencesbacktothegeometrydata.

input.fc <- arc.open('data.gdb/features')

filtered.df <- arc.select(input.fc, fields=c('fid', 'mean'), where_clause="mean < 100")

Page 33: Getting Data Science with R and ArcGIS

AccessArcGISfromRNow,ifwewanttodoanalysisinRwiththisspatialdata,weneedittoberepresentedasspobjects.arc.data2spdoestheconversion

forus:

arc.sp2datainvertsthisprocess,takingspobjectsandgeneratingArcGIScompatibledataframes.

df.as.sp <- arc.data2sp(filtered.df)

Page 34: Getting Data Science with R and ArcGIS

AccessArcGISfromRFinishedwithourworkinR,wanttogetthedatabacktoArcGIS.Writeourresultsbacktoanewfeatureclass,witharc.write:

arc.write('data.gdb/new_features', results.df)

Page 35: Getting Data Science with R and ArcGIS

AccessArcGISfromRWKTtoproj.4conversion:

Interactingdirectlywithgeometries:

Geoprocessingsessionspecific:

arc.fromP4ToWkt, arc.fromWktToP4

arc.shapeinfo, arc.shape2sp

arc.progress_pos, arc.progress_label, arc.env (read only)

Page 36: Getting Data Science with R and ArcGIS

BuildingRScriptTools

Page 37: Getting Data Science with R and ArcGIS

BuildingRScripttoolstool_exec <- function(in_params, out_params) { # the first input parameter, as a character vector input.features <- in_params[[1]]

# alternatively, can access by the parameter name: input.input <- in_params$input_features print(input.dataset) # ... next, do analysis steps

# this will be returned as the "Output Graphs" parameter. out_params[[1]] <- plot(results.dataset) return(out_params)}

Page 38: Getting Data Science with R and ArcGIS

RArcGISBridgeDemoDetailsofmodelbasedclusteringanalysisintheRSampleTools

Page 39: Getting Data Science with R and ArcGIS

TheHowandWhere

Page 40: Getting Data Science with R and ArcGIS

HowToInstallInstallwiththeRbridgeinstallDetailedinstallationinstructions

Page 41: Getting Data Science with R and ArcGIS

WhereCanIRunThis?

Page 42: Getting Data Science with R and ArcGIS

WhereCanIRunThis?Now:First, 3.1orlaterArcGISPro(64-bit)1.1orlaterArcGIS10.3.1orlater:32-bitRbydefaultinDesktop64-bitRavailableviaServerandBackgroundGeoprocessing

Upcoming:CondaformanagingRenvironments

installR

Page 43: Getting Data Science with R and ArcGIS

Resources

Page 45: Getting Data Science with R and ArcGIS

RLookingforapackagetosolveaproblem?Usethe .

TonsofgoodbooksandresourcesonRavailable,checkouttheenginetofindresourcesforthelanguagewhichcanbe

difficulttolocatebecauseofthename.

CRANTaskViews

RSeek

RPackagesbyHadleyWickham

Page 46: Getting Data Science with R and ArcGIS

SpatialR/DataScience Afreeand

accessibleversionoftheclassicinthefield,ElementsofStatisticalLearning.

AnIntroductiontoStaisticalLearning(PDF) website

GettingStartedinDataScience

Page 47: Getting Data Science with R and ArcGIS

ArcGIS+RDemoof

CamPlouffe(EsriCA)ranan ,coversmaterialsinmoredepth.

UCPlenaryDemo:StatisticalIntegrationwithRSSN:spatialmodelingonstreamnetworks

RArcGISWorkshop

Page 48: Getting Data Science with R and ArcGIS

MaterialsCourses:

Books:

KonstantinKrivoruchko(GAcreator)Toobigtoprint.Tonsofusefulstuff,coversbothRandArcGISextensively.

HighPerformanceScientificComputingTheDataScientist'sToolbox

SpatialStatisticalDataAnalysisforGISUsers

Page 49: Getting Data Science with R and ArcGIS

PackagesClusteringdemocoversmclustandsp.Tree-basedmodels,e.g.Timeseriesdata,e.g.

CARTLittleBookofR

Page 50: Getting Data Science with R and ArcGIS

RArcGISExtensions

CombinesPython,R,andMATLABtosolveawidevarietyofproblems

AnRflavoredlanguageforspatialanalysis

RArcGISBridgeMarineGeospatialEcologyTools(MGET)

GeospatialModelingEnvironment

Page 51: Getting Data Science with R and ArcGIS

ConferencesuseR2016isbeingheldatStanfordJune27-30

Manyhappeningaroundworld,someupcomingones:ODSCEastMay20-22inBostonODSCWestNov4-6inSantaClara

useR!Conference

OpenDataScienceConference(ODSC)

Page 52: Getting Data Science with R and ArcGIS

Closing

Page 53: Getting Data Science with R and ArcGIS

OutreachResourcesandoutreach--connectthedots,wantthistobeoutreachsowecanbuildupmoreR+ArcGISpeoplewhoaren'tascommonasourcorelanguagefolks.Futureoftheproject,questions

Page 54: Getting Data Science with R and ArcGIS

CommunityOpensourceproject,differentethosContributionsarethecurrencyThatsaid,majoruptakeinthecommercialspace:MicrosoftR(boughtRevolutionAnalytics);RStudio

Ourinvolvement:RecentlyhostedaSpace-timeStatisticsSummitMoresoon

Page 55: Getting Data Science with R and ArcGIS

ThanksRteam:DmitryPavlushko,SteveKopp,KonstantinKrivoruchko;today'sspeakers

GeoprocessingTeamContactUs

Page 56: Getting Data Science with R and ArcGIS

RateThisSessioniOS,Android:Feedbackfromwithintheapp

Page 57: Getting Data Science with R and ArcGIS

RateThisSessioniOS,Android:Feedbackfromwithintheapp

WindowsPhone,ornosmartphone?Cuneiformtabletsaccepted.

Page 58: Getting Data Science with R and ArcGIS