Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in...

9
Going the Distance: Google Maps Capabilities in a Friendly SAS Environment Anton Bekkerman, Ph.D., Montana State University, Bozeman, MT ABSTRACT While the GEODIST procedure allows users to calculate “as the crow flies,” straightline distances, SAS does not directly provide capabilities to calculate road distances between locations. For many users who might only have addresses (rather than geographic coordinates) and who need to determine actual road distances for optimizing routes, minimizing transportation costs, or simply translating postal addresses to geographic coordinates, existing SAS functionality may be insufficient. I demonstrate how Google Maps can be integrated with SAS to perform these functions and output the desired results within the SAS environment. That is, after a SAS user specifies a location or multiple locations (as postal addresses, city names, state names, etc.), the information is passed to Google Maps from within SAS, the underlying Google Maps HTML code with the coordinates and/or directions is retrieved and parsed, and the desired results are recorded to a SAS dataset. The entire process is completed using only a few lines of code within a single DATA step statement. Moreover, I demonstrate how the process can be easily automated for numerous location entries within a MACRO environment. A comparison of the native SAS straightline and integrated road distance methods indicates that, on average, the straightline method underestimates the true road distance by approximately 25%, and this error becomes larger as the distance between spatially separated locations increases. INTRODUCTION When you are asked to get from location A to location B, what is your first reaction? Perhaps it is to pull out your smart phone and use one of the myriad driving directions apps. Or maybe it is to access a web-based option, such as Google Maps, MapQuest, Bing, among others. Or, perhaps you may even be tempted to pull out the circa 1997 road atlas, which has had its corners chewed off by your dog (or kid), proudly displays travel mug coffee stains, and has been accumulating dust in your car’s trunk and waiting for the “just-in-case” scenario when there is neither a wifi nor cellular phone signal. 1 Regardless of your preferred method, rarely do you consider calculating distances using the “as the crow flies” method— a straightline connection between two spatially-separated points, which accounts for the Earth’s curvature but ignores the constraints associated with traveling on roads. Such constraints are manifest in routes being indirect connections between a starting and ending locations due to factors such as geological characteristics (e.g., unbridged bodies of waters), construction or road repair projects, or simply no available routes that mimic the “as the crow flies” path. Moreover, it is reasonable to assume that most travel occurs using ground transportation, rather than other methods that may be more characteristic of a straightline distance. 2 While the SAS software has continued to update and expand its spatial analysis capabilities, tools for easily determining and automating road distances between locations are not directly available. Moreover, the constantly changing road conditions and accessibility to driving routes require a dynamic method for recognizing these changes and providing the most current spatial analysis results. This paper presents a relatively straightforward method for determining road distances by integrating the Google Maps directions tool, which has developed a mechanism for optimizing transportation routes within much of North America and the world. A preliminary example demonstrates the underlying process for calling the Google Maps directions tool directly from SAS and extracting relevant distance information into a SAS dataset. The technique is then generalized to determine distances for any number of starting and ending location combinations. The presented methodology is then compared to the native distance calculation tool in SAS—the GEODIST function— which calculates the straightline distance between two spatially-separated points. The comparison analysis shows that the GEODIST function underestimates road distances by approximately 25%. Such errors can have non-trivial impacts on studies that rely on the precise understanding of distances and travel routes for estimating costs and revenues, optimizing logistics, and improving marketing efforts, among other activities. 1 Yes. From a recent personal experience, I can attest that such places still exist. 2 One could argue that travel by rail or air follow straightline routes. However, railroads are often subject to similar constraints as roads and air travel is subject to layovers in locations that prevent direct routes. 1

Transcript of Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in...

Page 1: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

Going the Distance: Google Maps Capabilities in a Friendly SAS EnvironmentAnton Bekkerman, Ph.D., Montana State University, Bozeman, MT

ABSTRACT

While the GEODIST procedure allows users to calculate “as the crow flies,” straightline distances, SAS does not directlyprovide capabilities to calculate road distances between locations. For many users who might only have addresses(rather than geographic coordinates) and who need to determine actual road distances for optimizing routes, minimizingtransportation costs, or simply translating postal addresses to geographic coordinates, existing SAS functionality maybe insufficient. I demonstrate how Google Maps can be integrated with SAS to perform these functions and output thedesired results within the SAS environment. That is, after a SAS user specifies a location or multiple locations (as postaladdresses, city names, state names, etc.), the information is passed to Google Maps from within SAS, the underlyingGoogle Maps HTML code with the coordinates and/or directions is retrieved and parsed, and the desired results arerecorded to a SAS dataset. The entire process is completed using only a few lines of code within a single DATA stepstatement. Moreover, I demonstrate how the process can be easily automated for numerous location entries withina MACRO environment. A comparison of the native SAS straightline and integrated road distance methods indicatesthat, on average, the straightline method underestimates the true road distance by approximately 25%, and this errorbecomes larger as the distance between spatially separated locations increases.

INTRODUCTION

When you are asked to get from location A to location B, what is your first reaction? Perhaps it is to pull out your smartphone and use one of the myriad driving directions apps. Or maybe it is to access a web-based option, such as GoogleMaps, MapQuest, Bing, among others. Or, perhaps you may even be tempted to pull out the circa 1997 road atlas,which has had its corners chewed off by your dog (or kid), proudly displays travel mug coffee stains, and has beenaccumulating dust in your car’s trunk and waiting for the “just-in-case” scenario when there is neither a wifi nor cellularphone signal.1

Regardless of your preferred method, rarely do you consider calculating distances using the “as the crow flies” method—a straightline connection between two spatially-separated points, which accounts for the Earth’s curvature but ignoresthe constraints associated with traveling on roads. Such constraints are manifest in routes being indirect connectionsbetween a starting and ending locations due to factors such as geological characteristics (e.g., unbridged bodies ofwaters), construction or road repair projects, or simply no available routes that mimic the “as the crow flies” path.Moreover, it is reasonable to assume that most travel occurs using ground transportation, rather than other methodsthat may be more characteristic of a straightline distance.2

While the SAS software has continued to update and expand its spatial analysis capabilities, tools for easily determiningand automating road distances between locations are not directly available. Moreover, the constantly changing roadconditions and accessibility to driving routes require a dynamic method for recognizing these changes and providingthe most current spatial analysis results. This paper presents a relatively straightforward method for determiningroad distances by integrating the Google Maps directions tool, which has developed a mechanism for optimizingtransportation routes within much of North America and the world. A preliminary example demonstrates the underlyingprocess for calling the Google Maps directions tool directly from SAS and extracting relevant distance information into aSAS dataset. The technique is then generalized to determine distances for any number of starting and ending locationcombinations.

The presented methodology is then compared to the native distance calculation tool in SAS—the GEODIST function—which calculates the straightline distance between two spatially-separated points. The comparison analysis shows thatthe GEODIST function underestimates road distances by approximately 25%. Such errors can have non-trivial impactson studies that rely on the precise understanding of distances and travel routes for estimating costs and revenues,optimizing logistics, and improving marketing efforts, among other activities.

1Yes. From a recent personal experience, I can attest that such places still exist.2One could argue that travel by rail or air follow straightline routes. However, railroads are often subject to similar constraints as roads and air travel is

subject to layovers in locations that prevent direct routes.

1

Page 2: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

NATIVE SAS DISTANCE CALCULATION TOOLS

The GEODIST function is used to calculate distances between two geographic coordinates using the Haversineformula (SAS Institute, Inc. 2011). The formula determines the shortest, straightline distance between two coordinates,accounting for the approximate curvature of the Earth. The function requires four arguments—the latitude and longitudeof the starting location and the latitude and longitude of the destination. While manually obtaining these coordinatesfrom postal addresses or location names is not overly costly when dealing with only a few locations, increasing thenumber of observations can become expensive or even impractical.3 Using a known set of coordinates, the GEODISTfunction can be called within the DATA step as follows:

distance = geodist(latitudeStart,longitudeStart,latitudeEnd,longitudeEnd,’M’);

where latitudeStart and longitudeStart represent columns of the latitude and longitude coordinates of thestarting locations, latitudeEnd and longitudeEnd represent columns of the latitude and longitude coordinates ofthe destinations, and distance is the column containing the resulting straightline distances. The option ‘M’ requeststhat the distance is output in miles rather than kilometers, which are the default units.

The straighline route is rarely the same as the driving route between the two locations. Moreover, it is expected thatthe difference between the two alternatives will be more substantial as the distance between two locations increases.Figure 1 provides a visual comparison of the straightline distance and one that is based on drivable routes betweenBozeman, MT and Las Vegas, NV. The figure makes evident the constraints that bind road travel but not necessarilystraightline approximations.

INTEGRATING GOOGLE MAPS

As shown in Figure 1, the Google Maps directions tool can be used to obtain a more precise estimate of drivingdistances. This is the underlying mechanism for generating driving distance data within SAS. The following SAS codedemonstrates a basic framework for performing the SAS—Google Maps integration.

%let addr1 = Bozeman,MT;%let addr2 = Las+Vegas,NV;

filename google url "http://maps.google.com/maps?daddr=&addr2.%nrstr(&saddr)=&addr1";

data dist(drop=html);infile google recfm=f lrecl=10000;input @ ’<div class="altroute-rcol altroute-info"> <span>’ @;input html $50.;if _n_ = 1;locStart = "&addr1";locEnd = "&addr2";roaddistance = input(scan(html,1," "),comma12.);

run;

proc print data=dist noobs; run;

The MACRO variables addr1 and addr2 specify the starting and ending locations, respectively, and are the only user-input variables. The URL google requests that Google Maps generates driving directions between the two specifiedlocations. The HTML code underlying the route displayed in Google Maps is then read into SAS and parsed withinthe DATA step. The third line of the DATA step specifies that SAS begins to parse the HTML code beginning after theline <div class="altroute-rcol altroute-info"> <span>. That is, the DATA step eliminates all text thatprecedes location where the road distance value is reported. Lastly, the SCAN function is used to extract the roaddistance value into the SAS dataset. Table 1 shows the contents of the resulting dist dataset.

3The Appendix presents SAS code that helps automate the process for obtaining geographic coordinates for postal addresses and location names.

Users can also use the GEOCODE procedure, but a detailed discussion of this procedure is out of the scope of this paper.

2

Page 3: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

Figure 1: Comparison of Straightline and Driving Routes Between Bozeman, MT and Las Vegas, NV

Source: The map was generated using Google Maps.Notes: The starting location is Bozeman, MT (45.682677,-111.053288) and the ending location is Las Vegas, NV (36.116799,-115.174534).

3

Page 4: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

Table 1: Contents of the dist Dataset: Road Distance Information

locStart locEnd roaddistanceBozeman,MT Las+Vegas,NV 832

Of course, the advantages of using this approach are minimal when determining distances for one or a few locationpairs—users can go directly to Google Maps and obtain the same outputs. Substantial improvements in efficiency (andcost-savings) begin to be realized when road distances need to be recorded for a large number of location pairs. Forexample, consider a courier service that has three warehouses from where packages could be delivered to customers.The courier service may be interested in understanding how to efficiently allocate delivery packages to the warehousessuch that the final delivery distances are minimized. This requires that the courier service determines the drivingdistances from each of the three warehouses to the final destinations. The following data represent randomly generatedcourier service warehouse sites and customer locations in the Bozeman, MT area.

data courier;input warehouse_address & $19. warehouse_city $ & warehouse_state $

customer_address & $19. customer_city $ & customer_state $ ;datalines;8250 Huffine Lane Bozeman MT 2884 Caterpillar Dr. Bozeman MT8250 Huffine Lane Bozeman MT 408 S 12th Ave. Bozeman MT8250 Huffine Lane Bozeman MT 30 Main Street Belgrade MT6553 N 19th Ave Bozeman MT 2884 Caterpillar Dr. Bozeman MT6553 N 19th Ave Bozeman MT 408 S 12th Ave. Bozeman MT6553 N 19th Ave Bozeman MT 30 Main Street Belgrade MT1340 Kagy Blvd Bozeman MT 2884 Caterpillar Dr. Bozeman MT1340 Kagy Blvd Bozeman MT 408 S 12th Ave. Bozeman MT1340 Kagy Blvd Bozeman MT 30 Main Street Belgrade MT...;run;

The following MACRO uses the location pair information in the courier dataset and creates an output datasetcontaining the driving distance for each pair.

/**********************************************************************//* Purpose: Determine road distances for location pairs *//* Author: Anton Bekkerman *//* *//* User inputs: *//* input = name of SAS input dataset *//* (e.g., libname.inputName) *//* output = name of SAS output dataset *//* (if empty, then libname.inputName_dist) *//* startAddr = variable name of starting location address *//* (variable content example: 555 StreetName Dr.) *//* startCity = variable name of starting location city *//* (variable content example: Bozeman) *//* startSt = variable name of starting location state *//* (variable content example: MT) *//* endAddr = variable name of destination address *//* endCity = variable name of destination city *//* endSt = variable name of destination state *//**********************************************************************/

4

Page 5: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

%macro road(input,output,startAddr,startCity,startSt,endAddr,endCity,endSt);

/* Check if input data set exists; otherwise, throw exception */%if %sysfunc(exist(&input))ˆ=1 %then %do;

data _null_;file print;put #3 @10 "Data set &input. does not exist";

run;%abort;

%end;

/* Check if user specified output dataset name; otherwise, create default */%if &outputˆ="" %then %let outData=&output;

%else %let outData = &input._dist;

/* Replace all inter-word spaces with plus signs */data tmp; set &input;

addr1 = tranwrd(left(trim(&startAddr))," ","+")||","||tranwrd(left(trim(&startCity))," ","+")||","||left(trim(&startSt));

addr2 = tranwrd(left(trim(&endAddr))," ","+")||","||tranwrd(left(trim(&endCity))," ","+")||","||left(trim(&endSt));

n = _n_;run;

data _NULL_;if 0 then set tmp nobs=n;call symputx("nObs",n); stop;

run;

%do i=1 %to &nObs;

/* Place starting and ending locations into macro variables */data _null_; set tmp(where=(n=&i));call symput("addr1",trim(left(addr1)));call symput("addr2",trim(left(addr2)));run;

/* Determine road distance*/options noquotelenmax;filename google url "http://maps.google.com/maps?daddr=&addr2.%nrstr(&saddr)=&addr1";data dist(drop=html);

infile google recfm=f lrecl=10000;input @ ’<div class="altroute-rcol altroute-info"> <span>’ @;input html $50.;if _n_ = 1;roaddistance = input(scan(html,1," "),comma12.);

run;data dist; merge tmp(where=(n=&i)) dist; run;

/* Append to output dataset */%if &i=1 %then %do;

data &outData; set dist(drop=n addr:); run;%end;%else %do;

proc append base=&outData data=dist(drop=n addr:) force; run;

5

Page 6: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

%end;%end;

/* Delete the temporary dataset */proc datasets library=work noprint;

delete tmp;quit;%mend;

The MACRO road is used to evaluate road distances for the destinations contained in the courier dataset. Table 2presents an abbreviated representation of the resulting output data, courier dist. These data can now be used toevaluate the optimal courier warehouse location (conditional on distance to final destination) to minimize the total costsfor delivering packages to their final destinations.

Table 2: Contents of the courier dist Dataset: Road Distance Information for Multiple Location Pairs

Warehouse DestinationAddress City State Address City State Road Distance (miles)

8250 Huffine Lane Bozeman MT 2884 Caterpillar Dr Bozeman MT 6.38250 Huffine Lane Bozeman MT 408 S 12th Ave. Bozeman MT 6.48250 Huffine Lane Bozeman MT 30 Main Street Belgrade MT 8.16553 N 19th Ave Bozeman MT 2884 Caterpillar Dr Bozeman MT 6.26553 N 19th Ave Bozeman MT 408 S 12th Ave. Bozeman MT 4.86553 N 19th Ave Bozeman MT 30 Main Street Belgrade MT 14.11340 Kagy Blvd Bozeman MT 2884 Caterpillar Dr Bozeman MT 3.21340 Kagy Blvd Bozeman MT 408 S 12th Ave. Bozeman MT 1.31340 Kagy Blvd Bozeman MT 30 Main Street Belgrade MT 11.1

......

......

......

AN EMPIRICAL COMPARISON OF METHODS

As noted above and shown in Figure 1, there is likely a discrepancy between the straightline and driving directionsmethods for calculating distances. However, if the discrepancy is only trivial, then using the integrated Google Mapsapproach may be a cost-ineffective approach.

To evaluate whether the dissimilarities are statistically significant and quantify the potential error, I use the roadMACRO and the GEODIST function to determine distances using a large number of location pairs. As an example,the comparison is made using the travel distance between the locations of four- and two-year universities in California,Colorado, Montana, Nevada (excluding those in Las Vegas), Oregon, Washington, and Wyoming and Las Vegas, NV—the 2013 location of the Western Users of SAS Software annual conference. The resulting dataset yielded a total of272 location pairs.

Figure 2 shows a comparison of these distances across all location pairs, across pairs that are separated by less thanor equal to 500 miles, and across locations that are separated by a distance greater than 500 miles. In each case, thestraightline approximation underestimates the road distance. More importantly, this difference is statistically significantacross all scenarios. This suggests that using straightline distances as approximations to road distances could lead toinaccurate inferences. In the sample used for this example, the average error is approximately 25%—that is, the roaddistance is underestimated by approximately 25% when using the straightline approach.

The results also indicate that the error is larger when two spatially separated locations are farther apart. This isgenerally observable in Figure 2, but is much clearly observed in Figure 3. The latter figure shows that as the distancebetween two location pairs increases, so does the degree of underestimation due to the use of a straighline distanceapproximation.

6

Page 7: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

Figure 2: Comparison of Haversine (Straightline) and Road Distances Across 272 Location Pairs

Source: Figure generated by the author.Notes: Bar heights indicate average distances and bands represent 95% confidence limits.

7

Page 8: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

Figure 3: Percent Underestimation of Road Distance when Using the Straightline Distance Approximation

Source: Figure generated by the author.

CONCLUSION

The capabilities for spatial analysis continues to rapidly improve in SAS, but there remain aspects that require additionalexternal resources. One such deficiency is the ability to calculate road distances between spatially separated locations.While the GEODIST function offers an approximation (which is appropriate to use in some cases), a more precisemechanism is not currently available. This imprecision is relatively straightforward to overcome by using the geocodingand driving direction functions of Google Maps. The Google Maps directions feature becomes even more powerfulwhen it is coupled with SAS, enabling users to easily automate the road distance data collection process. This allowsusers to determine road distances across large datasets and immediately employ these data for statistical analyses.Being able to obtain a more detailed and precise understanding of distances can substantially improve individuals’ andcompanies’ abilities to optimize their decisions and strategies, and can have significant economic impacts.

APPENDIX: DETERMINING LATITUDE AND LONGITUDE COORDINATES

The following code uses the SAS—Google Maps integration to geocode an address or location (determine the latitudeand longitude coordinates).

%let addr1 = Bozeman,MT;filename google url "http://maps.google.com/maps?q=&addr1";data location(keep=lat long);

infile google recfm=f lrecl=10000;input @ ’viewport:{center:{’ @;input html $50.;if _n_ = 1;

ystart = index(html,"lat:");yend = index(html,",lng");xstart = index(html,"lng:");

8

Page 9: Going the Distance: Google Maps Capabilities in a Friendly ...INTEGRATING GOOGLE MAPS As shown in Figure1, the Google Maps directions tool can be used to obtain a more precise estimate

xend = index(html,"},span");lat = input(substr(html,ystart+4,yend-1),best8.);long = input(substr(html,xstart+4,xend-1),best11.);

run;

REFERENCES

SAS Institute, Inc. 2011. SAS/STAT 9.3 Users Guide, Cary, NC: SAS Institute Inc.

CONTACT INFORMATION

All SAS code described in this paper can be accessed by clicking here or by visiting the “Tools/Code” tab on the websitelisted below. Please address comments and questions to:

Anton Bekkerman, Ph.D.205 Linfield HallMontana State UniversityP.O. Box 172920Bozeman, MT 59717-2920Phone: (406) [email protected]://www.montana.edu/bekkerman

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS InstituteInc. in the USA and other countries. R© indicates USA registration.

Other brand and product names are trademarks of their respective companies.

9