Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop...

15
Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical tools in ArcGIS. For further information, consult the MIT Libraries, which own a number of books on spatial statistics. Search the Barton catalog or stop by any of the library circulation desks for help locating resources. ESRI’s help web site is also useful for ArcGIS-specific information: http://blogs.esri.com/Dev/blogs/geoprocessing/archive/2010/07/13/Spatial-Statistics-Resources.aspx For general information on the GIS resources MIT offers, please visit our website: http://libraries.mit.edu/gis Geodata Repository Account: You will need an MIT Geodata Repository Account before beginning this exercise. If you do not already have an account, you can create one at: http://libraries.mit.edu/gis/data/repository.html Introduction: The City of Cambridge recently received money earmarked to build community facilities for the city’s residents. After examining recent census data, the City of Cambridge discovered that its population of residents over 65 years old had increased since the last census, so they decided to build a new senior center. Your job is to suggest some possible locations for this new facility. Step 1: Adding data to your map from the MIT Geodata Repository You will add data from Cambridge found in the MIT Geodata Repository. 1. Open ArcMap. (Start > All Programs > ArcGIS > ArcMap) 2. Indicate that you want to open a new blank map. 3. If the MIT Geodata tool is not displaying at the top of your ArcMap document, right click in the toolbar area and click MIT Geodata Search Toolbar.

Transcript of Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop...

Page 1: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Introduction to Spatial Statistics

Workshop Exercise This exercise will allow you to explore basic uses for the statistical tools in ArcGIS. For further information, consult the MIT Libraries, which own a number of books on spatial statistics. Search the Barton catalog or stop by any of the library circulation desks for help locating resources. ESRI’s help web site is also useful for ArcGIS-specific information: http://blogs.esri.com/Dev/blogs/geoprocessing/archive/2010/07/13/Spatial-Statistics-Resources.aspx For general information on the GIS resources MIT offers, please visit our website: http://libraries.mit.edu/gis Geodata Repository Account:

You will need an MIT Geodata Repository Account before beginning this exercise. If you do not already have an account, you can create one at: http://libraries.mit.edu/gis/data/repository.html

Introduction: The City of Cambridge recently received money earmarked to build community facilities for the city’s residents. After examining recent census data, the City of Cambridge discovered that its population of residents over 65 years old had increased since the last census, so they decided to build a new senior center. Your job is to suggest some possible locations for this new facility. Step 1: Adding data to your map from the MIT Geodata Repository

You will add data from Cambridge found in the MIT Geodata Repository.

1. Open ArcMap. (Start > All Programs > ArcGIS > ArcMap) 2. Indicate that you want to open a new blank map. 3. If the MIT Geodata tool is not displaying at the top of your ArcMap document, right click in the

toolbar area and click MIT Geodata Search Toolbar.

Page 2: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

4. Click Search Metadata and type “Cambridge” (the search is not case sensitive). 5. Select the following layers and click “Add Selected Layer to Map”

Cambridge, MA (City Border, 2007) Cambridge, MA (Elderly Facilities, 2005) Cambridge, MA (Major Roads, 2008)

6. If you want to see more information about each of these layers click ‘View Metadata’. 7. Exit the MIT Geodata Repository Search Results dialog box 8. These data layers are currently still saved on the MIT server so you will not be able to alter

them. In order to do this, you will need to save them locally. In the Table of Contents (where all 3 layer names are listed) right click each layer name and select Data > Export data.

9. By default, the layers will be saved with the name “export_output” which is not very useful to

you. Delete the name “export_output” and save the layers with the following names:

border

eldery_facilities

major_roads

Page 3: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

10. Click “Ok” and then “yes” when asked if you want to add the layer to the map. 11. Delete the original repository layers from the table of contents by right clicking and selecting

“delete.” Hint: Their layer names will start with “sde_”

Step 2: Finding the Mean, Central Feature, and Standard Distance Many elderly live in nursing homes, assisted living facilities and low-cost apartments. You decide that that the residents of these facilities might benefit from a centrally located senior center. The location of these facilities is displayed in the “eldery_facilities” layer and you can use some of ArcMap’s statistical tools to find the center of these facilities. To calculate the Mean Center of all elderly facilities:

1. Open Arc Toolbox if it is not already open using the toolbox button . 2. Expand the Spatial Statistics Tools and then Measuring Geographic Distribution. Double click on

Mean Center. 3. For your Input Feature Class, select “eldery_facilities”. For your Output Feature Class, save the

file to the desktop and call it “elderly_facilities_mean” Click Ok.

The point representing the mean center will be added to your map.

4. Double click on the symbol used to represent the mean center in the table of contents and change the color, size, or shape to make it easier to see.

Page 4: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical
Page 5: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Next, calculate the Central Feature of the elderly facilities: 1. Still under Measuring Geographic Distribution, double click on Central Feature. 2. Use “elderfacilities” as your Input Feature Class again and save the Output Feature Class to your

desktop using the name “elderly_facilities_central” 3. Click OK. 4. The point of the Central Feature will be added as a layer to your map. Double click on the

symbol for the central feature in the table of contents and change the size, color or shape to make it easier to see.

Q: Are the mean center and central feature located near one another? Why are they different?

A: The mean center is the exact point that is the shortest average distance from all other points. The central feature is the feature (in this case an elderly facility) that is the shortest average distance from all other features. While the mean center can be a new point that is not already on the map, the central feature shows the most centrally located feature (elderly facility) of all the features (elderly facilities) you are currently evaluating.

Q: What is the name of the elderly facility that is considered the central feature?

A: Vernon Hall

Page 6: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

You know that the elderly will not want to travel far to reach the senior center so you decide to calculate the standard distance to see the degree to which the elderly features are clustered or dispersed around the mean center.

1. Double click on Standard Distance under Measuring Geographic Distribution. 2. Select “elderfacilities” as your Input Feature Class and save the Output Feature Class to the

desktop using the name “elderly_facilities_sd” 3. Click Ok. 4. The standard distance circle has been added to your map, with the mean center as the center

of the circle.

5. In the table of contents, right click on “elderfacilities_StandardDist” and open the attribute

table. 6. Note the standard distance.

Q: What does the standard distance tell you?

A: The radius of the standard deviation circle is equal to one standard deviation and in a normal spatial distribution most features fall within one standard deviation of the mean center. Does it appear that most of the elderly features fall within the standard deviation circle? This means that the elderly from most facilities would have to travel one standard distance (in this case, about 7000 ft), at most, to reach a senior center that was constructed at the mean center.

Page 7: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Step 3: Finding the average nearest neighbor Determining if the elderly facilities are clustered or dispersed may also help you decide where a senior center should be located. To do this, run the Average Nearest Neighbor tool:

1. Open the Analyzing Patterns tool set from Arc Toolbox 2. Click on Average Nearest Neighbor. 3. Select “elderfacilities” as your Input Feature Class. Select Manhattan Distance as the Distance

Method. Q: Why was Manhattan Distance chosen?

A: Manhattan distance is the equivalent of traveling along city blocks (at right angles), rather than in straight lines. Because the elderly would have to travel on roads, rather than in straight lines to and from each point, Manhattan distance may provide more accurate information.

4. Check the box for Generate Report. 5. Click OK. 6. When the tool stops running, click the “Results” tab at the bottom of the toolbox.

Page 8: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

7. The results of the test are displayed in the toolbox window, however the html report provides a

graphical representation of your results. Click the html report link.

Q: How dispersed or clustered are the elderly facilities?

A: If the elderly facilities are significantly clustered, it may make sense to construct a senior center in each cluster or to construct a center in the middle of all the clusters, rather than constructing one that is located between all the individual points (elderly facilities). Since the facilities are somewhat dispersed, there are not defined clusters around which to construct a senior center.

Page 9: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Step 4: Adding and examining demographic data Now that you have looked at the geographic distribution and patterns of elderly facilities, you are ready to examine population statistics. You will examine a variable from the 2000 Census that indicates the number of households in each census tract that include at least one person over the age of 65. This variable has been normalized by dividing the number of households that include at least one person over the age of 65 by the total number of households in the census tract to provide the percentage of households over 65 in each tract.

1. Add this data layer to the map from the drive specified by the instructor.

2. If this layer is not already turned on, check the box next to “ pcthouseholdsover65”. Move this layer up or down and turn other layers on or off to make the map easy to read.

You will first examine the data generally to determine if households with people over 65 are clustered or dispersed in Cambridge, then you will run local statistical tests to see where clusters of elderly exist. Step 5: Calculating General Clustering You will first run the Getis-Ord General G statistic to see if areas with a high or low proportion of elderly households are clustered.

1. From the Analyzing Patterns menu, select High/Low Clustering. 2. For the input feature class, enter “Pct households over 65”. For the input field, select

“pctover652”. 3. For the Distance Method, select Manhattan. 4. Check the box to Generate Report. 5. Click Ok.

Page 10: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Arcmap warns you that it used the distance of 1069ft for the distance of the search threshold. By default Arcmap will use the minimum distance to insure all features have at least one neighbor. You can change this distance to make the analysis more appropriate for your project. The larger the value, the more neighbors will be considered during the calculation.

6. Once the test has finished running, click the results tab on the bottom of the toolbox window. 7. Double click the HTML report.

Q: What is the G score? What is the Z score? What does this tell you?

A: Because the G score is 0 and the Z score is not significant, the proportion of elderly households in each tract does not appear to be more clustered or more dispersed than a random pattern.

Page 11: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Next you will run a spatial autocorrelation test to see if the general pattern of features is clustered or dispersed (as opposed to clustering specifically of high or low values).

8. Select Spatial Autocorrelation from the Analyzing Patterns menu and input the same information as you did for the General G test.

9. Check the box for “generate report” 10. Once the test has finished running, click on the “report” tab at the bottom of the Toolbox

window. 11. Double click on the HTML report.

Q: What are the results? What does this tell you?

A: While the high and low values do not appear to be clustered as indicated by the previous test, when all feature locations and values are taken into consideration for the Spatial Autocorrelation test, the proportion of elderly households are clustered at a significant level of .05. This means that there is clustering of similar values, but the test does not tell us which values are clustered.

Page 12: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Step 6: Calculating Local Statistics While the General G and Spatial Autocorrelation tests told you something about the overall pattern of households with residents over 65 in Cambridge, you want to know more about which specific areas have an unusually high or low percentage of elderly households. To find this, you will use local statistical tests. Local Moran’s I:

1. From the Mapping Clusters tool set, choose Cluster and Outlier Analysis (Anselin Local Moran’s I).

2. Use the same Input Feature Class and Input Field as you did for the two previous tests. 3. For the Output Feature Class, click on the folder icon and save the file to your desktop, using a

name that makes sense to you. 4. Click Ok.

Page 13: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

A map showing the types of significant relationships is added to your map.

Q: Are there any clusters or outliers present?

A: You will notice one area of Cambridge with significant HH relationship. HH indicates that the relationship is High-High. This parcel is surrounded by parcels with high values and has values that are significantly higher than its neighbors. This indicates that although there are a high percentage of elderly households in this area in general, there are a significantly greater percentage of elderly houses in this particular parcel.

5. Right click on the layer name in the table of contents and select Open Attribute Table. Scroll through the table and notice the fields for Z-score and P-value. If you were interested, you could map these values as well.

Hot Spot Analysis (Getis-Ord Gi*): To learn more about where elderly households are clustered, you will calculate a local version of the G score that you calculated earlier.

1. Close the table if you haven’t already. 2. From the Mapping Clusters toolset, select Hot Spot Analysis (Getis-Ord Gi*). 3. Use the same inputs as you did for the Local Moran’s I test. Nname the output feature class

something unique so that you can easily identify this variable. Click Ok.

Page 14: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

A graphic display of the Z scores will be added as a layer to your map.

Q: What do you see?

A: You will notice a statistically significant “hot spot” (dark red) and a couple statistically significant “cold spots.” There are also some sections with moderately high or low Z scores that are not statistically significant.

To be a statistically significant hot spot, the parcel needs to be surrounded by parcels with high, positive values and have a much higher, positive value than its neighbors. The inverse is true for the cold spots. This is similar to the HH or LL relationships that the Local Moran’s I test detects. However, Local Moran’s I can also detect HL or LH relationships, whereas Hot Spot Analysis just looks for clusters of similar high or low values. You can now see which parcels contain high or low percentages of households with residents over 65, when compared to neighboring parcels.

Page 15: Introduction to Spatial Statistics Workshop Exercise · Introduction to Spatial Statistics Workshop Exercise This exercise will allow you to explore basic uses for the statistical

Conclusion

1. Turn on the elder facilities central feature, mean center and standard distance that you calculated earlier. Move them to the top of the table of contents so that they display above the layers of households with people over 65.

Q: Based on the mean and central feature, where would the residents of elderly facilities benefit from having a senior center?

A: Because the central feature and mean center are near one another, a senior center could be developed at either of these locations and still be central to most residents of elderly facilities. This information is a good start to determining where to construct the senior center. In order to proceed with the senior center plans, you would have to conduct more research, including determining where free space is available to house a senior center, whether one of the existing facilities has space available, how many residents are in each elderly facility, how many residents of the facilitates would use the senior center, etc.

2. Now, turn on and off the layers that represent the results from the local Moran I’s test and the

Getis-Ord Gi* test so you can see how they compare to each other and the mean and central feature of the elderly facilities.

Q: Based on the results of these local statistical tests, where might you construct a senior center? Does this differ from your previous answer?

A: The mean center and central feature of the elderly facilities seems to be located in an area with a lower concentration of households with residents 65 and older while the census tract to the left of this area has a high concentration of households with older residents.

Q: What else would you want to research before deciding where to construct the facility?

A: You may want to further research the specific number of people over 65 in each tract rather than just relying on the number of households, the number of residents interested in using the senior center and where they are located, whether the senior center would most benefit those in elderly facilities or those living in private homes, what public transportation is available to various areas of Cambridge, senior facilities that are available in neighboring areas, etc.

While there is still much research to be conducted, these statistics helped you learn more about the elderly population in Cambridge and provided you with information to build on as you continue your research.