GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at...

18
Update GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10) July 2013 Below are the general steps for using the GI* methodologies in analysis. Click on the links for step-by- step visual guides. 1) Creating a GI* map for a single dataset - Creating a grid to sit over the map extent - Adding data points and joining to the grid (to give number of points per grid square) - Carrying out GI* analysis (inbuilt function in ArcMap) - Thematically shading the GI* map at 90-99.9% significance levels 2) Creating a combined map for up to 4 GI* datasets - Adding further datasets and carrying out GI* process as above for each (also enabling consideration of potential areas of underreporting through multi-layering). - Joining all the datasets to original blank grid layer to create a single combined layer. - Exporting the combined table into the provided Excel template where weightings can be applied to grid cells and their average score calculated. - Importing the dataset of single grid cell values back into ArcMap and then thematically mapping. 3) Proximity Analysis of points of interest to a hotspot layer - Adding Points of Interest data to the above Excel template. - Joining Points table to the above imported grid cell data (single or combined dataset) to give all combinations of grid cell/point of interest - For each grid cell / point combination, assigning a score of 1-4 for both the grid significance level and the distance of the point of interest from it. - Multiplying the 2 scores and averaging the resultant scores for each Point of Interest across every grid cell, to gain a single score per Point - Importing into ArcMap the top ranked Points of Interest and mapping them along with the previously produced hotspot layer. GLA Intelligence 1

Transcript of GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at...

Page 1: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

Update

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

July 2013

Below are the general steps for using the GI* methodologies in analysis. Click on the links for step-by- step visual guides.

1) Creating a GI* map for a single dataset

- Creating a grid to sit over the map extent - Adding data points and joining to the grid (to give number of points per grid square) - Carrying out GI* analysis (inbuilt function in ArcMap) - Thematically shading the GI* map at 90-99.9% significance levels

2) Creating a combined map for up to 4 GI* datasets

- Adding further datasets and carrying out GI* process as above for each (also enabling consideration of potential areas of underreporting through multi-layering).

- Joining all the datasets to original blank grid layer to create a single combined layer. - Exporting the combined table into the provided Excel template where weightings can be

applied to grid cells and their average score calculated. - Importing the dataset of single grid cell values back into ArcMap and then thematically

mapping.

3) Proximity Analysis of points of interest to a hotspot layer

- Adding Points of Interest data to the above Excel template. - Joining Points table to the above imported grid cell data (single or combined dataset) to

give all combinations of grid cell/point of interest - For each grid cell / point combination, assigning a score of 1-4 for both the grid

significance level and the distance of the point of interest from it. - Multiplying the 2 scores and averaging the resultant scores for each Point of Interest

across every grid cell, to gain a single score per Point - Importing into ArcMap the top ranked Points of Interest and mapping them along with the

previously produced hotspot layer.

GLA Intelligence 1

Page 2: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Creating a GI* map from a single dataset

1) Create grid

• Open the area shapefile that your data refers to (e.g. a borough) • Create a grid that sits over your area boundary (a fishnet) and then

Save fishnet to a location of your choice as ‘Grid’

Select your boundary area from the dropdown list

Co-ordinates will be

automatically

populated

Number of Rows and

Columns should be

zero

Ensure ‘create label

points’ is ticked

and ‘polygon’

selected

For cell height and width, you need to calculate the shortest side

of the grid square you have created and divide it by 100. ie using

the co-ordinates above:

NB. Clearly grid cell sizes will be different depending on the boundary size you are using – here because of London’s size it has large grid squares, however a Borough may be 100-150m enabling more focussed targeting.

GLA Intelligence Unit 2

Page 3: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

2) Adding Eastings and Northings to each Grid square

(only necessary if looking to combine datasets at a later stage)

• Open the Attribute Table for the Grid layer (not the label layer) and using the drop down arrow at the top left (‘Table Options’), select Add Field. Type ‘Eastings’ into the Name box and select ‘Long Integer’ as Type. Click OK.

• Right-click on the header of the newly created column and select Calculate Geometry, then select X Co-ordinate of Centroid from the Property drop down list. Click OK.

GLA Intelligence Unit 3

Page 4: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

• If you repeat this process for the Northings / Y co-ordinates this will give you a list of all grid

squares in your grid with their centroid eastings and northings listed. You can now delete the label layer as this was where the co-ordinates were retrieved from.

3) Clipping the grid to fit your shapefile

• Select ‘Select by Location’ from the Selection option on the main toolbar and select everything from your Grid layer that has their grid square centroid within your source area boundary. When selected, save this as a new layer (e.g. Grid_boundary) and remove the original grid layer.

Select your own boundary layer

4) Add points and join to grid

• In the normal way, add the data points for your area (ensuring the data is geocoded). • Spatially join the points to the grid layer using the Sum option to aggregate the number of

points per grid square, and save wherever relevant as e.g. grid_boundary_data_join.

GLA Intelligence Unit 4

Page 5: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

5) Carry out GI* analysis

• Select Hot Spot Analysis from the Toolbox as below:

GLA Intelligence Unit 5

Page 6: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

• Select the ‘grid_boundary_data_join’ layer and the ‘Count’ column and enter somewhere to

save the resultant layer (e.g. grid_boundary_data_join_GI)’.

This is where you calculate your distance D (radius from centroid of grid

that reaches all neighbouring centroids) using Pythagoras’ theorem:

Square Root of (2 x (cell height)2 )

6) Thematically shade at significance levels

• The resultant map has calculated a z-score value for each grid square (see attribute table) which are then comparable with any other grid squares in your area. This allows you to show how statistically significant a 100m (approx.) area is compared to any other in the map.

GLA Intelligence Unit 6

Page 7: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

• The map obviously requires some amending to the thematic shadings. To calculate where the shadings should be split, the issue of ‘multiple testing’ needs to be first addressed. This involves the overlapping search radii of neighbouring grid squares, and can be corrected by performing what is called a Bonferroni correction (not yet part of ArcMap).

• To calculate the 90% significance threshold (where those grid cells shaded would indicate a

1 in 10 chance that the observation in that grid cell would have just occurred naturally) first calculate 0.1 / number of cells (in this case 0.1 / 7859 = 0.00001272 – look at the attribute table of the clipped boundary grid shape for the number of grid cells). Then using an online ‘percentile to z-score calculator’ (e.g. www.measuringusability.com/zcalcp.php - two-sided) enter the above value and note the answer as a positive value e.g. 4.3649.

• To calculate the 95% significance threshold (where those grid cells shaded would indicate a 1 in 20 chance that the observation in that grid cell would have just occurred naturally) first calculate 0.05 / number of cells, then carry out the same steps as above. In this example, the figure would be 4.5141.

• To calculate the 99% significance threshold (where those grid cells shaded would indicate a 1 in 100 chance that the observation in that grid cell would have just occurred naturally) first calculate 0.01 / number of cells, then carry out the same steps as above. In this example, the figure would be 4.8444.

• To calculate the 99.9% significance threshold (where those grid cells shaded would indicate a 1 in 1000 chance that the observation in that grid cell would have just occurred naturally) first calculate 0.001 / number of cells, then carry out the same steps as above. In this example, the figure would be 5.2788.

• Open the Layer Properties of the GI layer and choose Symbology.Quantities.Graduated Colours. Select 5 classes and a colour range (keep lowest range blank) and then enter your 4 numbers in ascending order in the area shown below:

GLA Intelligence Unit 7

Page 8: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

• Click OK and you have a finished thematic map of statistical significance.

In explaining the map, a reminder:

• Those 450m grid squares which are significant at 99.9% indicate that there is only a 1 in 1000 chance that the clustering seen in that cell compared to its surroundings (and compared to similar observations across the whole area) is likely to have occurred naturally (ie. something highly unusual has occurred in that grid square)

• The same refers to those cells significant at 99% (1 in 100 chance), 95% (1 in 20 chance) and 90% (1 in 10 chance).

• Any significance level below 90% is deemed not of sufficient significance to report on. • All cells have a specified value (unlike when using Kernal Density Estimation) which can be used to

compare any cells within the specified boundary.

Page 9: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Creating a combined map for up to 4 GI* datasets

1) Adding all datasets to map and combining • The initial stage of this process involves repeating stages 4 and 5 above in adding each individual

dataset (ensuring that you carried out stage 2 in the first place). To combine the datasets, you will only need the z-score values created (ie. no need for the thematic mapping in stage 6), however if you do wish to create the thematic maps (example below) then doing so won’t affect the combining process, as it doesn’t change the z-score values, and may allow you to potentially identify geospatial differences in your dataset hotspots (e.g. underreporting)

• The first stage in combining the datasets (3 in this

example) is to create a single table which has the 3 GI* values from the datasets assigned to each relevant grid square.

• To do this, a copy of the original ‘grid-boundary’ layer must be made. Then join one GI* layer at a time to the grid-boundary layer, as a table on the first ‘FID’ column of both tables. It is also worth noting the order in which you joined the datasets. 1)

• This will result in a single table which contains the details of the grid square, and the relevant GI* scores of each dataset (see below).

Page 10: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Grid Square details Dataset 1 z-scores Dataset 2 z-scores Dataset 3 z-scores etc

2) Combining multiple datasets to single dataset • To combine the scores for each grid square, the attribute table needs to be exported into Excel. The

easiest way to do this is to click on the down arrow in the top-left corner of the table and select ‘Export’ and then save the output table as a Text file.

• Open Excel and import the file (tab delimited and columns separated by commas). At the same time, open the accompanying Excel file ‘GI_combination_ArcMap’ and insert the number of grid squares and your 90% significance value on the ‘Start’ sheet (very important).

• Copy and paste the whole sheet from the imported text file into the sheet named ‘Import’ into the ‘GI_combination_ArcMap’ workbook.

• Now click the relevant tab at the bottom for the number of datasets you have, and copy and paste down to the number of rows corresponding to the number of grid cells you have (highlight and copy the 2nd row then drag the vertical slider down to the relevant row, and then paste whilst holding down shift).

• This sheet provides a single value for each grid square (final column) which multiplies an average of the GI zscores (third from last column) with a count of the number of z-scores for that grid cell that have dataset values over the 90% threshold value (penultimate columns). This ensures that a grid cell with only 1 high z-score out of 3 doesn’t outweigh a cell with 3 medium z-scores for example.

• The table also gives the ability to weight individual datasets should you wish. The columns highlighted in red refer to the source dataset z scores and are set at an unweighted default level of 1 (row 1). This figure can be changed for each of the datasets by changing this value of 1 to whatever you wish, which will scale up the values below respectively to then be included in the average calculations. See the table below for an example of this.

• Save the ‘GI_combination_MapInfo’ workbook.

Page 11: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Source data for each grid square pulled in from

Import sheet

Dataset weighting

area

Grid square average calculations

3) Mapping combined dataset • Having saved the workbook, Copy and Paste Special the completed Dataset sheet into a new Excel

workbook and save it where relevant. • Return to your ArcMap workspace and add the sheet from the newly saved Excel workbook (no need

to geocode). • Create a new copy of your clipped Grid-Boundary layer (and call it something appropriate to avoid

confusion) and join the Grid_Cell column of your imported combined data table to the FID column of this Grid-Boundary layer.

• Open Layer Properties (Symbology/Quantities/Graduated colours) and select your Field value as Overall_Cell_Score and choose a Classification (Natural Breaks used here).

• Choose a colour scheme of 5 classes ensuring the lowest class is transparent.

Page 12: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

• NB. The significance boundaries are redundant here as you are dealing with averaged (and possibly weighted) figures rather than the original z-scores – which should be remembered when making any conclusions.

• This gives an overall combined (weighted if needed) multi-dataset map based on the original GI* datasets.

In explaining the map – those 450m grid squares which are shaded heaviest indicate the strongest level of hotspot, based on the combination of (in this case) unweighted statistical significance scores from 3 datasets. Hotspot intensity decreases through 4 lower hotspot intensities (lowest left blank) using a ‘natural breaks’ classification.

Page 13: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Proximity Analysis of points of interest to a hotspot layer

1) Importing Points of Interest data • Returning to the ‘GI_combination_ArcMap’ workbook that was used above and still contains your GI

data, navigate to the ‘Points_of_Interest’ tab and insert your points data, ensuring that it is formatted as below:

• Select your data (A1:C24 in this case) and name the range as ‘Points’ (write this in the box at the

top left and press Enter)

Page 14: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

• Do the same for your imported single GI dataset /combined GI dataset from earlier (e.g. A1:L7543) and name it ‘Combined’.

2) Joining Points of Interest and Combined GI data

• Next open Microsoft Query* in order to join the two datasets. This is to join every point of interest

to every grid square (ie. 20 points of interest x 7859 grid squares = 157,180 combinations in this example). *From the Data tab, in the Get External Data group click From Other Sources, and then click From Microsoft Query.

• From the Choose Data Source box, select Excel Files from the Databases tab and click OK. • Next locate the ‘GI_combination_ArcMap’ workbook in the relevant location and click OK. • Once the data connection has been established the 2 named ranges that were created will appear.

First expand (+) the Combined range and transfer the column headings ‘Easting’, ‘Northing’ and the relevant score column depending on what dataset you wish to measure distance against (ie. ‘Dataset_1_zscore’, ‘Dataset_2_zscore, ‘Overall Cell Score’ etc) across to the right pane, then expand the Points range and transfer all 3 column headings across.

• Click Next, OK the following box and then click File and ‘Return data to Excel’ in the resultant window.

• A prompt will then ask where you want to return the combined data to – select cell A1 in the ‘Grid- Point Join’ tab. NB. This will increase the size of your workbook hugely but you don’t need to save the data here, just what it gives you in future sheets.

3) Calculating Significance and Distance Scores

• Select the ‘Grid-Point Summary’ sheet which references (in orange) the relevant data from the

Microsoft Query Import. Again, the rows will need to be copied down to the extent of the ‘Grid-

Page 15: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Point Join’ sheet [Beware: this may be a lot of rows depending on your grid size and points of interest list].

• The purple columns are inbuilt calculations which 1) calculate the distance between each grid cell centroid-point of interest combination (using Pythagoras’ theorem), and 2) assign scores of between 0 and 4 to both the selected significance score and calculated distance between points. The levels for which these scores are assigned are user-defined and taken from the 2 areas on the ‘Start’ sheet.

- For the significance scores: if using the original Dataset zscores these thresholds can be the 90-99.9% thresholds, or if using the Combined GI scores the Natural Breaks/Equal Interval ranges seen above are required.

- For the Distance scores: these are user-defined, and set as a default that any cells within 1000m of a Point of interest gets an increasing score. These should be changed on the Start tab to suit the users needs.

• The sheet informs the ‘Grid-Point Pivot’ sheet which will need refreshing (right-click / refresh when

in the pivot table). • This in turn informs the ‘Results’ sheet which gives the top 25 points of interest with regard to their

scores for proximity to cells with high levels of statistical significance of clustering. If there are less than 25 points of interest with scores above 0 then only these points of interest are shown. Should you wish to see a greater number than 25, the formulae in the cells on this sheet can just be dragged down to the required level, however bear in mind very small scores may actually be irrelevant.

Page 16: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

4) Mapping top Points of Interest • The data on the results sheet should be saved to a new workbook (paste special) and then imported

into ArcMap in the normal way as a points dataset (using the eastings and northings within it) and saved as a shapefile.

• How this is displayed is then a user preference. One option is to show the GI* map of statistical significance, with all Points of Interest overlaid, overlaid again with the top Points of Interest, such as in the map below.

Page 17: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

Commentary can then be made from the raw data as to why Point O is ranked #1 e.g.

Having compared the distances of each point to the centrepoints of all 7859 450m2 cells and their varying significance values, Point O is the closest on average to statistically significant cells than any other point.

In total, Point O is within 1000m of 12 cells with significance levels greater that 95%. This means that when taken into consideration alongside all other points, the effect of the large proportion of cells with no significance score (<95%) is lessened, and a higher average score gained.

Point O is within 500m from four cells with significance levels above 95%, between 500-1000m from three cells with significance levels above 99.9% and between 500-1000m from two cells with significance levels of 95% and 99%.

These cells are 4062, 4063, 4171, 4172, 4173, 4174, 4285, 4286, 4287, 4397, 4398 and 4510 and can be seen on the below map:

Page 18: GI* and related step-by-step methodologies · PDF file- Thematically shading the GI* map at 90-99.9 ... consideration of potential areas of underreporting ... Select your own boundary

GI* and related step-by-step methodologies (For ArcMap and Excel 2007/10)

From a significance level obviously the above only applies if you are calculating proximity to a single GI* dataset. If you are calculating proximity to a combined GI* dataset, as previously mentioned, significance levels are redundant due to the combining and weighting process. In this case, cells with the highest scores could be described as having the ‘highest level of hotspot based on the combination of statistically significance scores’.

For more information please contact Richard Fairchild, GLA Intelligence Greater London Authority, City Hall, The Queen’s Walk, More London, London SE1 2AA Tel: 020 7983 4723 e-mail: [email protected]

Copyright © Greater London Authority, 2013