Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes ....

19
Fill Gaps in Your Data with Areal Interpolation By Witold Frączek and Heather Smith learn.arcgis.com/ 380 New York Street Redlands, California 92373 8100 USA Copyright © 2019 Esri All rights reserved. Printed in the United States of America. The information contained in this document is the exclusive property of Esri. This work is protected under United States copyright law and other international copyright treaties and conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, except as expressly permitted in writing by Esri. All requests should be sent to Attention: Contracts and Legal Services Manager, Esri, 380 New York Street, Redlands, CA 92373-8100 USA. ___ Learn ArcGIS ___ Guided lessons based on real-world problems

Transcript of Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes ....

Page 1: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

By Witold Frączek and Heather Smith learn.arcgis.com/

380 New York Street

Redlands, California 92373 – 8100 USA Copyright © 2019 Esri

All rights reserved.

Printed in the United States of America.

The information contained in this document is the exclusive property of Esri. This work is protected under United States copyright law and other international copyright treaties and conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, except as expressly permitted in writing by Esri. All requests should be sent to Attention: Contracts and Legal Services Manager, Esri, 380 New York Street, Redlands, CA 92373-8100 USA.

___ Learn ArcGIS ___ Guided lessons based on real-world problems

Page 2: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

Time: 30 minutes

Overview

Data is never perfect. Often it is not even complete. Your data may have gaps due to inconsistent data

collecting methods or technical problems with sensors. Typically, features with missing data are not

shown on the final map or are represented with a gray color. But sometimes you may want to fill in these

gaps, either to conduct further analysis or to improve your map's appearance.

In ArcGIS, there are two common methods for filling gaps in spatial data. Neither can re-create the true

values for your missing data, but they offer more reliable results than simple guesswork. The first method is

the Fill Missing Values tool. It estimates values based on neighboring features, offering different methods

for selecting and sampling from those neighbors.

In this lesson, you'll use the second method—geostatistics—to map the spatial variation of data across the

entire study area. Predicted values are then pulled from the resulting map to fill the gaps in your data.

Geostatistics allows you to predict values in locations that have not been measured, using data from

measurements elsewhere in your study area. Typically geostatistics uses point data, but it can also

interpolate from data stored in polygons, as you will use in this lesson.

You can read more about interpolation at What is geostatistics?

Note: The Geostatistical Analyst license is required to complete this lesson.

In this lesson, you'll learn to do the following:

• Interpolate the percentage of seniors across Poland from an incomplete polygon layer.

• Convert the interpolated surface into a new polygon layer.

• Combine the known values with the predicted values to make a complete map.

Interpolate the percentage of seniors across Poland

If you know the values for most of the features in your dataset, you can use them to predict continuous

values across the entire area. You'll do this to map the spatial distribution of seniors in Poland.

1. Download the FillGaps project package.

Page 3: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

2. Locate the downloaded file on your computer. Double-click FillGaps.ppkx to open it in ArcGIS Pro.

Note: If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.

This map depicts powiaty, administrative units similar to counties, in Poland. The polygons are colored

to represent the percentage of the population aged 65 years or older. Unfortunately, the data is

incomplete. Ten powiaty contain no value for the percentage of seniors.

This spatial data can be found on the Living Atlas. The values for the percentage of seniors were

provided by Statistics Poland. (The missing values were artificially removed for the purpose of this

lesson).

Demographic data is often difficult to model with geostatistics because urban areas show dramatically

different patterns than rural ones. In this case, the spatial variation in this data is relatively smooth,

without dramatically distinct breaks. This means that the data might be appropriate for geostatistics.

3. On the ribbon, on the Analysis tab, in the Tools group, click Geostatistical Wizard.

Page 4: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

4. In the Geostatistical Wizard window, under Geostatistical methods, choose Areal Interpolation.

Most interpolation methods require point data as the input, but areal interpolation uses polygons. In

this lesson, you are using polygons that are nearly complete and fit together like puzzle pieces. You can

also use polygons that are widely spaced or overlapping. For example, you may have data representing

observations of birds, which is stored in polygons for the ground covered by each observer.

You can read more about this geostatistical method at What is areal interpolation?

Areal interpolation will process values differently if you declare them as representing averages, rates,

or events. You are mapping the percentage of a population over a certain age, which is a rate.

5. Under Input Dataset 1, for Type, choose Rate.

6. For Source Dataset, choose Powiaty_Seniors. For Count Field, choose 2017 Senior Population, and for

Population Field, choose 2017 Total Population.

7. Click Next.

The next window shows a covariance chart. The blue crosses represent your data without any

modeling. The blue line represents the model that will be used to predict the percentage of seniors

over the entire area. You want to edit parameters of the model until the model line follows the path of

the crosses and 90 percent of the crosses fall within the red confidence intervals. Currently, that is not

the case.

Page 5: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

Not only does the line not follow the crosses closely, but there are two crosses that lie far away from

the path. In many situations you won't be able to accomplish an ideal model, but you can try to get as

close as possible. A good place to start is by making the lag size smaller. Doing so will reduce the area

that is searched when sampling to generate the blue crosses.

8. Change Lag Size to 12,000.

The model changes. However, the crosses are now even farther from the confidence intervals.

Page 6: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

Next, you'll try to improve the model by changing its shape.

9. Change Model to Stable.

Note: Stable and K-Bessel models often give the best result, but also take more time to process.

Page 7: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

Achieving a perfect model can be difficult or even impossible, especially if you are working with

demographic data instead of a natural phenomenon. In this scenario, even though only one of the

crosses falls within the confidence intervals, the model line follows the crosses relatively closely. This

model isn't perfect, but it is a suitable compromise.

10. Click Next.

The next window contains a preview map.

Page 8: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

11. Click different parts of this preview map.

The map highlights neighboring polygons that will be used to determine the predicted value for the

location you clicked. Polygons colored red will be weighted heavier in the analysis than those colored

green.

12. Click Next.

The Cross validation page opens. Cross-validation assesses the accuracy of a prediction surface. It does

so by removing a single polygon from the dataset and using the remaining data to predict a value within

the removed polygon.

Page 9: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

The Predicted scatterplot for this model does not look good. Ideally, the red values should follow the

trend of the blue and gray lines. Your chart looks more like a random cloud of points. On the other

hand, the values listed on the Summary tab look good. These numbers should all be close to zero

except for Root-Mean-Square Standardized, which should be close to 1. The Root-Mean-Square value

of 0.02 means that the predicted proportion of senior citizens will be off by 2 percent on average from

the real value. This is a reasonable margin of error. These values are more indicative of the quality of

your model than the scatterplot.

13. Click Finish. In the Method Report window, click OK.

An interpolated layer is added to the map.

14. In the Contents pane, turn off Powiaty_Seniors and turn on Powiaty_Seniors outlines.

Page 10: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

The areas with heavy black outlines are the ones with missing data.

Create polygons from the interpolation

The interpolation you created is continuous and ignores the polygon outlines. Geostatistics has

smoothed the demographic data to create a gradual surface. While it may not match known data

precisely, smooth interpolations like this are often better at predicting unknown values.

Next, you'll convert the continuous interpolation surface into polygons.

1. On the ribbon, on the Map tab, in the Navigate group, click Bookmarks and choose Kluczborski.

Page 11: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

The map navigates to Kluczborski powiat.

The Areal Interpolation layer is a geostatistical layer, which means that every location on the map has a

slightly different value. Some of the polygons that you need to fill, such as this one, have a wide range

of predicted values. You'll convert this predicted surface into a polygon layer with a single predicted

value for each powiat.

2. On the ribbon, on the Analysis tab, in the Geoprocessing group, click Tools.

The Geoprocessing pane appears.

3. Search for and open Areal Interpolation Layer To Polygons.

Page 12: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

4. For Input areal interpolation geostatistical layer, choose Areal Interpolation.

5. For Input polygon features, choose Powiaty_Seniors.

6. For Output polygon feature class, change the output name to Interpolated_Polygons. (Make

sure to include an underscore.)

7. Click Run.

A polygon layer is added to the map.

8. On the ribbon, on the Map tab, in the Navigate group, click the Full Extent button to return to the

default view of the map.

9. In the Contents pane, drag Interpolated_Polygons under Powiaty_Seniors outlines and turn off Areal

Interpolation.

Page 13: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

You now have a value for percentage of seniors in every polygon. But you have the real values for most

of those polygons. You only want to use the predicted values for 10 of them.

10. Right-click Interpolated_Polygons and choose Attribute Table.

The attribute table appears. It contains all of the data from the Powiaty_Seniors layer and it also has

three new fields: Included, Predicted, and Standard Error.

11. Double-click the header for the Percent Seniors column to sort it.

Page 14: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

Now, all of the empty records are at the top of the table. Next, you'll replace these <Null> values with

the data from the Predicted field.

12. Select all of the rows with missing senior data.

Tip: Click the area to the far left of a row to select it. To select multiple rows, press the Shift key or drag

the cursor. You can also use the Select by Attribute tool.

13. At the top of the attribute table, click the Calculate button.

The Calculate Field tool opens in the Geoprocessing pane. The field calculation will only be applied to

the selected rows.

14. For Field Name, choose Percent Seniors.

Page 15: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

15. In the Fields list, scroll down and double-click Predicted.

The PercentSeniors = box populates with !Predicted! This will take the values from the Predicted field

and paste them into the Percent Seniors field. But the existing values in these two fields are formatted

as decimal values, not percent values. To convert them, you'll multiply values by 100.

16. After !Predicted!, type * 100.

17. Click Run.

18. In the attribute table, double-click the 2017 Senior Population header to sort the table again.

The <Null> values in the Percent Seniors column have been replaced. The unselected rows remain

unchanged.

Page 16: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

19. At the top of the attribute table, click Clear to clear the selection.

20. Close the attribute table.

Symbolize the map

Finally, you'll symbolize the new layer to match the original one. Instead of setting the symbology

parameters one by one, you'll import them from the Powiaty_Seniors layer.

1. In the Contents pane, turn off Powiaty_Seniors outlines and click Interpolated_Polygons to select it.

2. On the ribbon, on the Appearance tab, in the Drawing group, click Import.

The Geoprocessing pane updates to show the Apply Symbology From Layer tool.

Page 17: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

3. For Symbology Layer, choose Powiaty_Seniors.

4. Click Run.

The symbology of Areal_Interpolation_Polygons now matches that of Powiaty_Seniors, your initial

layer, but there are no longer any holes in the data.

Page 18: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

5. On the Quick Access Toolbar, click the Save button.

Summary

The process of substituting values to replace missing data is called imputation. Often, values are imputed

using the average of the remaining dataset. When your data is spatial, you have better options available

to you, because you can assume that things that are closer together are more similar than things that are

farther apart. In this lesson, you used areal interpolation to create a continuous surface across Poland to

Page 19: Learn ArcGIS - Esri · Fill Gaps in Your Data with Areal Interpolation . Time: 30 minutes . Overview . Data is never perfect. Often it is not even complete. Your data may have gaps

Fill Gaps in Your Data with Areal Interpolation

model the percentage of the population that is over 65 years of age. You then sampled from that surface

to predict values for the polygons that were missing data.

Don't forget to tell your map readers that some of the values were imputed. This can be done with labels, a

list, or symbology. If your map is included in a report, you can describe the method of imputation.

The Fill Missing Values tool can accomplish the same task. For some datasets, this tool will give better

results. For others, geostatistics will be better. It is difficult to know until you have tried both, but if the

spatial transition between values is not smooth, Fill Missing Values is recommended.

For an extra challenge, find the Fill Missing Values tool in the Geoprocessing pane and use it to impute

the missing values in the Powiaty_Seniors layer.

Hint: You can read about the Fill Missing Values tool in its help topic and in this ArcUser article.

Were you able to get better results with this method or with areal interpolation? To find out, you can

compare your results to the real values in the Powiaty_full_dataset layer. In the Catalog pane, on the

Maps tab, double-click Full Dataset to open the map containing this layer.