Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State...

Adaptive Kernel Density in Demographic Analysis

Richard Lycan

Institute on Aging

Portland State University

Context

• We examined use of adaptive bandwidth density mapping to spatially generalize block group data from the American Community Survey and tabulate to new geographies.

• Used disability rates for the age 65 and older population to test the concepts.

• Disability rate data are published in the tables from the American Community Survey. The survey asks about several types of disability such as hearing or problems getting around in the residence.

• Methods were evaluated in two contexts:

- 1. A study of rural transportation needs in Oregon for

- Census urban area geographies (2015 conference proceedings)

- 2. Development of neighborhood statistics for Portland, Oregon for

- Neighborhood association districts (this paper)

• Earlier study of grid based methods in school enrollment forecasting (2014 Education Conference Proceedings)

Some characteristics of disability rates

• Partly to escape the problems of small sample size we use the data for any disability. Here are the individual level correlations between the six measures.

• Disability rates rise with age. We use rates for the population age 65 plus but the rates for the very old are high.

• Persons with higher educational levels respond less frequently to the disability questions and incur disabilities later in life.

• Disability rates are slightly higher for persons living in urban than rural settings.

Disability rates for block groups, tracts, and PUMAs

• Block groups, census tracts, and PUMAs show disability rates at varying levels of detail.

• At the block group level we see interesting detail, but know much of the variation is due to sampling error.

• Census tract data averages out some of the sampling error but also reduces the amount of geographical detail.

• Going all the way to PUMA geography, areas with at least 60,000 population, provides stable results, but little geographical detail.

Our goal

- We would like to have data for various geographies that are reliable.

- There is a tradeoff between risk from sampling error and geographical specificity.

- Here is a map of disability rates by neighborhood compiled by allocating block group data to neighborhoods using block centroid populations.

Can grid based methods improve on this standard method for allocating data to new geographies?

How density grids are used to calculate disability rates

• The map to the right shows a combination of:- A grid map of disability

rates

- A point map of block centroids with population age 65+

- Neighborhood boundaries

• Spatial Analyst extract values to points is used to add the rate to the point file.

• Disability rates for neighborhoods are summarized using ArcMap or Excel.

Quantifying the error

• The graphs to the right show coefficient of variation (CV) – the ratio of the MOE to the estimate.

• The CV declines from huge to acceptable as the size of the geography increases.

• We include an intermediate geography referred to as a “super tract” in between the tract and PUMA.

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

Block Group

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

Census Tract

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

"Super Tract"

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

PUMA

Coeffi

cien

t of V

aria

tion

Number of Geographies

Supertracts were constructed

to have about 3,000 persons

age 65+

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

Block Group

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

Census Tract

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

"Super Tract"

0 10 20 30 400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

2.2

2.3

2.4

PUMA

Coeffi

cien

t of V

aria

tion

Number of Geographies

Sample size by geography

• In the following grid based analyses we use data for geographies about the size of the supertract.

• The median value of the CV declines by about a factor of two as one progresses from small to large geographies.

• The median number of persons age 65+ rises more rapidly- 159 – block group

- 494 – census tract

- 2,923 – super tract

- 14,714 PUMA

• These values give us some rough guidelines about the population desired in our grid based analysis.

The type of grid mapping affects the results

• A fixed distance grid and an adaptive grid are used to calculate disability rates.

• With a fixed density grid all the points within a fixed radius from each grid cell are used to calculate density at that grid location. Here the radius is 4,400 feet.

• Note that the number of age 65+ varies considerably between the two red circles with a 4,400 foot radius and thus the sampling error would vary consid-erably across the map

The adaptive grid reaches out to get a number of 65+

• When only 500 persons age 65+ are required the required number can be found within a few thousand feet. This range roughly equates to the sample in a single census tract.

• As the number required increases to 1,000, 2,000, … the search needs to reach out over a longer radius.

• The 3,000 persons distance roughly equates to the sample in four or five census tracts.

Disability rates using adaptive bandwidth grid

• This animation shows the computation of disability rates for various adaptive bandwidths.

• A bandwidth of 500 means that the grid is based on a search radius sufficient to include 500 persons age 65+

• A bandwidth of 3000 persons is roughly equivalent to a supertract.

• This sequence of slides walks through the calculation of number of disabled by bandwidth.

• As bandwidth increases the range of computed values decreases.

• Note the outlier values in ALLOC for Sunderland and Portland Airport.

Comparing fixed and adaptive methods

• This scatter chart shows the correlation between the estimate of disabled age 65+ from the fixed and adaptive bandwidth models.

• The fixed bandwidth is for 4,400 feet and the adaptive is for 3,000 persons.

• The results of the two models in this case are quite similar. Both reduce the effects of the sampling errors.

• Applied to Census Urban area data the fixed bandwidth did not perform as well.

Software for adaptive kernel density analysis

• Developed my own program to better understand.- Two stages

- 1. Computes distance to grid for particular bandwidth, e.g. includes 1,000 persons

- 2. Uses distances to grid from “1” to calculate a ratio grid between numerator and denominator, e.g. disabled/all persons

- Features- Grid for numerator and denominator use same bandwidth, based on same area

- Weights values by reciprocal of distance to a power, used d2 in examples

- Cuts off search at some distance value, used 8 miles in example

- Written in low level code using True Basic

• Available software- Spatial Analyst does not include. Statistical Analyst includes adaptive tool, but not

applicable to this computation.

- CrimeStat includes an adaptive bandwidth grid model. It does not use common geography for numerator and denominator.

- SaTScan includes an adaptive bandwidth model similar to my program

- Various. Researchers in such fields as remote sensing, epidemiology, criminology, and ecology make use of the adaptive bandwidth grid model using various scientific computing software, such as StatA.

Conclusions

• There is high interest in publishing data from the ACS for local geographies, such as neighborhoods. Allocating block group data makes this possible, but with problems due to sampling error.

• The use of grid based models in the allocation process is one option for compiling ACS data for areas like neighborhoods.- Advantages

- Reduces wild swings due to sampling error.

- Recompilation to different geographies is easy.

- Disadvantages

- The computed values represent a general region, not a specific geography

- Error measures can only be stated in broad terms

- Works best where values vary in a smooth form

• Fixed and adaptive distance grids behave differently- Adaptive distance grid emphasizes sample size, hard to visualize what

area is included in calculations

- For the fixed distance grid it is difficult to decide what radius to use when density of the observed variable varies greatly.

Richard LycanInstitute on AgingCollege of Urban and Public AffairsPortland State University, Oregon

[email protected]

Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State...

Documents

Transcript of Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State...