A Novel Methodology for Georeferenced Address...
Transcript of A Novel Methodology for Georeferenced Address...
Georeferenced address listing - 1
1
A Novel Methodology for Georeferenced Address Listing
Abstract: This paper presents a novel methodology for address listing. This methodology builds
on publically available georeferenced databases and geospatial methodologies that include
spatial sampling, identification of site domain, reverse geocoding, web-mining and the use of US
Postal Services database (for address validation). To demonstrate the application of this
methodology, four methods of spatial sampling were employed to draw four sets of sample sites
in the Chicago Metropolitan Statistical Area (MSA). Using the local spatial autocorrelation, the
domain (or area) of each sample site was determined. A random list of residential addresses was
developed with the aid of reverse geocoding and web-mining and publically available datasets
within the sampling domain of each site. The sampled residential addresses were compared with
the US Postal Service’s address database. Our analysis suggests that 98.7% of the addresses were
valid residential addresses. These results are important and have important implications for
survey research for developing location based list of addresses in a cost-effective manner. Since
the addresses are geo-referenced, we can develop multi-level socio-physical contexts around
respondents at no extra burden on their effort and time.
Keywords: reverse geocoding, spatial sampling, geo-information revolution.
Georeferenced address listing - 2
2
A Novel Methodology for Georeferenced Address Listing
INTRODUCTION: Recent advances in geo-spatial technologies allow georeferencing a vast
amount of (non-geographic) data/information we collect, ranging from a bank transaction to
making a phone call. For example, we can locate an airplane or predict weather at a given
location in real time, identify the location where from a phone call originates, and trace a
person’s interaction across geographic space and time with the smart cell phones. Most of these
data are stored and maintained digitally, and some of these data are available publically. With the
availability of these georeferenced data, we have entered into a new era of geo-information
revolution. This, in turn, has a phenomenal impact on our day-to-day lives, ranging from finding
our neighbors’ names and phone numbers to planning a trip. This is also changing the way we
collect survey data and conduct social science research. Utilizing these publically available
datasets, namely Google Earth, White Pages and US Postal Services (USPS) database, this article
demonstrates how these datasets can be utilized for address listing at or around a given location.
Once the locations are identified (with the aid spatial sampling) the proposed methodology can
be used to develop list of addresses at or around the identified locations.
A major challenge of conventional sampling and survey methods is that a finite, complete and
up-to-date sampling frame of respondents is rarely available. But geospatial methodologies do
allow us to account for every inch of geographic areas with its socio-physical attributes. This
means spatial sampling can be more effective to sample areas than population. Spatial sampling
has been used extensively for sampling natural resources. But its application to social surveys
has been limited, because of the unavailability of a complete sampling frame of human
population with the locational (or georeferenced) information. Traditionally, a pseudo-sampling
Georeferenced address listing - 3
3
frame of geographic areas is created to implement spatial sampling, because a finite and
complete sampling frame of natural resources is difficult to construct, especially for large
geographic areas. Such a sampling frame is constructed by overlaying a geometric grid over the
area of interest, referred to as the sampling domain. Once the sampling frame is constructed and
characterized, any classical sampling methods (such as random systematic and random stratified)
can be employed to draw a sample of locations or areas. The selection of locations or areas,
however, does not necessarily represent respondents or residential units or households.
This paper demonstrates the application of reverse geocoding coupled with web-mining and
USPS database to construct and validate a list of residential addresses at or around a given
location. Spatial sampling (and its integration with the reverse geocoding) offers several
advantages over non-spatial social survey methodology. First, it ensures better spatial coverage
and representation of population distribution across geographic space as it takes into account
spatial distribution population a fine geographic scale, such as 90m spatial resolution. Human
population is not distributed homogenously, and can range significantly from rural to urban areas
and within urban areas. Generally, population data are aggregated, analyzed, and available at
higher-order geographic units (such as census blocks in the US) due to confidentiality issues, and
their shapes and sizes can vary greatly. Integration of satellite remote sensing with the existing
secondary datasets, such as the US Census, allows us to develop population estimates at a very
fine spatial resolution; LandScan population concentration data at 90m spatial resolution for the
continental US, and 400m and 1km spatial resolution worldwide, are some examples of such
datasets (ORNL, 2008). The availability of these data formulates bases to draw a spatially
representative sample of human population.
Georeferenced address listing - 4
4
Second, spatial sampling can capture spatial heterogeneity in the distribution of socio-economic
and demographic characteristics, which is critically important for social surveys. The distribution
of human population and its socio-economic characteristics witness significant spatial disparity
and segregation (Osborne & Rose, 1999). Spatial analytical methods, such as local spatial
autocorrelation and semivariance, can be used to quantify and characterize geographic space by
socio-economic and demographic attributes, and construct homogenous strata in terms of socio-
economic characteristics. This, in turn, can ensure socio-economic representation of population
across geographic space. Third, spatial sampling is likely to ensure better population
representation as compared to traditional methods used for social surveys. A finite and complete
sampling frame of population is needed to draw a representative sample, but an up-to-date and
complete sampling frame of population is rarely available. For spatial sampling, however, we
rely on the sampling frame of areas, which can be updated frequently and constructed at any
spatial resolution. Once the sampled areas and/or locations are identified, the reverse geocoding,
presented in this paper, can generate a list of addresses within the selected areas and/or at or
around the sample sites. Fourth, reverse geocoding adds locational reference (or georeference) to
the selected sample. This, in turn, is important to attach multi-level socio-physical environmental
contexts to the selected sample. With the increasing importance of place and space, these
contextual data are becoming important to study social, behavioral, and health outcomes,
because peoples’ long-term exposure to their immediate place-specific socio-physical and
chemical environment is likely to influence their attitudes, behavior, social and economic
outcomes, and health (Caughy & O'Campo, 2006; Chen, Gong, & Paaswell, 2008; Frumkin,
2003). Integration of survey data with the multi-level socio-physical contextual data from
Georeferenced address listing - 5
5
multiple sources is likely to augment the scope of the survey data and advance interdisciplinary,
multi-level social science research.
The main objectives of this paper are: (a) to develop, test and validate the reverse geocoding to
identify addresses at or around given locations, (b) to extract a list of residential addresses with
the aid of publically available datasets, and (c) to validate the residential list using the USPS
database. The remainder of this article is organized into three sections. The first section describes
the study area, data used, and methods employed. The second section presents the
implementation of reverse geocoding with the spatial sampling design, and the final section
discusses the findings of this research along with their implications and limitations.
METHODS AND MATERIALS
Study Area: The pilot survey was conducted in the urban Chicago, MSA. Given that there is a
significant gradient in socio-physical and chemical environment, the study area is ideally suited
for this experiment, as it will demonstrate how effectively the proposed method represents
enumeration list in this area with very diverse population in terms of socio-economic
characteristics and population concentration.
Data: The data for this research come from a variety of sources, namely Google Earth, White
Pages, the US Census, and Oak Ridge National Laboratories. The latter two datasets were
utilized to construct a pseudo-sampling frame of human population and characterize it with the
socio-economic characteristics. The Google Earth dataset was utilized for reverse geocoding to
Georeferenced address listing - 6
6
develop a valid list of addresses at/around the selected sites. The White Pages database was
utilized to extract a list of residential units from the valid addresses extracted from Google Earth.
Method: Four different methods of sampling, namely clustered random (adopted for GSS) (R.
Harter, Eckman, English, & O’Muircheartaigh, 2010), optimized clustered random, spatially
random, and optimized spatial, were employed to draw four different sets of sample sites. The
first two methods were implemented in two stages: first, census tracts were drawn, and then
random points were simulated within the selected census tracts. The first set of census tracts was
selected using the GSS design, in which the 1692 urban census tracts were implicitly stratified by
income and race; 18 census tracts were then systematically sampled with probability proportional
to their 2000 population size. The second set of census tracts was selected using an optimization
method to better represent socio-economic characteristics of the target population. In the first
stage, we optimized the selection of census tracts. Specifically, we utilized the census data on
population, income, education and race/ethnicity to construct a composite index, and then
selected 18 census tracts that maximized the semi-variance in the composite index and controlled
for spatial autocorrelation in the selected Census Tracts. Once the Census Tracts were selected,
250 random points were simulated within the selected Tracts inversely proportionate to their
population size.
For the spatial random sampling, the study domain (Urban Chicago MSA) was partitioned into
400m x 400m pixels, and characterized by income, education and race/ethnicity using the 2000
US Census Data. Then a composite index was constructed for each pixel. These pixels served as
a pseudo sampling frame. Even though the classical sampling method was feasible, we sampled
Georeferenced address listing - 7
7
the pixels randomly with the control for spatial auto-correlation. Specifically, two pixels within
the extent of local spatial correlation were not selected to avoid the selection of redundant
sample sites. In the optimized design, however, the selected set of pixel captured the maximum
semivariance in the composite index and minimized spatial autocorrelation (Fig 1) (N. Kumar,
2007, 2009).
IMPLEMENTATION: Unlike traditional social surveys, for the spatial social survey we
sample locations/areas first and then we sample respondents at/around the sampled
locations/areas (Fig 2). The spatial sampling integrated with reverse geocoding and web-mining
is implemented in four steps as illustrated in Fig 3. These steps include: selection of sample sites
using a spatial sampling method, developing a list of valid addresses at/around the sample sites,
extraction of residential units at the selected addresses and sampling residential units, and
validation of sampled residential addresses.
Step 1: Selection of sample sites using a spatial sampling design: Since population representation
is an important requirement for social surveys, a sample of locations/areas which represent
population distribution is selected in the first step. A spatially organized sampling frame of
human population, however, is rarely available. We utilize high resolution LandScan(ORNL,
2008) data to construct a pseudo-sampling frame of population (organized at 400 x 400m spatial
resolution). Since SES data were not available at the spatial resolution of the LandScan
population data, the coarse resolution (at the tract level) the 2000 US Census data were used to
impute SES at the spatial resolution of LandScan population data. Values of several adjacent
Georeferenced address listing - 8
8
census tracts were used to impute value for a given pixel even though a pixel may be completely
within a census tract. This approach gives higher weight to nearer census tracts and also takes
into account the relative location of pixel with respect to locations of other (adjacent) census
tracts. This resulted in a pseudo-sampling frame of population with socio-economic
characteristics. The four methods of spatial sampling, described above, were employed to draw
four different sets of sample sites using this sampling frame (Fig 1).
Step 2: Sampling domain for address listing: Since selecting a site (using a spatial sampling
method) may not represent a respondent (or household or address), it is important that the
domain or area within which to extract addresses is defined. There are various ways to demarcate
the site domain, including an address closest to the site, pixel within which the site is located or
jurisdiction or census unit within which the site is located. If only one respondent is to be
selected around each site and non-response is not a concern, the first option may be appropriate.
Since the shape and size of administrative/census unit can vary, local spatial autocorrelation can
be employed to determine the domain or area within which to select addresses. The local spatial
autocorrelation determines the geographic extents within which similarity in the value (of the
chosen attribute) exists (N. Kumar, 2007, 2009; Naresh Kumar, Liang, Linderman, & Chen,
2010). Because the extent of local spatial autocorrelation can vary geographically, the domain
(or area) a site represents can vary as well (Rachel Harter, Eckman, English, &
O'Muircheartaigh, 2010). For our survey in Chicago MSA, we computed site-specific extent
using the local spatial autocorrelation.
Georeferenced address listing - 9
9
Step 3: Developing a list of addresses at/around the selected sample sites: Utilizing Google
Earth, we construct a valid list of addresses within the sampling domain of each site. The process
of locating/situating an address onto the earth’s surface is called geo-coding (Rushton et al.,
2006). For spatial social surveys, however, we have the locations and need to extract addresses
at/around these locations. This process is called reverse-geocoding (Curtis, Mills, & Leitner,
2006). This involves two important decisions: what geographic extent (or spatial domain) to use
to search for addresses around a sample site and how many addresses to extract.
If the goal is to sample residential addresses, address listing using reverse geocoding may be not
sufficient. Because in commercial areas all addresses could be commercial. Keeping this
constraint in mind, we integrated reverse geocoding with web mining. This verifies whether
valid residential units are present within the domain of a site. If the required number of
residential units is not present within the domain, it can be expanded and iteratively a valid list of
residential units can prepared.
Ideally, all addresses present within the domain of a sample site must be included in the list from
which to draw respondents. This process, however, can be computationally intensive, especially
in a densely populated area. Depending on the number residential units to be drawn around each
sample site and computation time, the number of residential addresses to include in the list must
be chosen with care. This number can be set to 10-20 times the number of respondents to be
drawn from the list. In the proposed study, we set this number to (randomly selected) 20 valid
residential addresses within the domain of each site, and one of these 20 was chosen randomly.
Georeferenced address listing - 10
10
We utilize the Google Application Programming Interface (API) for reverse geocoding
addresses. This application allows us to pass on a location (with longitude and latitude), and
extracts an address at/around the location. For example, passing on a location with longitude ~
-91.53615 and latitude ~ 41.661335 returns 35 E. Jefferson St., Iowa City, IA 52245 (Fig 4). In
addition, this application can return ten different levels of information at/around a location,
including Street Name, City, County, State, and Country. Since the location can be a business or
an institute or a residential building, such as a house, an apartment complex, or a condominium,
it is important to determine whether valid addresses are present within the site domain. The
White Pages database allows us to determine whether an address has residential units.
Since there can be multiple valid addresses within the geographic extent/domain of a sample site,
it is important that these addresses are chosen randomly unless all addresses are extracted from
the extent of a sample site. An ideal way to select addresses randomly is to simulate random
locations within the geographic extent of a sample site and select addresses at/around the
simulated locations. In this study, we simulate random points within the domain a sample site,
find an address at/around the simulated points, validate these addresses, and then pass them on to
White Pages to extract valid residential units at these addresses. Since we simulate point
locations within the extent of selected site randomly and then select the residential addresses
randomly for the survey, each residential address has equal likelihood of selection without
replacement. Our implementation (through the built-in constraint) also ensured that each address
within the extent of each site has only one chance of selection. Iteratively, we develop a list of
twenty residential addresses around each site.
Georeferenced address listing - 11
11
Step 4: Selecting the required sample from the address list: The proposed method can produce a
list of addresses (Ni) at or within the domain of a site. Once the list is prepared any conventional
method of sampling can be employed to select the desired number of residential units (ni) around
each site. Nonetheless, the decision about ni can be influenced by a number of factors. For our
survey, we inflated ni to account for the anticipated non-response and in-eligibility, due to non-
residential addresses and household that did not speak a supported language (Rachel Harter, et
al., 2010). Assuming a 10% response rate and 70% eligibility rate, we should sample ni = 1/(0.70
× 0.10) = 15 households around each site
For parameter estimation and deriving robust inferences from the survey data, it is important that
probability of sampling an address is known. Let k=1,…,Ni index the residential addresses with
geo-location (xk,yk) within the domain (or area a site represents). Let j=1,…,Li index LandScan
pixels that ith sample site represents (this also includes the pixel in which the sample site is
located). Suppose MOSj is the measure of size of pixel j, which contains the number of addresses
within pixel j. The probability an address will be selected from jth pixel can be expressed as
| = ∑ for j=1,…,Li
Since the number of addresses present within each pixel (Nji) can vary, the overall selection
probability of an address selection within jth pixel can be adjusted by number of valid addresses
present in a pixel, as
∑ ×
Georeferenced address listing - 12
12
Step 4: Validation of the selected sample using US Postal Service Database: To demonstrate the
validity of reverse geocoding, we created a list of twenty addresses around each of 951 samples
sites (chosen using the four different methods of spatial sampling described above), and one
residential unit was drawn randomly from each list. We validated these addresses using
AccuZIP6, a comprehensive mailing list management system (AccuZIP, 2010). AccuZIP6 is
Coding Accuracy Support System (CASS) and Presort Accuracy Verification and Evaluation
(PAVE), certified by the United States Postal Service (USPS). Of these 951 addresses, 938
(98%) were valid residential addresses. The thirteen addresses that were not validated are shown
in Fig 5. As evident from this figure, these addresses are distributed randomly and no systematic
bias seems evident. These addresses were investigated further and provided insight into the
limitations of reverse geocoding. These thirteen invalid addresses were not matched due to four
different reasons, namely, outdated data on dismantled buildings (Fig 6a), wrong nomenclature
for building type (apartment building listed as a town house) (Fig 6b), mismatch in the address
format between the USPS database and the White Pages database (Fig 6c), and the duplicated
listing of an address in multiple towns (Fig 6d). Although the number of invalid addresses
retrieved using reverse geocoding is significant, it will be important to validate all addresses and
evaluate the reasons for invalid addresses. It can serve dual purposes: provide a sense of
completeness of addresses in the list, and help us understand the reasons for invalid addresses.
DISCUSSION: A finite and complete enumeration listing of population in question is critically
important for social surveys. Field listing and existing database lists (such as the US Census and
USPS addresses) are widely used and accepted listing methods. Utilizing publically available
datasets, this paper presents a novel method of constructing a list of addresses. Given the
Georeferenced address listing - 13
13
computation and time constraint, we do not develop the entire list of residential addresses.
Simulating a random location within the extent of sample sites and then identifying the closest
residential address(es) around the simulated point ensured equal likelihood of selection of each
address. The sampling strategy also ensured that each address within the extent of a site has only
one chance of selection. The proposed method utilizes two different datasets, Google Earth
(Google, 2010) and White Pages (WhitePages, 2010), and can develop a list of addresses,
residential and household units, by geographic locations or by census/administrative units.
Although the proposed method can develop a list of residential addresses for any geographic
and/or administrative units and can be used for traditional social surveys, its unique benefits are
realized when utilized with the spatial sampling methods.
The proposed method has several important implications for social surveys. First, it can help
overcome the problems of over- and under-coverage of the population that the traditional
methods of listing suffer from. Over-coverage involves listing of units not in the area of interest;
under-coverage involves missing residential units that are of interest for the study. If residential
units of interest are missing or undesired units are included unintentionally, it can lead to
sampling bias. The field listing method has been accepted as a standard method for survey
research. It involves training an enumerator to identify residential units in the study area. The
enumerator is instructed to count a variety of residential units. This can include multi-family
dwelling units, counting mailboxes, door bells, or utility meters. The field listed data are usually
validated with respect to census data.
Georeferenced address listing - 14
14
While under-coverage is a well-known issue associated with the field listing method, it is also
expensive and time consuming. Thus researchers have recently begun to utilize the database
listing method. For example, the United States Postal Services (USPS) maintains a database for
all the mail delivery points in the country. Commercial firms are licensed to the databases
(AccuZIP, 2010; Valassis, 2010). The National Opinion Research Center (NORC) utilizes these
data for drawing samples for GSS. These services can provide a list of mailing addresses (with
names in many cases) by different census units, including census block. Metadata about the type
of each address are also available. Although a census block is a fine spatial unit, location
accuracy in geocoding these data can pose the problem of under-coverage by eliminating un-
geocoded addresses. In addition, the cost associated with acquiring these data could be an
important issue.
As discussed earlier, the proposed method of address listing augments the scope of spatial
sampling for social surveys. Spatial sampling can ensure better spatial coverage and population
representation than the traditional methods of sampling. Since the enumeration list is drawn
using the geographic extent, it can help assess the population (by SES and demographic
characteristics) a list of residential addresses represents. The availability of precise locations can
also ensure contextualization of the survey data by socio-physical environments, such as the
number of recreational facilities or fast food stores or schools or pharmacies within the identified
spatial extent of a sample site. This, in turn, can attract researchers’ interest in these data from a
wide variety of disciplines, including public health, economics, ecology and sociology.
Georeferenced address listing - 15
15
Although the reverse geocoding, integrated with White Pages, offers a unique and unprecedented
opportunity to develop a location-based address list, there are several limitations of this
methodology. First, the list is developed based on location or geographic extent or a boundary.
This means, locations or geographic units need to be defined for developing the list. Although
we suggest the use of local spatial autocorrelation (in the chosen attribute) to define this extent
(or site domain), pre-defined census or administrative units can also be used as site domain.
Extracting a complete list of residential addresses within the site domain can be computationally
intensive, especially for densely populated urban areas. To make our application computationally
efficient, we simulate random points within the site domain, reverse geocode these points and
then identify valid residential addresses closest to the random points. Since the address list
comprises addresses around the random points and the final respondents are drawn randomly
from the list, each address has equal likelihood of selection.
The reliability of enumeration units developed using the proposed method is largely dictated by
the quality and completeness of geo-referenced address dataset. If these data are not updated, the
unavailability of new addresses or the availability of abundant addresses could pose the problems
of under-coverage and over-coverage, respectively. Although 98% of the samples were valid
residential addresses, a few were not valid, because some buildings (or addresses) were
dismantled. In addition, there are certain limitations on the access to these publically available
datasets, for example Google Earth allows extraction of 15,000 addresses/day and White Pages
allows for two addresses/second and 200 addresses/day. Nonetheless, the commercial license to
these datasets may help overcome these problems. The validation of results presented in this
study applies to the Chicago MSA. It is likely that the quality and completeness of the
Georeferenced address listing - 16
16
georeferenced data used may vary geographically. The future research will be geared towards
implementation and validation of this methodology of enumeration listing nationally.
Intensifying competition is increasingly creating pressure among commercial firms to maintain a
complete and reliable dataset and develop more advanced features, such as API services, to
attract customers and businesses. A few years ago, Google Earth and Mapquest were the only
publically available datasets of georeferenced addresses. In recent years, however, Microsoft and
Yahoo! have also ventured in this area and are making available similar types of datasets. Since
the lack of geo-referenced datasets on addresses is still problematic in many other countries,
especially in developing countries, the application of the proposed method of listing can be of
little use for these countries. Because of modern advances in inexpensive geo-spatial
technologies it is likely that georeferenced address databases can be developed for these
countries and the application of the proposed method can be extended to these countries as well.
Georeferenced address listing - 17
17
References
AccuZIP. (2010). AccuZIP6 5.0 Retrieved 08/16/2010, 2010, from http://www.accuzip.com/
Caughy, M. O., & O'Campo, P. J. (2006). Neighborhood poverty, social capital, and the
cognitive development of African American preschoolers. Am J Community Psychol,
37(1-2), 141-154.
Chen, C., Gong, H. M., & Paaswell, R. (2008). Role of the built environment on mode choice
decisions: additional evidence on the impact of density. Transportation, 35(3), 285-299.
Curtis, A., Mills, J. W., & Leitner, M. (2006). Spatial confidentiality and GIS: re-engineering
mortality locations
from published maps about Hurricane Katrina. International Journal of Health Geographics,
5(44), . doi: 10.1186/1476-072X-5-44
Frumkin, H. (2003). Healthy places: Exploring the evidence. American Journal of Public Health,
93(9), 1451-1456.
Google. (2010). Google Map. http://maps.google.com/maps/
Harter, R., Eckman, S., English, N., & O'Muircheartaigh, C. (2010). Applied Sampling for
Large-Scale Multi-Stage Area Probability Designs. In J. Wright & P. Marsden (Eds.),
Handbook of Survey Research (2 ed., Vol. , pp. 169-199): Emerald Group Publishing
Harter, R., Eckman, S., English, N., & O’Muircheartaigh, C. (Eds.). (2010). Applied Sampling
for Large-Scale Multi-Stage Area Probability Designs. Bingley, UK: Emerald Group
Publishing Limited.
Kumar, N. (2007). Spatial Sampling for a Demography and Health Survey. Population Research
and Policy Review, 26(5-6), 581-599. doi: 10.1007/s11113-007-9044-7
Georeferenced address listing - 18
18
Kumar, N. (2009). An optimal spatial sampling design for intra-urban population exposure
assessment. Atmospheric Environment, 43(5), 1153-1155.
Kumar, N., Liang, D., Linderman, M., & Chen, J. (2010). An Optimal Spatial Sampling Design
for Social Surveys. Iowa City.
ORNL. (2008). LandScanTM Global Population Database, from http://www.ornl.gov/landscan/.
Osborne, T., & Rose, N. (1999). Do the social sciences create phenomena?: the example of
public opinion research. British Journal of Sociology, 50(3), 367-396.
Rushton, G., Armstrong, M. P., Gittler, J., Greene, B. R., Pavlik, C. E., West, M. M., &
Zimmerman, D. L. (2006). Geocoding in cancer research - A review. American Journal
of Preventive Medicine, 30(2), S16-S24. doi: DOI 10.1016/j.amepre.2005.09.011
Valassis. (2010). Valassis. http://www.valassis.com
WhitePages. (2010). WhitePages.com. http://www.whitepages.com/
F
Fig 1: Sample ssites selected ussing four spatiaal sampling me
Ge
ethods in urban
eoreferenced a
n Chicago, MS
address listin
SA
ng - 19
19
Georeferenced address listing - 20
20
Fig 2: A contrast between social survey and spatial social surveys
Spatial Social Survey
SocialSurvey
Defining and charactering
population
SamplingMethod
CharacteringSamplingDomain
PseudoSampling
Frame
SamplingFrame
A sample oflocations
Reverse Geocoding
Enumeration list of addresses at/around
sample locations
A sample ofIndividuals/Addresses/Households
A Sample ofAddresses/Buildings/
Spatial Social Survey
SocialSurvey
Defining and charactering
population
SamplingMethod
CharacteringSamplingDomain
PseudoSampling
Frame
SamplingFrame
A sample oflocations
Reverse Geocoding
Enumeration list of addresses at/around
sample locations
A sample ofIndividuals/Addresses/Households
A Sample ofAddresses/Buildings/
Spatial Social Survey
SocialSurvey
Defining and charactering
population
SamplingMethod
CharacteringSamplingDomain
PseudoSampling
Frame
SamplingFrame
A sample oflocations
Reverse Geocoding
Enumeration list of addresses at/around
sample locations
A sample ofIndividuals/Addresses/Households
A Sample ofAddresses/Buildings/
Georeferenced address listing - 21
21
Fig 3: Schematics of integrating reverse geocoding and white pages.
A sampleof geo-referenced
locations
Final sample of residentialaddresses around sample sites
Sampling Method
ContextualizedSpatial SamplingPseudo Frame Reverse Geocoding Application
Location Evaluation
Valid Street Addresses at/around
Sample Sites
List of residential addresses at/around
sample sites
Street Address Mining Application
Addresses validationand filtering
Publically availableaddress database
(white pages/yellow pages)
Invalid Location Location adjustment
A sampleof geo-referenced
locations
Final sample of residentialaddresses around sample sites
Sampling Method
ContextualizedSpatial SamplingPseudo Frame Reverse Geocoding Application
Location Evaluation
Valid Street Addresses at/around
Sample Sites
Reverse Geocoding Application
Location Evaluation
Valid Street Addresses at/around
Sample Sites
List of residential addresses at/around
sample sites
Street Address Mining Application
Addresses validationand filtering
Publically availableaddress database
(white pages/yellow pages)
Street Address Mining Application
Addresses validationand filtering
Publically availableaddress database
(white pages/yellow pages)
Invalid Location Location adjustment
Fig 4:: An example o
of residential aaddress listing a
Ge
around a samp
eoreferenced a
le site.
address listin
ng - 22
22
Fig 5: SSpatial distribu
ution of invalidd addresses in U
Ge
Urban Chicago
eoreferenced a
o, MSA.
address listin
ng - 23
23
a. DismanEllyn, IL, 6a dismantle
c. Address Route 59, WGoogle maChicago, IL“1 N640 S
tled building: 60137 is valid ed building.
format: The aWest Chicago,aps the addressL 60185”. Thetate Route 59 W
“468 Duane Staddress, but it
address is “1 N, IL, 60185”. Ins is “1N640 Rtee WhitePage’s aWest Chicago,
Fig 6: S
t, Glen represents
bb
640 State n the e 59, West address is IL 60185”
dtI
Some example
b. Wrong nombe an apartmen
d. Double listintowns, e.g. 43 IIckenham Ln, C
es of invalid ad
Ge
menclature: Thent building, but
ng: some addreIckenham Ln, Campton Hills
ddresses.
eoreferenced a
e address abovt it is listed as a
esses are listedElgin, IL 6012
s, IL 60124.
address listin
ve seems to a house.
d in two 24” and “43
ng - 24
24