Strategies for subsampling nonrespondents for economic ...

Survey Methodology

Catalogue no. 12-001-X ISSN 1492-0921

by Katherine Jenny Thompson, Stephen Kaputa, and Laura Bechtel

Strategies for subsampling nonrespondents for economic programs

Release date: June 21, 2018

Standard table symbolsThe following symbols are used in Statistics Canada publications:

. not available for any reference period

.. not available for a specific reference period

... not applicable 0 true zero or a value rounded to zero 0s value rounded to 0 (zero) where there is a meaningful distinction between true zero and the value that was rounded p preliminary r revised x suppressed to meet the confidentiality requirements of the Statistics Act E use with caution F too unreliable to be published * significantly different from reference category (p < 0.05)

How to obtain more informationFor information about this product or the wide range of services and data available from Statistics Canada, visit our website, www.statcan.gc.ca. You can also contact us by email at [email protected] telephone, from Monday to Friday, 8:30 a.m. to 4:30 p.m., at the following toll-free numbers:

• Statistical Information Service 1-800-263-1136 • National telecommunications device for the hearing impaired 1-800-363-7629 • Fax line 1-877-287-4369

Depository Services Program

• Inquiries line 1-800-635-7943 • Fax line 1-800-565-7757

Published by authority of the Minister responsible for Statistics Canada

© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2018

All rights reserved. Use of this publication is governed by the Statistics Canada Open Licence Agreement.

An HTML version is also available.

Cette publication est aussi disponible en français.

Note of appreciationCanada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the publicStatistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, Statistics Canada has developed standards of service that its employees observe. To obtain a copy of these service standards, please contact Statistics Canada toll-free at 1-800-263-1136. The service standards are also published on www.statcan.gc.ca under “Contact us” > “Standards of service to the public.”

https://www.statcan.gc.ca

mailto:STATCAN.infostats-infostats.STATCAN%40canada.ca?subject=

https://www.statcan.gc.ca/eng/reference/licence

https://www150.statcan.gc.ca/n1/pub/12-001-x/2018001/article/54929-eng.htm

https://www.statcan.gc.ca

https://www.statcan.gc.ca/eng/about/service/standards

Survey Methodology, June 2018 75 Vol. 44, No. 1, pp. 75-99 Statistics Canada, Catalogue No. 12-001-X

1. Katherine Jenny Thompson, Economic Statistical Methods Division, U.S. Census Bureau, Washington, DC 20233. E-mail:

[email protected]; Stephen Kaputa, Economic Statistical Methods Division, U.S. Census Bureau, Washington, DC 20233. E-mail: [email protected]; Laura Bechtel, Economic Statistical Methods Division, U.S. Census Bureau, Washington, DC 20233. E-mail: [email protected].

Strategies for subsampling nonrespondents for economic programs

Katherine Jenny Thompson, Stephen Kaputa, and Laura Bechtel1

Abstract

The U.S. Census Bureau is investigating nonrespondent subsampling strategies for usage in the 2017 Economic Census. Design constraints include a mandated lower bound on the unit response rate, along with targeted industry-specific response rates. This paper presents research on allocation procedures for subsampling nonrespondents, conditional on the subsampling being systematic. We consider two approaches: (1) equal-probability sampling and (2) optimized allocation with constraints on unit response rates and sample size with the objective of selecting larger samples in industries that have initially lower response rates. We present a simulation study that examines the relative bias and mean squared error for the proposed allocations, assessing each procedure’s sensitivity to the size of the subsample, the response propensities, and the estimation procedure.

Key Words: Quadratic program; Unit response rate; Nonresponse adjustment; Systematic sampling; Optimal allocation;

Two-phase sampling.

1 Introduction

Many federal programs are simultaneously experiencing declining response rates and reductions in

funding. At the same time, these programs are required to maintain predetermined reliability levels and are

encouraged to collect an increased number of data items and to publish more statistics. Of course, as

nonresponse increases, the precision of the survey estimates will decrease from the original design levels

and can be sensitive to nonresponse bias. Consequently, many federal agencies are investigating adaptive

collection design strategies, where the term “collection design” refers to protocol(s) for collecting data.

With business surveys, the collection design may vary by type of unit. These populations are generally

highly skewed; the majority of a tabulated total in a given industry is often provided by a small number of

large businesses. Because publication statistics are generally industry totals or percentage change, missing

data from the largest cases can induce substantive nonresponse bias in the totals, whereas missing data from

the smaller cases (even those with large sampling weights) often have little apparent effect on the tabulated

levels (Thompson and Washington, 2013). Thus, the contact strategies are designed to ensure that the largest

cases provide valid response data. Figure 1.1 illustrates nonresponse follow-up (NRFU) procedures that

differ by a survey-specific unit size classification, where both collection designs have fixed calendar

schedules and a fixed NRFU budget.

For the large unit category, the NRFU procedures become progressively more costly (per unit) with the

exception of the final contact attempt. In contrast, with the smaller units, the NRFU procedures do not

include personal contact and are therefore less expensive.

Selecting a probability subsample of nonrespondents is a strategic feature of many responsive and

adaptive collection designs (Tourangeau, Brick, Lohr and Li, 2016). Of course, this is not a new practice

76 Thompson, Kaputa and Bechtel: Strategies for subsampling nonrespondents for economic programs

Statistics Canada, Catalogue No. 12-001-X

for surveys. Indeed, nonrespondent subsampling has been a survey practice since first discussed in Hansen

and Hurwitz (1946). Actually, the setting of the two-phase sample approach presented in Hansen and

Hurwitz (1946) paper is quite similar to the business survey setting discussed here: an “inexpensive” mailed

questionnaire to all sampled units (c.f. the “21st century design” that mails a letter containing a URL, user

name, and password), followed by “expensive” personal interviews of subsampled nonrespondents (c.f.

personal phone calls or certified reminder letters). Their proposed optimal allocation procedures are not

entirely dissimilar either, with the final allocations being highly dependent on whether the response rates

for each collection mode are known or estimated using auxiliary data rather than the previously collected

responses.

Figure 1.1 Nonresponse follow-up procedures for differing types of business in a fictional survey.

Fitting nonrespondent subsampling into a responsive or adaptive design framework is straightforward.

As originally proposed by Groves and Herringa (2006), responsive designs require a minimum of two

distinct phases of collection, with the second phase often being a probability subsample of nonrespondents

that occurs at the “phase capacity” when the survey estimates are no longer changing, providing evidence

the existing collection protocol is no longer cost-effective. Schouten, Calinescu and Luiten (2013)

characterize responsive designs as a special case of adaptive collection designs. With an adaptive collection

design, the data collection procedures can change (adapt) during the collection period. Paradata and sample

data are used to determine whether to change the current procedures. The overall budget is fixed, but the

implementation of a given strategy depends on (1) the realized sample of respondents at a point in time, (2)

informative data obtained during data collection about the respondents and nonrespondents, and (3)

information known in advance about the survey unit from the sampling frame. Consequently, selecting a

probability sample of nonrespondents for NRFU – instead of attempting to contact all nonrespondents –

falls under the adaptive design umbrella, with paradata (specifically response status) used to determine the

sampling frame and frame data (e.g., the unit’s size and industry classification) used as the basis of the

sample design.

The U.S. Census Bureau is investigating nonrespondent subsampling strategies for the 2017 Economic

Census (EC). Although a single program, the EC employs different sampling designs by sector (Probability

proportional to size for the Construction sector, cut-off sampling for the Manufacturing and Mining sectors

in collections prior to 2017, complete enumeration for the Wholesale Trade sector, and stratified simple

January February March April May Large Units Small Units

Final reminder

letter

Reminder letter

Robot call reminder

Certified mail reminder

Personal phone call

Reminder letter

Strong reminder

letter

Final reminder

letter

Survey Methodology, June 2018 77


random sampling without replacement (SRS-WOR) in the remaining sectors). Moreover, as is typical with

many business programs, it is a multi-purpose collection, with the general statistics items collected from all

surveyed units in a sector: examples include – but are not limited to – receipts/shipments, annual and first

quarter payroll, and total employment. In addition, the EC collects information on product sales, types of

which differ by sector and often industry. Imputation procedures differ by item, as do the estimators.

Consequently, the subsampling design must be robust to sampling and estimator to the largest extent

possible. We consider a systematic sample of nonrespondents sorted by a measure of size, a sampling design

known to be as efficient as stratified simple random sampling without replacement (SRS-WOR) on average

if the list is in random order and more efficient if the list is monotonic increasing or decreasing (Zhang,

2008; Lohr, 2010, Chapter 2, pages 50-51).

Ideally, the nonrespondent subsampling allocation procedure should be informed by properties of the

respondent sample during the collection period. Of course, if the program is designed to collect one or two

key items, then the allocation procedures should (at least attempt to) directly incorporate information on the

survey design and estimation procedure, as well as detailed cost information, as proposed in Hansen and

Hurwitz (1946) long-ago. In this case, one should use an optimal allocation procedure that minimizes costs

subject to (estimable) reliability constraints. See Harter, Mach, Chaplin and Wolken (2007) and Beaumont,

Bocci and Haziza (2014) for examples.

Such optimization is difficult to accomplish in the considered multi-purpose survey setting, especially

when strongly correlated auxiliary variables are not available for all items. However, the OMB Statistical

Standards for federal surveys require “survey (design) to achieve the highest practical rates of response,

commensurate with the importance of survey uses, respondent burden, and data collection costs” and

mandate nonresponse bias analyses for programs that fail to achieve these rates (Federal Register Notice,

2006). For nonrespondent subsampling occurring during the data collection cycle, imposing mandated lower

bounds on the program-level response rate and in specified domains (examples include sampling strata or

other post-strata such as industry code or type of government) is therefore a natural constraint to include in

the allocation procedure.

In this paper, we explore allocation approaches that address such constraints, with an overall objective

of selecting larger systematic subsamples in domains that have lower-than-targeted response rates. We

introduce two optimized allocation procedures, both formulated as quadratic programs and solved with

standard software packages: one that minimizes deviations between domain unit response rates and one that

minimizes deviations between domain subsampling intervals. Our case study compares the statistical

properties of subsamples obtained from each proposed allocation with three different estimators,

considering two ratio estimators commonly used by business surveys along with the simple expansion

(Horvitz-Thompson) estimator. The latter is not necessarily the most precise estimator when highly

correlated auxiliary data are available, but gives an “upper bound” on the variance increase due to

subsampling. The ratio estimators were selected to illustrate that the subsampling variance component can

be reduced by incorporating correlated auxiliary data at the estimation stage.

Note that the presented allocation procedure is designed specifically for business surveys and implicitly

assumes that largest units are excluded from the subsampling. In this case, the overall cost savings may not



be substantial because the majority of a program’s NRFU budget will be likely allocated to obtaining

responses from the designated larger cases. However, the estimate quality can be improved. By equalizing

response rates in considered domains, we hope to reduce the bias of the estimates by obtaining a respondent

set that resembles the parent sample. Moreover, equalizing the subsampling intervals should help avoid

overly increasing the sampling variance due to the second phase of selection, an unpleasant side effect of

the additional stage of sampling that can completely offset any bias reduction obtained via the probability

subsample (Biemer, 2010). And, it may be possible to further reduce both nonresponse bias and subsampling

variance via an improved ratio or regression estimation procedure, if related covariates are available.

Section 2 provides context, briefly introduces the studied estimators, and presents our allocation

procedures. Section 3 presents a simulation study that compares the statistical properties of the considered

estimators for each realized allocation. We conclude in Section 4 with recommendations and suggestions

for future research.

2 Methodology

2.1 Survey design and estimation

The general framework for our research is the two-phase sample design shown in Figure 2.1. The first

stage is a stratified probability sample with a total sample size of n from a finite population (frame) of size

,N performed before data collection begins. The survey is conducted, and units either respond or do not.

During the data collection, response rates are monitored in H domains, where the domains do not

necessarily equal the sampling strata. For example, total response rates might be monitored by three-digit

industry classification, although these industry sampling strata are further broken down by size class.

Furthermore, the domains could be independent of the original sampling strata e.g., race or sex categories

(resembling post-strata). Hereafter, the term “domain” refers to the nonrespondent subsampling strata,

indexed by 1, 2, , .h h H

The second stage of probability sampling occurs at a predetermined point in the data collection cycle

when we select an overall 1 in K subsample of size 1m from the m nonrespondents (a two-phase

sample); this predetermined point can be a fixed calendar date or via a responsive/adaptive design protocol.

The value of K is determined by the program managers, who take into account the overall budget for NRFU

(assumed fixed), mandated performance measures (e.g., response rates, coefficient of variation

requirements), and other operational considerations such as length of collection period and available

resources. Our allocation procedure determines the 1 in hK systematic subsample of size 1hm from the

hm nonrespondents in each domain. Only the sampled 1hm units receive NRFU.

Our objective is to estimate ,Y the population total of characteristic .y This estimate is 1 2ˆ ˆ ˆ

R RY Y Y

where 1ˆ

RY is estimated from the 1hr first-stage sample respondents and 2ˆ

RY is estimated from the 2hr

second-stage sample respondents (see Figure 2.1). Nonresponse adjustments to the 2hr subsampled

(responding) units assume a missing at random response (MAR) mechanism, treated as a Bernoulli sample

(Särndal, Swensson and Wretman, 1992, Chapter 15; Kott, 1994). We consider three different adjustment-

to-sample reweighting estimators of 2ˆ

RY (Kalton and Flores-Cervantes, 2003): the double reweighted

expansion (DE) estimator (Binder, Babyak, Brodeur, Hidiroglou and Wisner, 2000; Shao and Thompson,



2009; Haziza, Thompson and Yung, 2010), a separate ratio (SR) estimator that adjusts for unit nonresponse

using a covariate that is highly correlated with both response propensity and the survey characteristic of

interest (Shao and Thompson, 2009; Haziza et al., 2010), and a combined ratio (CR) estimator (Binder et al.,

2000). Formulae are provided in the Appendix.

Figure 2.1 Nonrespondent subsample from probability sample, selected during data collection (two-phase sample design). Unsampled nonrespondents do not receive NRFU.

These estimators require a minimum of 2 1hr in each domain and a minimum of 2 2hr for variance

estimation. These minimal conditions may not hold for several reasons. During the early stages of NRFU

collection, an insufficient number of the subsampled units might respond in a given domain. Alternatively,

the allocation procedure could determine that no subsampling is required in one or more domains. Lastly,

the allocation procedure could require 100-percent follow-up (all units subsampled) in selected domains;

henceforth, we refer to 100-percent follow-up/no subsampling as “full follow-up”. In these cases, the

estimation procedure ignores the last stage of sampling as if it did not occur and produces estimates for

domain h using the collapsed estimator formulae provided in the Appendix.

2.2 Allocation strategies

When all nonresponding cases are subjected to NRFU, respondent contact strategies focus on improving

overall response rates. Analysts might focus primarily on obtaining responses from soft refusal cases that

they believe have similar characteristics to previous respondents (“quick wins”), although this phenomenon

Respondents r1h Units

Unsampled Nonrespondents

1st Stage Probability Sample nh Units

Respondents r2h Units

Nonrespondents m2h Units

Initial Contact Strategies

NRFU procedures

Nonrespondents mh Units

Systematic Subsample

2nd Stage Probability Sample

m1h Units



is more likely when the survey collection is performed in the field, as with household or agricultural surveys,

and perhaps is less likely for internet or mail collections. With business surveys, the size of the unit is a

factor in the NRFU procedures as discussed in Section 1.

Our objective is to obtain a realized set of respondents that approximates a random subsample of the

originally selected sample via a probability sample of nonrespondents. With a probability sample, the

targeted cases represent a cross-section of the nonrespondent population. By focusing contact efforts on the

subsample, we hope to decrease the effects of nonresponse bias on the estimated totals by obtaining data

from all types of nonresponding units. Moreover, weighting or imputation methods may be more effective

at reducing the nonresponse bias effects with a probability subsample of nonrespondents (Brick, 2013).

Even though they do not receive additional NRFU, the unsampled nonrespondent cases may provide

responses later in the collection cycle. If so, an unbiased estimation procedure would not include the

unsampled late responses in the final estimate assuming that all subsampled units respond, as these units are

represented by the subsampled cases. However, this procedure is extremely distasteful to many survey

managers. Instead, we include their data in the tabulations as if they had responded before subsampling.

This does induce bias in the estimate. In practice, we ensure that this situation occurs infrequently by

subsampling late in the data collection cycle.

With a business survey that keeps track of little or no demographic information, most of the information

on the nonrespondents such as industry and unit size (e.g., total payroll, total receipts) is obtained from the

sampling frame. Sorting the nonrespondents within prespecified domains by unit size and selecting a

systematic sample should yield a subsample that resembles the originally designed sample in terms of unit

size composition. This is especially important for business surveys where responses tend to be obtained

from the larger units (Thompson and Washington, 2013). The choice of subsampling domain is determined

by overall survey objectives such as publication levels or by the adjustment cell design (e.g., weighting cells

or imputation classes), although computations are considerably simplified when the domain of interest is

the original sampling strata. In the EC, the industry is the domain of interest.

We consider two allocation approaches: (1) equal-probability sampling; and (2) optimized allocation

with constraints on unit response rates and sample size in predetermined domains. Equal probability

sampling is easy to implement and should have the lowest sampling variance among considered

nonrespondent subsampling allocation strategies, since the subsampling weight adjustment will be a

constant value in all domains. However, since the same proportion of nonrespondents is sampled in each

domain, the subsample may not be large enough to offset nonresponse bias effects on totals in low-

responding domains. We refer to the allocations obtained by equal probability sampling as Constant ,K

where K refers to the overall sampling interval 1 in .K

Our optimized allocation methods address the above concern by concentrating NRFU efforts in domains

that have low response rates, attempting to select sufficient cases to achieve the performance benchmarks.

This strategy may decrease the nonresponse bias in the totals if the response mechanism is MAR, conditional

on the auxiliary variables used to define the domains; see Wagner (2012). However, it can increase the

variance, as the subsampling intervals will differ and the weights will become more variable. To minimize

the additional sampling variance caused by differing sampling intervals, the domain nonrespondent



subsampling intervals should be close to the overall nonrespondent subsampling interval. To control costs,

the allocation should not select more units for NRFU than budgeted. Recall that the federal survey

environment requires that target response rates be achieved or nearly achieved, which makes all domains

“equally” important from a data collection viewpoint.

To describe the allocation procedures, we introduce additional notation:

Unit response rate: 1 2

URRh

hh

hh

r r

n

Target response rate: 1

TURRhh

h

h

h

h

q m Kr

n

Target domain response rate: 1

TURRh h

h

h h

h

r q m K

n

with 1hr units of the hn originally sampled units responding before subsampling, leaving hm units available

for subsampling in each domain. The unit response rate (URR) is the actual proportion of responding

sampled units (Thompson and Oliver, 2012) and does not include an adjustment for subsampling. The target

response rate TURR used for allocation is the expected maximum obtainable URR for a given overall

subsampling rate ,K with hq representing the conditional probability of ultimately responding to the

census/survey in domain ,h given that the unit did not respond prior to subsampling. In the allocation

procedure, hq can be modeled from historical data if available or can be assumed constant for a new survey

or for sensitivity analyses.

We formulate optimized allocation as a quadratic program and consider two different objective

functions. The first quadratic program minimizes the squared deviation of the target response rate in each

domain TURR h from the overall target unit response rate TURR , subject to linear constraints on the size

of nonrespondent sample. This objective function is analogous to the numerator of the Pearson chi-square

goodness-of-fit test.

The second quadratic program minimizes the squared deviation in domain sampling intervals from the

overall sampling interval K subject to linear constraints on the unit response rates in each domain and on

the number of sampled nonrespondents. Thus, although the optimization procedure allows the sampling

intervals to vary by domain, the program tries to avoid potentially large increases in variance caused by the

deliberately introduced “disproportionate sampling fractions” referred to in Kish (1992). We refer to the

allocations obtained from these quadratic programs as Min URR and Min K respectively.

Both quadratic programs are primarily deterministic. However, recall that at the allocation stage, we

must estimate the number of subsampled units that will eventually respond in each domain. Both quadratic

programs use Constraints (1) through (3) in Table 2.1. Constraint (4) is included in the Min K allocation

to ensure that the optimization solution is not hK K for all domain .h There are two limiting scenarios

(preconditions) that are addressed before the Min K optimization. First, domains whose T T URR URRh

before subsampling must be removed from the optimization problem .hK Second, if the estimated

unit response rate cannot be possibly achieved in a given domain for an assumed ,hq then all units in the



domain are selected for NRFU 1 .hK The Min K optimization is applied to the remaining domains,

requiring that these subsampled domains have expected URRs that meet or exceed the target URRs.

Using sample data containing respondents and nonrespondents, along with different specified values for

,hq we use the SAS PROC NLP (The data analysis for this paper was generated using SAS software.

Copyright, SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered

trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.) to solve the quadratic programs (obtaining

the set of ).hK The realized allocations are not integer values, and the real valued intervals hK were input

to SAS PROC SURVEYSELECT to select stratified systematic subsamples of nonrespondents. As noted

by one reviewer, this yields a solution that is randomly rounded but constrained at the overall required

sample size, and there may be some impact on reliability due to rounding error. Such effects were not studied

in this paper.

Table 2.1 Optimized allocation quadratic programs

Min URR Min K Purpose

Obj

ecti

ve

Fu

nct

ion

2T Tmin URR URRhh

22

1

min minh

hh h

h

mK K K

m

Con

stra

ints

(1) 1h hh h

K m m Selected sample size cannot exceed overall 1-in-K sample size

(2) 1 1h hm m Domain subsample cannot exceed number of nonrespondents in the strata

(3) 1 0hm Non-negativity constraint

(4) Not Applicable

1T

1T

T T

URR 1

URR

URR URR otherwise

h h hh

h

hh

h

h

r q mK

n

rK

n

Ensures that all domains achieve target URR as feasible.

3 Case study

This section presents the results of a simulation study that evaluates the considered allocation procedures

on empirical sample data from the Annual Survey of Manufactures (ASM) from the 2010 and 2011 data

collections. For more information on the ASM, see http://www.census.gov/manufacturing/asm.

The ASM is an establishment survey designed to produce “sample estimates of statistics for all

manufacturing establishments with one or more paid employee(s)” (http://www.census.gov/manufacturing/

asm/); it is a Pareto-PPS sample of approximately 50,000 establishments selected from a universe of

328,500. Approximately 20,000 establishments are included with certainty, and the remaining

establishments are selected with probability proportional to a composite measure of size. Selected units are

in the sample for the four years between censuses. Sampling strata are defined by six-digit industry code

using the North American Industry Classification System.



The ASM estimates totals with a difference estimator (Särndal et al., 1992). To reduce respondent

burden, units below a certain threshold are dropped from the sampling frame entirely. Instead, their data are

imputed using administrative data values for selected items and industry-level regression models for the

remaining items. Similarly, the ASM imputes complete records for unit nonrespondents. See

http://www.census.gov/manufacturing/asm/ for additional information on the ASM methodology.

Because the items collected by the ASM questionnaire are a subset of the EC’s manufacturing sector

items, the ASM is often used to pretest new EC processing or data collection procedures. With the ASM

and the EC, implementing a probability subsample of nonrespondents for NRFU represents a major

procedural change. The ASM NRFU procedures are very similar to the EC procedures. Because a given

company can comprise several establishments, the sets of multi-unit (MU) establishments corresponding to

the company can be designated for phone follow-up as well as other company completeness checks. In

contrast, the NRFU procedures for the single unit (SU) establishments – establishments with one location

and parent company – differ. The largest SU establishments are included with certainty (sampled with

probability = 1) and may receive a personal phone call in selected domains. The sampled SU establishments

(“SU noncertainty establishments”) receive some reminders, but are very unlikely to receive a personal

phone call.

Our simulation study examines one of the fourteen key ASM items and employs the double expansion

estimate and the two ratio estimators described in the Appendix, not the difference estimator used in ASM

production estimates. Consequently, our results should not be extrapolated to the ASM.

3.1 Simulation study design

Our simulation study compares the statistical properties of total shipment estimates obtained from the

three considered nonrespondent subsampling designs over repeated samples, using three different

estimators. Our sampling frame of nonrespondents is derived from the fully imputed 2011 ASM sample and

is limited to the SU noncertainty establishments so that the overall ASM publication reliability requirements

are maintained. The ratio estimators employ the sample-based values of annual payroll as an auxiliary

variable. This variable is highly correlated with total shipments, but is subject to imputation. Note that we

use the complete ASM sample (all MU and SU establishments) for the allocations but present the relative

bias and MSE results for the subsampled domains (SU noncertainty establishments) only.

For the SU noncertainty establishments, the first NRFU attempt – consisting of a reminder letter – is

historically very effective, so nonrespondent subsample selection occurs before the second NRFU attempt.

The second NRFU attempt is generally more expensive (historically a package re-mail, although reminder

letters via certified mail will be used in future collections). Nonrespondent subsampling of SU noncertainty

establishments occurs after the second contact attempt (i.e., after the first NRFU attempt).

To perform the simulation, we removed all MU establishments and SU certainty establishments from

the ASM sample data to create a frame, and then independently repeated the following procedure 5,000

times for each allocation procedure:

1. Using the estimated response propensities provided in Table 3.1, randomly induce nonresponse into

the sample using a MAR response mechanism.



2. Sort the induced nonrespondents by sampling weight.

3. Select a stratified systematic sample using the nonrespondent domain subsampling rates for a given

allocation strategy.

4. Simulate unit response for each round of NRFU. Table 3.1 provides the conditional response

propensities used for each distinct NRFU contact phase. These statistics use paradata from the 2010

and 2011 ASM collections (Fink and Lineback, 2013). Hereafter, we refer to these conditional

probabilities as “nonrespondent conversion rates”. If the unit responded, the mode of response is

randomly assigned using historical frequencies provided by subject matter experts. After assigning

response status/response mode to each unit, compute cumulative collection cost, URR, and

estimates.

5. For each allocation, repeat Step 4 until either ten rounds of follow-up have been conducted or the

total budget has been expended. If funds are exhausted within a round, then NRFU ceases. Given

that the fixed budget assumes that 1 K of the original set of nonrespondents will receive NRFU,

the budget can be exhausted under full follow-up. The total budget is never expended before ten

rounds of NRFU with nonrespondent subsampling, as the cost-per-unit of mailing a reminder letter

is quite low. Our choice of a maximum of ten rounds of NRFU in the simulation was subjective;

the purpose was to demonstrate that subsampling would facilitate additional contact efforts at no

additional cost.

Table 3.1 Nonrespondent conversion rates for noncertainty single unit establishments by NRFU contact round used for simulation

Domain Initial

Response Probability

Nonrespondent Conversion Rates for a given Round of Nonresponse Follow-up

1 2 3 4 5 6 7 8 9 10 1 0.31 0.27 0.15 0.17 0.24 0.12 0.06 0.03 0.03 0.03 0.03 2 0.44 0.32 0.24 0.15 0.36 0.18 0.09 0.05 0.05 0.05 0.05 3 0.39 0.28 0.24 0.18 0.11 0.06 0.03 0.02 0.02 0.02 0.02 4 0.35 0.36 0.17 0.19 0.18 0.09 0.05 0.02 0.02 0.02 0.02 5 0.25 0.19 0.13 0.10 0.17 0.09 0.04 0.02 0.02 0.02 0.02 6 0.27 0.13 0.29 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 7 0.44 0.34 0.23 0.20 0.25 0.13 0.06 0.03 0.03 0.03 0.03 8 0.38 0.45 0.12 0.33 0.25 0.13 0.06 0.03 0.03 0.03 0.03 9 0.39 0.30 0.23 0.13 0.25 0.13 0.06 0.03 0.03 0.03 0.03 10 0.75 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11 0.28 0.23 0.12 0.18 0.15 0.07 0.04 0.02 0.02 0.02 0.02 12 0.36 0.30 0.21 0.15 0.31 0.16 0.08 0.04 0.04 0.04 0.04 13 0.39 0.22 0.19 0.13 0.23 0.12 0.06 0.03 0.03 0.03 0.03 14 0.37 0.36 0.16 0.06 0.45 0.22 0.11 0.06 0.06 0.06 0.06 15 0.41 0.32 0.22 0.19 0.26 0.13 0.06 0.03 0.03 0.03 0.03 16 0.40 0.34 0.22 0.23 0.32 0.16 0.08 0.04 0.04 0.04 0.04 17 0.34 0.26 0.18 0.10 0.21 0.11 0.05 0.03 0.03 0.03 0.03 18 0.40 0.31 0.18 0.10 0.18 0.09 0.04 0.02 0.02 0.02 0.02 19 0.37 0.29 0.20 0.19 0.23 0.11 0.06 0.03 0.03 0.03 0.03 20 0.40 0.28 0.21 0.15 0.18 0.09 0.04 0.02 0.02 0.02 0.02 21 0.36 0.27 0.20 0.14 0.23 0.11 0.06 0.03 0.03 0.03 0.03



The nonrespondent conversion rates in the majority of domains follow the same pattern: a decaying

response probability followed by a slight increase in the fourth round due to a longer collection period.

Domain 10 does not follow this pattern; it contained only four units that all responded before subsampling

began. After the 4th round of NRFU, the nonrespondent conversion rates are reduced by half until they

achieve the minimum allowable value of 0.02. The pattern reflects the findings of Olson and Groves (2012)

(Olson and Groves (2012) postulate that the response propensities change over the collection cycle,

especially as data collection protocols are modified. With the ASM, the reminder letters become more

stringent at each NRFU contact phase. Likewise, the authors demonstrate that response propensities decline

over the collection phase when a stable data collection protocol is used, as reflected in nonrespondent

conversion rates). Mail and phone response propensity estimates were provided by subject matter experts,

as were approximate costs by mode and an overall budget figure.

To evaluate the statistical properties of each allocation method for each estimator, we computed the

relative bias and the mean squared error. The relative bias (RBE) for each estimate of total shipments at

NRFU phase t for a given sampling overall interval ,K allocation method a (Constant ,K Min ,K

Min URR), eventual response probability ,q and estimator e (DE, SR, CR) is

5,000

1RBE 100 * 15, 0

ˆ

00

eKaqtse s

Kaqt

YY Y

where ˆ eKaqtsY is the estimated total and Y is the population total shipments value.

The mean squared error at NRFU phase t for a given sampling interval, allocation method and estimator

is

25,000

1MSE 5,00ˆ 0 .e e

KaqtsKaqt sY Y Y

Since our simulation induces MAR response, the DE estimates should be approximately unbiased over

repeated samples, whereas the two ratio estimates should not be. However, the DE estimates are expected

to have large variance; using ratio estimators with a positively correlated auxiliary variable is expected to

reduce this variance (i.e., increase the precision). Thus, examining the MSE provides insight into the bias-

variance tradeoff.

3.2 Allocation

The simulation study uses data from the 2011 ASM collection. Input parameters for allocation were

estimated from 2010 ASM collection data. Recall that the target URR applies to the entire ASM program

and is not restricted to the subsampling domains - in our case, SU noncertainty establishments.

Consequently, the certainty SU and MU unit counts obtained from the 2010 ASM data are included in the

allocation programs in the 1hr as constants; the remainder of the 1hr represents the estimated count of

responding SU noncertainty establishments after the first round of NRFU is completed. To ensure that each

nonrespondent sampling domain contained sufficient numbers of units to obtain a feasible solution, we used

three-digit industry as NRFU sampling domain instead of the six-digit industry used for the ASM sample



design [Note: the determination subsampling domain was determined collaborative with the ASM program

managers and methodologists].

Both quadratic programs require an estimated probability of eventually responding to follow-up hq to

compute the TURR (overall and by domain). To assess the sensitivity of the allocation procedure, we tested

ten different constant values 0.10, 0.20, ,1 ,hq keeping the value constant across all domains. A

similar approach can be taken when historic paradata are not available. In addition, we estimate the hq

directly from the 2010 ASM data. These estimates vary by 20-percent at three-digit industry level. However,

the median of these is nearly 50-percent. Consequently, the allocation obtained using the estimated (historic-

data) hq values are very similar to those obtained with 0.50.q

Approximately $21,000 was allotted for NRFU of SU noncertainty establishments after subsampling.

With full follow-up, the expected final unit response rate was approximately 79%. Using data from the 2007

EC, Bechtel and Thompson (2013) found that the target industry unit response rates of 70% could only be

achieved in a 1 in 3 subsample if the average unit response rate in the majority of EC industries was

60% or larger before follow-up begins. With the ASM, the response rate prior to subsampling was

approximately 57%. Instead, we select an overall 1 in 2 subsample, which would save approximately

50-percent of the allotted budget after five completed rounds of NRFU at the cost of a decrease expected

response rate (69%). The additional five rounds of NRFU added approximately $4,000 to the total cost

without commensurate increases in response rate (70%). A larger subsample would be preferable in terms

of quality, but is not cost effective.

For allocation, we obtain the TURR , allowing the hq to vary by domain. The maximum URR is always

achieved with the Min URR quadratic program. Table 3.2 presents the target URRs and the allocation

subsampling rates obtained from the Min URR quadratic program. A dash (-) indicates no subsample is

selected for NRFU (a sampling interval of ). If 1,K all units in the domain are selected for NRFU (full

follow-up). A label of q <value> indicates that the eventual probability of respondent is the same

constant value in all domains; values estimated from historical data are labeled as hq Est. Recall that TURR includes all respondent units in the ASM sample, not just the noncertainty single units that are

eligible for subsampling. Consequently, selected domains have achieved their target URRs before

subsampling and are not considered as subsampling candidates in the allocation programs.

As the probability of eventually responding increases, this allocation tends to select smaller subsamples

in increasing numbers of domains. When the probability of an eventual response hq is small (20-percent

or less), then the allocations sensibly tend towards no subsampling or full follow-up, focusing on obtaining

sample from the few domains with the poorest response rates. As the probability of an eventual response

increases, the amount of subsampling tends to increase as well. At 70-percent, almost half of the domains

are allocated at least one sampled unit, thus spreading the allocated sample across several domains instead

of concentrating in a few domains that have exceptionally poor response rates. Note that rates below 20-

percent are (hopefully) unrealistic as are rates greater than 70-percent. Domain 10 has highly variable

sampling rates regardless; because all four units responded before subsampling, the quadratic program

selected any sampling rate because, in effect, it always subsamples zero cases.



Table 3.2 Min URR Allocations (Sampling Intervals) (Program Level 2K )

Min URR

Domain q = 10 q = 20 q = 30 q = 40 q = 50 q = 60 q = 70 q = 80 q = 90 q = 100 qh = Est1 - - - - - - - 81.63 9.23 5.40 - 2 - - - - - - 3.88 2.26 1.71 1.44 - 3 - - 9.32 3.40 2.58 2.19 1.98 1.86 1.77 1.71 2.12 4 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.06 1.13 1.00 5 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 - 7 - - - - - - - 14.95 9.26 7.10 - 8 - - - - - - - - - - - 9 1.00 1.00 1.22 1.44 1.61 1.76 1.89 2.01 2.12 2.22 1.62 10 1.03 30.26 30.37 30.26 30.46 29.90 30.51 29.04 1.00 10.03 10.04 11 - - - - - - - - - - - 12 - - - - - - - - - - - 13 - - - - 5.00 2.94 2.29 2.01 1.88 1.78 2.91 14 - - - - - - - - - - - 15 7.86 4.45 3.22 2.57 2.42 2.32 2.28 2.30 2.35 2.40 2.38 16 1.00 1.35 1.46 1.46 1.49 1.51 1.53 1.57 1.62 1.66 1.66 17 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 18 - - - - - - - - - 37.95 - 19 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 20 1.00 1.00 1.00 1.16 1.34 1.49 1.63 1.75 1.87 1.97 1.35 21 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

TURR 72.5% 72.9% 73.3% 73.7% 74.1% 74.4% 74.8% 75.2% 75.6% 76.0% 74.3%

Unlike the Min URR quadratic program, the Min K quadratic program did not always obtain a

solution for a given target URR because of the domain-level constraints on the target URRs. When this

occurred, we incrementally lowered the target response rate until a feasible solution could be obtained.

Table 3.3 presents the target URRs and the allocations obtained from the Min K quadratic program.

Both the allocation methods tend to designate the same domains for either no subsampling or for full

follow-up. However, the two methods produce very different allocations for the same hq in the subsampled

domains. The Min K allocations avoid subsampling in domains that have already achieved their maximum

estimated target URR, regardless of the probability of eventually obtaining a response, with 40- to

50-percent of the domains not being subsampled when 0.30 0.50.hq Otherwise, the subsampling tends

to be split between full follow-up (all units selected) or subsampling at an approximately 1 in 2

sampling rate. In short, the Min URR allocations yield domain subsampling intervals that can differ

considerably from the overall interval, as the allocation seeks to equalize the target URR in each domain.

The resultant variability in sampling intervals can lead to large increases in sampling variance. Because the

Min K objective function seeks to equalize sampling intervals, the domain subsampling intervals tend to

be less variable and are generally close to the overall sampling interval.



Table 3.3 Min K Allocations (Sampling Intervals) (Program Level 2K )

Min K (Target 2)K

Domain q = 10 q = 20 q = 30 q = 40 q = 50 q = 60 q = 70 q = 80 q = 90 q = 100 q = Est 1 - - - - - - - - - - - 2 - - - - - - - - 2.00 2.00 - 3 - - - - 1.99 2.00 2.00 2.00 2.00 2.01 1.99 4 1.00 1.00 1.00 1.00 1.00 1.00 1.01 1.10 1.18 1.26 1.00 5 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 7 - - - - - - - - - 2.06 - 8 - - - - - - - - - - - 9 1.00 1.32 1.44 1.72 1.90 1.99 1.97 1.96 1.96 2.09 1.90 10 - - - - - - - - - - - 11 - - - - - - - - - - - 12 - - - - - - - - - - - 13 - - - - - 1.99 1.99 1.98 1.98 2.04 - 14 - - - - - - - - - - - 15 2.52 2.23 2.36 1.90 1.76 1.97 1.92 1.90 1.90 2.27 1.76 16 2.17 2.08 1.71 1.83 1.90 1.97 1.97 1.96 1.96 2.09 1.90 17 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 18 - - - - - - - - - - - 19 - 2.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 20 1.00 1.00 1.11 1.36 1.57 1.75 1.90 1.97 1.97 2.06 1.59 21 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

TURR 71.0% 71.4% 72.3% 72.7% 73.1% 73.4% 73.8% 74.2% 74.6% 75.0% 73.3%

3.3 Results

Our baseline closely mimics the NRFU procedures used in the 2012 ASM NRFU – four phases of full

follow-up 1K but can include an additional incomplete fifth round when the planned budget was not

depleted to retain programming consistency. For other values of ,K NRFU is concluded after ten rounds

regardless of the remaining funds.

Table 3.4 presents the relative bias of the estimates (RBE) and the mean squared error (MSE) results

obtained with full NRFU and the Constant K allocation for each considered estimator. In all cases, the

unbiased double expansion (DE) estimator yields unbiased estimates, whereas the ratio estimators are

slightly biased as expected. With subsampling, the relative bias of the ratio estimators increases, whereas

the DE estimator remains unbiased. Regardless of estimator, the additional stage of subsampling increases

the sampling variance and consequently the MSE; the bias tends to remain unaffected because the

subsampled units are a representative subsample at each round of follow-up.

With equal probability subsampling (Constant ),K a subsample may contain a few sampled cases in

one or more domains. Although the subsampling weighting adjustment is not variable, the nonresponse

adjustment factors can be quite large. The optimal allocations are designed to equalize response rates across

domains, which can lead to occasionally “oversampling” in low-responding domains. Table 3.5 presents the

RBE and the MSE for the Min URR optimal allocations, using three different constant values of



0.30, 0.50, 0.70q q and the domain specific rates estimated from historical data ( hq Estimated). In

all scenarios, the DE estimates are unbiased, the CR estimates are slightly biased, and the SR estimates are

the most biased. This repeats the RBE pattern shown in the Constant K allocation results. Moreover, the

RBE estimates do not appear to be overly sensitive to values of hq used in allocation. Again, even with the

additional rounds of NRFU, the bias of the subsamples’ estimates is larger than that obtained with full

follow-up of nonrespondents. In all cases, the MSE of the estimates obtained from the optimal allocations

are smaller than those obtained with the Constant K allocations.

Regardless of estimator, the bias decreases when eventually probability of responding is low. This seems

a bit counterintuitive but is in fact a direct consequence of the subsampling allocation procedure. When the

probability of obtaining an eventual response is low, the Min URR allocation tends to subsample all or

no units in a domain. With full follow-up, all responding units within the same domain have the same

nonresponse adjustment. With a subsample, only the responding subsampled units’ weights are adjusted for

nonresponse and subsampling, in turn occasionally creating extremely variable weights within domain. As

the probability of an eventual response increases, then the optimal allocation has sample in more domains,

and finer adjustments are possible. With that said, the CR estimators tend to produce the lowest MSEs,

regardless of allocation.

Table 3.4 Summary of relative bias in percent of the estimate and MSE for Constant K allocations in x1012

Constant K Relative Bias of the Estimate

Percent K = 1 (Full) K = 2

Contact DE CR SR DE CR SR 2 0.01% 0.03% 0.10% 0.00% 0.51% 1.43% 3 0.00% 0.03% 0.08% -0.01% 0.29% 0.77% 4 0.00% 0.01% 0.06% -0.02% 0.14% 0.40% 5 0.01% 0.02% 0.04% -0.01% 0.12% 0.32% 6 -0.01% 0.11% 0.29% 7 0.00% 0.11% 0.28% 8 0.00% 0.11% 0.27% 9 0.00% 0.10% 0.25% 10 0.00% 0.10% 0.25%

Constant K Mean Squared Error

x10^12 K = 1 (Full) K = 2 Contact DE CR SR DE CR SR

2 4.96 2.60 5.56 37.53 26.34 70.49 3 3.67 1.96 4.17 19.82 13.80 28.88 4 2.55 1.39 3.03 11.75 8.30 14.87 5 2.48 1.39 2.87 9.94 7.10 12.12 6 9.36 6.75 11.16 7 9.09 6.63 10.63 8 8.80 6.48 10.23 9 8.51 6.32 9.95 10 8.27 6.19 9.74



Table 3.5 Summary of relative bias of the estimate and MSE for Min URR optimal allocations

Min URR RBE (Target 2K )

Percent q = 0.30 q = 0.50 q = 0.70 q = Estimated

Contact DE CR SR DE CR SR DE CR SR DE CR SR 2 -0.01% 0.06% 0.20% 0.01% 0.07% 0.36% 0.01% 0.08% 0.31% -0.01% 0.08% 0.32% 3 0.00% 0.05% 0.16% 0.01% 0.05% 0.26% 0.01% 0.07% 0.23% 0.01% 0.07% 0.23% 4 0.00% 0.05% 0.14% 0.01% 0.05% 0.18% 0.01% 0.06% 0.19% 0.01% 0.06% 0.17% 5 0.00% 0.05% 0.14% 0.01% 0.05% 0.18% 0.01% 0.05% 0.17% 0.00% 0.06% 0.15% 6 0.01% 0.05% 0.13% 0.02% 0.05% 0.17% 0.01% 0.05% 0.17% 0.00% 0.06% 0.15% 7 0.01% 0.05% 0.13% 0.01% 0.05% 0.17% 0.01% 0.05% 0.16% 0.00% 0.06% 0.15% 8 0.01% 0.05% 0.13% 0.01% 0.05% 0.16% 0.01% 0.05% 0.16% 0.00% 0.05% 0.14% 9 0.01% 0.05% 0.13% 0.01% 0.05% 0.16% 0.01% 0.05% 0.16% 0.00% 0.05% 0.14% 10 0.00% 0.05% 0.13% 0.01% 0.05% 0.15% 0.01% 0.05% 0.15% 0.00% 0.05% 0.14%

Min URR MSE (Target 2K )

x10^12 q = 0.30 q = 0.50 q = 0.70 q = Estimated Contact DE CR SR DE CR SR DE CR SR DE CR SR

2 12.55 6.77 16.03 14.35 7.48 17.60 14.31 7.44 17.15 14.39 7.79 17.05 3 8.88 5.13 10.87 9.80 5.43 11.32 9.57 5.42 11.36 9.75 5.47 10.98 4 7.00 4.28 8.15 7.61 4.45 8.28 7.31 4.43 8.44 7.53 4.44 8.05 5 6.61 4.07 7.41 7.02 4.17 7.44 6.80 4.13 7.60 6.94 4.14 7.42 6 6.45 3.97 7.16 6.78 4.09 7.18 6.62 4.03 7.25 6.75 4.05 7.15 7 6.37 3.92 7.05 6.68 4.06 7.08 6.55 3.97 7.07 6.67 4.02 7.03 8 6.34 3.90 6.94 6.57 4.01 6.97 6.45 3.93 6.95 6.57 3.98 6.93 9 6.28 3.87 6.86 6.50 3.98 6.89 6.39 3.90 6.86 6.47 3.94 6.84 10 6.23 3.85 6.78 6.40 3.91 6.76 6.35 3.87 6.75 6.42 3.89 6.73

The Min K allocation procedure is designed to reduce the variability in the subsampled units’

adjustment weights. Table 3.6 presents the relative bias of the estimate and MSE for the Min K optimal

allocation method. The Min K estimators display the same pattern as before. The DE estimates are

unbiased, the CR estimates are nearly unbiased and the SR estimates are slightly biased.

The MSE estimates for the Min K method follow a similar pattern as the Min URR method, as

expected due to the similarities between corresponding Min URR and Min K allocations. These results

appear to be relatively insensitive to assumed eventual probability of response .q The historical-data

estimated conversion rates produce similar results to an assumed q 0.50. In many cases, the Min URR

method produces the least biased estimates. However, bias is only a single component of the MSE, and the

Min URR allocations tend to have smaller expected number of respondents in several strata than their

Min K counterparts. Moreover, the Min K allocations have smaller sampling variances by design,

ultimately yielding estimates with lower MSEs than their Min URR counterparts.

Figures 3.1 and 3.2 plot the RBEs and MSEs obtained at each round of NRFU for the CR estimator (our

“best” estimator) using the hq obtained from historical data for each of the considered optimal allocation

methods along with the benchmark values (labeled as “Full Follow-up”). In Figure 3.1, the benchmark

estimates are the least biased. However, this extremely low bias is in part a consequence of our nonresponse

model, which is uniform within domain and NRFU phase. Neither of the optimal allocation estimates

attained the benchmark estimate levels, but they become very close after seven rounds of NRFU and the



RBEs of the Min URR and Min K CR estimates are less than one tenth of one percent (0.06% and

0.05% respectively). In summary, subsampling with either optimal allocation strategy yielded trivial biases

increases over full follow-up.

Table 3.6 Summary of relative bias of the estimate and MSE for Min K optimal allocations

Min RBEK (Target 2)K

Percent q = 0.30 q = 0.50 q = 0.70 q = Estimated Contact DE CR SR DE CR SR DE CR SR DE CR SR

2 0.03% 0.08% 0.24% 0.03% 0.09% 0.31% 0.00% 0.08% 0.33% 0.01% 0.07% 0.30% 3 0.03% 0.05% 0.20% 0.03% 0.08% 0.22% 0.00% 0.05% 0.22% 0.01% 0.06% 0.21% 4 0.02% 0.04% 0.16% 0.03% 0.07% 0.18% 0.00% 0.05% 0.17% 0.01% 0.05% 0.17% 5 0.02% 0.04% 0.15% 0.03% 0.06% 0.17% 0.01% 0.05% 0.16% 0.01% 0.05% 0.16% 6 0.02% 0.05% 0.14% 0.02% 0.06% 0.16% 0.01% 0.05% 0.15% 0.00% 0.05% 0.15% 7 0.02% 0.05% 0.14% 0.02% 0.05% 0.16% 0.01% 0.05% 0.15% 0.01% 0.05% 0.15% 8 0.02% 0.05% 0.14% 0.02% 0.05% 0.16% 0.01% 0.05% 0.15% 0.01% 0.04% 0.14% 9 0.02% 0.05% 0.14% 0.02% 0.05% 0.15% 0.01% 0.05% 0.14% 0.01% 0.04% 0.14% 10 0.02% 0.05% 0.14% 0.02% 0.05% 0.16% 0.01% 0.05% 0.15% 0.01% 0.04% 0.14%

Min MSEK (Target 2)K

x10^12 q = 0.30 q = 0.50 q = 0.70 q = Estimated Contact DE CR SR DE CR SR DE CR SR DE CR SR

2 12.86 7.19 15.85 13.81 7.42 16.80 15.09 8.34 18.00 13.43 7.19 16.07 3 8.74 5.04 10.26 9.32 5.38 10.82 10.45 5.89 11.38 9.25 5.30 10.69 4 6.92 4.07 7.65 7.26 4.26 7.92 7.84 4.60 8.19 7.22 4.33 7.93 5 6.50 3.85 7.07 6.77 4.05 7.33 7.23 4.28 7.47 6.65 4.06 7.21 6 6.32 3.80 6.80 6.57 3.94 7.02 7.02 4.19 7.28 6.45 3.95 6.91 7 6.23 3.76 6.69 6.49 3.88 6.91 6.90 4.15 7.16 6.31 3.91 6.78 8 6.21 3.73 6.61 6.39 3.84 6.82 6.78 4.10 7.06 6.23 3.87 6.68 9 6.16 3.70 6.54 6.35 3.79 6.71 6.68 4.05 6.93 6.15 3.83 6.57 10 6.10 3.66 6.43 6.24 3.74 6.62 6.60 3.98 6.87 6.11 3.80 6.48

Figure 3.1 Relative bias of the estimates (Historic )hq for the CR estimator.

R

elat

ive

Bia

s (%

)

0.0

2

0

.04

0.0

6

0

.08

Nonresponse Follow-up Round

Min-URR

Min-K

Full Follow-Up



Figure 3.2 plots MSE values by NRFU round using the CR estimator. The targeted nonresponse

sampling strategy used for the Min K allocation appears to reduce the overall error. We believe that this

is due to two factors. First, the Min K allocation procedure samples larger proportions of nonrespondents

in low responding areas than obtained with the Min URR allocations. Second, the quadratic formula for

the Min K allocation includes a constraint on the domain response rates, lowering the overall target

response but reducing the variability in the proportion of respondents by domain. Ultimately, this approach

yields similar response rates across sampling domains, indicative of a representative sample (Wagner, 2012;

Schouten, Cobben and Bethlehem, 2009). Note that the increased MSE is not trivial with nonrespondent

subsampling, even when using an adjustment procedure that benefits from a strong covariate in the ratio

adjustment procedure. This is an acknowledged price paid for nonrespondent subsampling (Biemer, 2010).

However, this additional variance component is measurable. If the measured component is too large, the

program managers can subsample less (use a larger ).K

Figure 3.2 Mean squared error (Historic )hq for the CR estimator.

3.4 Discussion

Given a sophisticated allocation method, a ratio estimator employing a highly correlated auxiliary

variable, and a fairly large subsample, this case study shows that nonrespondent subsampling does not

overly penalize quality to save cost. The additional stage of sampling increased the MSE for the studied

variable, but the level was reduced by the judicious choice of estimator. Of course, we consider only one

variable in our simulation, and this variable may or may not “behave” similarly to other survey items. One

referee suggested the usage of an R-indicator (Schouten et al., 2009) or balance indicator (Särndal and

Lundquist, 2014) to assess the overall representativeness of the respondent sets in a field survey setting.

This might be useful at later stages of data collection (after nonrespondent subsampling and during NRFU),

Mea

n S

quar

ed E

rror

x10

12

Nonresponse Follow-up Round

Min-URR

Min-K

Full Follow-Up



but would not provide any further insight into the degree of bias reduction on any collected item, as we can

do in this simulation setting.

Of the three considered allocation methods, the Constant K method had the worst performance, often

selecting a very small probability subsample when not needed and consequently increasing the sampling

variance without reducing the bias. Of the three considered allocation methods, the Min K allocation was

the most effective in realizing acceptable response rates and achieving reliable estimates; the larger bias

caused by the varying domain sampling intervals is generally offset by the reduced sampling variance.

However, implementation of the Min K allocation can be more challenging than the Min URR.

For both optimal allocation procedures, we tested four different eventual probabilities of response to

assess the sensitivity of the allocation procedures to these inputs. By comparing allocations obtained with a

constant assumed input value to those obtained using the empirical estimates, we found that the realized

allocations could over- or under- sample in selected domain, and the domain response rates could vary more

than expected when the actual (survey) values are quite different from the input values. Consequently, we

recommend using values estimated from historic paradata whenever possible.

If reducing cost is the overall goal, then we note that additional NRFU contact attempts beyond the fifth

contact did not improve the bias or MSE of the subsampled estimates in our case study. Of course, if the

achieved cost reduction for a 1 in 2 subsample with up to ten NRFU contact attempts is acceptable, the

funds allocated to these final contact attempts might be better expended earlier in the collection cycles using

other contact strategies.

4 Conclusion

In general, the NRFU procedures for economic programs conducted by the U.S. Census Bureau follow

a calendar schedule. Budget is tied to the fiscal year, and contact strategies are budgeted accordingly. Since

economic populations are highly skewed and the statistics of interest are totals, a large fraction of the NRFU

budget is allocated to the larger units. The smaller units are believed to be homogeneous – at least in size.

However, it is difficult to validate that belief in the absence of collected respondent data. Given that the

NRFU procedures rely on obtaining response data from the larger units, the response rates from smaller

units tend to be much lower. It is quite likely that the realized respondent set is neither “balanced…which

means (the selected sample has) the same or almost the same characteristics as the whole population” for

selected items (Särndal, 2011) nor “representative… with respect to the sample if the response propensities

i are the same for all units in the population” (Schouten et al., 2009). The emphasis on obtaining responses

from the larger units at the cost of the lower unit response in turn creates a bias in the estimates, as imputed

or adjusted values for smaller units resemble the large unit values (Thompson and Washington, 2013).

By limiting the target domain for nonrespondent subsampling to the smaller units, we can reduce this

unmeasurable bias. Our allocation method increases the potential of obtaining a balanced and representative

sample by targeting the low responding areas that usually would not receive any special treatment. It can be

implemented at any stage of the data collection process and with any sample design, making it quite flexible

although not necessarily optimal for specific sample designs and estimators. It is a “safe” approach for a



multi-purpose survey, presumably designed to obtain reliable estimates for a variety of items. Moreover,

selecting a systematic subsample from a list sorted by a unit measure of size avoids incidence of additional

nonresponse bias incurred by focusing NRFU efforts on high response propensity cases (Tourangeau et al.,

2016; Beaumont et al., 2014). We acknowledge that the increased variability in design weights and

reduction in response rates are less than desirable effects caused by subsampling. However, these effects

can be lessened via the choice of estimator, as demonstrated by our improved results with a ratio estimator.

More sophisticated calibration estimators or other collapsed estimators could likewise be considered at the

estimation stage.

Without probability subsampling, the contention that the realized respondent set of small businesses

remains a probability sample is debatable. Several discussions of the summary report of the AAPOR Task

Force on non-probability sampling (Baker, Brick, Bates, Battaglia, Couper, Dever, Gile and Tourangeau,

2013) specifically question whether “a probability sample with less than full coverage and high nonresponse

should still be considered a probability sample”. That question is certainly relevant in our studied context,

where sampled smaller units truly “opt in” to respond. Selecting a probability subsample of nonrespondents

and instructing survey analysts to limit NRFU contact to these cases may limit this phenomenon. In addition,

with a probability subsample, one can use accepted quality measures such as sampling error or response

rates for evaluation.

All of the results presented for our case study assume that the existing NRFU contact strategies are used

with the subsampled designs. However, subsampling nonrespondents without changing the data collection

procedure may have minimal tangible benefits besides cost reduction. The reverse is also true: for example,

Kirgis and Lepkowski (2013) present improved response data results for targeted small domains obtained

with probability samples and revised contact strategies.

Tourangeau et al. (2016) note that “it is not always clear how to intervene to obtain cases, particularly

cases with low underlying propensities, to respond”. This is especially relevant in the business survey

context. Business surveys can draw on a wealth of cognitive research on data collection strategies for large

companies: see Paxson, Dillman and Tarnai, 1995; Tuttle, Morrison and Willimack, 2010; Willimack and

Nichols, 2010; Snijkers, Haraldsen, Jones and Willimack, 2013. In contrast, the smaller businesses receive

very little personal contact (if any) and there is limited cognitive research on preferable contact strategies to

draw upon. That said, the literature suggests that there are differences in collected data quality between large

and small businesses: see Thompson and Washington (2013), Willimack and Nichols (2010), Bavdaž

(2010), Torres van Grinsven, Bolko and Bavdaž (2014), and Thompson, Oliver and Beck (2015). Additional

cognitive research for small establishments combined with field tests could yield better contact strategies.

Subsampling nonrespondents paired with a new contact strategy for these “hard to reach” establishments

would create a truly adaptive approach for all units, not just the larger ones. To this point, in response to

these presented analyses, the Census Bureau conducted an embedded field experiment to test alternative

NRFU strategies for selected small units in the 2014 ASM (Thompson and Kaputa, 2017). The outcome of

that study was a new NRFU protocol implemented in the 2015 ASM and a second embedded field

experiment that paired our proposed nonrespondent subsampling design with the most effective follow-up

procedures determined from the 2014 test (Kaputa, Thompson and Beck, 2017).



Acknowledgements

Any views expressed are those of the author(s) and not necessarily those of the U.S. Census Bureau. The

authors thank Eric Fink, Xijian Liu, Jared Martin, Edward Watkins III, Hannah Thaw, the Associate Editor,

and two referees for their review of an earlier version of the manuscript, David Haziza for his thoughtful

discussion of the paper, and Barry Schouten for his useful suggestions on the optimization problems.

Appendix

Our objective is to estimate ,Y population total of characteristic ,y from the realized sample of

respondents. Let

hiS 1 if unit i in domain h was in original sample; 0 otherwise.

hi the probability of sampling unit i in domain h into the original sample 1 .hi hiw

hiR 1 if unit i in domain h provided a response before subsampling time t (value for );y 0

otherwise.

hiI 1 if unit i in domain h was selected for NRFU (i.e., was a subsampled nonrespondent); 0

otherwise.

hiJ 1 if unit i in domain h responds, given selected into nonrespondent subsample; 0 otherwise.

hif adjustment factor for nonrespondent subsampling and unit nonresponse after NRFU.

hiy value of characteristic y for unit i in domain ,h available only for respondents.

hix value of characteristic x for unit i in domain ,h available for all sampled units considered for

nonrespondent subsampling (i.e., the nonrespondent subsampling frame). Then Y

1 2 ˆ ˆ1 .hi hi hi hi hi hi hi hi hi hi hi R Rh i h i

w y S R w f y S R I J Y Y

We consider three different adjustment-to-sample reweighting estimators of 2ˆ :RY

Double Expansion (DE): 1DE2

2

ˆ 1h

hi h hi hi hi hi hiRh i h h

mY w K y S R I J

r

Separate Ratio (SR): 1

2

SR2

ˆ 1h

h

hii m

hi h hi hi hi hi hiRh i h hi

i r

xY w K y S R I J

x

Combined Ratio (CR): 1

2

1CR2

12

2

ˆ 1 .h

h

hi h hih i mhi h hi hi hi hi hiR

hh i h hhi h hi

i rh

w K xmY w K y S R I J

mrw K x

r

Note that the DE and CR estimators are variations of the recommended reweighting procedure described

in Brick (2013) and are discussed in Binder et al. (2000) among others. The DE estimator is the InfoS

estimator presented in Särndal and Lundström (2005), studied in Shao and Thompson (2009), among others;



the SR estimator is a variation of the InfoP estimator presented in Särndal and Lundström (2005), treating

the realized sample as the “population”. Sampling weights were not included in the SR so that the adjustment

reduces to the DE adjustment when 1 ;hix i h note that this unweighted response rate adjustment is

recommended in Little and Vartivarian (2005). The CR estimator is presented in Binder et al. (2000), and

is also studied in Shao and Thompson (2009). In our case study, a better choice might have been the quasi-

randomization estimator from Oh and Scheuren (1983), which incorporates sampling weights in the

adjustment factor, thus reducing their variability.

Collapsed estimators are used in three scenarios: (1) All units in the domain receive NRFU (no

subsampling); (2) No units in the domain receive NRFU because response rate targets have been achieved

(no subsampling); and (3) A single subsampled unit responded to NRFU (subsampling). The collapsed

estimators analogues are given as follows:

Collapsed DE: DE,C

1 2

ˆ hhi hi hi hih

i h h h

nY w y S R

r r

Collapsed SR: 1 2

SR ,Cˆ 1h

h h

hii n

hi hi hi hi hi hihi h hi

i r r

xY w y S R I J

x

Collapsed CR:

1 2

CR ,C

1 2

1 2

ˆ .h

h h

hi hih i nhi hi hi hih

hi h h hhi hi

i r rh h

w xnY w y S R

nr rw x

r r

References

Baker, R., Brick, J.M., Bates, N., Battaglia, M., Couper, M., Dever, J., Gile, K. and Tourangeau, R. (2013). Summary report of the AAPOR task force on non-probability sampling – Report and rejoinder. Journal of Survey Statistics and Methodology, 1, 90-137.

Bavdaž, M. (2010). The multidimensional integral business survey response model. Survey Methodology,

36, 1, 81- 93. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2010001/article/11245-eng.pdf.

Beaumont, J.-F., Bocci, C. and Haziza, D. (2014). An adaptive data collection procedure for call

prioritization. Journal of Official Statistics, 30(4), 607-621. Bechtel, L., and Thompson, K.J. (2013). Optimizing unit nonresponse adjustment procedures after

subsampling nonrespondents in the Economic Census. Proceedings of the Federal Committee on Statistical Methods Research Conference, https://nces.ed.gov/FCSM/index.asp.

Biemer, P. (2010). Total survey error: Design, implementation, and evaluation. The Public Opinion

Quarterly, 74(5), 817-848. Binder, D., Babyak, C., Brodeur, M., Hidiroglou, M. and Wisner, J. (2000). Variance estimation for two-

phase stratified sampling. The Canadian Journal of Statistics, 28, 751-764.



Brick, J.M. (2013). Unit nonresponse and weighting adjustments: A critical review. Journal of Official Statistics, 29, 329-353.

Federal Register Notice (2006). OMB Standards and Guidelines for Statistical Surveys, Washington, DC.

Fink, E., and Lineback, J.F. (2013). Using paradata to understand business survey reporting patterns. Proceedings of the Federal Committee on Statistical Methods Research Conference, https://nces.ed.gov/FCSM/index.asp.

Groves, R., and Herringa, S. (2006). Responsive design for household surveys: Tools for actively controlling survey errors and costs. Journal of the Royal Statistical Society Series A, 169(3), 439-57.

Hansen, M.H., and Hurwitz, W.N. (1946). The problem of non-response in sample surveys. Journal of the American Statistical Association, 41, 517-529.

Harter, R.M., Mach, T.L., Chaplin, J.F. and Wolken, J.D. (2007). Determining subsampling rates for nonrespondents. Proceedings of the Third International Conference on Establishment Surveys, American Statistical Association.

Haziza, D., Thompson, K.J. and Yung, W. (2010). The effect of nonresponse adjustments on variance estimation. Survey Methodology, 36, 1, 35-43. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2010001/article/11246-eng.pdf.

Kalton, G., and Flores-Cervantes, I. (2003). Weighting methods. Journal of Official Statistics, 19, 2, 81-97.

Kaputa, S., Thompson, K.J. and Beck, J. (2017). An embedded experiment for targeted nonresponse follow-up in establishment surveys. Proceedings of the Section on Survey Research Methods, American Statistical Association.

Kirgis, N., and Lepkowski, J. (2013). Design and management strategies for paradata-driven responsive design: Illustrations for the 2006-2010 National Survey of Family Growth. Improving Surveys with Paradata, (Ed., Frauke Kreuter). Hoboken, NJ: John Wiley & Sons, Inc.

Kish, L. (1992). Weighting for unequal Pi. Journal of Official Statistics, 8(2), 183-200.

Kott, P. (1994). A note on handling nonresponse in sample surveys. Journal of the American Statistical Association, 89, 693-696.

Little, R.J., and Vartivarian, S. (2005). Does weighting for nonresponse increase the variance of survey means? Survey Methodology, 31, 2, 161-168. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9046-eng.pdf.

Lohr, S.L. (2010). Sampling: Design and Analysis. Boston: Brooks/Cole.

Oh, H.L., and Scheuren, F.J. (1983). Weighting adjustment of unit nonresponse. Incomplete Data in Sample Surveys. New York: Academic Press, 20, 143-184.

Olson, K., and Groves, R.M. (2012). An examination of within-person variation in response propensity over the data collection field period. Journal of Official Statistics, 28, 29-51.

Paxson, M.C., Dillman, D.A. and Tarnai, J. (1995). Improving response to business mail surveys. In Business Survey Methods, (Eds., B.G. Cox, D. Binder, B. Nanajamma Chinnappa, M. Colledge and P. Kott). New York: John Wiley & Sons, Inc.



Särndal, C.-E. (2011). The 2010 Morris Hansen lecture: Dealing with survey nonresponse in data collection, in estimation. Journal of Official Statistics, 27, 1-21.

Särndal, C., and Lundquist, P. (2014). Accuracy in estimation with nonresponse: A function of degree of imbalance and degree of explanation. Journal of Survey Statistics and Methodology, 2(4), 361-387.

Särndal, C.-E., and Lundström, S. (2005). Estimation in Surveys with Nonresponse. Hoboken, NJ: John Wiley & Sons, Inc.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer Verlag.

Schouten, B., Calinescu, M. and Luiten, A. (2013). Optimizing quality of response through adaptive survey designs. Survey Methodology, 39, 1, 29-58. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2013001/article/11824-eng.pdf.

Schouten, B., Cobben, F. and Bethlehem, J. (2009). Indicators for the representativeness of survey response. Survey Methodology, 35, 1, 101-113. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2009001/article/10887-eng.pdf.

Shao, J., and Thompson, K.J. (2009). Variance estimation in the presence of nonrespondents and certainty

strata. Survey Methodology, 35, 2, 215-225. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2009002/article/11043-eng.pdf.

Snijkers, G., Haraldsen, G., Jones, J. and Willimack, D.K. (2013). Designing and Conducting Business

Surveys. Hoboken, NJ: John Wiley & Sons, Inc. Thompson, K.J., and Kaputa, S. (2017). Investigating adaptive nonresponse follow-up strategies for small

businesses through embedded experiments. Journal of Official Statistics, 33(3), 1-23. Thompson, K.J., and Oliver, B. (2012). Response rates in business surveys: Going beyond the usual

performance measure. Journal of Official Statistics, 27, 221-237. Thompson, K.J., Oliver, B. and Beck, J. (2015). An analysis of the mixed collection modes for two business

surveys conducted by the US Census Bureau. Public Opinion Quarterly, 79(3), 769-789. Thompson, K.J., and Washington, K.T. (2013). Challenges in the treatment of unit nonresponse for selected

business surveys: A case study. Survey Methods: Insights from the Field. Retrieved from http://surveyinsights.org/?p=2991.

Torres van Grinsven, V., Bolko, I. and Bavdaž, M. (2014). In search of motivation for the business survey

response task. Journal of Official Statistics, 30(4), 579-606. Tourangeau, R., Brick, J.M., Lohr, S. and Li, J. (2016). Adaptive and responsive survey designs: A review

and assessment. Journal of the Royal Statistical Society A, 180, 203-223. Tuttle, A., Morrison, R. and Willimack, D. (2010). From start to pilot: A multi-method approach to the

comprehensive redesign of an economic survey questionnaire. Journal of Official Statistics, 26, 87-103. Wagner, J. (2012). Research synthesis: A comparison of alternative indicators for the risk of nonresponse

bias. Public Opinion Quarterly, 76(3), 555-575.



Willimack, D., and Nichols, E. (2010). A hybrid response process model for business surveys. Journal of Official Statistics, 26, 3-24.

Zhang, L.C. (2008). On some common practices of systematic sampling. Journal of Official Statistics, 24,

557-569.

Strategies for subsampling nonrespondents for economic ...

Documents

Transcript of Strategies for subsampling nonrespondents for economic ...