Nonresponse bias in studies of residential mobility
description
Transcript of Nonresponse bias in studies of residential mobility
![Page 1: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/1.jpg)
Nonresponse bias in studies of residential mobility
Elizabeth Washbrook, Paul Clarke and Fiona Steele
University of Bristol
Research Methods Festival, 3 July 2012
![Page 2: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/2.jpg)
The problem of panel nonresponse
β’ Household survey panel data permits social scientists to analyse a wide range of issues that cannot be addressed with cross-sectional data
β’ But the value of panel data is potentially undermined by nonresponse (dropout or intermittent missingness)β Smaller sample sizes reduce the efficiency of estimatesβ More seriously, selective nonresponse can lead to biased
estimates β those who remain in the sample become untypical of the population as a whole
![Page 3: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/3.jpg)
Residential mobility applicationβ’ The study of residential mobility/migration is at the core of
studies of demography and the life course β how do different groups change their housing or location in response to changing circumstances?
β’ Nonresponse issues are rarely considered in the substantive literature on mobility, yet there are reasons to think it might be even more of a problem here than in other applications.
β’ Moving house (the outcome of interest) is often cited as a key reason why people drop out of panel surveys β movers who remain are not typical of movers as a wholeβ PSID 1968-1989 had a 51% attrition rate. Fitzgerald et al. (JHR 1998) provide data
showing at least 20% of attritors were lost following a move
![Page 4: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/4.jpg)
A standard model for mobilityπππ‘β = πβ²πππ‘β1 + ππ + πππ‘
ππ~π(0,π2) πππ‘β is the unobserved latent propensity of individual i to move in the interval [t-1, t). Observed mobility status, πππ‘, depends on whether this propensity is greater or less than zero. πππ‘β1 is a vector of fully observed covariates measured at t-1 (prior to any potential move), including an intercept. π is the coefficient vector of interest. ππ is an individual random effect. Similar models have been used in numerous studies of residential mobility and migration (e.g. Boheim & Taylor, 2002; Ioannides & Kan, 1996; Ermisch, 1999; Clark & Huang, 2003, Rabe & Taylor, 2010)
πππ‘ = πΌ(πππ‘β > 0) πππ‘~π(0,1)
![Page 5: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/5.jpg)
Modelling responseDefine π ππ‘ = 1 if πππ‘ is observed and π ππ‘ = 0 otherwise.
Estimates of π based on the sample where π ππ‘ = 1 will be biased unless the data are βmissing at randomβ (MAR), that is unless Pr(π ππ‘ = 1ΘοΏ½πππ‘,πππ‘β1α»= Pr(π ππ‘ = 1|πππ‘β1).
A vast literature has explored what happens when MAR doesnβt hold, i.e. when response is nonignorable. Broadly there are two approaches.
Specify and estimate and model for the missing data mechanism simultaneously with the outcome of interest (e.g. Hausman and Wise, 1979; Diggle and Kenward, 1994) o Has to rely on untestable assumptions about functional form and/or the
validity of exclusion restrictions on the DGP (instrumental variables) Assess the sensitivity of estimates to small departures from the MAR assumption
(e.g. Copas and Li, 1997) o Avoids modelling the unknown missing data mechanism but will not be valid if
the nonignorability of response is extreme
![Page 6: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/6.jpg)
The direct dependence (DD) modelIf moving directly affects whether an individual continues to participate in the panel then MAR is automatically violated. We can express this via a βdirect dependenceβ (DD) model for the response propensity:
π ππ‘β = πΏβ²πππ‘β1 + πΎπππ‘ + πππ‘, π ππ‘ = πΌαΊπ ππ‘β > 0α», πππ‘~π(0,1)
The term πΎπππ‘ captures the idea that the response propensity of an individual with given πππ‘β1, and πππ‘ will differ β by the amount πΎ - if they move relative to the case in which they do not move. In the mobility example we expect that πΎ < 0. Previous studies implicitly assume that πΎ = 0 so that MAR is satisfied.
![Page 7: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/7.jpg)
An alternative response modelA common alternative model of response is the bivariate probit (BP) model. This sets πΎ equal to zero by definition, but allows for correlated errors in the mobility and response equations.
A simple cross-sectional version of this model for a continuous outcome variable is the basis for the well-known two-step Heckman selection estimator. π ππ‘β = πΏβ²πππ‘β1 + πππ‘, π ππ‘ = πΌαΊπ ππ‘β > 0α», (πππ‘,πππ‘)~bivnorm(0,0,1,1,π)
In the BP model non-ignorability is essentially an omitted variables problem. If all relevant factors can be observed and controlled π could be driven to zero and MAR would be satisfied. This is not so in the DD model. The causal effect of a move on response will lead to biases even with no omitted variables.
![Page 8: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/8.jpg)
Maximum likelihood estimationThe sample likelihood contribution for a given individual in a given year is βππ‘. It takes one of three possible combinations depending on the observed πππ‘ and π ππ‘:
Likelihood contribution
Response Mobility Interpretation [t-1,t)
βπ΄ππ‘ π ππ‘ = 0 Unobserved Dropout βπ΅ππ‘ π ππ‘ = 1 πππ‘ = 0 Remained in panel; no move βπΆππ‘ π ππ‘ = 1 πππ‘ = 1 Remained in panel; moved
βππ‘ = (1β π ππ‘)βπ΄ππ‘ + π ππ‘αΊ1β πππ‘α»βπ΅ππ‘ + π ππ‘πππ‘βπΆππ‘
The log likelihood for individual i is then
logβπ = logΰΆ± ππαΊππα»βββ αΰ·οΏ½ βππ‘
π‘=πππ‘=2 απππ
![Page 9: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/9.jpg)
Maximum likelihood estimationβππ‘ = (1β π ππ‘)βπ΄ππ‘ + π ππ‘αΊ1β πππ‘α»βπ΅ππ‘ + π ππ‘πππ‘βπΆππ‘
DD likelihood BP likelihood βπ΄ππ‘ αΌΞ¦αΊπβ²πππ‘β1 + ππα»ΓΞ¦αΊβπΏβ²πππ‘β1 + πΎα»α½+αΌΞ¦αΊβαΎπβ²Xππ‘β1 + ππαΏα»ΓΞ¦αΊβπΏβ²πππ‘β1α»α½
Ξ¦(βπΏβ²πππ‘β1)
βπ΅ππ‘ Ξ¦(βαΎπβ²Xππ‘β1 + ππαΏ) ΓΞ¦(πΏβ²πππ‘β1)
Ξ¦2(βαΎπβ²Xππ‘β1 + ππαΏ,πΏβ²πππ‘β1,π)
βπΆππ‘ Ξ¦(πβ²πππ‘β1 + ππ) ΓΞ¦(πΏβ²πππ‘β1 + πΎ)
Ξ¦2(πβ²πππ‘β1 + ππ,πΏβ²πππ‘β1,π)
![Page 10: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/10.jpg)
Exclusion restrictionsThe models set out rely heavily on the untestable assumption that the error terms are normally distributed. Likelihood-based estimates that rely solely on functional form for identification are well known to be sensitive to failures of the distributional assumptions.
However, the imposition of exclusion restrictions (the inclusion of instrumental variables) can dramatically improve the stability and robustness of the model.
The inclusion of a response instrument β something that predicts response but not the outcome of interest β is common practice in the standard continuous-variable two-equation Heckman selection model.
The statistics literature has also explored the role of an outcome instrument β something that predicts the outcome but not response β and has shown that this weakens the modelling assumptions necessary for identification.
![Page 11: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/11.jpg)
Residential mobility in the BHPS
β’ BHPS is representative sample of 5500 households in 1991, interviewed annually (18 waves of data on over 10,000 individuals).
β’ Sample of men 20-59, living in England or Wales in year t-1, from Waves 6-18 (1996-2008)β Full-time students and retirees excludedβ Focus on men avoids the βdouble-countingβ problem in
which sample individuals move together as a coupleβ’ 4,724 individuals contributing 33,347 person-year
observations (mean 7.1)
![Page 12: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/12.jpg)
Residential mobility in the BHPSβ’ Outcome =1 if individual moved to a different residence within the
same region between t-1 and t (longer distance moves coded 0)β The majority of moves are local (85% in this sample)β Motivations for short- and long-distance moves tend to the quite different:
long-distance moves are more job-related while short-distance moves are more housing-related
β’ Outcome observed for 94.5% of observations, among which mobility rate is 9.6%.
β’ 38% of sample individuals are known to have moved at least once, 16% more than once.
β’ 36% drop out of the panel at least once, 6% re-enter at a later wave
![Page 13: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/13.jpg)
Exclusion restrictionsOutcome instrument
β Log average sale price of properties in region of residence over 12 months prior to t-1, deflated by RPI. From Land Registry data (only available for England and Wales from 1995 onwards).
β Expect that high house prices will deter mobility, but will have no independent effect on response, conditional on year and region fixed effects.
Response instrumentβ Sample membership status. Original 1991 sample adult (OSM; omitted),
65%; ECHP joiner in 1997, 4%; Celtic booster sample joiner in 1999, 14%; parent of OSMs child, 9%; original 1991 sample child, 8%. TSMs dropped.
β Survey-related variables are often used as instruments in this context (e.g. Cappellari and Jenkins 2008). The rationale is that stronger survey attachment will have been fostered among OSMs than among later joiners or those involved only because of family ties.
![Page 14: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/14.jpg)
Results I. Nonignorability and IV parameters
DD model BP model
Coef SE Coef SE
Mobility instrument
Log region house price -0.257 0.144 -0.285 0.146
p<0.10
p<0.10 Response instrument
OSM adult at Wave 1 0 [ref]
0 [ref] ECHP joiner -1.159 0.079 -0.984 0.047 Celtic joiner -0.194 0.064 -0.163 0.054 PSM joiner -0.069 0.055 -0.075 0.046 Child joiner -0.163 0.059 -0.143 0.049 Joint p-value p<0.01
p<0.01
Random effect SE (ΟΞΌ) 0.116 0.105 Nonignorability parameter (Ξ³) -1.465 0.198 - Error correlation (Ο) - -0.419 0.129 Log likelihood -15138.5 -15148.4 Total obs 33347 33347 Uncensored obs 31511 31511
Value of Ξ³ implies moving reduces the expected response probability from 0.95 to 0.55.
![Page 15: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/15.jpg)
Results II. Covariates of interest
MAR (RE probit) DD model BP model
Coef SE Coef SE
Coef SE
Unemployed (ref = employed) 0.105 0.049 * 0.160 0.048 ** 0.138 0.050 ** Inactive (ref = employed) 0.066 0.053 0.100 0.052 + 0.084 0.053 Employed partner 0.020 0.034 0.022 0.033
0.022 0.033
Educ: O-level (ref = none) 0.031 0.036 0.021 0.035
0.028 0.035 Educ: A-level (ref = none) 0.057 0.035 + 0.022 0.035 0.042 0.035 Educ: degree (ref = none) 0.075 0.043 + 0.021 0.043 0.055 0.043 Single (ref = married) 0.170 0.049 ** 0.237 0.048 ** 0.196 0.049 ** Cohabiting (ref = married) 0.138 0.035 ** 0.154 0.034 ** 0.143 0.034 ** Private renter (ref = owner) 0.884 0.037 ** 0.865 0.039 ** 0.883 0.037 ** Social renter (ref = owner) 0.265 0.043 ** 0.305 0.042 ** 0.283 0.042 ** Lives with parents (ref=owner) 0.110 0.050 * 0.118 0.049 * 0.113 0.049 * Household income (log) 0.014 0.021 0.009 0.021
0.012 0.021
Rooms per person -0.100 0.017 ** -0.096 0.017 ** -0.099 0.017 ** Age 30-39 (ref = 20-29) -0.178 0.032 ** -0.188 0.031 ** -0.189 0.032 ** Age 40-49 (ref = 20-29) -0.0462 0.040 ** -0.478 0.040 ** -0.476 0.040 ** Age 50-59 (ref = 20-29) -0.639 0.047 ** -0.652 0.047 ** -0.649 0.046 ** ** p<.01, * p<.05, + p<.1. Models also control for, year, region, children of different ages and log regional house prices.
![Page 16: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/16.jpg)
Results III. Response equation
DD model BP model
Coef SE
Coef SE
Unemployed (ref = employed) -0.235 0.056 ** -0.265 0.047 ** Inactive (ref = employed) -0.136 0.055 * -0.148 0.047 ** Employed partner -0.009 0.041
-0.008 0.035
Educ: O-level (ref = none) 0.031 0.038
0.018 0.033 Educ: A-level (ref = none) 0.143 0.039 ** 0.107 0.033 ** Educ: degree (ref = none) 0.216 0.050 ** 0.171 0.042 ** Single (ref = married) -0.218 0.058 ** -0.275 0.049 ** Cohabiting (ref = married) -0.028 0.047 -0.101 0.038 ** Private renter (ref = owner) 0.285 0.078 ** -0.158 0.052 ** Social renter (ref = owner) 0.015 0.052
-0.105 0.041 *
Lives with parents (ref=owner) 0.027 0.062
-0.005 0.052 Household income (log) 0.019 0.023
0.009 0.020
Rooms per person -0.026 0.021
0.014 0.017 Age 30-39 (ref = 20-29) 0.020 0.048 0.125 0.039 ** Age 40-49 (ref = 20-29) -0.029 0.066 0.202 0.045 ** Age 50-59 (ref = 20-29) -0.082 0.079 0.213 0.049 ** ** p<.01, * p<.05, + p<.1. Models also control for, year, region, children of different ages and sample membership status.
![Page 17: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/17.jpg)
Conclusionsβ’ Estimates of some predictors of moving house in the BHPS
differ depending on whether or not attrition bias is accounted for in the analysisβ The positive effect of unemployment is markedly larger than
suggested by MAR estimatesβ The positive effect of economic inactivity (p<.1) is insignificant in the
MAR estimatesβ Higher qualifications are no longer significantly associated with
greater mobility when non-response is accounted forβ’ The direction of the changes implies that effects are
underestimated for covariates negatively associated with response and overestimated for those positively associated with response
![Page 18: Nonresponse bias in studies of residential mobility](https://reader035.fdocuments.us/reader035/viewer/2022062315/56816094550346895dcfbc5c/html5/thumbnails/18.jpg)
Conclusionsβ’ Both the DD and BP models reject ignorability of non-response.
Corrections made by the two models are in the same direction, but larger in the former case. The log likelihood suggests the DD model is a slightly better fit.
β’ Next steps: simulation studies to explore the effect of including exclusion restrictions of varying strengths when the error distribution is mis-specified
β’ The potentially causal nature of the relationship between mobility and nonresponse implies that it is particularly important to consider the issue in studies of mobility, and provides an a priori reason for favouring a DD-type response mechanism.
β’ There are other examples where the DD model may be more appropriate, e.g. studies modelling poor health as the outcome