Missing data: issues, methods and examples James R. … · Missing data: issues, methods and...
Transcript of Missing data: issues, methods and examples James R. … · Missing data: issues, methods and...
Missing data: issues, methods and examples
James R. CarpenterLondon School of Hygiene & Tropical Medicine
www.missingdata.org.uk
Support for JRC from ESRC, MRC, & German Research Foundation
October 20, 2010
Overview 2Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Issues raised by missing data 4Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Shortcomings of current practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Key points and approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Missing data mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Implications for complete case analysis 8Complete case analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Multiple imputation 9Beyond a complete case analysis: assuming MAR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Motivation for MI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Why and when MI? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Why ‘multiple’ imputation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
MI algorithms 14Algorithms for MI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Software taxonomy: methods derived from multivariate normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chained equations/full conditional specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Pitfalls 18Likely pitfalls [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Sensitivity analysis 20Exploring departures from MAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Key idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Pattern mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1
Example: asthma data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
MNAR ‘pattern mixture’ analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Selection Model analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
IPW 26Inverse probability weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
GEE with missing data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Inverse probability weighting estimating equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
IPW in practice 29Using IPW in practice, 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Using IPW in practice, 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Beyond IPW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Double Robustness 32Double Robustness, 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Double robustness, 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Simulation exploring double robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
UK 1958 birth cohort 35Case study: UK 1958 Birth Cohort [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Model of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
NCDS data: item non response for model of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Naive multiple imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
More careful multiple imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Inverse probability weighting I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Inverse probability weighting II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Weight calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Results: multiple imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Results: inverse probability weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Summary 46Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
References II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2
Overview 2 / 48
Acknowledgements
John Carlin, Lyle Gurrin, Helena Romaniuk, Kate Lee (Melbourne)
Mike Kenward, Harvey Goldstein (LSHTM)
Geert Molenberghs (Limburgs University, Belgium)
James Roger (GlaxoSmithKline Research)
Sara Schroter (BMJ, London)
Jonathan Sterne, Michael Spratt, Rachael Hughes (Bristol)Stijn Vansteelandt (Ghent University, Belgium)
Ian White (MRC Biostatistics Unit, Cambridge)
Parts of this talk relevant to clinical trials are based on a peer-reviewed book, ‘Missing data in clinical
trials — a practical guide’ (joint with Mike Kenward), commissioned by the UK National Health Service,
available free on-line at www.missingdata.org.uk. Also available from the website are expanded notes
and practicals.
2 / 48
Outline
– Issues raised by non-trivial proportions of missing data: the central role of assumptions
– When is complete case analysis OK?
– Missing data mechanisms: unpacking common jargon
– Multiple imputation
◦ intuition
◦ algorithms◦ pitfalls
– Sensitivity analysis
– Inverse probability weighting and doubly robust methods
– Example: 1958 UK birth cohort
– Discussion
3 / 48
3
Issues raised by missing data 4 / 48
Motivation
Missing data are common. However, they are usually inadequately handled in both observational andexperimental research.
• For example, Wood et al (2004)[32] reviewed 71 published BMJ, JAMA, Lancet and NEJM papers. In 37 trials
with repeated outcome measures, 46% performed complete case analysis. Only 21% reported sensitivity
analysis.
• Also, Chan et al (2005)[9] estimate that 65% of studies in PubMed journals do not report the handling of
missing data.
When data are missing, valid analysis and inference rests on assumptions that cannot be verified from the data
at hand.
Sensitivity analysis—where we explore the robustness of inference to a range of plausible assumptions—istherefore important.
The assumptions need to be framed in terms that those collecting the data can understand and comment on.
4 / 48
Shortcomings of current practice
Sterne et al (2009)[26] searched four major medical journals (NEJM, Lancet, BMJ, JAMA) from 2002–7 for
articles involving original research in which multiple imputation was used. They reported
‘Although multiple imputation is increasingly regarded as a ‘standard’ method, many aspects of
its implementation can vary and very few of the papers that were identified provided full details of the
approach that was used, the underlying assumptions that were made, or the extent to which the
results could confidently be regarded as more reliable than any other approach to handling the
missing data (such as, in particular, restricting analyses to complete cases).’
See also Klebanoff and Cole (2008)[16].
5 / 48
Key points and approach
Key points to keep in mind are:
• the question (i.e. the hypothesis under investigation)
• the information in the observed data
• the reason for missing data
As statisticians we need to
• consult with investigators to design to minimise missing data/information;
• consult with investigators to postulate plausible missingness mechanism(s), and
• perform a valid analysis under each the assumed mechanism in turn, and interpret the results.
There are three broad classes of missingness mechanism: Missing Completely at random (MCAR); Missing At
Random (MAR) and Missing Not At Random (MNAR). The following picture illustrates these.
6 / 48
4
Missing data mechanisms
7 / 48
Implications for complete case analysis 8 / 48
Complete case analysis
Suppose we are interested in the regression of Y on multivariate X.
For each individual (unit) let Ri = 1 be the indicator for a complete case.
If, for all i, [Ri|Yi,Xi] = [Ri|Xi], that is the probability of a complete case, given (possibly partially observed)
Xi does not depend on Yi, then the complete case analysis is valid — though it may be quite inefficient.
To see this, consider
[Yi|Xi, Ri = 1] =[Yi,Xi, Ri = 1]
[Xi, Ri = 1]
=[Ri = 1|Xi][Yi,Xi]
[Ri = 1|Xi][Xi](under above assumption)
= [Yi|Xi].
Note we do not need the same mechanism for each individual, i, but taking this to the limit becomes very
contrived!
8 / 48
5
Multiple imputation 9 / 48
Beyond a complete case analysis: assuming MAR
It will often be plausible that the data are MAR with a mechanism depending on Y .
1. In this case the complete case analysis is invalid.
2. However, valid inference can be drawn using multiple imputation under the MAR assumption.
Further, suppose we have additional covariates Z :
1. If these have information about the missing values, we can include this using multiple imputation under MAR.
2. If, in addition, they are important for the missingness mechanism to be plausibly MAR, including them will
generally reduce bias in the parameter estimates.
9 / 48
Motivation for MI
Suppose our data set has variables X,Y with some Y values MAR given fully observed X.
Using only subjects with both observed, we can get valid estimates of the regression of Y on X.
However, inference based on observed values of Y alone (eg sample mean, variance) is typically biased.
This suggests the following idea
1. Fit the regression of Y on X2. Use this to impute the missing Y3. With this completed data set, calculate our statistic of interest (eg sample mean, variance, regression of X
on Y ).
As we can only ever know the distribution of missing data (given observed), it seems natural to repeat steps 2 &
3, and then average the results in some way.
10 / 48
Why and when MI?
Why?
1. MI is attractive, because once we have imputed the missing data, we can analyse the completed data sets
as we would have done if no data were missing.
2. MI is particularly attractive when we have missing covariates and/or auxiliary data, when other options are
relatively tricky.
When?
1. MI is not often needed if only (multivariate) responses are missing and our model of interest is a regression,
and we are prepared to assume MAR given the covariates in the model — for then we get valid estimates
from the observed data
2. Thus MI not as frequently used in trials as elsewhere, as in trials usually outcome data are missinga.
11 / 48aNote missing baseline can be treated as an outcome in the analysis
6
Why ‘multiple’ imputation?
One of the main problems with the single stochastic imputation methods is the need to develop appropriate
variance formulae for each different setting.
Multiple imputation attempts to provide a procedure that can get the appropriate measures of precision relatively
simply in (almost) any settinga
Once we choose the imputation model, it proceeds automatically.
The key is thus appropriate choice of the imputation model. This should
1. be consistent (also known as congenial) with the scientific model of interest, and
2. appropriately incorporate relevant auxiliary variables.
12 / 48aIf anything, inferences are mostly conservative. To explore this issue see [18, 15, 30, 25].
Summary
Assuming data are MAR, multiple imputation proceeds as follows:
1. model the data—including auxiliary variables—so that all the partially observed variables are responses;2. impute the missing data from this model multiple times, taking full account of the variability (i.e. including the
uncertainty in estimating the parameters of the imputation model), and
3. fit the model of interest to each ‘completed’ data set, and combine the results using Rubin’s rules.
A more formal intuition for MI is given in, for example, Rubin (1987)[22]. See also [26, 14]
For a recent example in relative survival see Nur et al, 2009 [19], and for a discussion of issues in longitudinal
data see Spratt et al, [24].
13 / 48
MI algorithms 14 / 48
Algorithms for MI
In order to do multiple imputation, it suffices to fit a model where partially observed variables are responses, andfully observed covariates.
This is tricky in general!
Thus, people have started with the assumption of multivariate normality, and tried to build out from that. Implicit
in that the regression of any one variable on the others is linear.
Skew variables can be transformed to (approximate) normality before imputation and then back transformed
afterwards.
With an unstructured multivariate normal distribution, it doesn’t matter whether we condition on fully observed
variables or have them as additional responses: so most software treat them as responses.
14 / 48
7
Software taxonomy: methods derived from multivariate norm al
Response type→ Normal Mixed response
Data structure→ Independent Multilevel Multilevel
PackageStandalone NORM PAN REALCOMa
SAS NORM-port — —
Stata NORM-port — —
R/S+ NORM-port PAN-port —
MLwiN MCMC algorithm emulates PAN + 1–2 binaryaStand alone & free from www.missingdata.org.uk. Interface with MLwiN, Stata
All methods: General missingness pattern; fitting by Markov Chain Monte Carlo (MCMC) or data augmentation
algorithm (see [27, 23, 11]).
Relationships essentially normal/linear (except MLwiN, REALCOM).
Interactions must be handled by imputing separately in each group.
Schafer has a general location model package, relatively little used.
15 / 48
Chained equations/full conditional specification
An alternative to specifying an explicit joint distribution the partially observed variables is to specify a sequence of
conditional models. This leads to the chained equations (also known as full conditional specification) algorithm:
1. To get started, for each variable in turn fill in missing values with randomly chosen observed values.
2. ‘Filled-in’ values in the first variable are discarded leaving the original missing values. These missing values
are then imputed using regression imputation on all other variables.
3. The ‘filled-in’ values in the second variable are discarded. These missing values are then imputed using
regression imputation on all other variables.
4. This process is repeated for each variable in turn. Once each variable has been imputed using theregression method we have completed one ‘cycle’.
5. The process is continued for several cycles, typically ∼ 10.
16 / 48
8
Comments
The attraction of this approach is that linear regression models can be replaced with GLMs etc. for non-normalresponses.
Software
SAS — IVEware
R — mice, mi
Stata — ice
There has been little theory for this approach, but recent work (Lee & Carlin (2010) [17]) found similar
performance to multivariate normal imputation, even for ordinal variables, provided appropriate rounding is used.
They also raised concerns about bias with FCS due to over-specification of models for unordered data, and
difficulties with large numbers of variables.
Ongoing work (unpublished) with Ian White and Rachael Hughes (Bristol) has found a condition for FCS to work,
and suggests when this condition doesn’t hold, errors are often small.
Multilevel and more complex data structures are more naturally handled using joint modelling.
17 / 48
Pitfalls 18 / 48
Likely pitfalls [26]
We have already noted that once the imputation model is specified, MI is essentially automatic.
It follows that most problems arise because of inappropriate choice of imputation model. For example:
1. omitting variables in the model of interest (e.g. response) or handling them incorrectly (e.g. with time to event
data);
2. handling non-normal variables incorrectly (though treating as continuous may work if probabilities are not too
extreme [1]);
3. having an imputation model which is inconsistent (uncongenial) with the model of interest;
4. overlooking key interactions;
5. an inappropriate MAR assumption, and
6. convergence problems with the MI model.
Point (3) usually arises when the model of interest contains non-linear relationships and/or interactions not in the
imputation model.
In passing, note the proposal for imputation of non-linear relationships of von Hippel (2009) [29] only works if
data are MCAR.
18 / 48
9
Survival analysis
Suppose the model of interest is a survival analysis with partially observed covariates.
When setting up the chained equations for the imputation, we need to remember to include the censoring
indicator as a covariate in all imputation models, together with a suitable measure of survival.
A paper by White and Royston (2009)[31] suggests the cumulative hazard is preferable.
19 / 48
Sensitivity analysis 20 / 48
Exploring departures from MAR
So far, we have assumed missing data are MAR.
Unfortunately, data may well not be MAR.
Therefore, we need to see if our conclusions are sensitive to plausible MNAR models.
They will almost certainly be sensitive to implausible MNAR models; but this tells us little.
Often, methods for sensitivity analysis are problem specific and opaque.
To be acceptable to collaborators, they need to be fairly general and transparent. However this does not
necessarily mean technically simplistic.
There are two broad approaches, ‘pattern-mixture’ and ‘selection’ models.
20 / 48
Key idea
Let Y be a partially observed variable, and X, Z be fully observed.
Let R = 1 if Y observed, and R = 0 otherwise.
The MAR assumption means [Y |X,Z,R = 0] = [Y |X,Z,R = 1].
In other words the statistical distribution of Y given X,Z is the same whether or not Y is seen.
This is directly exploited by MI under MAR: we estimate this distribution from the observed data and use it to
impute the missing data.
It follows that if data are MNAR, [Y |X,Z,R = 0] 6= [Y |X,Z,R = 1].
21 / 48
10
Pattern mixture model
This suggests the following:
1. Multiply impute missing data under MAR
2. Change the imputations to represent likely differences in conditional distributions between observed and
missing data.
3. Fit the model of interest to each imputed data set, and combine the results using Rubin’s rules in the usual
way.
This is an example of the ‘pattern mixture’ approach: the imputed data is a mixture of potentially different
imputations for different patterns.
If the assumptions in Step 2 are valid, inference is valid.
The approach is potentially very general (see, e.g. Ch 6. of Carpenter & Kenward [6]).
22 / 48
Example: asthma data
Baseline FEV1 (litres)
FE
V1
(litr
es)
at 1
2 w
eeks
low high
1.2
1.6
2.0
2.4
2.8
3.2
3.6
4.0
15/46 observed at 12 weeksAverage of 15 observed: 1.861 (l)
20/46 observed at 12 weeksAverage of 21 observed: 2.230 (l)
Estimated average assuming MAR:46 × 1.86 + 46 × 2.23
46 + 46= 2.05(l)
23 / 48
MNAR ‘pattern mixture’ analysis
First, notice that MAR means conditional on baseline, the distribution of observed and missing response is the
same.
Suppose the asthma data are MNAR.
To estimate the average % predicted FEV, we have to make additional assumptions.
For example: suppose we say that patients who withdraw have response 10% below that predicted assuming
MAR.
Then our new estimate of the average response at the end of the trial is:
1
92(1.861 × 15 + 1.675 × 31 + 2.230 × 20 + 2.007 × 26) = 1.920(l).
24 / 48
11
Selection Model analysis
The alternative general approach is to fit a selection model. Essentially, within this framework we now have twomodels (i) our model of interest, and (ii) a model for the chance of observations being missing.
We fit these jointly for inference about our questions of interest.
The resulting likelihood has to be integrated before it can be maximised, and numerical or Monte-Carlo methods
are often used (e.g. [10, 3, 6]).
Alternatively, one can approximate the results by re-weighting the imputed data (Carpenter, Kenward & White,
2007; Carpenter, Kenward & Rucker, 2010 [7, 5]).
25 / 48
IPW 26 / 48
Inverse probability weighting
Most estimators are the solution of an equation like
n∑
i=1
U(xi, θ) = 0.
For example, if Ui(xi, θ) = (xi − θ), solving
n∑
i=1
(xi − θ) = 0
gives θ =∑
n
i=1xi/n.
Theory says that if the average of Ui(Xi, θtrue) over samples from the population is zero, our estimate will
‘home in’ on the truth as the sample gets large (consistency).
26 / 48
GEE with missing data
If some of the observations xi are systematically unobserved, then it follows the corresponding U ’s are missing
from the above sum.
Then the average of the remaining Ui(θtrue) is not zero, so estimates won’t ‘home in’ on the truth.
Thus, unless data MCAR, GEEs generally give inconsistent estimates
27 / 48
12
Inverse probability weighting estimating equations
However, now let
Ri =
{
1 if xi is observed, with probability pi
0 if xi is not observed
Then, the average of
n∑
i=1
Ri
pi
U(xi, θtrue) = 0,
so parameter estimates will ‘home in’ on the truth as the sample size gets large.
Inverse probability weighting recovers consistent estimates when data are missing at random.
28 / 48
IPW in practice 29 / 48
Using IPW in practice, 1
Two things are needed for IPW methods:
1. a model relating the response to the explanatory variables or covariates, and
2. a model for the probability of missing observations.
Model (1) is usually a regression; it is the part of the analysis we would do if the data were fully observed.
Model (2) is usually a logistic regression, which provides an estimate of the probability of seeing the partially
observed variable for each individual in the data set.
In cases where there are several patterns of missingness, several logistic regressions will be required for (2).
Both (1) and (2) can be fitted in all standard statistical packages.
29 / 48
Using IPW in practice, 2
To obtain IPW estimates, we simply calculate
weighti = 1/{fitted probabilityi from model (2)},
and then refit model (1) using these weights. Again, this can be done in all standard software packages.
Estimating a standard error for the IPW estimate is more tricky, and should ideally include the variability in the
weights. This is rarely available in standard software packages.
One option is a sandwich estimator, e.g. [4].
30 / 48
13
Beyond IPW
• IPW estimates are consistent, but biased for finite samples. This is not usually an issue in practice.
• If the probability of response model is wrong, then the IPW estimates may be badly biased: it’s a good idea to
graph the weights.
• IPW analyses are not terribly efficient relative to likelihood based analyses (including multiple imputation).
• Interestingly, even if the true weights are known, perhaps because observations are missing by design, more
precise estimates will be obtained with estimated weights, provided the model is correct. The advantage of
knowing the true weights lies in knowing exactly which variables to put into your model for response, not in
their value.
31 / 48
Double Robustness 32 / 48
Double Robustness, 1
Robins, Rotnitzky and co-workers at Harvard have published a series of papers over the last decade outlining
methods for improving the efficiency of IPW estimators. This is most accessibly reviewed by Tsiatis (20006) [28].
Their approach leads to doubly robust estimators. Such estimators require three models:
(i) Model relating the response to explanatory variables/covariates of interest
(ii) Model for the probability of observing variables
(iii) Joint model of the mean of (functions of) partially observed variables given fully observed variables.
32 / 48
Double robustness, 2
Whatever method we use (indeed even if we have no missing data) if model (i) is wrong, our estimates willtypically be inconsistent.
However, doubly robust estimators have the intriguing property that if either model (ii) or model (iii) is wrong, the
estimators in model (i) are still consistent.
This is because in either case, the expectation of the estimating equation is still zero.
This robustness to two possible sources of error is known as double robustness.
By contrast, in MI, if the imputation model is wrong, parameter estimates are generally inconsistent.
The paper by Kang and Schafer (2007) [13] is a useful starting point for the debate about the pros and cons of
doubly robust estimators.
33 / 48
14
Simulation exploring double robustness
Carpenter et al (2006) [4] discussed doubly robust methods and multiple imputation, and reported a simulationstudy with a covariate MAR, where the mechanism involved the response.
The following table gives a brief summary.
Percent of 95%Method Average Z-score intervals
estimate for bias above below
OLS 0.03325 −87 0 50Doubly robust (DR) 0.08573 −0.4 2.6 2.4DR IPW 0.08563 −0.5 3.0 2.3(dropout wrong)DR IPW 0.08563 −0.5 2.2 1.7(cond mean wrong)MI 0.08611 −0.2 3.0 2.1MI (cond mean wrong) 0.15833 −131 77 0
Based on 2000 replicates; true parameter value 0.08596
34 / 48
UK 1958 birth cohort 35 / 48
Case study: UK 1958 Birth Cohort [8]
The UK National Childhood Development Study consists of a target sample of 17,634 children born in the UK in1958.
We model the probability of having any educational qualifications at age 23 as a function of four variables
measured at birth/early childhood: birth weight, mothers age, living in social housing and spending a period in
care.
By age 23, the target sample had reduced as a result of death and permanent emigration, to 15885.
24% of these were missing at age 23 — a non-trivial percentage (see Plewis et al., 2004[20]).
35 / 48
Model of interest
probit{Pr (child has no educational qualifications at 23 years)}
= β0 + β1care+ β2soch7+ β3invbwt+
β4mo age+ β5mo agesq+ β6agehous+ β7agesqhous
We focus particularly on the variable agehous — the interaction between mother’s age when the child was born
and whether they were living in social housing before the child was 7 years old.
Also of interest are the non-linear terms mo agesq and agesqhous
The following variables are predictive of educational qualifications age 23 and/or birth variables: sex, reading
ability age 16, behaviour measure age 11, number of family moves between birth and age 16.
36 / 48
15
NCDS data: item non response for model of interest
Care Social Birth Mother’s Educational n %housing weight age qualifications
age 23
+ + + + + 10279 65+ + + + - 2824 18- - + + + 1153 7.3- - + + - 765 4.8+ + - + + 349 2.2+ + - - + 109 0.67+ + - + - 97 0.61+ - + + + 59 0.37+ - + + - 52 0.33- - - - + 44 0.28- - - + + 37 0.23
Other patterns 114 0.72
37 / 48
Naive multiple imputation
We illustrate the Stata ice command by Royston (2007)[21], for imputation using chained equations:
. ice care soch7 invbwt mo_age noqual2, m(15) cycles(10)
saving(imp1) replace dryrun
Variable | Command | Prediction equation
------------+---------+-------------------------------
care | logit | soch7 invbwt mo_age noqual2
soch7 | logit | care invbwt mo_age noqual2
invbwt | regress | care soch7 mo_age noqual2
mo_age | regress | care soch7 invbwt noqual2
noqual2 | logit | care soch7 invbwt mo_age
------------------------------------------------------
End of dry run. No imputations were done, no files were created.
After imputation, we load the imputed data and use the mim prefix to fit the model of interest to each imputed
data set and combine the results for final inference using Rubin’s rules.
38 / 48
More careful multiple imputation
A more careful look at the data suggests the following:
1. The reason for social housing being missing depends on the other covariates, and given these not theresponse. So we can likely drop those with missing social housing without inducing bias.
We can then impute separately in the two social housing groups, thus allowing interactions, in particular with
mother’s age.
2. There are additional variables on the causal path — a reading score readtest, a behavioural score bsag
and the number of family moves, fammove. These auxiliary variables, together with sex, appear to contain
information and be predictors of missing data, and should be included (after transformation if necessary).
3. We should allow for the non-linear relationship with mother’s age — as best we can.
39 / 48
16
CodeWe impute separately in the two social housing groups; the code for each group is as follows:
ice mo_age mo_agesq noqual2 care invbwt readtest sex
bsag fammove, passive(mo_agesq:mo_age*mo_age)
eq(
noqual2: mo_age mo_agesq care invbwt readtest sex bsag fammove,
care: noqual2 mo_age mo_agesq invbwt readtest sex bsag fammove,
invbwt: noqual2 mo_age mo_agesq care readtest sex bsag fammove,
readtest: noqual2 mo_age mo_agesq care invbwt sex bsag fammove,
sex: noqual2 mo_age mo_agesq care invbwt readtest bsag fammove,
bsag: noqual2 mo_age mo_agesq care invbwt readtest sex fammove,
fammove: noqual2 mo_age mo_agesq care invbwt readtest sex bsag,
mo_age: noqual2 care invbwt readtest sex bsag fammove
)
m(20) cycles(30) saving(ncds_impute) replace orderasis
We then combine the two sets of imputed datasets, round (a few) impossible values, and analyse using Rubin’s
rules.
40 / 48
Inverse probability weighting I
Hawkes and Plewis (2006) [12] show the best predictors of of attrition are sex, social adjustment (age 11),
number of moves birth to wave 3 (age 16) and reading score wave 3.
Estimates for response vs. attrition at age 23 are:Variable Estimate SE
Constant −1.7 0.20
Sex −0.33 0.098
Social adjustment, wave 2 (bsag) 0.021 0.0052
No. moves, birth to wave 3 (fammove) 0.083 0.025
Reading score, wave 3 (readtest) −0.046 0.0067There are only 8072 cases with data for estimating this model.
Note that weights are not available for all the 10279 cases with complete data for the model of interest: in fact the
weighted model is estimated on 5996 cases, so there is considerable loss of power.
41 / 48
Inverse probability weighting II
Looking at the patterns of missing values for the non-response model, 1899 cases are missing information just
on number of moves, 1581 just on the reading test and 994 just on the social adjustment measure (and none just
on sex).
Thus we can
• estimate three more sets of weights based on the models that exclude just one of these variables;
• examine the relation of the weights from the incomplete (i.e. misspecified) models for response with those
from the full model and then, if the association is reasonably good,
• calibrate the weights from the full model against those from the incomplete models and replace the missing
weights by the estimated weights from the calibrations.
42 / 48
17
Weight calibrationThe weights from the three incomplete models are highly correlated with the weights from the full model
estimated on 8072 cases. The correlations are 0.82 (excluding number of moves), 0.89 (excluding the reading
test) and 0.90 (excluding social adjustment). Thus we:
1. Replace, where possible, missing weights by predicted values using estimates from the linear regression of
the weight from the full model against the weight for the model omitting social adjustment.
2. Replace, where possible, weights that are still missing by predicted values using estimates from the linear
regression of the weight from the full model against the weight for the model omitting the reading test.
3. Finally replace, where possible, weights that are still missing by predicted values using estimates from thelinear regression of the weight from the full model against the weight for the model omitting the number of
moves.
These steps allow us to estimate a weighted model with 8841 rather than 5996 cases.
43 / 48
Results: multiple imputation
Explanatory Complete Multiple imputationvariable Cases (N=10,279) Naive With interaction
In care 0·65 0·65 0·70(0·096) (0·091) (0·090)
In social 0·58 0·53 0·58housing (0·035) (0·032) (0·033)Inverse 73 70 75birthweight (8·5) (7·7) (8·6)Mo’ age at −0·017 −0·014 −0·017birth (0·0037) (0·0033) (0·0037)Mo’ age 0·0020 0·0016 0·0019squared (0·00047) (0·00042) (0·00048)Mo’ age × 0·013 0·012 0·015housing (0·0054) (0·0048) (0·0051)Mo’ age sq’d × −0·00078 −0·00068 −0·00094housing (0·00068) (0·00061) (0·00065)Constant −1·6 −1·4 −1·5
(0·078) (0·072) (0·075)
44 / 48
18
Results: inverse probability weighting
Explanatory Complete Multiple imputationvariable Cases (N=10,279) IPW I (N=5996) IPW II (N=8841)
In care 0·65 0·73 0·64(0·096) (0·14) (0·11)
In social 0·58 0·51 0·57housing (0·035) (0·046) (0·035)Inverse 73 80 75birthweight (8·5) (12) (9·5)Mo’ age at −0·017 −0·011 −0·014birth (0·0037) (0·0053) (0·0042)Mo’ age 0·0020 0·00083 0·0017squared (0·00047) (0·00066) (0·00052)Mo’ age × 0·013 0·0026 0·0065housing (0·0054) (0·0075) (0·0059)Mo’ age sq’d × −0·00078 −0·00017 −0·00054housing (0·00068) (0·00094) (0·00075)Constant −1·6 −1·7 −1·6
(0·078) (0·11) (0·088)
45 / 48
Summary 46 / 48
Discussion
• Analysing partially observed data entails untestable assumptions about the missingness mechanism(s).
• An appropriate strategy is thus
◦ assume a missingness mechanism (consistent with the data and the investigators experience)
◦ perform a valid analysis under that assumption
◦ explore the robustness of inference to different assumptions
• MI provides a computationally convenient tool for a range of practical problems but (i) watch for
nonlinearities/interactions and (ii) be careful with large numbers of variables, especially with FCS
• IPW is probably too inefficient in most cases, but provides a useful check
• DR is an attractive approach, but estimators for complex data are not known; there remain issues with
stability of the weights (see [2] and talk by Vansteelant from www.missingdata.org.uk)
• Bringing DR and MI together is potentially attractive (see talk by Daniel from www.missingdata.org.uk)
46 / 48
19
References
[1] C A Bernaards, T R Belin, and J L Schafer. Robustness of a multivariate normal approximation for imputation of incomplete binarydata. Statistics in Medicine, 26(6):1368–1382, mar 2007.
[2] W. Cao, A. A. Tsiatis, and M. Davidian. Improving efficiency and robustness of the doubly robust estimator for a population meanwith incomplete data. Biometrika, 96:723–734, 2009.
[3] J Carpenter, S Pocock, and C J Lamm. Coping with missing data in clinical trials: a model based approach applied to asthma trials.Statistics in Medicine, 21:1043–1066, 2002.
[4] J R Carpenter, M G Kenward, and S Vansteelandt. A comparison of multiple imputation and inverse probability weighting foranalyses with missing data. Journal of the Royal Statistical Society, Series A (Statistics in Society), 169:571–584, 2006.
[5] J R Carpenter, G Rucker, and G Schwarzer. Assessing the sensitivity of meta-analysis to selection bias: a multiple imputationapproach. Biometrics, In press:XXX–XXX, 2010.
[6] J R Carpenter and M G Kenward. Missing data in clinical trials — a practical guide. Birmingham: National Health ServiceCo-ordinating Centre for Research Methodology. Freely downloadable from www.missingdata.org.uk, accessed 7 May 2010,2008.
[7] J R Carpenter, M G Kenward, and I R White. Sensitivity analysis after multiple imputation under missing at random — a weightingapproach. Statistical Methods in Medical Research, 16:259–275, 2007.
[8] J R Carpenter and Ian Plewis. Coming to terms with non-response in longitudinal studies. Under revision for The SAGE handbookof Methodological Innovation, editors Malcolm Williams and Paul Vogt, London: SAGE, 2010.
[9] A-W Chan and Douglas G Altman. Epidemiology and reporting of randomised trials published in PubMed journals. The Lancet,365:1159–1162, 2005.
[10] P J Diggle and M G Kenward. Informative dropout in longitudinal data analysis (with discussion). Journal of the Royal StatisticalSociety Series C (applied statistics), 43:49–94, 1994.
[11] H Goldstein, J R Carpenter, M G Kenward, and K Levin. Multilevel models with multivariate mixed response types. StatisticalModelling, 9:173–197, 2009.
[12] D Hawkes and I Plewis. Modelling non-response in the national child development study. Journal of the Royal Statistical Society,Series A, 169:479–491, 2006.
[13] J. D. Y. Kang and J. L. Schafer. Demystifying double robustness: A comparison of alternative strategies for estimating a populationmean from incomplete data (with discussion). Statistical Science, 22:523–539, 2007.
[14] M G Kenward and J R Carpenter. Multiple imputation: current perspectives. Statistical Methods in Medical Research, 16:199–218,2007.
[15] J K Kim, J M Brick, W A Fuller, and G Kalton. On the bias of the multiple-imputation variance estimator in survey sampling. Journalof the Royal Statistical Society, Series B (Statistical Methodology), 68:509–521, 2006.
47 / 48
20
References II
[16] M A Klebanoff and S R Cole. Use of multiple imputation in the epidemiologic literature. American Journal of Epidemiology,168:355–357, 2008.
[17] Katherine J Lee and John B Carlin. Multiple imputation for missing data: fully conditional specification versus multivariate normalimputation. American Journal of Epidemiology, pages 624–632, 2010.
[18] X L Meng. Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10:538–573,1994.
[19] U Nur, L G Shack, B Rachet, J R Carpenter, and M P Coleman. Modelling relative survival in the presence of incomplete data: atutorial International Journal of Epidemiology, pre-print online, doi:10.1093/ije/dyp309, 2009.
[20] I Plewis, L Calderwood, D Hawkes, and G Nathan. National Child Development Study and 1970 British Cohort Study TechnicalReport: Changes in the NCDS and BCS70 Populations and Samples over Time (1st ed.). London: Institute of Education, Universityof London, 2004.
[21] P Royston. Multiple imputation of missing values: further update of ice with emphasis on interval censoring. The Stata Journal,7:445–464, 2007.
[22] D B Rubin. Multiple imputation for nonresponse in surveys. New York: Wiley, 1987.
[23] J L Schafer. Analysis of incomplete multivariate data. London: Chapman and Hall, 1997.
[24] M Spratt, J A C Sterne, K Tilling, J R Carpenter, and J B Carlin. Strategies for Multiple Imputation in Longitudinal Studies. AmericanJournal of Epidemiology, 172:478–487, 2010.
[25] R J Steele, N Wang, and A E Raftery. Inference from multiple imputation for missing data using mixtures of normals. StatisticalMethodology, 7:351–365, 2010.
[26] J A C Sterne, I R White, J B Carlin, M Spratt, P Royston, M G Kenward, A M Wood, and J R Carpenter. Multiple imputation formissing data in epidemiological and clinical research: potential and pitfalls. British Medical Journal, 339:157–160, 2009.
[27] M Tanner and W Wong. The calculation of posterior distributions by Data Augmentation (with discussion). Journal of the AmericanStatistical Association, 82:528–550, 1987.
[28] A A Tsiatis. Semiparametric Theory and Missing Data. Springer, New York, 2006.
[29] Paul T von Hippel. How to impute interactions, squares and other transformed variables. Sociological Methodology, 39:265–291,2009.
[30] N Wang and J M Robins. Large-sample theory for parametric multiple imputation procedures. Biometrika, 85:935–948, 1998.
[31] I R White and P Royston. Imputing missing covariate values for the cox model. Statistics in Medicine, 28:1982–1998, 2009.
[32] Angela M Wood, Ian R White, and Simon G Thompson. Are missing outcome data adequately handled? A review of publishedrandomized controlled trials in major medical journals. Clinical Trials, 1:368–376, 2004.
48 / 48
21