Missing data: issues, methods and examples James R. … · Missing data: issues, methods and...

21
Missing data: issues, methods and examples James R. Carpenter London School of Hygiene & Tropical Medicine [email protected] www.missingdata.org.uk Support for JRC from ESRC, MRC, & German Research Foundation October 20, 2010 Overview 2 Acknowledgements ........................................................................ 2 Outline .................................................................................. 3 Issues raised by missing data 4 Motivation ............................................................................... 4 Shortcomings of current practice .............................................................. 5 Key points and approach .................................................................... 6 Missing data mechanisms ................................................................... 7 Implications for complete case analysis 8 Complete case analysis ..................................................................... 8 Multiple imputation 9 Beyond a complete case analysis: assuming MAR ................................................ 9 Motivation for MI ......................................................................... 10 Why and when MI? ....................................................................... 11 Why ‘multiple’ imputation? .................................................................. 12 Summary ............................................................................... 13 MI algorithms 14 Algorithms for MI ......................................................................... 14 Software taxonomy: methods derived from multivariate normal ..................................... 15 Chained equations/full conditional specification ................................................. 16 Comments .............................................................................. 17 Pitfalls 18 Likely pitfalls [26] ......................................................................... 18 Survival analysis ......................................................................... 19 Sensitivity analysis 20 Exploring departures from MAR ............................................................. 20 Key idea ................................................................................ 21 Pattern mixture model ..................................................................... 22 1

Transcript of Missing data: issues, methods and examples James R. … · Missing data: issues, methods and...

Missing data: issues, methods and examples

James R. CarpenterLondon School of Hygiene & Tropical Medicine

[email protected]

www.missingdata.org.uk

Support for JRC from ESRC, MRC, & German Research Foundation

October 20, 2010

Overview 2Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Issues raised by missing data 4Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Shortcomings of current practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Key points and approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Missing data mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Implications for complete case analysis 8Complete case analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Multiple imputation 9Beyond a complete case analysis: assuming MAR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Motivation for MI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Why and when MI? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Why ‘multiple’ imputation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

MI algorithms 14Algorithms for MI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Software taxonomy: methods derived from multivariate normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chained equations/full conditional specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Pitfalls 18Likely pitfalls [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Sensitivity analysis 20Exploring departures from MAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Key idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Pattern mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1

Example: asthma data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

MNAR ‘pattern mixture’ analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Selection Model analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

IPW 26Inverse probability weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

GEE with missing data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Inverse probability weighting estimating equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

IPW in practice 29Using IPW in practice, 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Using IPW in practice, 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Beyond IPW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Double Robustness 32Double Robustness, 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Double robustness, 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Simulation exploring double robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

UK 1958 birth cohort 35Case study: UK 1958 Birth Cohort [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Model of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

NCDS data: item non response for model of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Naive multiple imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

More careful multiple imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Inverse probability weighting I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Inverse probability weighting II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Weight calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Results: multiple imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Results: inverse probability weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Summary 46Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

References II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2

Overview 2 / 48

Acknowledgements

John Carlin, Lyle Gurrin, Helena Romaniuk, Kate Lee (Melbourne)

Mike Kenward, Harvey Goldstein (LSHTM)

Geert Molenberghs (Limburgs University, Belgium)

James Roger (GlaxoSmithKline Research)

Sara Schroter (BMJ, London)

Jonathan Sterne, Michael Spratt, Rachael Hughes (Bristol)Stijn Vansteelandt (Ghent University, Belgium)

Ian White (MRC Biostatistics Unit, Cambridge)

Parts of this talk relevant to clinical trials are based on a peer-reviewed book, ‘Missing data in clinical

trials — a practical guide’ (joint with Mike Kenward), commissioned by the UK National Health Service,

available free on-line at www.missingdata.org.uk. Also available from the website are expanded notes

and practicals.

2 / 48

Outline

– Issues raised by non-trivial proportions of missing data: the central role of assumptions

– When is complete case analysis OK?

– Missing data mechanisms: unpacking common jargon

– Multiple imputation

◦ intuition

◦ algorithms◦ pitfalls

– Sensitivity analysis

– Inverse probability weighting and doubly robust methods

– Example: 1958 UK birth cohort

– Discussion

3 / 48

3

Issues raised by missing data 4 / 48

Motivation

Missing data are common. However, they are usually inadequately handled in both observational andexperimental research.

• For example, Wood et al (2004)[32] reviewed 71 published BMJ, JAMA, Lancet and NEJM papers. In 37 trials

with repeated outcome measures, 46% performed complete case analysis. Only 21% reported sensitivity

analysis.

• Also, Chan et al (2005)[9] estimate that 65% of studies in PubMed journals do not report the handling of

missing data.

When data are missing, valid analysis and inference rests on assumptions that cannot be verified from the data

at hand.

Sensitivity analysis—where we explore the robustness of inference to a range of plausible assumptions—istherefore important.

The assumptions need to be framed in terms that those collecting the data can understand and comment on.

4 / 48

Shortcomings of current practice

Sterne et al (2009)[26] searched four major medical journals (NEJM, Lancet, BMJ, JAMA) from 2002–7 for

articles involving original research in which multiple imputation was used. They reported

‘Although multiple imputation is increasingly regarded as a ‘standard’ method, many aspects of

its implementation can vary and very few of the papers that were identified provided full details of the

approach that was used, the underlying assumptions that were made, or the extent to which the

results could confidently be regarded as more reliable than any other approach to handling the

missing data (such as, in particular, restricting analyses to complete cases).’

See also Klebanoff and Cole (2008)[16].

5 / 48

Key points and approach

Key points to keep in mind are:

• the question (i.e. the hypothesis under investigation)

• the information in the observed data

• the reason for missing data

As statisticians we need to

• consult with investigators to design to minimise missing data/information;

• consult with investigators to postulate plausible missingness mechanism(s), and

• perform a valid analysis under each the assumed mechanism in turn, and interpret the results.

There are three broad classes of missingness mechanism: Missing Completely at random (MCAR); Missing At

Random (MAR) and Missing Not At Random (MNAR). The following picture illustrates these.

6 / 48

4

Missing data mechanisms

7 / 48

Implications for complete case analysis 8 / 48

Complete case analysis

Suppose we are interested in the regression of Y on multivariate X.

For each individual (unit) let Ri = 1 be the indicator for a complete case.

If, for all i, [Ri|Yi,Xi] = [Ri|Xi], that is the probability of a complete case, given (possibly partially observed)

Xi does not depend on Yi, then the complete case analysis is valid — though it may be quite inefficient.

To see this, consider

[Yi|Xi, Ri = 1] =[Yi,Xi, Ri = 1]

[Xi, Ri = 1]

=[Ri = 1|Xi][Yi,Xi]

[Ri = 1|Xi][Xi](under above assumption)

= [Yi|Xi].

Note we do not need the same mechanism for each individual, i, but taking this to the limit becomes very

contrived!

8 / 48

5

Multiple imputation 9 / 48

Beyond a complete case analysis: assuming MAR

It will often be plausible that the data are MAR with a mechanism depending on Y .

1. In this case the complete case analysis is invalid.

2. However, valid inference can be drawn using multiple imputation under the MAR assumption.

Further, suppose we have additional covariates Z :

1. If these have information about the missing values, we can include this using multiple imputation under MAR.

2. If, in addition, they are important for the missingness mechanism to be plausibly MAR, including them will

generally reduce bias in the parameter estimates.

9 / 48

Motivation for MI

Suppose our data set has variables X,Y with some Y values MAR given fully observed X.

Using only subjects with both observed, we can get valid estimates of the regression of Y on X.

However, inference based on observed values of Y alone (eg sample mean, variance) is typically biased.

This suggests the following idea

1. Fit the regression of Y on X2. Use this to impute the missing Y3. With this completed data set, calculate our statistic of interest (eg sample mean, variance, regression of X

on Y ).

As we can only ever know the distribution of missing data (given observed), it seems natural to repeat steps 2 &

3, and then average the results in some way.

10 / 48

Why and when MI?

Why?

1. MI is attractive, because once we have imputed the missing data, we can analyse the completed data sets

as we would have done if no data were missing.

2. MI is particularly attractive when we have missing covariates and/or auxiliary data, when other options are

relatively tricky.

When?

1. MI is not often needed if only (multivariate) responses are missing and our model of interest is a regression,

and we are prepared to assume MAR given the covariates in the model — for then we get valid estimates

from the observed data

2. Thus MI not as frequently used in trials as elsewhere, as in trials usually outcome data are missinga.

11 / 48aNote missing baseline can be treated as an outcome in the analysis

6

Why ‘multiple’ imputation?

One of the main problems with the single stochastic imputation methods is the need to develop appropriate

variance formulae for each different setting.

Multiple imputation attempts to provide a procedure that can get the appropriate measures of precision relatively

simply in (almost) any settinga

Once we choose the imputation model, it proceeds automatically.

The key is thus appropriate choice of the imputation model. This should

1. be consistent (also known as congenial) with the scientific model of interest, and

2. appropriately incorporate relevant auxiliary variables.

12 / 48aIf anything, inferences are mostly conservative. To explore this issue see [18, 15, 30, 25].

Summary

Assuming data are MAR, multiple imputation proceeds as follows:

1. model the data—including auxiliary variables—so that all the partially observed variables are responses;2. impute the missing data from this model multiple times, taking full account of the variability (i.e. including the

uncertainty in estimating the parameters of the imputation model), and

3. fit the model of interest to each ‘completed’ data set, and combine the results using Rubin’s rules.

A more formal intuition for MI is given in, for example, Rubin (1987)[22]. See also [26, 14]

For a recent example in relative survival see Nur et al, 2009 [19], and for a discussion of issues in longitudinal

data see Spratt et al, [24].

13 / 48

MI algorithms 14 / 48

Algorithms for MI

In order to do multiple imputation, it suffices to fit a model where partially observed variables are responses, andfully observed covariates.

This is tricky in general!

Thus, people have started with the assumption of multivariate normality, and tried to build out from that. Implicit

in that the regression of any one variable on the others is linear.

Skew variables can be transformed to (approximate) normality before imputation and then back transformed

afterwards.

With an unstructured multivariate normal distribution, it doesn’t matter whether we condition on fully observed

variables or have them as additional responses: so most software treat them as responses.

14 / 48

7

Software taxonomy: methods derived from multivariate norm al

Response type→ Normal Mixed response

Data structure→ Independent Multilevel Multilevel

PackageStandalone NORM PAN REALCOMa

SAS NORM-port — —

Stata NORM-port — —

R/S+ NORM-port PAN-port —

MLwiN MCMC algorithm emulates PAN + 1–2 binaryaStand alone & free from www.missingdata.org.uk. Interface with MLwiN, Stata

All methods: General missingness pattern; fitting by Markov Chain Monte Carlo (MCMC) or data augmentation

algorithm (see [27, 23, 11]).

Relationships essentially normal/linear (except MLwiN, REALCOM).

Interactions must be handled by imputing separately in each group.

Schafer has a general location model package, relatively little used.

15 / 48

Chained equations/full conditional specification

An alternative to specifying an explicit joint distribution the partially observed variables is to specify a sequence of

conditional models. This leads to the chained equations (also known as full conditional specification) algorithm:

1. To get started, for each variable in turn fill in missing values with randomly chosen observed values.

2. ‘Filled-in’ values in the first variable are discarded leaving the original missing values. These missing values

are then imputed using regression imputation on all other variables.

3. The ‘filled-in’ values in the second variable are discarded. These missing values are then imputed using

regression imputation on all other variables.

4. This process is repeated for each variable in turn. Once each variable has been imputed using theregression method we have completed one ‘cycle’.

5. The process is continued for several cycles, typically ∼ 10.

16 / 48

8

Comments

The attraction of this approach is that linear regression models can be replaced with GLMs etc. for non-normalresponses.

Software

SAS — IVEware

R — mice, mi

Stata — ice

There has been little theory for this approach, but recent work (Lee & Carlin (2010) [17]) found similar

performance to multivariate normal imputation, even for ordinal variables, provided appropriate rounding is used.

They also raised concerns about bias with FCS due to over-specification of models for unordered data, and

difficulties with large numbers of variables.

Ongoing work (unpublished) with Ian White and Rachael Hughes (Bristol) has found a condition for FCS to work,

and suggests when this condition doesn’t hold, errors are often small.

Multilevel and more complex data structures are more naturally handled using joint modelling.

17 / 48

Pitfalls 18 / 48

Likely pitfalls [26]

We have already noted that once the imputation model is specified, MI is essentially automatic.

It follows that most problems arise because of inappropriate choice of imputation model. For example:

1. omitting variables in the model of interest (e.g. response) or handling them incorrectly (e.g. with time to event

data);

2. handling non-normal variables incorrectly (though treating as continuous may work if probabilities are not too

extreme [1]);

3. having an imputation model which is inconsistent (uncongenial) with the model of interest;

4. overlooking key interactions;

5. an inappropriate MAR assumption, and

6. convergence problems with the MI model.

Point (3) usually arises when the model of interest contains non-linear relationships and/or interactions not in the

imputation model.

In passing, note the proposal for imputation of non-linear relationships of von Hippel (2009) [29] only works if

data are MCAR.

18 / 48

9

Survival analysis

Suppose the model of interest is a survival analysis with partially observed covariates.

When setting up the chained equations for the imputation, we need to remember to include the censoring

indicator as a covariate in all imputation models, together with a suitable measure of survival.

A paper by White and Royston (2009)[31] suggests the cumulative hazard is preferable.

19 / 48

Sensitivity analysis 20 / 48

Exploring departures from MAR

So far, we have assumed missing data are MAR.

Unfortunately, data may well not be MAR.

Therefore, we need to see if our conclusions are sensitive to plausible MNAR models.

They will almost certainly be sensitive to implausible MNAR models; but this tells us little.

Often, methods for sensitivity analysis are problem specific and opaque.

To be acceptable to collaborators, they need to be fairly general and transparent. However this does not

necessarily mean technically simplistic.

There are two broad approaches, ‘pattern-mixture’ and ‘selection’ models.

20 / 48

Key idea

Let Y be a partially observed variable, and X, Z be fully observed.

Let R = 1 if Y observed, and R = 0 otherwise.

The MAR assumption means [Y |X,Z,R = 0] = [Y |X,Z,R = 1].

In other words the statistical distribution of Y given X,Z is the same whether or not Y is seen.

This is directly exploited by MI under MAR: we estimate this distribution from the observed data and use it to

impute the missing data.

It follows that if data are MNAR, [Y |X,Z,R = 0] 6= [Y |X,Z,R = 1].

21 / 48

10

Pattern mixture model

This suggests the following:

1. Multiply impute missing data under MAR

2. Change the imputations to represent likely differences in conditional distributions between observed and

missing data.

3. Fit the model of interest to each imputed data set, and combine the results using Rubin’s rules in the usual

way.

This is an example of the ‘pattern mixture’ approach: the imputed data is a mixture of potentially different

imputations for different patterns.

If the assumptions in Step 2 are valid, inference is valid.

The approach is potentially very general (see, e.g. Ch 6. of Carpenter & Kenward [6]).

22 / 48

Example: asthma data

Baseline FEV1 (litres)

FE

V1

(litr

es)

at 1

2 w

eeks

low high

1.2

1.6

2.0

2.4

2.8

3.2

3.6

4.0

15/46 observed at 12 weeksAverage of 15 observed: 1.861 (l)

20/46 observed at 12 weeksAverage of 21 observed: 2.230 (l)

Estimated average assuming MAR:46 × 1.86 + 46 × 2.23

46 + 46= 2.05(l)

23 / 48

MNAR ‘pattern mixture’ analysis

First, notice that MAR means conditional on baseline, the distribution of observed and missing response is the

same.

Suppose the asthma data are MNAR.

To estimate the average % predicted FEV, we have to make additional assumptions.

For example: suppose we say that patients who withdraw have response 10% below that predicted assuming

MAR.

Then our new estimate of the average response at the end of the trial is:

1

92(1.861 × 15 + 1.675 × 31 + 2.230 × 20 + 2.007 × 26) = 1.920(l).

24 / 48

11

Selection Model analysis

The alternative general approach is to fit a selection model. Essentially, within this framework we now have twomodels (i) our model of interest, and (ii) a model for the chance of observations being missing.

We fit these jointly for inference about our questions of interest.

The resulting likelihood has to be integrated before it can be maximised, and numerical or Monte-Carlo methods

are often used (e.g. [10, 3, 6]).

Alternatively, one can approximate the results by re-weighting the imputed data (Carpenter, Kenward & White,

2007; Carpenter, Kenward & Rucker, 2010 [7, 5]).

25 / 48

IPW 26 / 48

Inverse probability weighting

Most estimators are the solution of an equation like

n∑

i=1

U(xi, θ) = 0.

For example, if Ui(xi, θ) = (xi − θ), solving

n∑

i=1

(xi − θ) = 0

gives θ =∑

n

i=1xi/n.

Theory says that if the average of Ui(Xi, θtrue) over samples from the population is zero, our estimate will

‘home in’ on the truth as the sample gets large (consistency).

26 / 48

GEE with missing data

If some of the observations xi are systematically unobserved, then it follows the corresponding U ’s are missing

from the above sum.

Then the average of the remaining Ui(θtrue) is not zero, so estimates won’t ‘home in’ on the truth.

Thus, unless data MCAR, GEEs generally give inconsistent estimates

27 / 48

12

Inverse probability weighting estimating equations

However, now let

Ri =

{

1 if xi is observed, with probability pi

0 if xi is not observed

Then, the average of

n∑

i=1

Ri

pi

U(xi, θtrue) = 0,

so parameter estimates will ‘home in’ on the truth as the sample size gets large.

Inverse probability weighting recovers consistent estimates when data are missing at random.

28 / 48

IPW in practice 29 / 48

Using IPW in practice, 1

Two things are needed for IPW methods:

1. a model relating the response to the explanatory variables or covariates, and

2. a model for the probability of missing observations.

Model (1) is usually a regression; it is the part of the analysis we would do if the data were fully observed.

Model (2) is usually a logistic regression, which provides an estimate of the probability of seeing the partially

observed variable for each individual in the data set.

In cases where there are several patterns of missingness, several logistic regressions will be required for (2).

Both (1) and (2) can be fitted in all standard statistical packages.

29 / 48

Using IPW in practice, 2

To obtain IPW estimates, we simply calculate

weighti = 1/{fitted probabilityi from model (2)},

and then refit model (1) using these weights. Again, this can be done in all standard software packages.

Estimating a standard error for the IPW estimate is more tricky, and should ideally include the variability in the

weights. This is rarely available in standard software packages.

One option is a sandwich estimator, e.g. [4].

30 / 48

13

Beyond IPW

• IPW estimates are consistent, but biased for finite samples. This is not usually an issue in practice.

• If the probability of response model is wrong, then the IPW estimates may be badly biased: it’s a good idea to

graph the weights.

• IPW analyses are not terribly efficient relative to likelihood based analyses (including multiple imputation).

• Interestingly, even if the true weights are known, perhaps because observations are missing by design, more

precise estimates will be obtained with estimated weights, provided the model is correct. The advantage of

knowing the true weights lies in knowing exactly which variables to put into your model for response, not in

their value.

31 / 48

Double Robustness 32 / 48

Double Robustness, 1

Robins, Rotnitzky and co-workers at Harvard have published a series of papers over the last decade outlining

methods for improving the efficiency of IPW estimators. This is most accessibly reviewed by Tsiatis (20006) [28].

Their approach leads to doubly robust estimators. Such estimators require three models:

(i) Model relating the response to explanatory variables/covariates of interest

(ii) Model for the probability of observing variables

(iii) Joint model of the mean of (functions of) partially observed variables given fully observed variables.

32 / 48

Double robustness, 2

Whatever method we use (indeed even if we have no missing data) if model (i) is wrong, our estimates willtypically be inconsistent.

However, doubly robust estimators have the intriguing property that if either model (ii) or model (iii) is wrong, the

estimators in model (i) are still consistent.

This is because in either case, the expectation of the estimating equation is still zero.

This robustness to two possible sources of error is known as double robustness.

By contrast, in MI, if the imputation model is wrong, parameter estimates are generally inconsistent.

The paper by Kang and Schafer (2007) [13] is a useful starting point for the debate about the pros and cons of

doubly robust estimators.

33 / 48

14

Simulation exploring double robustness

Carpenter et al (2006) [4] discussed doubly robust methods and multiple imputation, and reported a simulationstudy with a covariate MAR, where the mechanism involved the response.

The following table gives a brief summary.

Percent of 95%Method Average Z-score intervals

estimate for bias above below

OLS 0.03325 −87 0 50Doubly robust (DR) 0.08573 −0.4 2.6 2.4DR IPW 0.08563 −0.5 3.0 2.3(dropout wrong)DR IPW 0.08563 −0.5 2.2 1.7(cond mean wrong)MI 0.08611 −0.2 3.0 2.1MI (cond mean wrong) 0.15833 −131 77 0

Based on 2000 replicates; true parameter value 0.08596

34 / 48

UK 1958 birth cohort 35 / 48

Case study: UK 1958 Birth Cohort [8]

The UK National Childhood Development Study consists of a target sample of 17,634 children born in the UK in1958.

We model the probability of having any educational qualifications at age 23 as a function of four variables

measured at birth/early childhood: birth weight, mothers age, living in social housing and spending a period in

care.

By age 23, the target sample had reduced as a result of death and permanent emigration, to 15885.

24% of these were missing at age 23 — a non-trivial percentage (see Plewis et al., 2004[20]).

35 / 48

Model of interest

probit{Pr (child has no educational qualifications at 23 years)}

= β0 + β1care+ β2soch7+ β3invbwt+

β4mo age+ β5mo agesq+ β6agehous+ β7agesqhous

We focus particularly on the variable agehous — the interaction between mother’s age when the child was born

and whether they were living in social housing before the child was 7 years old.

Also of interest are the non-linear terms mo agesq and agesqhous

The following variables are predictive of educational qualifications age 23 and/or birth variables: sex, reading

ability age 16, behaviour measure age 11, number of family moves between birth and age 16.

36 / 48

15

NCDS data: item non response for model of interest

Care Social Birth Mother’s Educational n %housing weight age qualifications

age 23

+ + + + + 10279 65+ + + + - 2824 18- - + + + 1153 7.3- - + + - 765 4.8+ + - + + 349 2.2+ + - - + 109 0.67+ + - + - 97 0.61+ - + + + 59 0.37+ - + + - 52 0.33- - - - + 44 0.28- - - + + 37 0.23

Other patterns 114 0.72

37 / 48

Naive multiple imputation

We illustrate the Stata ice command by Royston (2007)[21], for imputation using chained equations:

. ice care soch7 invbwt mo_age noqual2, m(15) cycles(10)

saving(imp1) replace dryrun

Variable | Command | Prediction equation

------------+---------+-------------------------------

care | logit | soch7 invbwt mo_age noqual2

soch7 | logit | care invbwt mo_age noqual2

invbwt | regress | care soch7 mo_age noqual2

mo_age | regress | care soch7 invbwt noqual2

noqual2 | logit | care soch7 invbwt mo_age

------------------------------------------------------

End of dry run. No imputations were done, no files were created.

After imputation, we load the imputed data and use the mim prefix to fit the model of interest to each imputed

data set and combine the results for final inference using Rubin’s rules.

38 / 48

More careful multiple imputation

A more careful look at the data suggests the following:

1. The reason for social housing being missing depends on the other covariates, and given these not theresponse. So we can likely drop those with missing social housing without inducing bias.

We can then impute separately in the two social housing groups, thus allowing interactions, in particular with

mother’s age.

2. There are additional variables on the causal path — a reading score readtest, a behavioural score bsag

and the number of family moves, fammove. These auxiliary variables, together with sex, appear to contain

information and be predictors of missing data, and should be included (after transformation if necessary).

3. We should allow for the non-linear relationship with mother’s age — as best we can.

39 / 48

16

CodeWe impute separately in the two social housing groups; the code for each group is as follows:

ice mo_age mo_agesq noqual2 care invbwt readtest sex

bsag fammove, passive(mo_agesq:mo_age*mo_age)

eq(

noqual2: mo_age mo_agesq care invbwt readtest sex bsag fammove,

care: noqual2 mo_age mo_agesq invbwt readtest sex bsag fammove,

invbwt: noqual2 mo_age mo_agesq care readtest sex bsag fammove,

readtest: noqual2 mo_age mo_agesq care invbwt sex bsag fammove,

sex: noqual2 mo_age mo_agesq care invbwt readtest bsag fammove,

bsag: noqual2 mo_age mo_agesq care invbwt readtest sex fammove,

fammove: noqual2 mo_age mo_agesq care invbwt readtest sex bsag,

mo_age: noqual2 care invbwt readtest sex bsag fammove

)

m(20) cycles(30) saving(ncds_impute) replace orderasis

We then combine the two sets of imputed datasets, round (a few) impossible values, and analyse using Rubin’s

rules.

40 / 48

Inverse probability weighting I

Hawkes and Plewis (2006) [12] show the best predictors of of attrition are sex, social adjustment (age 11),

number of moves birth to wave 3 (age 16) and reading score wave 3.

Estimates for response vs. attrition at age 23 are:Variable Estimate SE

Constant −1.7 0.20

Sex −0.33 0.098

Social adjustment, wave 2 (bsag) 0.021 0.0052

No. moves, birth to wave 3 (fammove) 0.083 0.025

Reading score, wave 3 (readtest) −0.046 0.0067There are only 8072 cases with data for estimating this model.

Note that weights are not available for all the 10279 cases with complete data for the model of interest: in fact the

weighted model is estimated on 5996 cases, so there is considerable loss of power.

41 / 48

Inverse probability weighting II

Looking at the patterns of missing values for the non-response model, 1899 cases are missing information just

on number of moves, 1581 just on the reading test and 994 just on the social adjustment measure (and none just

on sex).

Thus we can

• estimate three more sets of weights based on the models that exclude just one of these variables;

• examine the relation of the weights from the incomplete (i.e. misspecified) models for response with those

from the full model and then, if the association is reasonably good,

• calibrate the weights from the full model against those from the incomplete models and replace the missing

weights by the estimated weights from the calibrations.

42 / 48

17

Weight calibrationThe weights from the three incomplete models are highly correlated with the weights from the full model

estimated on 8072 cases. The correlations are 0.82 (excluding number of moves), 0.89 (excluding the reading

test) and 0.90 (excluding social adjustment). Thus we:

1. Replace, where possible, missing weights by predicted values using estimates from the linear regression of

the weight from the full model against the weight for the model omitting social adjustment.

2. Replace, where possible, weights that are still missing by predicted values using estimates from the linear

regression of the weight from the full model against the weight for the model omitting the reading test.

3. Finally replace, where possible, weights that are still missing by predicted values using estimates from thelinear regression of the weight from the full model against the weight for the model omitting the number of

moves.

These steps allow us to estimate a weighted model with 8841 rather than 5996 cases.

43 / 48

Results: multiple imputation

Explanatory Complete Multiple imputationvariable Cases (N=10,279) Naive With interaction

In care 0·65 0·65 0·70(0·096) (0·091) (0·090)

In social 0·58 0·53 0·58housing (0·035) (0·032) (0·033)Inverse 73 70 75birthweight (8·5) (7·7) (8·6)Mo’ age at −0·017 −0·014 −0·017birth (0·0037) (0·0033) (0·0037)Mo’ age 0·0020 0·0016 0·0019squared (0·00047) (0·00042) (0·00048)Mo’ age × 0·013 0·012 0·015housing (0·0054) (0·0048) (0·0051)Mo’ age sq’d × −0·00078 −0·00068 −0·00094housing (0·00068) (0·00061) (0·00065)Constant −1·6 −1·4 −1·5

(0·078) (0·072) (0·075)

44 / 48

18

Results: inverse probability weighting

Explanatory Complete Multiple imputationvariable Cases (N=10,279) IPW I (N=5996) IPW II (N=8841)

In care 0·65 0·73 0·64(0·096) (0·14) (0·11)

In social 0·58 0·51 0·57housing (0·035) (0·046) (0·035)Inverse 73 80 75birthweight (8·5) (12) (9·5)Mo’ age at −0·017 −0·011 −0·014birth (0·0037) (0·0053) (0·0042)Mo’ age 0·0020 0·00083 0·0017squared (0·00047) (0·00066) (0·00052)Mo’ age × 0·013 0·0026 0·0065housing (0·0054) (0·0075) (0·0059)Mo’ age sq’d × −0·00078 −0·00017 −0·00054housing (0·00068) (0·00094) (0·00075)Constant −1·6 −1·7 −1·6

(0·078) (0·11) (0·088)

45 / 48

Summary 46 / 48

Discussion

• Analysing partially observed data entails untestable assumptions about the missingness mechanism(s).

• An appropriate strategy is thus

◦ assume a missingness mechanism (consistent with the data and the investigators experience)

◦ perform a valid analysis under that assumption

◦ explore the robustness of inference to different assumptions

• MI provides a computationally convenient tool for a range of practical problems but (i) watch for

nonlinearities/interactions and (ii) be careful with large numbers of variables, especially with FCS

• IPW is probably too inefficient in most cases, but provides a useful check

• DR is an attractive approach, but estimators for complex data are not known; there remain issues with

stability of the weights (see [2] and talk by Vansteelant from www.missingdata.org.uk)

• Bringing DR and MI together is potentially attractive (see talk by Daniel from www.missingdata.org.uk)

46 / 48

19

References

[1] C A Bernaards, T R Belin, and J L Schafer. Robustness of a multivariate normal approximation for imputation of incomplete binarydata. Statistics in Medicine, 26(6):1368–1382, mar 2007.

[2] W. Cao, A. A. Tsiatis, and M. Davidian. Improving efficiency and robustness of the doubly robust estimator for a population meanwith incomplete data. Biometrika, 96:723–734, 2009.

[3] J Carpenter, S Pocock, and C J Lamm. Coping with missing data in clinical trials: a model based approach applied to asthma trials.Statistics in Medicine, 21:1043–1066, 2002.

[4] J R Carpenter, M G Kenward, and S Vansteelandt. A comparison of multiple imputation and inverse probability weighting foranalyses with missing data. Journal of the Royal Statistical Society, Series A (Statistics in Society), 169:571–584, 2006.

[5] J R Carpenter, G Rucker, and G Schwarzer. Assessing the sensitivity of meta-analysis to selection bias: a multiple imputationapproach. Biometrics, In press:XXX–XXX, 2010.

[6] J R Carpenter and M G Kenward. Missing data in clinical trials — a practical guide. Birmingham: National Health ServiceCo-ordinating Centre for Research Methodology. Freely downloadable from www.missingdata.org.uk, accessed 7 May 2010,2008.

[7] J R Carpenter, M G Kenward, and I R White. Sensitivity analysis after multiple imputation under missing at random — a weightingapproach. Statistical Methods in Medical Research, 16:259–275, 2007.

[8] J R Carpenter and Ian Plewis. Coming to terms with non-response in longitudinal studies. Under revision for The SAGE handbookof Methodological Innovation, editors Malcolm Williams and Paul Vogt, London: SAGE, 2010.

[9] A-W Chan and Douglas G Altman. Epidemiology and reporting of randomised trials published in PubMed journals. The Lancet,365:1159–1162, 2005.

[10] P J Diggle and M G Kenward. Informative dropout in longitudinal data analysis (with discussion). Journal of the Royal StatisticalSociety Series C (applied statistics), 43:49–94, 1994.

[11] H Goldstein, J R Carpenter, M G Kenward, and K Levin. Multilevel models with multivariate mixed response types. StatisticalModelling, 9:173–197, 2009.

[12] D Hawkes and I Plewis. Modelling non-response in the national child development study. Journal of the Royal Statistical Society,Series A, 169:479–491, 2006.

[13] J. D. Y. Kang and J. L. Schafer. Demystifying double robustness: A comparison of alternative strategies for estimating a populationmean from incomplete data (with discussion). Statistical Science, 22:523–539, 2007.

[14] M G Kenward and J R Carpenter. Multiple imputation: current perspectives. Statistical Methods in Medical Research, 16:199–218,2007.

[15] J K Kim, J M Brick, W A Fuller, and G Kalton. On the bias of the multiple-imputation variance estimator in survey sampling. Journalof the Royal Statistical Society, Series B (Statistical Methodology), 68:509–521, 2006.

47 / 48

20

References II

[16] M A Klebanoff and S R Cole. Use of multiple imputation in the epidemiologic literature. American Journal of Epidemiology,168:355–357, 2008.

[17] Katherine J Lee and John B Carlin. Multiple imputation for missing data: fully conditional specification versus multivariate normalimputation. American Journal of Epidemiology, pages 624–632, 2010.

[18] X L Meng. Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10:538–573,1994.

[19] U Nur, L G Shack, B Rachet, J R Carpenter, and M P Coleman. Modelling relative survival in the presence of incomplete data: atutorial International Journal of Epidemiology, pre-print online, doi:10.1093/ije/dyp309, 2009.

[20] I Plewis, L Calderwood, D Hawkes, and G Nathan. National Child Development Study and 1970 British Cohort Study TechnicalReport: Changes in the NCDS and BCS70 Populations and Samples over Time (1st ed.). London: Institute of Education, Universityof London, 2004.

[21] P Royston. Multiple imputation of missing values: further update of ice with emphasis on interval censoring. The Stata Journal,7:445–464, 2007.

[22] D B Rubin. Multiple imputation for nonresponse in surveys. New York: Wiley, 1987.

[23] J L Schafer. Analysis of incomplete multivariate data. London: Chapman and Hall, 1997.

[24] M Spratt, J A C Sterne, K Tilling, J R Carpenter, and J B Carlin. Strategies for Multiple Imputation in Longitudinal Studies. AmericanJournal of Epidemiology, 172:478–487, 2010.

[25] R J Steele, N Wang, and A E Raftery. Inference from multiple imputation for missing data using mixtures of normals. StatisticalMethodology, 7:351–365, 2010.

[26] J A C Sterne, I R White, J B Carlin, M Spratt, P Royston, M G Kenward, A M Wood, and J R Carpenter. Multiple imputation formissing data in epidemiological and clinical research: potential and pitfalls. British Medical Journal, 339:157–160, 2009.

[27] M Tanner and W Wong. The calculation of posterior distributions by Data Augmentation (with discussion). Journal of the AmericanStatistical Association, 82:528–550, 1987.

[28] A A Tsiatis. Semiparametric Theory and Missing Data. Springer, New York, 2006.

[29] Paul T von Hippel. How to impute interactions, squares and other transformed variables. Sociological Methodology, 39:265–291,2009.

[30] N Wang and J M Robins. Large-sample theory for parametric multiple imputation procedures. Biometrika, 85:935–948, 1998.

[31] I R White and P Royston. Imputing missing covariate values for the cox model. Statistics in Medicine, 28:1982–1998, 2009.

[32] Angela M Wood, Ian R White, and Simon G Thompson. Are missing outcome data adequately handled? A review of publishedrandomized controlled trials in major medical journals. Clinical Trials, 1:368–376, 2004.

48 / 48

21