Handling Data with Three Types of Missing Values
Transcript of Handling Data with Three Types of Missing Values
-
7/31/2019 Handling Data with Three Types of Missing Values
1/33
Missing DataMultiple Imputation
Proposed Research
Handling Data with Three Types of Missing Values
Jennifer Boyko
Department of StatisticsUniversity of Connecticut
Storrs, CT
Jennifer Boyko Handling Data with Three Types of Missing Values 1 / 3 3
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
2/33
Missing DataMultiple Imputation
Proposed Research
Outline
1 Missing DataProblemCharacterizationMethods for Handling
2 Multiple ImputationStandard MITwo Stage MI
3 Proposed ResearchProcedureCombining RulesIgnorability and Rates of Missing InformationApplication
4 Conclusion
Jennifer Boyko Handling Data with Three Types of Missing Values 2 / 3 3
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
3/33
Missing DataMultiple Imputation
Proposed Research
ProblemCharacterizationMethods for Handling
The Missing Data Problem
Present in many areas of research
Small amounts can cause issues (Belin, 2009)Most statistical package defaults use complete case analysis
Problems include
biasinefficiency
unrealistic standard errors
Jennifer Boyko Handling Data with Three Types of Missing Values 3 / 3 3
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
4/33
Missing DataMultiple Imputation
Proposed Research
ProblemCharacterizationMethods for Handling
Pattern of Missingness
Maps which values are missing in a data set
Figure: Schafer & Graham (2002)Jennifer Boyko Handling Data with Three Types of Missing Values 4 / 3 3
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
5/33
Mi i D t P bl
-
7/31/2019 Handling Data with Three Types of Missing Values
6/33
Missing DataMultiple Imputation
Proposed Research
ProblemCharacterizationMethods for Handling
Mechanisms of Missingness
Missing At Random (MAR)
P(R|Y, ) = P(R|Yobs, )Missingness depends on observed values of Y only
Missing Completely At Random (MCAR)P(R|Y, ) = P(R, )Missingness not dependent on observed or unobserved valuesof YSpecial case of MAR
Missing Not At Random (MNAR)Occurs when condition of MAR is violatedMissingness is dependent on Ymis or some unobserved covariate
Jennifer Boyko Handling Data with Three Types of Missing Values 6 / 3 3
Missing Data Problem
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
7/33
Missing DataMultiple Imputation
Proposed Research
ProblemCharacterizationMethods for Handling
Ignorability
A missing data mechanism is classified as ignorable if twoconditions are met:
1 The data must be MAR or MCAR2 and must be distinct
P(, ) = P()P()Joint parameter space is the Cartesian cross-product of theindividual parameter spaces
Ignorability representes the weakest set of conditions under whichthe distribution of R does not need to be considered in Bayesian orlikelihood-based inference of (Rubin, 1976)
Jennifer Boyko Handling Data with Three Types of Missing Values 7 / 3 3
Missing Data Problem
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
8/33
Missing DataMultiple Imputation
Proposed Research
ProblemCharacterizationMethods for Handling
Older Methods
Complete Case Analysis (CCA)
Can produce biased resultsDefault in many statistical packages
Loss of information
Single Imputation
Fills in missing values with plausible valuesImputing unconditional meansHot deck imputationConditional mean imputationLast Observation Carried Forward (LOCF)
Jennifer Boyko Handling Data with Three Types of Missing Values 8 / 3 3
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
9/33
Missing DataS d d MI
-
7/31/2019 Handling Data with Three Types of Missing Values
10/33
Missing DataMultiple Imputation
Proposed Research
Standard MITwo Stage MI
Standard Multiple Imputation
Multiple imputation (Rubin, 1987) uses a three step process to
analyze incomplete data sets:1 Imputation
2 Analysis
3 Combination
Jennifer Boyko Handling Data with Three Types of Missing Values 10/33
Missing DataSt d d MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
11/33
ss g tMultiple Imputation
Proposed Research
Standard MITwo Stage MI
Imputation Stage
Idea: fill inm
> 1 plausible values for the missing data toaccount for model uncertainty
Create m complete data sets by drawing from the posteriorpredictive distribution of the missing values
Jennifer Boyko Handling Data with Three Types of Missing Values 11/33
Missing DataStandard MI
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
12/33
gMultiple Imputation
Proposed Research
Standard MITwo Stage MI
Analysis Stage
Analyze each of the m data sets using complete data methods
Let Q denote the parameter of interestLet Q be the complete data estimate
Let U be the variance of Q
Assumption: (Q
Q)/
U
N(0, 1)
Jennifer Boyko Handling Data with Three Types of Missing Values 12/33
Missing DataStandard MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
13/33
gMultiple Imputation
Proposed Research
Standard MITwo Stage MI
Combination Stage
Q =1
m
m
j=1Q(j)
U =1
m
mj=1
U(j)
B =1
m 1
m
j=1
Q(j) Q2
T = U + (1 + m1)B
Jennifer Boyko Handling Data with Three Types of Missing Values 13/33
Missing DataStandard MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
14/33
Multiple ImputationProposed Research
Standard MITwo Stage MI
Combination Stage
(Q
Q)
T t
= (m 1)1 +U
(1 + m1)B2
Jennifer Boyko Handling Data with Three Types of Missing Values 14/33
Missing DataM l i l I i
Standard MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
15/33
Multiple ImputationProposed Research
Standard MITwo Stage MI
Benefits of Multiple Imputation
Adds variability to the imputed values
Uses standard data analysis procedures after imputation
Can be very efficient
Can use the same set of imputations for several analyses
Jennifer Boyko Handling Data with Three Types of Missing Values 15/33
Missing DataM lti l I t ti
Standard MI
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
16/33
Multiple ImputationProposed Research
STwo Stage MI
Two Stage Multiple Imputation
Two stage multiple imputation (Harel, 2009) considers a situationwhere we can have data missing for two different reasons
Dropout in a longitudinal study vs. intermittent missingfollow-up
Refusal to answer a question vs. a dont know response
Latent variable vs. missing planned observed values
Death vs. dropout for other reasonsUnit nonresponse vs. item nonresponse
Jennifer Boyko Handling Data with Three Types of Missing Values 16/33
Missing DataMultiple Imputation
Standard MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
17/33
Multiple ImputationProposed Research
Two Stage MI
Computational Efficiency
Originally developed by Shen (2000) with the intention ofimproving computational efficiency.
Y1 Y2 Y3 Y4 Y5?
?
?
? ? ? ? ?? ? ? ? ?? ? ? ? ?? ? ? ? ?
Jennifer Boyko Handling Data with Three Types of Missing Values 17/33
Missing DataMultiple Imputation
Standard MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
18/33
Multiple ImputationProposed Research
Two Stage MI
Procedure
Imputation step is broken into two stages:
1 First draw m imputations of YAmis2 Conditioned on YAmis, draw n imputations of Y
Bmis
Yields a total of mn completed data sets
Jennifer Boyko Handling Data with Three Types of Missing Values 18/33
Missing DataMultiple Imputation
Standard MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
19/33
Multiple ImputationProposed Research
Two Stage MI
Two Stage MI Combining Rules
Q =1
mn
mj=1
nk=1
Q(j,k)
U = 1mn
mj=1
nk=1
U(j,k)
B =1
m 1m
j=1 Qj. Q..
2
W =1
m(n 1)m
j=1
nk=1
Q(j,k) Qj.
2T = U + (1 + m1)B + (1 n1)W
Jennifer Boyko Handling Data with Three Types of Missing Values 19/33
Missing DataMultiple Imputation
Standard MIT S MI
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
20/33
Multiple ImputationProposed Research
Two Stage MI
Two Stage MI Combining Rules
Q QT
t
1
=1
m(n
1)
(1 1/n)WT
2
+1
m
1
(1 + 1/m)B
T 2
Jennifer Boyko Handling Data with Three Types of Missing Values 20/33
Missing DataMultiple Imputation
Standard MIT St MI
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
21/33
p pProposed Research
Two Stage MI
Benefits
Can simplify imputation computationally
Able to quantify how much missing information is due to eachtype of missing value which can help in planning future studies
Allows for different mechanisms of missingness for each typeof missing value (one ignorable and one nonignorable type ofmissing data)
Jennifer Boyko Handling Data with Three Types of Missing Values 21/33
Missing DataMultiple Imputation
ProcedureCombining Rules
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
22/33
p pProposed Research
gIgnorability and Rates of Missing Information
Proposed Research
1 Multiple imputation in three stages including derivation of
combining rules
2 Ignorability and rates of missing information
3 Application of methodology to cognitive functioning data
Jennifer Boyko Handling Data with Three Types of Missing Values 22/33
Missing DataMultiple Imputation
ProcedureCombining Rules
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
23/33
Proposed Research Ignorability and Rates of Missing Information
Benefits
Extend the benefits of two stage MI to allow for greaterspecificity regarding the data analysis
Allows for missing data to be of three different types
Allows for three different assumptions of the mechanisms ofmissingness
Can quantify the variability and missing information due to
each type of missing value
Jennifer Boyko Handling Data with Three Types of Missing Values 23/33
Missing DataMultiple Imputation
ProcedureCombining Rules
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
24/33
Proposed Research Ignorability and Rates of Missing Information
Example 1
Example of missing data due to dropout, intermittent missingness,and a missing covariate
Y1 Y2 Y3 Y4 Y5?
?
??
? ?? ? ?
? ?? ? ?
Y1 Y2 Y3 Y4 Y5A
B
CB
A BC C C
B CC C C
Jennifer Boyko Handling Data with Three Types of Missing Values 24/33
Missing DataMultiple Imputation
P d R h
ProcedureCombining RulesI bili d R f Mi i I f i
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
25/33
Proposed Research Ignorability and Rates of Missing Information
Example 2
Example with missing values due to item nonresponse, unitnonresponse, and latent class
Y1 Y2 Y3 Y4 Y5? ??? ???? ?
? ?? ? ? ? ?? ? ? ? ?? ? ? ? ?? ? ? ? ?
Y1 Y2 Y3 Y4 Y5A BAA BAAA B
A BA C C C CA C C C CA C C C CA C C C C
Jennifer Boyko Handling Data with Three Types of Missing Values 25/33
Missing DataMultiple Imputation
P d R h
ProcedureCombining RulesI bilit d R t f Mi i I f ti
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
26/33
Proposed Research Ignorability and Rates of Missing Information
Process
Same as standard and two stage MI but with three stages in theimputation step and different combining rules
1 Impute L values of YAmis2 Conditioned on YAmis, impute M values of Y
Bmis
3 Conditioned on YAmis and YBmis, impute N values of Y
Cmis
Yields a total of LMN completed data sets
A second, but equivalent, method draws simultaneously from thejoint distribution of YAmis, Y
Bmis, and Y
Cmis
Jennifer Boyko Handling Data with Three Types of Missing Values 26/33
Missing DataMultiple Imputation
Proposed Research
ProcedureCombining RulesIgnorability and Rates of Missing Information
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
27/33
Proposed Research Ignorability and Rates of Missing Information
Three Stage MI Combining Rules
Q =1
LMN
Ll=1
Mm=1
Nn=1
Q(l,m,n)
U =
1
LMN
Ll=1
Mm=1
Nn=1
U(l,m,n)
B =1
L 1L
l=1
Ql.. Q...
2
W1 =1
L(M 1)L
l=1
Mm=1
Qlm. Ql..
2
W2 =1
LM(N
1)
L
l=1
M
m=1
N
n=1
Q(l,m,n) Qlm.2
Jennifer Boyko Handling Data with Three Types of Missing Values 27/33
Missing DataMultiple Imputation
Proposed Research
ProcedureCombining RulesIgnorability and Rates of Missing Information
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
28/33
Proposed Research Ignorability and Rates of Missing Information
Three Stage MI Combining Rules
T = U + (1 + L1)B + (1 M1)W1 + (1 N1)W2
1 =
1 + 1
L
B
T
2(L 1)1 +
1 1
M
W1
T
2(L(M 1))1
+1
1
NW2
T2
(LM(N 1))1
Jennifer Boyko Handling Data with Three Types of Missing Values 28/33
Missing DataMultiple Imputation
Proposed Research
ProcedureCombining RulesIgnorability and Rates of Missing Information
http://find/http://goback/ -
7/31/2019 Handling Data with Three Types of Missing Values
29/33
Proposed Research Ignorability and Rates of Missing Information
Ignorability
Extension of Rubins theory of MAR and ignorability aspresented in Rubin (1976)
Harel & Schafer (2009) present an extension to two types ofmissing values
Conditional ignorability; possible to define weaker conditionsunder which M+ can be ignored in one or more stages
Jennifer Boyko Handling Data with Three Types of Missing Values 29/33
Missing DataMultiple Imputation
Proposed Research
ProcedureCombining RulesIgnorability and Rates of Missing Information
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
30/33
Proposed Research Ignorability and Rates of Missing Information
Rates of Missing Information
Helps with determination of number of imputations requiredat each stage
Small numbers of imputations are required when the main
concern is relative efficiency of point estimatesEstimates for rates of missing information can be noisy forsmall numbers of imputations
Derivation of the asymptotic distribution of rates of missing
informationI will derive the estimates and asymptotic distribution for therates of missing information for three types of missing values
Jennifer Boyko Handling Data with Three Types of Missing Values 30/33
Missing DataMultiple Imputation
Proposed Research
ProcedureCombining RulesIgnorability and Rates of Missing Information
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
31/33
p g y g
Application
Cognitive functioning data
Three types of missing values will be dropout due todementia, dropout due to death unrelated to dementia, andan intermittently missing covariate
Large amounts of missing data are common in studies ofcognitive functioning (Coley et al., 2011)
Jennifer Boyko Handling Data with Three Types of Missing Values 31/33
Missing DataMultiple Imputation
Proposed Research
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
32/33
p
Conclusion
Applicable in analysis of many types of data sets
Allows researchers to quantify amount of variance attributableto each type of missing value
Informative in analysis of data and planning of future studies
Jennifer Boyko Handling Data with Three Types of Missing Values 32/33
Missing DataMultiple Imputation
Proposed Research
http://find/ -
7/31/2019 Handling Data with Three Types of Missing Values
33/33
Belin, T. (2009). Missing data: what a little can do and whatresearchers can do in response. American Journal of Opthalmology
148, 820822.Coley, N. et al. (2011). How should we deal with missing data in
clinical trials involving alzheimers disease patients? CurrentAlzheimers Research 8, 421433.
Harel, O. (2009). Strategies for Data Analysis with Two Types of
Missing Values: From Theory to Application. Saarbrucken, Germany:Lambert Academic Publishing.
Harel, O. & Schafer, J. L. (2009). Partial and latent ignorability inmissing-data problems. Biometrika 96, 3750.
Rubin, D. B. (1976). Inference and missing data. Biometrika 64,
581592.Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys.
Hoboken, New Jersey: John Wiley & Sons, Ltd, 1st ed.
Shen, Z. (2000). Nested Multiple Imputation. Ph.D. thesis, Departmentof Statistics, Harvard University, Cambridge, MA.
Jennifer Boyko Handling Data with Three Types of Missing Values 33/33
http://find/