Handling Data with Three Types of Missing Values

download Handling Data with Three Types of Missing Values

of 33

Transcript of Handling Data with Three Types of Missing Values

  • 7/31/2019 Handling Data with Three Types of Missing Values

    1/33

    Missing DataMultiple Imputation

    Proposed Research

    Handling Data with Three Types of Missing Values

    Jennifer Boyko

    Department of StatisticsUniversity of Connecticut

    Storrs, CT

    Jennifer Boyko Handling Data with Three Types of Missing Values 1 / 3 3

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    2/33

    Missing DataMultiple Imputation

    Proposed Research

    Outline

    1 Missing DataProblemCharacterizationMethods for Handling

    2 Multiple ImputationStandard MITwo Stage MI

    3 Proposed ResearchProcedureCombining RulesIgnorability and Rates of Missing InformationApplication

    4 Conclusion

    Jennifer Boyko Handling Data with Three Types of Missing Values 2 / 3 3

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    3/33

    Missing DataMultiple Imputation

    Proposed Research

    ProblemCharacterizationMethods for Handling

    The Missing Data Problem

    Present in many areas of research

    Small amounts can cause issues (Belin, 2009)Most statistical package defaults use complete case analysis

    Problems include

    biasinefficiency

    unrealistic standard errors

    Jennifer Boyko Handling Data with Three Types of Missing Values 3 / 3 3

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    4/33

    Missing DataMultiple Imputation

    Proposed Research

    ProblemCharacterizationMethods for Handling

    Pattern of Missingness

    Maps which values are missing in a data set

    Figure: Schafer & Graham (2002)Jennifer Boyko Handling Data with Three Types of Missing Values 4 / 3 3

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    5/33

    Mi i D t P bl

  • 7/31/2019 Handling Data with Three Types of Missing Values

    6/33

    Missing DataMultiple Imputation

    Proposed Research

    ProblemCharacterizationMethods for Handling

    Mechanisms of Missingness

    Missing At Random (MAR)

    P(R|Y, ) = P(R|Yobs, )Missingness depends on observed values of Y only

    Missing Completely At Random (MCAR)P(R|Y, ) = P(R, )Missingness not dependent on observed or unobserved valuesof YSpecial case of MAR

    Missing Not At Random (MNAR)Occurs when condition of MAR is violatedMissingness is dependent on Ymis or some unobserved covariate

    Jennifer Boyko Handling Data with Three Types of Missing Values 6 / 3 3

    Missing Data Problem

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    7/33

    Missing DataMultiple Imputation

    Proposed Research

    ProblemCharacterizationMethods for Handling

    Ignorability

    A missing data mechanism is classified as ignorable if twoconditions are met:

    1 The data must be MAR or MCAR2 and must be distinct

    P(, ) = P()P()Joint parameter space is the Cartesian cross-product of theindividual parameter spaces

    Ignorability representes the weakest set of conditions under whichthe distribution of R does not need to be considered in Bayesian orlikelihood-based inference of (Rubin, 1976)

    Jennifer Boyko Handling Data with Three Types of Missing Values 7 / 3 3

    Missing Data Problem

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    8/33

    Missing DataMultiple Imputation

    Proposed Research

    ProblemCharacterizationMethods for Handling

    Older Methods

    Complete Case Analysis (CCA)

    Can produce biased resultsDefault in many statistical packages

    Loss of information

    Single Imputation

    Fills in missing values with plausible valuesImputing unconditional meansHot deck imputationConditional mean imputationLast Observation Carried Forward (LOCF)

    Jennifer Boyko Handling Data with Three Types of Missing Values 8 / 3 3

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    9/33

    Missing DataS d d MI

  • 7/31/2019 Handling Data with Three Types of Missing Values

    10/33

    Missing DataMultiple Imputation

    Proposed Research

    Standard MITwo Stage MI

    Standard Multiple Imputation

    Multiple imputation (Rubin, 1987) uses a three step process to

    analyze incomplete data sets:1 Imputation

    2 Analysis

    3 Combination

    Jennifer Boyko Handling Data with Three Types of Missing Values 10/33

    Missing DataSt d d MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    11/33

    ss g tMultiple Imputation

    Proposed Research

    Standard MITwo Stage MI

    Imputation Stage

    Idea: fill inm

    > 1 plausible values for the missing data toaccount for model uncertainty

    Create m complete data sets by drawing from the posteriorpredictive distribution of the missing values

    Jennifer Boyko Handling Data with Three Types of Missing Values 11/33

    Missing DataStandard MI

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    12/33

    gMultiple Imputation

    Proposed Research

    Standard MITwo Stage MI

    Analysis Stage

    Analyze each of the m data sets using complete data methods

    Let Q denote the parameter of interestLet Q be the complete data estimate

    Let U be the variance of Q

    Assumption: (Q

    Q)/

    U

    N(0, 1)

    Jennifer Boyko Handling Data with Three Types of Missing Values 12/33

    Missing DataStandard MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    13/33

    gMultiple Imputation

    Proposed Research

    Standard MITwo Stage MI

    Combination Stage

    Q =1

    m

    m

    j=1Q(j)

    U =1

    m

    mj=1

    U(j)

    B =1

    m 1

    m

    j=1

    Q(j) Q2

    T = U + (1 + m1)B

    Jennifer Boyko Handling Data with Three Types of Missing Values 13/33

    Missing DataStandard MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    14/33

    Multiple ImputationProposed Research

    Standard MITwo Stage MI

    Combination Stage

    (Q

    Q)

    T t

    = (m 1)1 +U

    (1 + m1)B2

    Jennifer Boyko Handling Data with Three Types of Missing Values 14/33

    Missing DataM l i l I i

    Standard MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    15/33

    Multiple ImputationProposed Research

    Standard MITwo Stage MI

    Benefits of Multiple Imputation

    Adds variability to the imputed values

    Uses standard data analysis procedures after imputation

    Can be very efficient

    Can use the same set of imputations for several analyses

    Jennifer Boyko Handling Data with Three Types of Missing Values 15/33

    Missing DataM lti l I t ti

    Standard MI

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    16/33

    Multiple ImputationProposed Research

    STwo Stage MI

    Two Stage Multiple Imputation

    Two stage multiple imputation (Harel, 2009) considers a situationwhere we can have data missing for two different reasons

    Dropout in a longitudinal study vs. intermittent missingfollow-up

    Refusal to answer a question vs. a dont know response

    Latent variable vs. missing planned observed values

    Death vs. dropout for other reasonsUnit nonresponse vs. item nonresponse

    Jennifer Boyko Handling Data with Three Types of Missing Values 16/33

    Missing DataMultiple Imputation

    Standard MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    17/33

    Multiple ImputationProposed Research

    Two Stage MI

    Computational Efficiency

    Originally developed by Shen (2000) with the intention ofimproving computational efficiency.

    Y1 Y2 Y3 Y4 Y5?

    ?

    ?

    ? ? ? ? ?? ? ? ? ?? ? ? ? ?? ? ? ? ?

    Jennifer Boyko Handling Data with Three Types of Missing Values 17/33

    Missing DataMultiple Imputation

    Standard MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    18/33

    Multiple ImputationProposed Research

    Two Stage MI

    Procedure

    Imputation step is broken into two stages:

    1 First draw m imputations of YAmis2 Conditioned on YAmis, draw n imputations of Y

    Bmis

    Yields a total of mn completed data sets

    Jennifer Boyko Handling Data with Three Types of Missing Values 18/33

    Missing DataMultiple Imputation

    Standard MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    19/33

    Multiple ImputationProposed Research

    Two Stage MI

    Two Stage MI Combining Rules

    Q =1

    mn

    mj=1

    nk=1

    Q(j,k)

    U = 1mn

    mj=1

    nk=1

    U(j,k)

    B =1

    m 1m

    j=1 Qj. Q..

    2

    W =1

    m(n 1)m

    j=1

    nk=1

    Q(j,k) Qj.

    2T = U + (1 + m1)B + (1 n1)W

    Jennifer Boyko Handling Data with Three Types of Missing Values 19/33

    Missing DataMultiple Imputation

    Standard MIT S MI

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    20/33

    Multiple ImputationProposed Research

    Two Stage MI

    Two Stage MI Combining Rules

    Q QT

    t

    1

    =1

    m(n

    1)

    (1 1/n)WT

    2

    +1

    m

    1

    (1 + 1/m)B

    T 2

    Jennifer Boyko Handling Data with Three Types of Missing Values 20/33

    Missing DataMultiple Imputation

    Standard MIT St MI

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    21/33

    p pProposed Research

    Two Stage MI

    Benefits

    Can simplify imputation computationally

    Able to quantify how much missing information is due to eachtype of missing value which can help in planning future studies

    Allows for different mechanisms of missingness for each typeof missing value (one ignorable and one nonignorable type ofmissing data)

    Jennifer Boyko Handling Data with Three Types of Missing Values 21/33

    Missing DataMultiple Imputation

    ProcedureCombining Rules

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    22/33

    p pProposed Research

    gIgnorability and Rates of Missing Information

    Proposed Research

    1 Multiple imputation in three stages including derivation of

    combining rules

    2 Ignorability and rates of missing information

    3 Application of methodology to cognitive functioning data

    Jennifer Boyko Handling Data with Three Types of Missing Values 22/33

    Missing DataMultiple Imputation

    ProcedureCombining Rules

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    23/33

    Proposed Research Ignorability and Rates of Missing Information

    Benefits

    Extend the benefits of two stage MI to allow for greaterspecificity regarding the data analysis

    Allows for missing data to be of three different types

    Allows for three different assumptions of the mechanisms ofmissingness

    Can quantify the variability and missing information due to

    each type of missing value

    Jennifer Boyko Handling Data with Three Types of Missing Values 23/33

    Missing DataMultiple Imputation

    ProcedureCombining Rules

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    24/33

    Proposed Research Ignorability and Rates of Missing Information

    Example 1

    Example of missing data due to dropout, intermittent missingness,and a missing covariate

    Y1 Y2 Y3 Y4 Y5?

    ?

    ??

    ? ?? ? ?

    ? ?? ? ?

    Y1 Y2 Y3 Y4 Y5A

    B

    CB

    A BC C C

    B CC C C

    Jennifer Boyko Handling Data with Three Types of Missing Values 24/33

    Missing DataMultiple Imputation

    P d R h

    ProcedureCombining RulesI bili d R f Mi i I f i

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    25/33

    Proposed Research Ignorability and Rates of Missing Information

    Example 2

    Example with missing values due to item nonresponse, unitnonresponse, and latent class

    Y1 Y2 Y3 Y4 Y5? ??? ???? ?

    ? ?? ? ? ? ?? ? ? ? ?? ? ? ? ?? ? ? ? ?

    Y1 Y2 Y3 Y4 Y5A BAA BAAA B

    A BA C C C CA C C C CA C C C CA C C C C

    Jennifer Boyko Handling Data with Three Types of Missing Values 25/33

    Missing DataMultiple Imputation

    P d R h

    ProcedureCombining RulesI bilit d R t f Mi i I f ti

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    26/33

    Proposed Research Ignorability and Rates of Missing Information

    Process

    Same as standard and two stage MI but with three stages in theimputation step and different combining rules

    1 Impute L values of YAmis2 Conditioned on YAmis, impute M values of Y

    Bmis

    3 Conditioned on YAmis and YBmis, impute N values of Y

    Cmis

    Yields a total of LMN completed data sets

    A second, but equivalent, method draws simultaneously from thejoint distribution of YAmis, Y

    Bmis, and Y

    Cmis

    Jennifer Boyko Handling Data with Three Types of Missing Values 26/33

    Missing DataMultiple Imputation

    Proposed Research

    ProcedureCombining RulesIgnorability and Rates of Missing Information

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    27/33

    Proposed Research Ignorability and Rates of Missing Information

    Three Stage MI Combining Rules

    Q =1

    LMN

    Ll=1

    Mm=1

    Nn=1

    Q(l,m,n)

    U =

    1

    LMN

    Ll=1

    Mm=1

    Nn=1

    U(l,m,n)

    B =1

    L 1L

    l=1

    Ql.. Q...

    2

    W1 =1

    L(M 1)L

    l=1

    Mm=1

    Qlm. Ql..

    2

    W2 =1

    LM(N

    1)

    L

    l=1

    M

    m=1

    N

    n=1

    Q(l,m,n) Qlm.2

    Jennifer Boyko Handling Data with Three Types of Missing Values 27/33

    Missing DataMultiple Imputation

    Proposed Research

    ProcedureCombining RulesIgnorability and Rates of Missing Information

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    28/33

    Proposed Research Ignorability and Rates of Missing Information

    Three Stage MI Combining Rules

    T = U + (1 + L1)B + (1 M1)W1 + (1 N1)W2

    1 =

    1 + 1

    L

    B

    T

    2(L 1)1 +

    1 1

    M

    W1

    T

    2(L(M 1))1

    +1

    1

    NW2

    T2

    (LM(N 1))1

    Jennifer Boyko Handling Data with Three Types of Missing Values 28/33

    Missing DataMultiple Imputation

    Proposed Research

    ProcedureCombining RulesIgnorability and Rates of Missing Information

    http://find/http://goback/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    29/33

    Proposed Research Ignorability and Rates of Missing Information

    Ignorability

    Extension of Rubins theory of MAR and ignorability aspresented in Rubin (1976)

    Harel & Schafer (2009) present an extension to two types ofmissing values

    Conditional ignorability; possible to define weaker conditionsunder which M+ can be ignored in one or more stages

    Jennifer Boyko Handling Data with Three Types of Missing Values 29/33

    Missing DataMultiple Imputation

    Proposed Research

    ProcedureCombining RulesIgnorability and Rates of Missing Information

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    30/33

    Proposed Research Ignorability and Rates of Missing Information

    Rates of Missing Information

    Helps with determination of number of imputations requiredat each stage

    Small numbers of imputations are required when the main

    concern is relative efficiency of point estimatesEstimates for rates of missing information can be noisy forsmall numbers of imputations

    Derivation of the asymptotic distribution of rates of missing

    informationI will derive the estimates and asymptotic distribution for therates of missing information for three types of missing values

    Jennifer Boyko Handling Data with Three Types of Missing Values 30/33

    Missing DataMultiple Imputation

    Proposed Research

    ProcedureCombining RulesIgnorability and Rates of Missing Information

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    31/33

    p g y g

    Application

    Cognitive functioning data

    Three types of missing values will be dropout due todementia, dropout due to death unrelated to dementia, andan intermittently missing covariate

    Large amounts of missing data are common in studies ofcognitive functioning (Coley et al., 2011)

    Jennifer Boyko Handling Data with Three Types of Missing Values 31/33

    Missing DataMultiple Imputation

    Proposed Research

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    32/33

    p

    Conclusion

    Applicable in analysis of many types of data sets

    Allows researchers to quantify amount of variance attributableto each type of missing value

    Informative in analysis of data and planning of future studies

    Jennifer Boyko Handling Data with Three Types of Missing Values 32/33

    Missing DataMultiple Imputation

    Proposed Research

    http://find/
  • 7/31/2019 Handling Data with Three Types of Missing Values

    33/33

    Belin, T. (2009). Missing data: what a little can do and whatresearchers can do in response. American Journal of Opthalmology

    148, 820822.Coley, N. et al. (2011). How should we deal with missing data in

    clinical trials involving alzheimers disease patients? CurrentAlzheimers Research 8, 421433.

    Harel, O. (2009). Strategies for Data Analysis with Two Types of

    Missing Values: From Theory to Application. Saarbrucken, Germany:Lambert Academic Publishing.

    Harel, O. & Schafer, J. L. (2009). Partial and latent ignorability inmissing-data problems. Biometrika 96, 3750.

    Rubin, D. B. (1976). Inference and missing data. Biometrika 64,

    581592.Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys.

    Hoboken, New Jersey: John Wiley & Sons, Ltd, 1st ed.

    Shen, Z. (2000). Nested Multiple Imputation. Ph.D. thesis, Departmentof Statistics, Harvard University, Cambridge, MA.

    Jennifer Boyko Handling Data with Three Types of Missing Values 33/33

    http://find/