1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent...

1

Practical Approaches for Dealing with Missing Data in

Longitudinal Analyses of Adolescent Addiction Programs

Michael Dennis, Ph.D., Chestnut Health Systems, Bloomington, IL

Presentation at the Advisory Committee Meeting for the “Economic Evaluation Methods: Development and Applications (R01 DA018645)”. Cocunut Grove, FL, November 10-11, 2006. Preparation of this

manuscript was supported by funding from the Center for Substance Abuse Treatment (CSAT Contract no. 270-2003-00006). The content of this presentation are the opinions of the author and do not reflect the

views or policies of the government. Available on line at www.chestnut.org/LI/Posters or by contacting Joan Unsicker at 720 West Chestnut, Bloomington, IL 61701, phone: (309) 827-6026, fax:(309) 829-4661, e-

Mail: [email protected]

2

This presentation provides..

• A quick review of the problems of missingness and methods of imputation based on Schafer 2002

• A summary of the practical approach chestnut uses to deal with missing data

• Focus here is on the conceptual issues and actual effectiveness – not the math or computation formula per se

3

Types of Missingness

• By design

• Logical skipouts

• Item missing

• Wave missing

• Unobserved latent constructs

4

Key Terms (From Rubin)

• Missing Completely at Random (MCAR): No relationship to predictors or dependent variables

• Missing at Random (MAR): No relationship with dependent variable (can be predicted)

• Missing Not at Random (MNAR): Related to predictors and or dependent variables

5

The Problem With Listwise Deletion (default)

Source: Schafer (2002)

Each Estimate are Increasingly biased as we move away from

MCAR

Smaller SD inflates significance tests

Unstable

Changes correlations & Relationships

Loss of sample is also problematic for multivariate analyses

6

Pair-wise

• Pair-wise is particularly efficient and unbiased under the assumption of MCAR

• Becomes rapidly unstable even under MAR

• Often narrows covariance or variance estimates and distorts relationship in regression or structural equation model (SEM)

7

Problems with other common methods of replacement

Source: Schafer & Graham (2002)

Mean Subst.

Narrows Variance

Reg. Est. Still

Narrows Variance

Only models

using real variance

are relatively unbiased

Hot Deck better but

still biased

8

Examples of Predictive

• Weighted hot-deck: sort people based on related variables, then randomly replace

• Maximum Likelihood (ML): predict from all other available data.

• Restricted Maximum Likelihood (RML): predict from all other available data within the same condition (site, time, etc) to preserve differences

• Multiple imputations: Average over several imputations – a form of boot strapping that does not assume a normal distribution

9

Problem with these methods…

• Complicated on many variables and/or for multiple analyses

• All methods have unknown biases under MNAR unless there is a know a-priori basis for modeling missingness (e.g.. A common factor)

• In longitudinal analysis, this includes knowing the expected trajectory over time.

10

Chestnut Strategy 1: Minimize it

• Train, monitoring and do quality assurance to get staff to minimize data

• Use simple logical skips to minimize not applicable questions and burden

• Differentiate between refusals (rare), don’t knows (more common) and skip outs (common) – track and do problem solving if refusals start occurring on specific items (which is MNAR)

• Put more effort into follow-up

11

Follow-up Rates are PRIMARILY related to effort

Source: Scott (2004)

12

Accepting a lower follow-up rate “biases” results

Source: Scott (2004)

• The easiest to find people are different on the outcome – which is MNAR

• The differences are as or larger as the treatment effects we are looking for

13

Strategy 2: Make Logical Edits

1. Design questionnaire so that there are clear simple logical edits with implied value

2. Test logic of edits (all do not work, e.g., M1)

3. Replace logical skip outs with implied value

4. Test logic of complex edits to create summary measures (all do not work, e.g.., NHSDA)

5. Make complex edits

14

Strategy 3: Replace missing data within known factors

• Recall that this was one of the few ways to deal with MNAR

• Know common factors should have a Cronbach’s alpha of at least .7

• Evaluate amount of missing – ‾ by design (e.g., adding an item in a new version) is

MCAR,

‾ systematic refusal is MNAR.

• Calculate scale as mean of valid items x expected number of items. (Require at least 3 valid)

• Generally do above within subscale, then sum up to higher order scales

15

PERSONS MAP OF ITEMS <more>|<rare> 2 TRUNCATED.### | ## | .## | . | HlthProbs .## |T 1 .## + .## S| .### | .### |S Withdrawal/ill .#### | ProbW/Law .###### | Unsafe GiveUpActs DespiteMedPsyProbs .#### | DepressedNervous NeededMoreAOD UnableCutDown 0 .###### +M .###### | ResponNotMet LargerAmnt/more .####### | .############ M| HideWhenUseAOD Fights/trouble .###### |S SpentTimeGetting .####### | .###### | ParentComplained -1 .###### + .##### |T WeeklyAOD . | .###### | .#### | . S| .###### | -2 . + .#### |

.##### | -3 TRUNCATED + -4 .############ + EACH '#' is 24

Example: GAIN Substance Problems Scale (SPS)

Rasch Model Demonstrating Severity of Items are NOT Equal

Source: Riley et al (in press)

16

Use of Rasch Measurement Model / Computer Adaptive Tests (CAT) models

GAIN Substance Problem Scale (SPS)

MeasureW

ithdrawal Sym

ptoms

Frequency of Use

Em

otional Problem

s

Recovery E

nvironment

Health P

roblems

Symptom Count (16) 0.53 0.38 0.36 0.37 0.19

Full Rasch (16) 0.54 0.43 0.41 0.39 0.22

CAT (5-11 items) 0.57 0.45 0.44 0.40 0.23

CAT can closely

approximate with a fraction

of items

Weighting items with

Rasch Does a Little Better

Construct validation: Comparing alternative

measures to “expected” correlates

Source: Riley et al (in press)

17

Strategy 4: Replace structural missing data (e.g.., by site)

• Where data is missing structurally by design (i.e., MCAR), use regression to impute value based on correlated factors in other sites (seeking formula with 70% or more of variance explained).

• Simple regression if small percent of data (under 5%)

• As the amount of missing data goes up to 15%, it is worth considering the use of ML or MI

• Above 15% missing, all methods are questionable

• At this point we usually have less than 1% missing within wave, but 5-20% or more by wave

18

Strategy 5: Replacement within wave

• Identify remaining items with more than 1-2% missing and the feasibility of replacing via regression (or ML/MI)

• For the rest, sort data on key dimensions of variation and do modified weighted hot deck on the 2-3 people above or below

- we typically sort on a total symptom count and the baseline dependent variable within count, condition & site

- Can replace with mean, median or random choice – we have found that the median was more stable because of the skewed nature of several distributions and use it by default

19

Understanding Multidimensional Nature can be used to Create Additional Strata for Replacement

Female Sex Risk

Needle Risk

Crack Risk % Blue Male Sex Risk Dimension

High Risk Needle Sharers

Male Sex Buyers

Female Sex Traders

Source: Dennis et al (2001)

20

Important to block on Condition in Experiments or Quasi-Experiments

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Control Experimental

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Unrestricted replacement would average out real variance effect of experimental condition

21

Strategy 6: Replacement Across Waves

• Create a summary measure based on the average across waves times the expected number of waves to get a total (e.g.., total days of abstinence)

- Works best when most people only have 1-2 waves of several (e.g.., 4-8) missing

- Above can become biased is missing data by wave is high or systematic

• Can regress from first/last or all available to fill in

• Need to know the expected trajectory

22

Special Case of A Curvilinear Trajectory

0

5

10

15

20

25

30

35

Intake 3 6 9 12

Actual

Source: Godley et al (2004)

23


0

5

10

15

20

25

30

35

Intake 3 6 9 12

Mean Replacement

Actual

Very Biased


24


0

5

10

15

20

25

30

35

Intake 3 6 9 12

Mean ReplacementAvg of NeighborsActual

Much less biased


25

Strategy 7: Use of Maximum Likelihood (ML)

• Where possible, use ML or Restricted ML (RML) as part of software applications like AMOS, Stata etc.

• Need to evaluate how much data it is replacing

• Need to be confident that it is not MAR (vs. MNAR) by virtual of small n missing, knowledge of reason, or other analyses

• Restricted ML (RML) preferred to control for site, condition, and/or subject differences.

Alternative: We have not used, but have been thinking about exploring some of the new methods of multiple imputation

26

References

• Dennis, M. L., Wechsberg, W. M., McDermeit (Ives), M., Campbell, R. S., & Rasch, R.R. (2001). The correlates and predictive validity of HIV risk groups among drug users in a community-based sample: Methodological findings from a multi-site cluster analysis. Evaluation and Program Planning, 24, 187-206.

• Godley, S. H., Dennis, M. L., Godley, M. D., & Funk, R. R. (2004). Thirty-month relapse trajectory cluster groups among adolescents discharged from outpatient treatment. Addiction, 99, 129-139.

• Riley, B. B., Conrad, K. J., Bezruczko, N., & Dennis, M. (in press). Relative precision, efficiency and construct validity of different starting and stopping rules for a Computerized Adaptive Test: The GAIN Substance Problem Scale. Journal of Applied Measurement.

• Schafer, J. L., & Graham, J. W. (2002). Missing data Our view of the state of the art. Psychological Methods, 7, 147-177.

• Scott, C. K. (2004). A replicable model for achieving over 90% follow-up rates in longitudinal studies of substance abusers. Drug and Alcohol Dependence, 74, 21-36.

1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent...

Documents

Transcript of 1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent...