Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of...

38
Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice Harvard University nhorton at email.smith.edu http://www.biostat.harvard.edu/ multinform

Transcript of Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of...

Page 1: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analysis of multiple informant/multiple source data in Stata

Nicholas J. Horton

Department of Mathematics

Smith College, Northampton MA

Garrett M. Fitzmaurice

Harvard University

nhorton at email.smith.edu

http://www.biostat.harvard.edu/multinform

Page 2: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Acknowledgements

• Joint research project with Nan Laird and colleagues, Harvard School of Public Health

• Jane Murphy and the Stirling County Study for use of their example dataset (see Horton et al AJE, 2001 for more details)

• Supported by NIH grant RO1-MH54693

Page 3: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Outline

• Motivation for multiple source data

• Examples of multiple sources/informants

• Models for correlated multiple source data

• Accounting for complex survey design

• Accounting for incomplete/missing data

• Example (Stirling County Study)

• Conclusions

Page 4: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Why multiple source data?

• to provide better measures of some underlying construct that is difficult to measure or likely to be missing

• also known as multiple informant reports, proxy reports, co-informants, etc.

• discordance is expected, otherwise there is no need to collect multiple reports

• Statistical framework developed in (Horton and Fitzmaurice SIM tutorial, 2004)

Page 5: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Definition of multiple source data

• data obtained from multiple informants or raters (e.g., self-reports, family members, health care providers, teachers)

• or via different/parallel instruments or methods (e.g., symptom rating scales, standardized diagnostic interviews, or clinical diagnoses)

• None of the reports is a “gold’’ standard• We consider multiple source data that are

commensurate (multiple measures of the same underlying variable on a similar scale)

Page 6: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Examples of multiple source data

• child psychopathology (ask parents, teachers and children about underlying psychological state)

• service utilization studies (collect information from subjects and databases)

• medical comorbidity (query providers and charts to assess medical problems)

Page 7: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Examples of multiple source data (cont.)

• adherence studies (collect self-report of adherence, electronic pill caps [MEMS] plus pharmacy records)

• nutritional epidemiology (utilize multiple dietary instruments such as food frequency questionnaires, 24-hour recalls, food diaries)

Page 8: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Incomplete/missing reports

• Multiple source reports are commonly incomplete since, by definition, they are collected from sources other than the primary subject of the study

• This missingness may be by design or happenstance (or both!)

Page 9: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Example: missing source reports

• Consider service utilization studies that collect information from subjects and databases

• Subjects may be lost to follow-up (or only contacted periodically)

• Databases may be incomplete (lack of consent, lack of appropriate coverage)

Page 10: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analytic approach

• Multiple sources can provide information on outcomes or predictors (risk factors)

• Multiple source outcome: what is the prevalence of child psychopathology? (measured using parallel parent and teacher reports)

• Fitzmaurice et al (AJE, 1995), Horton et al (HSOR, 2002), Horton and Fitzmaurice (SIM tutorial, 2004)

Page 11: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analytic approach (cont.)

• Multiple source predictor: what are the odds of developing depression in adulthood, conditional on parallel reports of anxiety (collected from a child and a parent)?

• Examples: Horton et al (AJE, 2001), Lash et al (AJE, 2003), Liddicoat et al (JGIM, 2004), Horton and Fitzmaurice (SIM tutorial, 2004)

• We will focus on an example using multiple source predictors

Page 12: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Notation

• Let Y denote a univariate outcome for a given subject

• Let denote the l’th multiple source predictor

• Let Z denote a vector of other covariates for the subject

• To simplify exposition, we consider two sources with dichotomous reports (L=2)

Page 13: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Questions to consider

• Are the sources reporting on the same underlying construct (are they commensurate or interchangeable?)

• Is it possible to combine the reports in some fashion?

• How to handle missing reports?

Page 14: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analytic approaches

• Reviewed in Horton, Laird and Zahner (IJMPR, 1999)

• Use only one source

• Fit separate models

Page 15: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analytic approaches (cont.)

• Combine (pool) the reports in some fashion

• Include both reports in the model

Page 16: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analytic approaches (cont.)

• We considered simultaneous estimation of the marginal models:

• Non-standard application of GEE• Method independently suggested by Pepe et

al (SIM, 1999)

Page 17: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Advantages of new approach

• can be used to test for source differences in association with the outcome

• can test if the effects of other risk factors on the outcome differ by source

Page 18: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Advantages of new approach• different source effects where necessary

• a pooled model can be fit if no significant source effects (potentially more efficient)

• can be fit using general purpose statistical software (Stata and others)

Page 19: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Accounting for survey design

• Many health services or epidemiologic studies arise from complex survey samples

• Need to address stratification, multi-stage clustering and unequal sampling weights

• Failing to properly account for survey design may lead to bias and incorrect estimation of variability

Page 20: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Accounting for survey design (cont.)

• Estimation proceeds using the approximate (quasi) log-likelihood (weighted version of the usual score equations for a GLM, accounting for the multi-stage clustering, including multiple source reports)

• Can be fit using general purpose statistical software (elegant and powerful implementation in Stata)

Page 21: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Accounting for incomplete source reports

• Missing source reports are missing predictors

• Use weighted estimating equation methodology of Robins et al (JASA, 1994) and Xie and Paik (Biometrics, 1997), applied by Horton et al, (AJE, 2001)

• Adds an additional “missingness weight”

• Complications to variance estimation

Page 22: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Example: Stirling County Study

• Outcome: time to event (death) over 16 year follow-up period (1952-1968) (n=1079)

• multiple source predictors: partially observed dichotomous physician report or self report of psychiatric disorder (dpax)

• other predictors: age (3 categories), gender• statistical model: piecewise exponential survival

with 4 intervals each of 4 years duration (subjects contribute time at risk in each interval)

Page 23: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Stirling County survey design

Strata 1Stratum 1 Stratum KStratum k

PSU 1 PSU JPSU j

self-report

phys.-report

Page 24: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Implementation in Stata

Specify probability sampling unit (subject), probability sampling weights (weight) and stratification variable (district):

svyset id [pweight=weight], strata(district)

Describe the sampling design:

svydes

Page 25: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Survey: Describing stage 1 sampling units

pweight: weight VCE: linearized Strata 1: district SU 1: id FPC 1: <zero>

#Obs per Unit

----------------------------Stratum #Units #Obs min mean max -------- -------- -------- -------- -------- -------- 1 93 654 2 7.0 8 2 37 284 4 7.7 8 3 51 346 2 6.8 8 4 202 1488 2 7.4 8 5 291 2104 2 7.2 8 6 128 946 2 7.4 8 7 50 374 4 7.5 8 8 98 706 2 7.2 8 9 129 968 2 7.5 8-------- -------- -------- -------- -------- -------- 9 1079 7870 2 7.3 8

Page 26: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Implementation in Stata (cont.)

xi: svy: poisson event dpax int1 int2 int3 female ageind1 ageind2 diag i.diag*ageind1 i.diag*ageind2 i.dpax*female i.dpax*ageind1 i.dpax*ageind2 i.dpax*diag, exposure(atrisk)

Page 27: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Implementation in Stata (cont.)

Can then test for significant informant effects (any term with dpax [self-report] in the model):

test dpax=0test _IdpaXfemal_1, accumulatetest _IdpaXagein_1, accumulatetest _IdpaXageina1, accumulatetest _IdpaXdiag_1, accumulate

Page 28: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Results (separate parameters)

• Initially fit model with separate parameters• No evidence for source interactions• Implies that the association between risk

factors and mortality did not differ by source

• Dropped these terms from the model, yielding parsimonious shared parameter model with smaller standard errors

Page 29: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Implementation (shared parameter)

xi: svy: poisson event int1 int2 int3 female ageind1 ageind2 diag i.diag*ageind1 i.diag*ageind2, exposure(atrisk)

Page 30: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Results (shared parameter)Survey: Poisson regression

Number of strata = 9 Number of obs = 7420Number of PSUs = 1079 Population size = 64723.522 Design df = 1070 F( 9, 1062) = 21.94 Prob > F = 0.0000

------------------------------------------------------------------------------ | Linearized event | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- int1 | -.9594993 .2058191 -4.66 0.000 -1.363354 -.5556444 int2 | -.5680445 .1936756 -2.93 0.003 -.9480716 -.1880174 int3 | -.360743 .2002561 -1.80 0.072 -.7536821 .0321962 female | -.1298938 .1493215 -0.87 0.385 -.42289 .1631024 ageind1 | 2.484883 .2820244 8.81 0.000 1.931499 3.038266 ageind2 | 3.530875 .2894511 12.20 0.000 2.962919 4.098831 diag | 1.62166 .3256041 4.98 0.000 .982765 2.260555_IdiaXage~_1 | -1.351475 .379926 -3.56 0.000 -2.09696 -.6059908_IdiaXage~a1 | -1.313849 .4554167 -2.88 0.004 -2.20746 -.4202376 _cons | -5.577167 .2941931 -18.96 0.000 -6.154428 -4.999906 atrisk | (exposure)------------------------------------------------------------------------------

Page 31: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Results (shared parameters)Parameter (log

MRR)Estimate (SE)

female -0.13 (0.15)

mid-age 2.48 (0.28)

older-age 3.53 (0.33)

diagnosis 1.62 (0.33)

diagnosis*mid-age -1.35 (0.38)

diagnosis*older-age -1.31 (0.46)

Page 32: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Interpretation of results (annual mortality rate)

Age

< 50

Age

>= 70

Diagnosis=0 0.001 0.056

Diagnosis=1 0.007 0.093

Page 33: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Results (2 df test of interaction of age and diagnosis)

. test _IdiaXagein_1=0

Adjusted Wald test

( 1) [event]_IdiaXagein_1 = 0

F( 1, 1070) = 12.65 Prob > F = 0.0004

. test _IdiaXageina1, accumulate

Adjusted Wald test

( 1) [event]_IdiaXagein_1 = 0 ( 2) [event]_IdiaXageina1 = 0

F( 2, 1069) = 6.67 Prob > F = 0.0013

Page 34: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Results (calculation of MRR and 95% CI)

. lincom diag, eform

( 1) [event]diag =------------------------------------------------------------------event | exp(b) Std.Err. t P>|t| [95% Conf. Interval]------+----------------------------------------------------------- (1) | 5.0615 1.6480 4.98 0.000 2.6718 9.5884------------------------------------------------------------------

. lincom diag + _IdiaXagein_1, eform

( 1) [event]diag + [event]_IdiaXagein_1 = 0

------------------------------------------------------------------event | exp(b) Std.Err. t P>|t| [95% Conf. Interval]------+----------------------------------------------------------- (1) | 1.3102 .25297 1.40 0.162 .89703 1.9137------------------------------------------------------------------

Page 35: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Conclusions• new methods of analysis of multiple source data

are available• can be implemented using existing software• methods allow the assessment of the relative

association of each source• each source yielded similar conclusions:

association between psychiatric disorder and mortality is stronger for younger subjects

• unified model has less variability, pools information after testing for systematic differences

Page 36: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Conclusions (cont.)

• methods account for complex survey designs

• methods incorporate partially observed subjects to contribute, under MAR (Little and Rubin book) assumptions

• multiple source reports arise in many settings (not just for children anymore!)

Page 37: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Future work

• Maximum-likelihood estimation instead of GEE approach – May yield efficiency gains– Particularly useful for missing reports

• Non-commensurate reports– Different scales– Different underlying constructs– Consider latent variable models (e.g. work of Normand and

colleagues)– See also gllamm and forthcoming Stata book by Rabe-

Hesketh and Skrondal)

Page 38: Analysis of multiple informant/ multiple source data in Stata Nicholas J. Horton Department of Mathematics Smith College, Northampton MA Garrett M. Fitzmaurice.

Analysis of multiple informant/multiple source data in Stata

Nicholas Horton

Department of Mathematics

Smith College, Northampton MA

nhorton at email.smith.edu

http://www.biostat.harvard.edu/multinform