© Willett & Singer, Harvard University Graduate School of EducationS077/Week #1– Slide 1 S077:...
-
Upload
amie-francis -
Category
Documents
-
view
216 -
download
0
Transcript of © Willett & Singer, Harvard University Graduate School of EducationS077/Week #1– Slide 1 S077:...
© Willett & Singer, Harvard University Graduate School of Education S077/Week #1– Slide 1
S077: Applied Longitudinal Data AnalysisWeek #1: What Are The Topics Covered In Today’s Overview?
Topic Slide
I. Pre-Amble:• A Little Past, A Little Present?• Two Kinds of Question About Time?
2-34
II. Exploring Change Over Time:• An Illustrative Data Example.• Introducing the Person-Period Dataset.• Questions about Change at Two Levels.• Level-1: Exploring Individual Change Over Time.• Level-2: Exploring Differences In Change Over People.
5678
9-10
III. Introducing The Multilevel Model For Change Over Time:• Outlining the Process• Level-1 Submodel For Individual Change.• Level-2 Submodel For Inter-individual Differences In Change.• Fitting The Multilevel Model For Change To Data.• Examining the Estimated Fixed Effects.• Examining the Estimated Variance Components.
1112-1314-16
1718-20
21
S077/Week #1– Slide 2
S077: Applied Longitudinal Data Analysis I: First Longitudinal Study of Event Occurrence?
John Graunt’s Notes on the Bills of Mortality (1662):
Analyzed mortality statistics in London, and concluded that:• More female babies were born, than male.• Women lived longer than men.
Created the first Life Table:• Summarized -- out of every 100 babies born in London
-- how many survived until ages 6, 16, 26, years, etc.
Unfortunately, his table did not give a realistic representation of the true survival rates because … the figures for all the ages after 6 were guesses!
© Willett & Singer, Harvard University Graduate School of Education
Age Died Survived
0 - 100
6 36 64
16 24
40
26 15
25
36 9
16
46 6
10
56 4
6
66 3
3
76 2
1
86 1 0
S077/Week #1– Slide 3
S077: Applied Longitudinal Data Analysis I: First Longitudinal Study of Individual Growth Over Time?
Count Philibert Gueneau de Montbeillard (1720-1785) recorded his son’s height every six months, from birth in 1759 through his 18th birthday …
Adolescent growth spurt
© Willett & Singer, Harvard University Graduate School of Education
S077/Week #1– Slide 4
S077: Applied Longitudinal Data Analysis I: What Kinds Of Research Questions Require Longitudinal Methods?
1. Within-Person Questions:• Descriptive: How does an infant’s neurofunction
change over time?• Summary: What is each infant’s per week rate of
change in neurofunction?3 Between-Person Question:
• Do babies that have been exposed to cocaine have lower weekly rates of development, on average?
1. Within-Person Questions:• Descriptive: How does an infant’s neurofunction
change over time?• Summary: What is each infant’s per week rate of
change in neurofunction?3 Between-Person Question:
• Do babies that have been exposed to cocaine have lower weekly rates of development, on average?
Individual Growth Modeling/Multilevel Model for Change(ALDA, Chapters 1 thru 8)
Individual Growth Modeling/Multilevel Model for Change(ALDA, Chapters 1 thru 8)
Espy et al. (2000) studied infant neurofunction:40 infants observed daily for 2 weeks; 20 had
been exposed to cocaine, 20 had not. Infants exposed to cocaine had lower rates of
change in neurodevelopment.
Espy et al. (2000) studied infant neurofunction:40 infants observed daily for 2 weeks; 20 had
been exposed to cocaine, 20 had not. Infants exposed to cocaine had lower rates of
change in neurodevelopment.
Questions About:Change in a Continuous Outcome Over Time
Questions About:Change in a Continuous Outcome Over Time
1. Within-Person Question: • Does each married couple eventually divorce or
not? 1. Between-Person Questions:
• If so, when are couples at greatest risk of divorce?• Do couples in which the wife is employed have a
greater risk of divorce?
1. Within-Person Question: • Does each married couple eventually divorce or
not? 1. Between-Person Questions:
• If so, when are couples at greatest risk of divorce?• Do couples in which the wife is employed have a
greater risk of divorce?
Discrete- and Continuous-Time Survival Analysis
(ALDA, Chapters 9 thru 15)
Discrete- and Continuous-Time Survival Analysis
(ALDA, Chapters 9 thru 15)
South (2001) studied marriage duration3,523 couples followed for 23 years, until divorce
or until the study ended.Couples in which the wife was employed tended
to divorce earlier.
South (2001) studied marriage duration3,523 couples followed for 23 years, until divorce
or until the study ended.Couples in which the wife was employed tended
to divorce earlier.
Questions About: Whether &When a Specified Event Occurs
Questions About: Whether &When a Specified Event Occurs
© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis II: Illustrative Example – Change in a Continuous Outcome Over Time
Sample: 103 African American children, born in low-income families:– 58 were randomly assigned to an early
intervention program.– 45 were randomly assigned to a control
group that received “standard” child care.
Research Design: – “Cognitive performance” of each child
was assessed 12 times between ages 6 and 96 months.
– Here, we analyze three of the waves of data, collected at 12, 18 and 24 months.
Broad Research Question:– Is the age-trajectory of cognitive
performance different for children who participated in the early intervention program, versus those who did not?
Data source: Peg Burchinal and colleagues (2000), Child Development.
(ALDA, Section 3.1, pp. 46-49) S077/Week #1– Slide 5© Willett & Singer, Harvard University Graduate School of Education
It’s not even clear right now how to articulate what we
mean by “different”!!!
S077: Applied Longitudinal Data Analysis II: How Are Longitudinal Data Best Formatted and Stored?
To address questions like these, you need longitudinal data … formatted in a Person-Period Dataset:one row of data for each occasion of observation for each person in the sample, as follows …
COG is a continuous outcome measured on a nationally-normed scale
• Declines within empirical growth records.
• Instead of asking whether the growth rate is higher among program participants, we’ll ask whether the rate of decline is lower.
PROGRAM is a dummy variable indicating whether child was randomly
assigned to special early childhood program (=1) or not (=0)
(ALDA, Section 3.1, pp. 46-49)
Fully balanced, 3 waves per child
AGE=1.0, 1.5, and 2.0 (converted to years—from
months—so that we estimate annual rate of change”)
S077/Week #1– Slide 6© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis II: Illustrative Example -- Effects Of Early Intervention On Children’s IQ
The Broad Research Question Really Has Two Parts:The Broad Research Question Really Has Two Parts:
(ALDA, Section 3.1, pp. 46-49) S077/Week #1– Slide 7© Willett & Singer, Harvard University Graduate School of Education
“Level-1” “Within-Child” Question About Individual
Change Over Time:
e.g., How does each child’s cognitive performance change over time?
“Level-1” “Within-Child” Question About Individual
Change Over Time:
e.g., How does each child’s cognitive performance change over time?
“Level-2” “Between-Child” Question About Inter-
Individual Differences in Change:
e.g., How do the cognitive performance trajectories differ from child to child?
In particular, do the trajectories of children in the Early Intervention Program differ substantially
from those who are not in the program?
“Level-2” “Between-Child” Question About Inter-
Individual Differences in Change:
e.g., How do the cognitive performance trajectories differ from child to child?
In particular, do the trajectories of children in the Early Intervention Program differ substantially
from those who are not in the program?
S077: Applied Longitudinal Data Analysis II: Exploring Individual Change With Empirical Growth Plots And Fitted OLS Trajectories
Overall impression: COG tends to decline over time, for each child, but there are interesting differences in the trajectories, and in the
quality of fit, from child to child.
(ALDA, Section 3.2, pp. 49-51)
+ +
+
1 1.5 2
AGE
50
75
100
125
150COG
++
+
1 1.5 2
AGE
50
75
100
125
150COG
+
++
1 1.5 2
AGE
50
75
100
125
150COG
+++
1 1.5 2
AGE
50
75
100
125
150COG
+++
1 1.5 2
AGE
50
75
100
125
150COG
+
++
1 1.5 2
AGE
50
75
100
125
150COG
+ ++
1 1.5 2
AGE
50
75
100
125
150COG
+
++
1 1.5 2
AGE
50
75
100
125
150COG
ID 68 ID 70 ID 71 ID 72
ID 902 ID 904 ID 906 ID 908
Other OLS-fitted trajectories are scattered, irregular and perhaps even curvilinear?
(68, 902, 906)
Other OLS-fitted trajectories are scattered, irregular and perhaps even curvilinear?
(68, 902, 906)
Many OLS-estimated trajectories are smooth and systematic(70, 71, 72, 904, 908)
Many OLS-estimated trajectories are smooth and systematic(70, 71, 72, 904, 908)
S077/Week #1– Slide 8© Willett & Singer, Harvard University Graduate School of Education
First, it makes sense to explore the observed individual growth for each child using simple empirical growth plots …
… and to summarize each child’s observed change using simple child-by-child OLS regression analyses.
S077: Applied Longitudinal Data Analysis II: Exploring Inter-Individual Differences In Change By Collecting OLS-Fitted Trajectories
Average OLS trajectory across the full sample
110-10(AGE-1)
Overall Impression
Most children’s empirical growth trajectories decline over time (although there are a few exceptions)
But there’s also considerable variation in empirical growth trajectory from child to child …
(ALDA, Section 3.2.3, pp. 55-56)
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
14 013* 556813. 0013412* 555677899912. 0223334411* 5566777788888911. 00011111222223333444410* 5566668899910. 0012222244 9* 6666677799 9. 344 8* 89 8. 34 7* 7 7. 6* 6. 5* 7
Fitted initial status
2. 0 1* 1. 0 0* 79 0. 134-0* 4444332-0. 99998888777765-1* 4333322211000-1. 99888877666655-2* 44322211110000-2. 9999877776655-3* 443322100000-3. 987-4* 443111
Fitted rate of change46 8444240 003836 83432 33028 426 724 144422 82018 316 000111412 2110 44433 8 1118886666 6 77744 4 333844 2 04444888833338888888 0 0000111122233334444444466668111114447
Residual variance
S077/Week #1– Slide 9© Willett & Singer, Harvard University Graduate School of Education
Second, it makes sense to collect the child-by-child OLS-estimated trajectories together …
… to display inter-individual differences in change across all children.
S077: Applied Longitudinal Data Analysis II: Exploring Systematic Inter-Individual Differences In Change By Grouping Fitted Trajectories
Overall Impression
Program participants tend to have, on average: Higher scores at Age-1,
(higher initial status). Perhaps less steep rates
of decline (shallower slopes)?
But, these are overall trends—there’s great inter-individual heterogeneity too, in both groups!
(ALDA, Section 3.3, pp. 57-60)
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
PROGRAM = 0 PROGRAM = 1
S077/Week #1– Slide 10© Willett & Singer, Harvard University Graduate School of Education
Finally, it makes sense to collect all the child-by-child OLS-estimated trajectories together, by values of the principal question predictor …
… to explore any systematic inter-individual differences in change across children.
S077: Applied Longitudinal Data Analysis III: Introducing The Multilevel Model For Change
Four Critical Steps: Specify a “Level-1” Submodel to Represent the Individual Change:
What population statistical model best describes the individual change over time, and might have given rise to our observations on the children?
Specify a “Level-2” Submodel to Represent Systematic Inter- Individual Differences in Change: What statistical model best represents the way that the individual growth trajectories differ from person to person in the population?
Fit the L1and L2 Models for Change Simultaneously to Data: Having specified the level-1 and level-2 sub-models – referred to jointly as the “Multilevel Model for Change” – how do we fit them to our data?
Test and Interpret the Fitted Multilevel Model for Change: Having fit the multilevel model for change to data, how do we test whether the effects can be generalized to the population? How do we sensibly display and interpret the empirical findings? Interpreting estimates of the fixed effects. Testing the fixed effects. Plotting prototypical fitted growth trajectories. Testing and interpreting estimates of the variance components.
(ALDA, Chapter 3 intro, p. 45)
How Can We Specify A Statistical Model To Capture All These Features?
“Multilevel Model for Change”
S077/Week #1– Slide 11© Willett & Singer, Harvard University Graduate School of Education
+ +
+
1 1.5 2
AGE
50
75
100
125
150COG
++
+
1 1.5 2
AGE
50
75
100
125
150COG
+
++
1 1.5 2
AGE
50
75
100
125
150COG
+++
1 1.5 2
AGE
50
75
100
125
150COG
+++
1 1.5 2
AGE
50
75
100
125
150COG
+
++
1 1.5 2
AGE
50
75
100
125
150COG
+ ++
1 1.5 2
AGE
50
75
100
125
150COG
+
++
1 1.5 2
AGE
50
75
100
125
150COG
ID 68 ID 70 ID 71 ID 72
ID 902 ID 904 ID 906 ID 908
S077: Applied Longitudinal Data Analysis III: Thinking About Individual Change Over Time, At “Level-1”
From our exploratory analyses, we can see that we certainly need a Level-1 Statistical Model to represent the individual changes in COG by AGE for each child …
(ALDA, Section 3.2, pp. 49-51)
What Individual Growth Trajectory Could Have Generated These Individual Sample Data?
Should it be linear or curvilinear?Should it be smooth or disjoint?
Other trajectories are scattered, irregular and perhaps curvilinear?
(68, 902, 906)
Other trajectories are scattered, irregular and perhaps curvilinear?
(68, 902, 906)
Many trajectories are smooth and systematic(70, 71, 72, 904, 908)
Many trajectories are smooth and systematic(70, 71, 72, 904, 908)
S077/Week #1– Slide 12© Willett & Singer, Harvard University Graduate School of Education
Structural portion of L1 model embodies our hypotheses about the shape of each child’s true trajectory of change over time …
Stochastic portion of L1 model allows for the observed scores to be deviated from the true trajectory … it’s the random measurement error for person i on occasion j. Usually assume ),0(~ 2
Nij
Key assumption: In the population, COGij is a linear function of child i’s AGE on
occasion j …
ijijiiij AGECOG )1(10i indexes persons (i=1 to 103)j indexes occasions (j=1 to 3)
© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: A “Level-1” Sub-Model For Individual Change Over Time?
(ALDA, Section 3.2, pp. 49-51)
1 1.5 2
AGE
50
75
100
125
150COG
0i is the intercept of i’s true change trajectory, his true value of COG at AGE=1, his “true initial status”
1 year
1i is the slope of i’s true change trajectory, his yearly rate of change in
true COG, his true “annual rate of change”
S077/Week #1– Slide 13
Individual i’s hypothesized true change
trajectory
1i 2i
3i
i1, i2, & i3 are the level-1 measurement errors.
Once the Level-1 Model has been specified, the ith child’s true
trajectory of change is characterized by the individual growth parameters
– initial status, 0i, and rate of change, 1i – present in the model.
S077: Applied Longitudinal Data Analysis III: Thinking About Systematic Inter-Individual Differences In Change, at Level-2?
1. The level-2 outcomes must be the level-1 individual growth parameters 0i and 1i .
2. We need two level-2 sub-models, one for each parameter in the level-1 individual growth model (initial status, rate of change).
3. Each level-2 sub-model will specify the hypothesized relationship between a level-1 individual growth parameter and the question predictor, here PROGRAM:– We will need to choose a reasonable functional
form for these level-2 relationships (linear ?)
4. Each level-2 sub-model must allow children with common level-2 predictor values to entertain the possibility of heterogeneous individual change trajectories:– Each level-2 sub-model will need its own error
term, and – We may need to allow for covariances across
the level-2 errors.
What Features Must We Build In To A Decent Level-2 Sub-Model?
(ALDA, Section 3.3, pp. 57-60)
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
PROGRAM = 0 PROGRAM = 1
S077/Week #1– Slide 14© Willett & Singer, Harvard University Graduate School of Education
What are the implications of these features for a potential “Level-2” statistical model for systematic
inter-individual differences in change?
Recall that Program Participants tend to have: Scores that are higher, at age 1 (higher initial status). Less steep rates of decline (shallower slopes). But, these are only overall trends, there is great inter-
individual heterogeneity.
Recall that Program Participants tend to have: Scores that are higher, at age 1 (higher initial status). Less steep rates of decline (shallower slopes). But, these are only overall trends, there is great inter-
individual heterogeneity.
S077: Applied Longitudinal Data Analysis III: Level-2 Sub-Models For Systematic Inter-Individual Differences In Change
00Key to remembering subscripts on the
gammas (the ’s):• First Subscript indicates role in level-1
model (0 for intercept; 1 for slope)• Second subscript indicates role in level-2
model (0 for intercept; 1 for slope)
ii PROGRAM 001000 For the level-1 intercept (initial status)
ii PROGRAM 111101 For the level-1 slope (rate of change)
What about the Zetas (the ’s)?• They’re the level-2 residuals that permit the level-1 individual
growth parameters to differ stochastically across children.• As with most residuals, we’re less interested in their values than
their population variances and covariances
(ALDA, Section 3.3.1, pp. 60-61) S077/Week #1– Slide 15© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: Understanding The Stochastic Elements Of The Level-2 Sub-Models
General Idea: Model posits the existence of an average
population trajectory for each program group. Because of the presence of the level-2 residuals,
each child i has his own true change trajectory (defined uniquely by 0i and 1i).
The shading suggests the existence of many true population trajectories, one per child.
2110
0120
1
0 ,0
0~
Ni
iinitial status
rate of change
Assumptions about the level-2 residuals:
ii PROGRAM 001000
ii PROGRAM 111101
(ALDA, Section 3.3.2, pp. 61-63)
PROGRAM=0 PROGRAM=1
1 1.5 2
AGE
50
75
100
125
150COG
1 1.5 2
AGE
50
75
100
125
150COG
Population trajectory for child i,i
i(AGE- 1)
Average population trajectory,(AGE- 1)
Average population trajectory,(AGE- 1)
S077/Week #1– Slide 16© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: Fitting The Multilevel Model For Change To Data
MLwiN
Statistical programs expressly designed for multilevel modeling
Multipurpose statistical packages with
multilevel modeling modules
aML
Specialty statistical packages originally designed for another
purpose that can also fit some multilevel models
S077/Week #1– Slide 17© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: Examining and Interpreting The Estimated Fixed Effects
(ALDA, Section 3.5, pp. 68-71)
True annual rate of change for the average non-participant is –21.13
For the average participant, true rate of change is 5.27 higher
Advice: As you’re learning these methods, take the time to actually write out the fitted level-1/level-2 models before interpreting computer output
—It’s the best way to learn what you’re doing!
Fitted model for initial status ii PROGRAM85.684.107ˆ0
Fitted model for rate of change ii PROGRAM27.513.21ˆ1
True initial status (COG at age 1) for the average non-participant is 107.84
For the average participant, true initial status is 6.85 higher
In the population from which this sample was drawn we estimate that…
S077/Week #1– Slide 18© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: Plotting Prototypical Change Trajectories To Aid Interpretation
(ALDA, Section 3.5.1, pp. 69-71)
ii PROGRAM85.684.107ˆ0
ii PROGRAM27.513.21ˆ1
General idea: Substitute prototypical values for the level-2 predictors (here, just PROGRAM=0 or 1) into the fitted models, as usual.
1.5 2
AGE
150
75
100
125
150COG
Tentative Conclusion: Program participants appear to have higher initial status and slower rates of decline.
AGEGOCso
i
i
86.1569.114ˆ:
86.15)1(27.513.21ˆ
69.114)1(85.684.107ˆ
1
0
-=
-=+-==+=
=
p
p
1PROGRAM
AGEGOCso
i
i
13.2184.107ˆ:
13.21)0(27.513.21ˆ
84.107)0(85.684.107ˆ
1
0
-=
-=+-==+=
=
p
p
0PROGRAM
S077/Week #1– Slide 19© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: Testing Hypotheses About Fixed Effects Using Single-Parameter Tests
(ALDA, Section 3.5.2, pp.71-72)
)ˆ(
ˆ
asez
General formulation:
Careful: Most programs provide appropriate tests but…
different programs use different terminologyTerms like z-statistic, t-statistic, t-ratio, quasi-t-statistic—which are not
the same—are used interchangeably
For rate of change:Average non-participant had a non-zero rate of decline (depressing)Program participants had slower rates of decline, on average, than non-participants (the “program effect”).
For initial status:Average non-participant had a non-zero level of COG at age 1 (surprise!)Program participants had higher initial status, on average, than non-participants (probably because the intervention had already started)
S077/Week #1– Slide 20© Willett & Singer, Harvard University Graduate School of Education
S077: Applied Longitudinal Data Analysis III: Examining and Interpreting The Estimated Variance Components
(ALDA, Section 3.6, pp. 72-74)
General idea: • Variance components quantify the amount of
outcome variation left—at level-1 & level-2—that is potentially predictable by predictors not yet in the model.
• Interpretation is easiest when comparing nested models that each contain different predictors (which we will soon do…).
Level-1 Residual Variance (74.24***): Summarizes within-person variability in
outcome around individuals’ own trajectories.
Here, we conclude there is non-zero within-person residual variability, in the population.
Level-2 ResidualVariance/Covariance Matrix: Summarizes between-person variability in change trajectories (here, initial status
and growth rates) after controlling for predictor(s) (here, PROGRAM)No further residual variance in rates of change to be predicted by the addition of
further predictors (nor is there a residual covariance)
29.1241.36
41.36***64.124
S077/Week #1– Slide 21© Willett & Singer, Harvard University Graduate School of Education