THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT June 20 – 23 2004 Boston, MA
description
Transcript of THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT June 20 – 23 2004 Boston, MA
Using Hierarchical Growth Models to Monitor School Performance: The effects of the model, metric and time on the validity of inferences
THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT
June 20 – 23 2004Boston, MA
Pete Goldschmidt, Kilchan Choi, Felipe Martinez
UCLA Graduate School of Education & Information StudiesCenter for the Study of Evaluation
National Center for Research on Evaluation, Standards, and Student Testing
Purpose:We use several formulations of multilevel models to investigate four related issues:
one, whether (and when) the metric matters using longitudinal models to make valid inferences about school quality;
two, whether different longitudinal models yield consistent inferences regarding school quality;
three, the tradeoff between additional time points and missing data; and
four, the stability of school quality inferences across longitudinal models using differing number of occasions
We examine three types of models:
Longitudinal Growth Panel Models
Longitudinal School Productivity Models
Longitudinal Program Evaluation Models:
Longitudinal Growth Panel Models (LGPM)
Research Questions:
Inferences affected by test metric (Scale Scores Vs. NCEs)?
Estimates for Growth
Estimates of school effects
Estimates of Program effects
LGPM
• Longitudinal Panel Design
• Keep track of students’ achievement form one grade to the next
• E.g., collect achievement scores at Grades 2, 3, 4, and 5 for students in a school
• Focus on students’ developmental processes
• What do students’ growth trajectories look like?
Choice of Metric:
• IRT-based scale scores
• Vertically equated scores across grades and years
• Theoretically represent growth on a continuum that can measure academic progress over time
• Change from year to year is an absolute measure of academic progress
•Change represents a relative position from year to year not absolute growth in achievement
•Relative standing compared to a norming population
Scale Scores: Normal Curve Equivalents:
Student Characteristic District Sample
Female 0.50 0.49African American 0.21 0.21
Asian 0.17 0.13Hispanic 0.41 0.45
Other 0.03 0.03White 0.19 0.18
ELL 0.37 0.49Free/ reduced Lunch 0.67 0.88Special Educ. 0.07 0.08N = 7,856 students
Proportion
Student Characteristics
2
Sampling Conditions for Monte Carlo Study
Total Number Students Sampled (%) mean nof Schools
60 25% 31.3
60 50% 65.660 75% 98.5
60 100% 130.9
3
Question Scale Scores NCEs
1) γ000 vs. γ000
2) γ001 vs. γ001
3) γ100 vs. γ100
4) γ101 vs. γ101
Summary Parameter Estimates Compared
4
School NCEs
Initial Status ooj ooj
Rate of Change 10j 10j
Scale Scores
Summary of Estimates Compared Using Rank Order Correlations
5
SAT-9 Reading Achievement NCE SS NCE SS NCE SS
Mean Initial status (g000)
Student Predictors
Special Education (010) -0.47 -0.44 -0.47 -0.44 -0.47 -0.44
Low SES (020) -0.36 -0.4 -0.35 -0.4 -0.35 -0.39
LEP (030) -0.34 -0.35 -0.33 -0.34 -0.32 -0.33
Minority (040) -0.48 -0.54 -0.48 -0.54 -0.48 -0.53
Girl (050) 0.1 0.1 0.1 0.1 0.1 0.1
School Predictors
LAAMP Effect (001) 0.03 0.04 0.02 0.03 0.02 0.02
Minority (002) -0.01 -0.01 -0.01 -0.01 -0.01 -0.01
Low (003) 0.13 0.1 0.17 0.15 0.2 0.17
Mean Growth (g100) 0.07 0.64 0.07 0.63 0.07 0.63
Student Predictors
Special Education (110) 0 -0.03 0 -0.03 0 -0.03
Low SES (120) 0.05 0.06 0.05 0.06 0.05 0.06
LEP (130) 0.07 0.07 0.07 0.07 0.07 0.07
Minority (140) -0.03 -0.02 -0.03 -0.02 -0.03 -0.02
Girl (150) 0.01 0.01 0.01 0.01 0.01 0.01
School Predictors
LAAMP Effect (101) 0.01 0.01 0.01 0.01 0.01 0.01
Minority (102) 0.11 0.14 0.12 0.15 0.12 0.16
Low (103) -0.08 -0.08 -0.08 -0.08 -0.08 -0.08
25% 50% 75%
Summary of Results Describing SAT-9 Reading Achievement
6
Sampling
Condition NCE SS NCE SS
25% 24.5 23.7 9.3 9.2
50% 24.4 25.5 9.6 9.7
75% 24.5 26.4 9.2 9.3
25% 43.8 52.2 16.8 16.8
50% 42.7 51.9 16.4 16.575% 42.9 52.3 16.1 16.1
Math
Model 2 to 4
Model 1 to 4
Reading
Percent Reduction in Between School Variation in Growth
7
Sample Test Type Initial Status Growth GrowthR25 Read 0.939 0.897 0.754
Math 0.969 0.954 0.847
R50 Read 0.942 0.895 0.75
Math 0.972 0.956 0.856
R75 Read 0.943 0.896 0.748
Math 0.973 0.955 0.858
Spearman Correlation Kendall (Tau) Correlation
Initial Status0.817
0.878
0.873
0.821
0.88
0.826
Correlations between Estimated Coefficients – Model 4
11
Model 1Model 2
Model 3Model 4
R25
R50
R750.700.720.740.760.780.800.820.840.860.880.900.920.940.960.981.00
0.98-1
0.96-0.98
0.94-0.96
0.92-0.94
0.9-0.92
0.88-0.9
0.86-0.88
0.84-0.86
0.82-0.84
0.8-0.82
0.78-0.8
0.76-0.78
0.74-0.76
0.72-0.74
0.7-0.72
Correlation Pattern between Sampling Condition and Model – Reading SAT-9 Growth
12
-0.89
-0.88
-0.87
-0.86
-0.85
-0.84
-0.83
-0.82
-0.81
-0.80
-0.79
0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73
Efect Size of Growth (scale scores)
Rela
tive B
ias
in G
row
thComparison of relative Bias to the Effect Size
of Growth
13
Program Effect Size for Initial Status (scale score)
.09.07.05.03.01-.01-.03-.05-.07-.09
Re
lative
Bia
s f
or
Intia
l S
tatu
s
5
4
3
2
1
0
-1
-2
-3
-4
-5
Relationship between Relative Bias in NCEs for Initial Status
14
<Model 4>Agreement
Sample Test Type Initial Status Growth Initial Status Growth Initial Status Growth (Growth)
R25 Read 0.988 0.954 1.149 0.348 -0.008 0.004 0.985
Math 0.982 0.94 0.793 0.812 0 -0.003 0.992
R50 Read 0.99 0.95 0.965 0.693 -0.004 0.003 0.994
Math 0.984 0.94 0.966 0.88 0.003 -0.004 0.999
R75 Read 0.985 0.94 0.265 1.146 -0.002 0.002 1
Math 0.99 0.948 0.71 0.878 0.004 -0.004 1
Correlation Effect Size Ratio Effect Size Difference
17
Program Effect Size for Growth (scale scores)
.03.02.010.00-.01
Re
lative
Bia
s f
or
Gro
wth
5
4
3
2
1
0
-1
-2
-3
-4
-5
Relationship between Relative Bias in NCEs for Growth
15
Longitudinal School Productivity Model (LSPM)
Research Questions:
Inferences affected by test metric (Scale Scores Vs. NCEs)?
Estimates for growth
Estimates of school effects
Estimates of “Type A” and “Type B” effects
LSPM
• Multiple-cohorts design (Willms & Raudenbush, 1989; Bryk et. al., 1998)
• Monitor student performance at a school for a particular grade over years
• E.g., collect achievement scores for 3rd grade students attending a school in 1999, 2000, and 2001
• Focus on schools’ improvement over subsequent years
Research Question
• To what extent does the choice of the metric matter when the focus is school improvement over time (NCE vs. scale score)?
• A Multiple-cohort school productivity model is used as the basis for inferences about school performance
3-level Hierarchical Model for measuring school improvement
Model I : Unconditional School Improvement Model
Level-1 (within-cohort) model:
Yijt = βjt0 + rijt
* βjt0 : estimates of performance for school j (j = 1,.., J) at cohort t (t = 0,1,2,3,4)
Level-2 (between-cohort, within-school) model:
βjt0 = j0 + j1Timetj + ujt
* j0 : status at the first year (i.e., Timetj = 0) or initial status for school j
* j1 : yearly improvement / growth rate during the span of time for school j
Level-3 (between-school) model:
j0 = 00 + Vj0 * 00 : grand mean initial status
j1 = 10 + Vj1 * 10 : grand mean growth rate
Model II: Student characteristics
Level-1 (within-cohort) model:
Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt4Girlijt +βjt5Minorityijt + rijt
Level-2 (between-cohort, within-school) model:
βjt0 = j0 + j1Timetj + ujt
Level-3 (between-school) model:
j0 = 00 + Vj0
j1 = 10 + Vj1
Model III: Student characteristics & School intervention indicator
Level-1 (within-cohort) model:
Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt1Girlij4 +βjt5 Minorityijt + rijt
Level-2 (between-cohort, within-school) model:
βjt0 = j0 + j1Timetj + ujt
Level-3 (between-school) model:
j0 = 00 + 01 LAAMPj + Vj0
j1 = 10 + 11 LAAMPj + Vj1
Model IV: Full Model
Level-1 (within-cohort) model:
Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt1Girlij4 +βjt5Minorityijt + rijt
Level-2 (between-cohort, within-school) model:
βjt0 = j0 + j1Timetj + ujt
Level-3 (between-school) model:
j0 = 00 + 01(%Minorityj) + 02(%LowSESj) + 03(%LEPj) +
04 LAAMPj + Vj0
j1 = 10 + 11(%Minorityj) + 12(%LowSESj) + 13(%LEPj) +
14 LAAMPj + Vj1
Comparison of Key Parameters:NCE vs. Scale Score
Question Parameter
1) School mean initial status
j0
2) School improve / growth rate
j1
3) Type A effect 11
(%Minorityj) + 12
(%LowSESj) + 13
(%LEPj) +
14
LAAMPj + Vj1
4) Type B effect 14
LAAMPj + Vj1
• Type A effect: includes effects of school policies and
practice, educational context, and wider social influences
• Type B effect: Includes the effects of tractable policies and practices, but excludes factors that lie outside the control of the school
NCE Results vs. Scale Score Results
• School Ranking (rank-order corr.)
• School improvement / growth rate parameter (corr. between estimates)
• Effect size (par. est. / s.d. of outcome)
• Statistical significance of the effect of the school intervention indicator variable
Conclusion
• NCE vs. scale score – for the purpose of measuring school improvement under the multiple-cohort design:
• Differences are minimal in terms of:
school ranking
school improvement / growth rate
effect size
statistical significance of the effect of the school intervention indicator
Conclusion (cont’d)
• Results are consistent across sampling conditions, models, and content area
Demographic
Samples
Average# Of Measures % of #of StudentsPer Student Sample 3 yrs 4 yrs 5 yrs 6 yrs 7 yrs
7 100% 17,349 23,132 28,915 34,698 40,4816 85% 17,349 22,265 27,181 32,097 37,0135 70% 17,349 21,397 25,445 20,493 33,5414 55% 17,349 20,530 23,711 26,892 30,073
Reduction in Standard Error (SE) for Average Growth b/t Schools
3 4 5 5 6 6 7 100%
Baseline Model 0.229 0.223 0.228 0.292Full Model 0.260 0.296 0.351 0.449
85%Baseline Model -0.010 -0.008 0.068Full Model 0.060 0.107 0.272
70%Baseline Model 0.013 0.022 0.119Full Model 0.054 0.144 0.259
55%Baseline Model 0.007 0.033 0.108Full Model 0.053 0.119 0.234
0
0.1
0.2
0.3
0.4
0.5
Number of Occasions
Prop
ortio
n SE
Red
uctio
n
Full Model Baseline Model
100% Sample SE Reduction AVG GROWTH
-0.1
0
0.1
0.2
0.3
0.4
0.5
Number of Occasions
Prop
orti
on S
E Re
duct
ion
85% Full 70% Full 55% Full 85% Baseline 70% Baseline 55% Baseline
85%-70%-55% Samples SE Reduction AVG GROWTH
Reduction in Standard Error (SE) for Average Status b/t Schools
100%Baseline Model 0.042 0.042 0.039 0.051Full Model 0.046 0.048 0.042 0.054
85%Baseline Model 0.000 -0.003 0.011Full Model 0.002 -0.003 0.010
70%Baseline Model -0.002 -0.004 0.012Full Model 0.000 -0.002 0.014
55%Baseline Model -0.002 -0.003 0.012Full Model 0.000 0.000 0.013
100% Sample SE Reduction AVG STATUS
0.030
0.035
0.040
0.045
0.050
0.055
0.060
Number of Occasions
Pro
port
ion
SE R
educ
tion
Full Model
Baseline Model
85%-70%-55% Samples SE Reduction AVG STATUS
-0.006
-0.004
-0.002
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
Number of Occasions
Pro
port
ion
SE R
ed
uct
ion
85% Full
70% Full
55% Full
85% Baseline
70% Baseline
55% Baseline
Reduction in Tau for Average Growth b/t Schools
100%Baseline Model 0.279 0.368 0.346 0.522Full Model 0.451 0.451 0.466 0.623
85%Baseline Model 0.105 0.059 0.310Full Model 0.106 0.068 0.350
70%Baseline Model 0.095 0.113 0.330Full Model 0.110 0.155 0.388
55%Baseline Model 0.115 0.131 0.262Full Model 0.140 0.183 0.346
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Number of Occasions
Prop
ortio
n Ta
u Re
duct
ion
Full Model
Baseline Model
100% Sample Tau Reduction AVG GROWTH
0.00
0.050.10
0.150.20
0.250.30
0.350.40
0.45
Number of Occasions
Prop
ortio
n Ta
u Re
duct
ion
85% Full 70% Full 55% Full 85% Baseline 70% Baseline 55% Baseline
85%-70%-55% Samples Tau Reduction AVG GROWTH
Reduction in Tau for Average STATUS b/t Schools
100%
Baseline Model 0.066 0.061 0.047 0.062Full Model 0.070 0.065 0.053 0.068
85%Baseline Model -0.006 -0.017 0.002Full Model -0.005 -0.015 0.004
70%Baseline Model -0.002 -0.007 0.017Full Model -0.001 -0.004 0.020
55%Baseline Model 0.001 -0.005 0.019Full Model 0.002 -0.004 0.022
100% Sample Tau Reduction AVG STATUS
0.03
0.035
0.04
0.045
0.05
0.055
0.06
0.065
0.07
0.075
Number of Occasions
Pro
port
ion
Tau
Red
uct
ion
Full Model Baseline Model
85%-70%-55% Samples Tau Reduction AVG STATUS
-0.020
-0.015
-0.010
-0.005
0.000
0.005
0.010
0.015
0.020
0.025
Number of Occasions
Pro
port
ion
Tau
Redu
cti
on
85% Full 70% Full 55% Full 85% Baseline 70% Baseline 55% Baseline
Program Effect on AVG GROWTH b/t Schools by Sample and Number of Occasions
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
3 4 5 6 7
Number of Occasions
T-V
alu
e
100% SAMPLE
85% SAMPLE
70% SAMPLE
55% SAMPLE
Chart 8:Effect of Additional Occasions on Between-School
Achievement Growth Reliabilities
0.40
0.45
0.50
0.55
0.60
0.65
0.70
1 2 3 4 5 6 7 8 9
Number of occasions
Rel
iabili
ty
EBXS1
EBXS2
EBXS3
EBXS4
EBGRTH01
EBGRTH12
EBGRTH34
EBGRTH02
EBGRTH13
EBGRTH24
EBGRTH03
EBGRTH14
EBGRTH04
Correlations Among EB Estimates
Chart 9: Relationship among Correlations of Empirical Bayes Estimates of School Performance
0.75-1.00
0.50-0.75
0.25-0.50
0.00-0.25
-0.25-0.00
-0.50--0.25
-0.75--0.50