THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT June 20 – 23 2004 Boston, MA

Using Hierarchical Growth Models to Monitor School Performance: The effects of the model, metric and time on the validity of inferences

THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT

June 20 – 23 2004Boston, MA

Pete Goldschmidt, Kilchan Choi, Felipe Martinez

UCLA Graduate School of Education & Information StudiesCenter for the Study of Evaluation

National Center for Research on Evaluation, Standards, and Student Testing

Purpose:We use several formulations of multilevel models to investigate four related issues:

one, whether (and when) the metric matters using longitudinal models to make valid inferences about school quality;

two, whether different longitudinal models yield consistent inferences regarding school quality;

three, the tradeoff between additional time points and missing data; and

four, the stability of school quality inferences across longitudinal models using differing number of occasions

We examine three types of models:

Longitudinal Growth Panel Models

Longitudinal School Productivity Models

Longitudinal Program Evaluation Models:

Longitudinal Growth Panel Models (LGPM)

Research Questions:

Inferences affected by test metric (Scale Scores Vs. NCEs)?

Estimates for Growth

Estimates of school effects

Estimates of Program effects

LGPM

• Longitudinal Panel Design

• Keep track of students’ achievement form one grade to the next

• E.g., collect achievement scores at Grades 2, 3, 4, and 5 for students in a school

• Focus on students’ developmental processes

• What do students’ growth trajectories look like?

Choice of Metric:

• IRT-based scale scores

• Vertically equated scores across grades and years

• Theoretically represent growth on a continuum that can measure academic progress over time

• Change from year to year is an absolute measure of academic progress

•Change represents a relative position from year to year not absolute growth in achievement

•Relative standing compared to a norming population

Scale Scores: Normal Curve Equivalents:

Student Characteristic District Sample

Female 0.50 0.49African American 0.21 0.21

Asian 0.17 0.13Hispanic 0.41 0.45

Other 0.03 0.03White 0.19 0.18

ELL 0.37 0.49Free/ reduced Lunch 0.67 0.88Special Educ. 0.07 0.08N = 7,856 students

Proportion

Student Characteristics

2

Sampling Conditions for Monte Carlo Study

Total Number Students Sampled (%) mean nof Schools

60 25% 31.3

60 50% 65.660 75% 98.5

60 100% 130.9

3

Question Scale Scores NCEs

1) γ000 vs. γ000

2) γ001 vs. γ001

3) γ100 vs. γ100

4) γ101 vs. γ101

Summary Parameter Estimates Compared

4

School NCEs

Initial Status ooj ooj

Rate of Change 10j 10j

Scale Scores

Summary of Estimates Compared Using Rank Order Correlations

5

SAT-9 Reading Achievement NCE SS NCE SS NCE SS

Mean Initial status (g000)

Student Predictors

Special Education (010) -0.47 -0.44 -0.47 -0.44 -0.47 -0.44

Low SES (020) -0.36 -0.4 -0.35 -0.4 -0.35 -0.39

LEP (030) -0.34 -0.35 -0.33 -0.34 -0.32 -0.33

Minority (040) -0.48 -0.54 -0.48 -0.54 -0.48 -0.53

Girl (050) 0.1 0.1 0.1 0.1 0.1 0.1

School Predictors

LAAMP Effect (001) 0.03 0.04 0.02 0.03 0.02 0.02

Minority (002) -0.01 -0.01 -0.01 -0.01 -0.01 -0.01

Low (003) 0.13 0.1 0.17 0.15 0.2 0.17

Mean Growth (g100) 0.07 0.64 0.07 0.63 0.07 0.63

Student Predictors

Special Education (110) 0 -0.03 0 -0.03 0 -0.03

Low SES (120) 0.05 0.06 0.05 0.06 0.05 0.06

LEP (130) 0.07 0.07 0.07 0.07 0.07 0.07

Minority (140) -0.03 -0.02 -0.03 -0.02 -0.03 -0.02

Girl (150) 0.01 0.01 0.01 0.01 0.01 0.01

School Predictors

LAAMP Effect (101) 0.01 0.01 0.01 0.01 0.01 0.01

Minority (102) 0.11 0.14 0.12 0.15 0.12 0.16

Low (103) -0.08 -0.08 -0.08 -0.08 -0.08 -0.08

25% 50% 75%

Summary of Results Describing SAT-9 Reading Achievement

6

Sampling

Condition NCE SS NCE SS

25% 24.5 23.7 9.3 9.2

50% 24.4 25.5 9.6 9.7

75% 24.5 26.4 9.2 9.3

25% 43.8 52.2 16.8 16.8

50% 42.7 51.9 16.4 16.575% 42.9 52.3 16.1 16.1

Math

Model 2 to 4

Model 1 to 4

Reading

Percent Reduction in Between School Variation in Growth

7

Sample Test Type Initial Status Growth GrowthR25 Read 0.939 0.897 0.754

Math 0.969 0.954 0.847

R50 Read 0.942 0.895 0.75

Math 0.972 0.956 0.856

R75 Read 0.943 0.896 0.748

Math 0.973 0.955 0.858

Spearman Correlation Kendall (Tau) Correlation

Initial Status0.817

0.878

0.873

0.821

0.88

0.826

Correlations between Estimated Coefficients – Model 4

11

Model 1Model 2

Model 3Model 4

R25

R50

R750.700.720.740.760.780.800.820.840.860.880.900.920.940.960.981.00

0.98-1

0.96-0.98

0.94-0.96

0.92-0.94

0.9-0.92

0.88-0.9

0.86-0.88

0.84-0.86

0.82-0.84

0.8-0.82

0.78-0.8

0.76-0.78

0.74-0.76

0.72-0.74

0.7-0.72

Correlation Pattern between Sampling Condition and Model – Reading SAT-9 Growth

12

-0.89

-0.88

-0.87

-0.86

-0.85

-0.84

-0.83

-0.82

-0.81

-0.80

-0.79

0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73

Efect Size of Growth (scale scores)

Rela

tive B

ias

in G

row

thComparison of relative Bias to the Effect Size

of Growth

13

Program Effect Size for Initial Status (scale score)

.09.07.05.03.01-.01-.03-.05-.07-.09

Re

lative

Bia

s f

or

Intia

l S

tatu

s

5

4

3

2

1

0

-1

-2

-3

-4

-5

Relationship between Relative Bias in NCEs for Initial Status

14

<Model 4>Agreement

Sample Test Type Initial Status Growth Initial Status Growth Initial Status Growth (Growth)

R25 Read 0.988 0.954 1.149 0.348 -0.008 0.004 0.985

Math 0.982 0.94 0.793 0.812 0 -0.003 0.992

R50 Read 0.99 0.95 0.965 0.693 -0.004 0.003 0.994

Math 0.984 0.94 0.966 0.88 0.003 -0.004 0.999

R75 Read 0.985 0.94 0.265 1.146 -0.002 0.002 1

Math 0.99 0.948 0.71 0.878 0.004 -0.004 1

Correlation Effect Size Ratio Effect Size Difference

17

Program Effect Size for Growth (scale scores)

.03.02.010.00-.01

Re

lative

Bia

s f

or

Gro

wth

5

4

3

2

1

0

-1

-2

-3

-4

-5

Relationship between Relative Bias in NCEs for Growth

15

Longitudinal School Productivity Model (LSPM)

Research Questions:

Inferences affected by test metric (Scale Scores Vs. NCEs)?

Estimates for growth

Estimates of school effects

Estimates of “Type A” and “Type B” effects

LSPM

• Multiple-cohorts design (Willms & Raudenbush, 1989; Bryk et. al., 1998)

• Monitor student performance at a school for a particular grade over years

• E.g., collect achievement scores for 3rd grade students attending a school in 1999, 2000, and 2001

• Focus on schools’ improvement over subsequent years

Research Question

• To what extent does the choice of the metric matter when the focus is school improvement over time (NCE vs. scale score)?

• A Multiple-cohort school productivity model is used as the basis for inferences about school performance

3-level Hierarchical Model for measuring school improvement

Model I : Unconditional School Improvement Model

Level-1 (within-cohort) model:

Yijt = βjt0 + rijt

* βjt0 : estimates of performance for school j (j = 1,.., J) at cohort t (t = 0,1,2,3,4)

Level-2 (between-cohort, within-school) model:

βjt0 = j0 + j1Timetj + ujt

* j0 : status at the first year (i.e., Timetj = 0) or initial status for school j

* j1 : yearly improvement / growth rate during the span of time for school j

Level-3 (between-school) model:

j0 = 00 + Vj0 * 00 : grand mean initial status

j1 = 10 + Vj1 * 10 : grand mean growth rate

Model II: Student characteristics


Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt4Girlijt +βjt5Minorityijt + rijt




j0 = 00 + Vj0

j1 = 10 + Vj1

Model III: Student characteristics & School intervention indicator


Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt1Girlij4 +βjt5 Minorityijt + rijt




j0 = 00 + 01 LAAMPj + Vj0

j1 = 10 + 11 LAAMPj + Vj1

Model IV: Full Model


Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt1Girlij4 +βjt5Minorityijt + rijt




j0 = 00 + 01(%Minorityj) + 02(%LowSESj) + 03(%LEPj) +

04 LAAMPj + Vj0

j1 = 10 + 11(%Minorityj) + 12(%LowSESj) + 13(%LEPj) +

14 LAAMPj + Vj1

Comparison of Key Parameters:NCE vs. Scale Score

Question Parameter

1) School mean initial status

j0

2) School improve / growth rate

j1

3) Type A effect 11

(%Minorityj) + 12

(%LowSESj) + 13

(%LEPj) +

14

LAAMPj + Vj1

4) Type B effect 14

LAAMPj + Vj1

• Type A effect: includes effects of school policies and

practice, educational context, and wider social influences

• Type B effect: Includes the effects of tractable policies and practices, but excludes factors that lie outside the control of the school

NCE Results vs. Scale Score Results

• School Ranking (rank-order corr.)

• School improvement / growth rate parameter (corr. between estimates)

• Effect size (par. est. / s.d. of outcome)

• Statistical significance of the effect of the school intervention indicator variable

Conclusion

• NCE vs. scale score – for the purpose of measuring school improvement under the multiple-cohort design:

• Differences are minimal in terms of:

school ranking

school improvement / growth rate

effect size

statistical significance of the effect of the school intervention indicator

Conclusion (cont’d)

• Results are consistent across sampling conditions, models, and content area

Demographic

Samples

Average# Of Measures % of #of StudentsPer Student Sample 3 yrs 4 yrs 5 yrs 6 yrs 7 yrs

7 100% 17,349 23,132 28,915 34,698 40,4816 85% 17,349 22,265 27,181 32,097 37,0135 70% 17,349 21,397 25,445 20,493 33,5414 55% 17,349 20,530 23,711 26,892 30,073

Reduction in Standard Error (SE) for Average Growth b/t Schools

3 4 5 5 6 6 7 100%

Baseline Model 0.229 0.223 0.228 0.292Full Model 0.260 0.296 0.351 0.449

85%Baseline Model -0.010 -0.008 0.068Full Model 0.060 0.107 0.272

70%Baseline Model 0.013 0.022 0.119Full Model 0.054 0.144 0.259


0

0.1

0.2

0.3

0.4

0.5

Number of Occasions

Prop

ortio

n SE

Red

uctio

n

Full Model Baseline Model

100% Sample SE Reduction AVG GROWTH

-0.1

0

0.1

0.2

0.3

0.4

0.5

Number of Occasions

Prop

orti

on S

E Re

duct

ion

85% Full 70% Full 55% Full 85% Baseline 70% Baseline 55% Baseline

85%-70%-55% Samples SE Reduction AVG GROWTH

Reduction in Standard Error (SE) for Average Status b/t Schools

100%Baseline Model 0.042 0.042 0.039 0.051Full Model 0.046 0.048 0.042 0.054

85%Baseline Model 0.000 -0.003 0.011Full Model 0.002 -0.003 0.010

70%Baseline Model -0.002 -0.004 0.012Full Model 0.000 -0.002 0.014

55%Baseline Model -0.002 -0.003 0.012Full Model 0.000 0.000 0.013

100% Sample SE Reduction AVG STATUS

0.030

0.035

0.040

0.045

0.050

0.055

0.060

Number of Occasions

Pro

port

ion

SE R

educ

tion

Full Model

Baseline Model

85%-70%-55% Samples SE Reduction AVG STATUS

-0.006

-0.004

-0.002

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

Number of Occasions

Pro

port

ion

SE R

ed

uct

ion

85% Full

70% Full

55% Full

85% Baseline

70% Baseline

55% Baseline

Reduction in Tau for Average Growth b/t Schools

100%Baseline Model 0.279 0.368 0.346 0.522Full Model 0.451 0.451 0.466 0.623




0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of Occasions

Prop

ortio

n Ta

u Re

duct

ion

Full Model

Baseline Model

100% Sample Tau Reduction AVG GROWTH

0.00

0.050.10

0.150.20

0.250.30

0.350.40

0.45

Number of Occasions

Prop

ortio

n Ta

u Re

duct

ion


85%-70%-55% Samples Tau Reduction AVG GROWTH

Reduction in Tau for Average STATUS b/t Schools

100%

Baseline Model 0.066 0.061 0.047 0.062Full Model 0.070 0.065 0.053 0.068

85%Baseline Model -0.006 -0.017 0.002Full Model -0.005 -0.015 0.004

70%Baseline Model -0.002 -0.007 0.017Full Model -0.001 -0.004 0.020

55%Baseline Model 0.001 -0.005 0.019Full Model 0.002 -0.004 0.022

100% Sample Tau Reduction AVG STATUS

0.03

0.035

0.04

0.045

0.05

0.055

0.06

0.065

0.07

0.075

Number of Occasions

Pro

port

ion

Tau

Red

uct

ion

Full Model Baseline Model

85%-70%-55% Samples Tau Reduction AVG STATUS

-0.020

-0.015

-0.010

-0.005

0.000

0.005

0.010

0.015

0.020

0.025

Number of Occasions

Pro

port

ion

Tau

Redu

cti

on


Program Effect on AVG GROWTH b/t Schools by Sample and Number of Occasions

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

3 4 5 6 7

Number of Occasions

T-V

alu

e

100% SAMPLE

85% SAMPLE

70% SAMPLE

55% SAMPLE

Chart 8:Effect of Additional Occasions on Between-School

Achievement Growth Reliabilities

0.40

0.45

0.50

0.55

0.60

0.65

0.70

1 2 3 4 5 6 7 8 9

Number of occasions

Rel

iabili

ty

EBXS1

EBXS2

EBXS3

EBXS4

EBGRTH01

EBGRTH12

EBGRTH34

EBGRTH02

EBGRTH13

EBGRTH24

EBGRTH03

EBGRTH14

EBGRTH04

Correlations Among EB Estimates

Chart 9: Relationship among Correlations of Empirical Bayes Estimates of School Performance

0.75-1.00

0.50-0.75

0.25-0.50

0.00-0.25

-0.25-0.00

-0.50--0.25

-0.75--0.50

THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT June 20 – 23 2004 Boston, MA

Documents

Transcript of THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT June 20 – 23 2004 Boston, MA