Estimating Growth when Content Specifications Change:

Estimating Growth when Content Specifications Change:A Multidimensional IRT Approach

Mark D. ReckaseTianli LiMichigan State University

The Problem

State curriculum frameworks often change from one grade to the next reflecting the addition of new instructional content. For example, at grade 7 algebra may be introduced as an

instructional goal. At grade 6, algebra is not an important component of the

curriculum. Tests at the two grades reflect the instructional content

so the 6th grade test does not include algebra and the 7th grade test does.

How can the score scales of these tests be linked?

Research Questions

What do changes on the linked score scale mean, when the scale is produced using the usual unidimensional IRT models?

Can multidimensional IRT be used to form vertical scales? If so, how do the results compare to the unidimensional results?

The Approach

State testing data were analyzed using multidimensional IRT to develop a realistic model for the test data at two grade levels.

The results of the real data analyses were idealized to create the specifications for simulating the tests at two grade levels.

Simulate data with known structure to determine how unidimensional and multidimensional procedures function.

The Simulated Data Design

Grade 6 – two major constructsArithmeticProblem Solving

Grade 7 – three major constructsArithmeticProblem SolvingAlgebra

Simulated Test Structure

Test Level Algebra Arithmetic Problem Solving

Total

Grade 6 0 17 (4) 23 (6) 40 (10)

Grade 7 11 (0) 11 (4) 18 (6) 40 (10)

Note: The numbers in parentheses are the common items between the two forms of the tests.

Mean Vectors at each Grade Level

Class Level Algebra Arithmetic ProblemSolving

Grade 6Grade 7

-1.5 (-1.50)0 (.03)

.5 (.51)

.7 (.73)-.2 (-.21)0 (.01)

Note: Values in parentheses are the observed means from the simulated data

Covariance MatricesCovariance Matrix for Grade 6

Algebra Arithmetic Problem Solving

Algebra .25 (.25) 0 (.00) 0 (.00)

Arithmetic 0 (.00) .8 (.84) .7 (.76)

Problem Solving

0 (.00) .7 (.76) 1.2 (1.29)

Covariance Matrix for Grade 7

Algebra Arithmetic Problem Solving

Algebra 1 (1.05) .4 (.42) .6 (.64)

Arithmetic .4 (.42) .6 (.60) .3 (.32)

Problem Solving

.6 (.64) .3 (.32) 1 (1.02)Note: Values in parentheses are estimated from the simulated data.

Orientation of Items

-2-1.5 -1

-0.50 0.5

1

-1

0

1

2-2

-1.5

-1

-0.5

0

0.5

1

1.5

1

2

3

Effect Size Built into Data

Algebra ArithmeticProblem Solving

1.9 .26 .21

Unidimensional Basisfor Comparison Imagine that the full set of 70 items from both

test levels are administered to the students at both grade levels.

The matrix of 2000 + 2000 students from the two grades by 70 items can be analyzed with the unidimensional models to serve as a basis for comparison for the vertical scaling result.

Analyze the matrix using 2pl and Rasch model.

2PL Solution

-2

-1

0

1

2

-1

0

1

2-2

-1

0

1

2

1

2

3

Rasch Model Solution

-2

-1

0

1

2

-1

0

1

2-3

-2

-1

0

1

2

1

2

3

Vertical Scaling Analysis

Common-item concurrent calibration BILOGMG

Off grade items coded as not reachedBoth 2pl and Rasch model used for analysis

Determine effect size of difference in mean of two grade levels

Vertically Scaled Effect Sizes

2PL Model70 Items

Rasch Model

70 Items

2PL ModelConcurrent

Rasch Model

Concurrent

Mean (SD)Grade 6

-.54 (.78) -.42 (.93) -.22 (1.16) -.14 (1.06)

Mean (SD)Grade 7

.56 (1.13) .45 (1.15) .26 (1.20) .21 (1.38)

Effect Size 1.13 .83 .41 .28

Vertically Scaled Effect Sizes

Linked effect size is smaller than full data effect size.

Rasch effect size is less than 2pl effect size.

Full data set effect size is less than modeled effect size.

Alternative Linking Method

Common-item, separate calibration

Common item parameter relationship was poor

-2 -1.5 -1 -0.5 0 0.5 1-2

-1.5

-1

-0.5

0

0.5

1

b-parameters Grade xb-

para

met

ers

Gra

de x

+ 1

MIRT Analysis

Full data analysis with TESTFACTThree dimensional analysisDetermine effect size for each dimensionCorrelate each estimated with the

generating s to determine meaning of the results.

MIRT Effect Sizes

θ1 θ2 θ3

Mean (SD) Total

.01 (.95) -.01 (.90) .05 (.72)

Mean (SD) 6 -.57 (.54) .16 (.99) .03 (.74)

Mean (SD) 7 .60 (.90) -.19 (.77) .06 (.69)

Effect Size 1.56 -.40 .05

Correlation between Trueand Estimated Values

Est θ1 Est θ2 Est θ3

True θ1 .92 -.08 .02

True θ2 .47 .50 -.18

True θ3 .46 .80 -.03

Interpretation of MIRT Solution

Results are difficult to interpret because of the default procedures in TESTFACT.

Solution needs to be rotated to have axes align with content dimensions.

Current solution shows that is related to algebra and shows the big algebra effect.

is a combination of arithmetic and problem solving with the emphasis on problem solving. Most likely it has the sign of the a-parameters

reversed.

Concurrent MIRT Analysis

Use concurrent calibration of data from the two grade levels.Three dimensional solutionNo rotation

Determine effect sizes and correlations with true values.

Concurrent MIRT Calibration

θ1 θ2 θ3

Mean (SD) Total

.06 (.75) -.09 (.57) -.38 (1.01)

Mean (SD) 6 -.02 (.87) -.29 (.56) .18 (.64)

Mean (SD) 7 .14 (.59) .10 (.50) -.94 (.99)

Effect Size .22 .74 -1.34


Est θ1 Est θ2 Est θ3

True θ1 .16 .57 -.87

True θ2 .54 .02 -.40

True θ3 .77 -.05 -.43


Scale on Dimension 3 is reversed and it has a large effect size (algebra).

Dimension 1 is most related to arithmetic and problem solving with a moderate effect size.

Dimension 2 is moderately related to algebra and has a large effect size.

The overall result gives a reasonable estimate of effects, but the dimensions need to be rotated to match the constructs.

Conclusions

Unidimensional linking of the two level tests underestimate the effect size.

Rasch model gives a smaller effect size than the two parameter logistic model.

MIRT solution shows promise. Need to determine how to rotate solution to match

constructs. TESTFACT has problems converging on estimates

because of mismatch between assumptions and reality.

Estimating Growth when Content Specifications Change:

Documents

Transcript of Estimating Growth when Content Specifications Change: