University of Ostrava Czech republic 26-31, March, 2012.
-
Upload
conrad-conley -
Category
Documents
-
view
214 -
download
0
Transcript of University of Ostrava Czech republic 26-31, March, 2012.
![Page 1: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/1.jpg)
University of OstravaCzech republic
26-31, March, 2012
![Page 2: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/2.jpg)
Different forms of a test
Item banking
Achievement monitoring
![Page 3: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/3.jpg)
Classical Test Theory Item ResponseTheory
It is applied only for different test forms equating
It is often ignored (conception of parallel test forms)
Establishes equivalent scores on different test forms
Doesn’t create a common scale
Allows to satisfy all equating needs
Allows to put all estimates of item and examinee parameters to the common scale
![Page 4: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/4.jpg)
It is a special procedure that allows to establish relation between examinee scores on different test forms and place them onto the same scale.
As a result, measure based on responses to one test can be matched to a measure based on responses to another test, and the conclusions drawn about examinee are identical, regardless of the test form that produced the measure.
Equating of different test forms is called horizontal equating.
![Page 5: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/5.jpg)
The purpose: comparison of student achievements at different grade levels
Test forms are designed to be of different difficulties
Measures from different tests should be placed on the same linear continuum
Procedure of this test equating is called vertical equating.
![Page 6: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/6.jpg)
• Item bank – a set of items from which test forms that create equivalent measures may be constructed.
• Item bank is composed of a set of test items that have been placed onto a common scale, so that different subsets of these items produce interchangeable measures for an examinee.
• In the presence of item bank we dont need in further equating
![Page 7: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/7.jpg)
Both are designed to place estimated parameters onto a common scale
In test equating the goal is to place person measures from the multiple test forms onto the same scale
In item banking the goal is to place item calibrations on the same scale
Procedures are nearly identical when we use Rasch measurement
![Page 8: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/8.jpg)
Equating – procedure that ensures the examinee measures obtained from different subsets of items are interchangeable. When two tests are equated, the resulting measures are placed onto the same scale.
Scaling – procedure that associates numbers with the performance of examinees. Tests can be scaled identically, but have not been equated.
![Page 9: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/9.jpg)
Applies only to compare examinee test scores on two different test forms
A problem can be ignored (introduction of “parallel” test froms)
Implies only an establishment of relation between test scores on different test forms
Doesn’t imply creation of a common scale
![Page 10: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/10.jpg)
Linear equating
Equipercentile equating
![Page 11: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/11.jpg)
It is based on equating the standard score on test X to the standard score on test Y:
Thus, , where
,
BxAy
x
yA
xyBx
y
yx
yyxx
![Page 12: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/12.jpg)
Scores on tests X and Y are considered to be equivalent if their respective percentile ranks in any given group are equal.
![Page 13: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/13.jpg)
Both methods require assumptions concerning identity of test score destrubutions and about equivalence of examinee groups
Equating in CTT doesn’t imply creation of a common scale
![Page 14: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/14.jpg)
Measuring the same trait – tests of different content can not be equated (but can be scaled in a similar manner).
Invariance of equating results across samples of examinees
Independence of equating results on which test is used as a reference test
![Page 15: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/15.jpg)
• Method of common items: linkage between two test forms is accomplished by means of a set of items which are common for two test forms
• Method of common persons: linkage between
two test forms is accomplished by means of a set of persons who respond to both test forms
• Combined methods: linkage between two test forms is accomplished by means of common items and / or common persons plus common raters
![Page 16: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/16.jpg)
Internal anchor: Each test form has
one set of items that is shared with other forms and another set of items that is unique to this form
![Page 17: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/17.jpg)
External anchor:
Each test form has an additional set of items, that are not from these test forms
![Page 18: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/18.jpg)
Involving all examinees respond both test forms.
There are two approaches to this design:
- same group/ same time
- same group/ different time
![Page 19: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/19.jpg)
Linkage between two test forms is accomplished by means of a set of examinees who respond to all items.
![Page 20: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/20.jpg)
Selecting an equating method Parameter estimation Transformation of parameters from
different test froms to the same scale Evaluating the quality of the links between
test froms
![Page 21: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/21.jpg)
Simultaneous calibration: all parameters are estimated simultaneously in one run of the estimation software. Data are automatically scaled to the same scale.
Separate calibration: parameters are estimated for each test form separately. That is, the data are calibrated in multiple runs of the estimation software.
Separate calibration may be more difficult to accomplish because the test developer needs to transform measures to a common scale
![Page 22: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/22.jpg)
Separate calibration of all test forms with transformating measures to the common scale
Simultaneous calibration of all test forms and placing all measures on the common scale
Separate calibration of all test forms with anchoring the difficulty values of the common items and consecutive placing all parameters on the common scale
![Page 23: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/23.jpg)
As a rule this procedure is used with method of common items that are called nodal items in this case
Each test form is calibrated separately. As a result for each test form all estimates lie on the own scale. The only difference between scales is in difference between origins of the scales
This difference can be removed by means of calculating location shift
It is desirable to have not less that 15-20 % nodal items (some of them can be deleted from the link later).
![Page 24: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/24.jpg)
Choice of a common scale Selection of nodal items Calibration of all test forms Calculating equating constants Link quality evaluation Transformating all parameters onto a common
scale
![Page 25: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/25.jpg)
t12 – shift constant from test form 1 to test form 2; δi1 – difficulty estimate of item i in test from 1;δi2 – difficulty estimate of item i in test from 2;l – the number of common items.
Sometimes other formulas are applied - weighted mean, dispersion shift, etc.
lt
l
iii
1
12
12
)(
![Page 26: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/26.jpg)
δi1' = δi1 + t12 ,
where δi1 – difficulty estimate for item i in test form 1;
δi1' – difficulty estimate for the same item on the scale of test
form 2, i=1,…,k, k – the total number of test items;
θn1'= θn1 + t12,
where θn1 – ability estimate for examinee n who respond items of test form 1; θn1
' – ability estimate for the same examinee on the scale of test form 2, n=1,…, N; N – the total number of examinees who respond items of test form 1.
Shifted by this way parameter estimates of test from 1 will be placed to the scale of test form 2.
![Page 27: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/27.jpg)
Item-within-link (fit analysis of linking items);
Item-between-link (stability of the item calibrations between two test forms)
![Page 28: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/28.jpg)
where σi12 is defined by σi122 = σi1
2+ σi22 ;
σi1
, σi2 - standard errors of measurement for item i under
calibration of test form 1 and 2;
δi1 - difficulty estimate for item i in test form 1; δi1
' - difficulty estimate for the same item on the scale of test form 2; Ui ~ N(0,1)
12
11
i
iiiU
![Page 29: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/29.jpg)
All parameters of all test forms are estimated simultaneously
Is the simplest approach to equating test forms or calibrating an item bank because it requires no subsequent transformation of the estimated measures or calibrations. Data are automatically scaled to the same scale in one run the estimation software
![Page 30: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/30.jpg)
![Page 31: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/31.jpg)
As a rule this procedure is used with method of common items that are called anchor items in this case
Common items are estimated one time during calibration of the first test form
During calibration of another test form the calibration values for these items are treated as being fixed or known and are not estimated. As a result, the remaining parameter estimates are forced onto the same scale as the anchor items
It is easy to anchor items in most estimation software
![Page 32: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/32.jpg)
IAFILE=* 2 -0.29 4 -1.06 8 -0.49 11 -0.04 17 -0.28 37 -2.20 38 -1.34 *
Numbers of anchor items and their difficulties are specified. These difficulty values will be fixed and not be estimated during calibration of new test form
![Page 33: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/33.jpg)
Choice of a common scale Selection of anchor items Calibration of the test form which scale is accepted as a
common scale Sequential calibration of other test forms with fixing the
difficulty values of anchor items Item-Within Link Fit (fit analysis of linking items);
![Page 34: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/34.jpg)
If we use different equating procedures, obtained scales will be different and can not be directly compared. It is connected with different ways of origin selection in different procedures.
There are papers (for example, Smith R.M. «Applications of Rasch Measurement». Chicago: Mesa Press. -1992) where all three procedures are analyzed. The precision of estimated examinee and item parameters is approximately the same and correlation between measures is high.
![Page 35: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/35.jpg)
Each test form has 26 dichotomous items Both test forms have 6 common items: № 4, 6, 7, 14, 20,
24 (23 % of the total number of items) The total number of examinees for test form 1 is 654, for
test form 2 - 661 For test calibration Winsteps software was used Means of examinee measures are -1,07 и -0,72 logits for
test form 1 and 2 correspondingly The first test form scale was chosen as a common scale
![Page 36: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/36.jpg)
Item numbe
r
Test form 1 Test form 2
ui
Difficulty
estimateδi
Standard Error
σi
Difficulty
estimateδi
Standard Error
σi
Shifted Difficul
ty estimate
δi'
4 -1.39 0.09 -1.07 0.09 -1.368 -0.176 -0.93 0.1 -0.54 0.09 -0.838 0.697 -2.57 0.1 -1.99 0.1 -2.288 2.014 -0.44 0.1 -0.32 0.09 -0.618 -1.3320 0.88 0.12 0.96 0.11 0.662 -1.34Sum -4.45 -2.96 -4.45Mean -0.89 -0.592 -0.89
Shift constant t12= - 0,298.
![Page 37: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/37.jpg)
It implies creation of a common response matrix for both test forms containing 1315 examinees and 46 different items.
Measures of all examinees and difficulty values of all items will be placed on a common scale that is centered in the difficulty mean of all 46 items
![Page 38: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/38.jpg)
Calibration of test form 1 Calibration of test form 2 with fixing the difficulty values of anchor
items from the first calibration IAFILE=*
4 -1.39
6 -0.93
7 -2.57
14 -0.44
20 0.88
* As a result examinee measures from both test forms will be on
the first test form scale
![Page 39: University of Ostrava Czech republic 26-31, March, 2012.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bf951a28abf838c91032/html5/thumbnails/39.jpg)
Comparison of examinee measures from three equating procedures revealed approximately similar results: correlation is closed to 1
The choice of equating procedure is determined
by the real data design and purpose of research