Newton-Raphson Estimate Procedure in Test Assembly
-
Upload
widiatmoko -
Category
Documents
-
view
296 -
download
1
description
Transcript of Newton-Raphson Estimate Procedure in Test Assembly
ESTIMATION TO EXAMINEE PARAMETER USING ESTIMATION TO EXAMINEE PARAMETER USING NEWTON-RAPHSON METHOD: AN APPLICATION NEWTON-RAPHSON METHOD: AN APPLICATION
FOR LANGUAGE TESTINGFOR LANGUAGE TESTING
WidiatmokoWidiatmokoE.: [email protected].: [email protected]
Center for Languages Teacher Training and Center for Languages Teacher Training and Development, MOEDevelopment, MOE
Presented at the Annual Conference on Presented at the Annual Conference on LinguisticsLinguistics
Atma Jaya Catholic University JakartaAtma Jaya Catholic University Jakarta15-16 February 200615-16 February 2006
Curriculum/Syllabus Learning Process Evaluation
Why?
Mat
eria
ls
Met
hod,
A
ppro
ach,
T
echn
ique
Eva
luat
ion
Steps in the Construction of Steps in the Construction of Calibrated ItemsCalibrated Items
Outlining Topics Domain Specification
Constructing Items
Reviewing and Revising Items
Items Try-outSelecting Items
Good Items Items Calibration (Estimation)
Item Banking
Test & ItemsTest & Items
Test:Test:questions to measure examinee’s trait in a situationquestions to measure examinee’s trait in a situationexaminer focuses on what the examinees are like in their examiner focuses on what the examinees are like in their norm groupsnorm groupstest: easy, the examinee: higher ability; vice versa test: easy, the examinee: higher ability; vice versa (Hambleton, Swaminathan, & Rogers(Hambleton, Swaminathan, & Rogers, , 1991; Naga, 1992) or1991; Naga, 1992) or
Items: Items: their statistics subject to change or inconsistent depending their statistics subject to change or inconsistent depending
upon the groups’ traits of examineesupon the groups’ traits of examinees designed based on the aforementioned judgmentdesigned based on the aforementioned judgment difficulty (i.e., proportion of examinees passing the item) and difficulty (i.e., proportion of examinees passing the item) and
discrimination (i.e., item-total test biserial or point biserial) discrimination (i.e., item-total test biserial or point biserial) are group-dependent; implying that the values of these are group-dependent; implying that the values of these statistics depend on the examinee group in which they are statistics depend on the examinee group in which they are acquired (Magnusson, 1967; Hambleton, 1989)acquired (Magnusson, 1967; Hambleton, 1989)
IRTIRT
► IRT used by nearly all of the largest IRT used by nearly all of the largest test publishers, many state test publishers, many state departments of education, and departments of education, and industrial and professional industrial and professional organizations (Hambleton & Murray, organizations (Hambleton & Murray, 1983; Hambleton, 1989)1983; Hambleton, 1989)
IRTIRT: Local independence: Local independence composite scores of items by the composite scores of items by the
homogeneous subpopulation of examinees homogeneous subpopulation of examinees which are independent (Naga, 1992 cited in which are independent (Naga, 1992 cited in Widiatmoko, 2005)Widiatmoko, 2005)
responses to any two items are uncorrelated responses to any two items are uncorrelated in a homogeneous subpopulation with a in a homogeneous subpopulation with a particular level of particular level of (Hulin, Drasgow, & (Hulin, Drasgow, & Parsons, 1983)Parsons, 1983)
within any group of examinees all within any group of examinees all characterized by the same values characterized by the same values θ1, θ2, ..., θ1, θ2, ..., θkθk, the (conditional) distributions of the item , the (conditional) distributions of the item scores are all independent of each other scores are all independent of each other (Lord & Novick, 1968) and (McDonald, 1999) (Lord & Novick, 1968) and (McDonald, 1999)
IRT:IRT: Parameter invariance Parameter invariance
►parameters characterising an item do parameters characterising an item do not depend on the trait distribution of not depend on the trait distribution of the examinees and the parameter the examinees and the parameter characterising an examinee does not characterising an examinee does not depend on the set of test items depend on the set of test items ((Hambleton, Swaminathan, & RogersHambleton, Swaminathan, & Rogers, , 1991)1991)
IRT:IRT: Unidimension Unidimension
►presence of a dominant component or presence of a dominant component or factor influencing test performance factor influencing test performance ((Hambleton, Swaminathan, & RogersHambleton, Swaminathan, & Rogers, , 1991)1991)
► item that measures one trait or item that measures one trait or characteristic over the examinees characteristic over the examinees (Traub, 1983; Naga, 1992)(Traub, 1983; Naga, 1992)
Language testsLanguage tests
►designed using the concept of IRT designed using the concept of IRT in in discrete-points test paradigm (Weir, discrete-points test paradigm (Weir, 1990) 1990)
►discrete-points test focusing on one discrete-points test focusing on one point of grammar at a time; at only point of grammar at a time; at only one element of a particular component one element of a particular component of a grammar; only one skill at a time of a grammar; only one skill at a time and one aspect of a skill (Oller, 1979) and one aspect of a skill (Oller, 1979)
Is the test characteristic curve Is the test characteristic curve generated by Newton-Raphson method generated by Newton-Raphson method satisfied in one-parameter logistic satisfied in one-parameter logistic model?model?
Question formulatedQuestion formulated
Theory: Parameter Theory: Parameter estimationestimation
► determining the value determining the value of an examinee’s trait with of an examinee’s trait with adequate precision and classifying an examinee into adequate precision and classifying an examinee into trait categories with small probabilities of trait categories with small probabilities of misclassification (misclassification (Lord & Novick, 1968)Lord & Novick, 1968)
► incorporates item parameter and examinee incorporates item parameter and examinee parameter parameter
► basic consideration: that parameter estimates are basic consideration: that parameter estimates are chosen by selecting the values that make an chosen by selecting the values that make an observed data set appear most likely in light of a observed data set appear most likely in light of a particular model (particular model (Hulin, Drasgow, & Parsons,Hulin, Drasgow, & Parsons, 1983) 1983)
► item parameter: estimation to item difficulty and item parameter: estimation to item difficulty and discrimination discrimination
► examinee parameter: estimation to examinee’s traitexaminee parameter: estimation to examinee’s trait► concerns item banking concerns item banking
IRT modelsIRT models
►1PL, 2PL, 3PL, and 4PL models 1PL, 2PL, 3PL, and 4PL models ►systematic procedure for considering systematic procedure for considering
and quantifying the probability or and quantifying the probability or improbability of individual item and improbability of individual item and examinee’s response patterns in a set examinee’s response patterns in a set of test data (Henning, 1987)of test data (Henning, 1987)
►appropriate for dichotomous dataappropriate for dichotomous data►distinction among the models: the distinction among the models: the
numbers of parametersnumbers of parameters
Model parametersModel parameters
►11stst: scale of examinee’s trait and item : scale of examinee’s trait and item difficulty difficulty
►22ndnd: continuous estimate of : continuous estimate of discriminability discriminability
►33rdrd: index of pseudo chance-level : index of pseudo chance-level (guessing) (guessing)
►44thth: index of carelessness by the high : index of carelessness by the high achiever (Hambleton, 1989)achiever (Hambleton, 1989)
1PL model1PL model
► widely usedwidely used► probabilistic where the examinees and items are probabilistic where the examinees and items are
not only graded for trait and difficulty, but also not only graded for trait and difficulty, but also judged according to the probability or likelihood of judged according to the probability or likelihood of their response patterns given the observed their response patterns given the observed examinee’s trait and item difficulty (Henning, 1987)examinee’s trait and item difficulty (Henning, 1987)
► assumption all items are equally discriminating, assumption all items are equally discriminating, ► application is for relatively easy tests, application is for relatively easy tests, ► much smaller sample sizes are required if the main much smaller sample sizes are required if the main
purpose is to estimate purpose is to estimate θ θ ((Hulin Hulin et alet al. cited in . cited in Crocker & Algina, 1986) Crocker & Algina, 1986)
N-R MethodsN-R Methods
►finds zeros of the next derivatives of finds zeros of the next derivatives of maximized function (Krass, 2005). maximized function (Krass, 2005).
►obtains results where the drift of obtains results where the drift of parameter estimates arrested and the parameter estimates arrested and the parameters estimated more accurately parameters estimated more accurately than with the joint maximum likelihood than with the joint maximum likelihood procedure (Swaminathan, Hambleton, procedure (Swaminathan, Hambleton, Sireci, Xing, & Rizavi, 2003)Sireci, Xing, & Rizavi, 2003)
MethodologyMethodology
► a surveya survey► purposive random sampling purposive random sampling ► the population: 45 items along with their the population: 45 items along with their bibi and and
2000 examinees responding the items2000 examinees responding the items► 40 40 bibi randomly as the subpopulation of items randomly as the subpopulation of items ► only examinees respond the items correctly and only examinees respond the items correctly and
incorrectly are purposively selected data incorrectly are purposively selected data ► 70 examinees responding the items randomly as 70 examinees responding the items randomly as
the subpopulation of examinees the subpopulation of examinees ► the research analysis units: 40 the research analysis units: 40 bibi and 70 examinees and 70 examinees
responding the items responding the items ► the values of examinees’ latent traits analyzed the values of examinees’ latent traits analyzed
AnalysisAnalysis► initial initial θ θ of examinees’ latent traits extend from -1.735 to of examinees’ latent traits extend from -1.735 to
+3.664 +3.664 ► the first iteration includes the examinees 10, 20, 30, and 50 the first iteration includes the examinees 10, 20, 30, and 50 ► the second iteration includes the examinees 5, 9, and 39 the second iteration includes the examinees 5, 9, and 39 ► the third iteration includes the examinees 1, 2, 4, 7, 8, 11, 12, the third iteration includes the examinees 1, 2, 4, 7, 8, 11, 12,
13, 14, 15, 17, 18, 19, 21, 22, 24, 25, 27, 28, 31, 32, 34, 35, 13, 14, 15, 17, 18, 19, 21, 22, 24, 25, 27, 28, 31, 32, 34, 35, 37, 41, 42, 43, 44, 49, 51, 53, 54, 55, 57, 59, 61, 63, 64, 65, 37, 41, 42, 43, 44, 49, 51, 53, 54, 55, 57, 59, 61, 63, 64, 65, 67, 68, and 69 67, 68, and 69
► the fourth iteration includes the examinees 3, 6, 16, 23, 26, the fourth iteration includes the examinees 3, 6, 16, 23, 26, 29, 36, 38, 45, 46, 47, 56, 58, and 66 29, 36, 38, 45, 46, 47, 56, 58, and 66
► the fifth iteration includes the examinees 33, 40, 48, 52, 62, the fifth iteration includes the examinees 33, 40, 48, 52, 62, and 70 and 70
► the sixth iteration includes the examinee 60 the sixth iteration includes the examinee 60 ► it results in the examinees’ traits it results in the examinees’ traits θθ varying from -1.735 to varying from -1.735 to
+2.912 +2.912
Test Characteristic Curve for 1PL
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-1.74 -1.54 -1.19 -0.98 -0.79 -0.60 -0.41 -0.30 -0.12 0.08 0.18 0.63 0.90 1.05 1.38 2.06 2.91
Examinees Latent Trait
Prob
abili
ty o
f Exa
min
ees
Late
nt T
rait
ConclusionConclusion
►1PL model is not sufficiently satisfied. 1PL model is not sufficiently satisfied. Hypothetically, it may be due to the Hypothetically, it may be due to the number of examinees, the method number of examinees, the method employed, the model chosen, the test employed, the model chosen, the test length, and the other factors length, and the other factors
Recommendation & Recommendation & ImplicationImplication
►continuous studycontinuous study► in language testing, recommended to in language testing, recommended to
employ some methods of estimation for employ some methods of estimation for the widely ranged test items using 2PL the widely ranged test items using 2PL model, 3PL model, and other models model, 3PL model, and other models
►computer programs for the sake of the computer programs for the sake of the accurate and quick iterationaccurate and quick iteration
► item banking item banking
Questions
C U Next Year