DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto,...

29
DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses by a Modified HYBRID Model. INSTITUTION Educational Testing Service, Princeton, N.J. REPORT NO ETS-RR-95-16 PUB DATE Jul 95 NOTE 29p.; Paper presented at a symposium titled "Applications of Latent Trait and Latent Class Models in the Social Sciences" (Hamburg, Germany, May 1994). PUB TYPE Reports Evaluative/Feasibility (142) Speeches/Conference Papers (150) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS *Ability; College Students; Estimation (Mathematics); Higher Education; *Item Response Theory; *Psychometrics; Reading Comprehension; *Reading Tests; *Timed Tests IDENTIFIERS *HYBRID Model; Item Parameters; Latent Class Models; *Response Patterns; Speededness (Tests) ABSTRACT This study demonstrates the utility of a HYBRID psychometric model, which incorporates both item response theoretic and latent class models, for detecting test speededness. The model isolates where in a sequence of test items examinee response patterns shift from one providing reasonable estimates of ability to those best characterized by a random response pattern. The study applied the HYBRID model to three distinct data sets: (1) simulated data representing the performance of 3,000 examinees on a 70-item test; (2) data from a statewide field test (n=5,997) of a 40-item reading comprehension test with a fixed time limit; and (3) data from a study of urban university students (n=752) who took a similar 40-item reading comprehension test under varying time limits. The HYBRID model successfully identified "switch points" in examinees' item response patterns in all three data sets. This paper discusses applications of the model used to detect speededness and to provide adjusted estimates of item parameters and examinee abilities. (Contains 4 tables, 4 figures, and 38 references.) (Author) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. ***********************************************************************

Transcript of DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto,...

Page 1: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

DOCUMENT RESUME

ED 395 036 TM 025 056

AUTHOR Yamamoto, Kentaro; Everson, Howard T.TITLE Modeling the Mixture of IRT and Pattern Responses by

a Modified HYBRID Model.INSTITUTION Educational Testing Service, Princeton, N.J.REPORT NO ETS-RR-95-16PUB DATE Jul 95NOTE 29p.; Paper presented at a symposium titled

"Applications of Latent Trait and Latent Class Modelsin the Social Sciences" (Hamburg, Germany, May1994).

PUB TYPE Reports Evaluative/Feasibility (142)Speeches/Conference Papers (150)

EDRS PRICE MF01/PCO2 Plus Postage.DESCRIPTORS *Ability; College Students; Estimation (Mathematics);

Higher Education; *Item Response Theory;*Psychometrics; Reading Comprehension; *ReadingTests; *Timed Tests

IDENTIFIERS *HYBRID Model; Item Parameters; Latent Class Models;*Response Patterns; Speededness (Tests)

ABSTRACTThis study demonstrates the utility of a HYBRID

psychometric model, which incorporates both item response theoreticand latent class models, for detecting test speededness. The modelisolates where in a sequence of test items examinee response patternsshift from one providing reasonable estimates of ability to thosebest characterized by a random response pattern. The study appliedthe HYBRID model to three distinct data sets: (1) simulated datarepresenting the performance of 3,000 examinees on a 70-item test;(2) data from a statewide field test (n=5,997) of a 40-item readingcomprehension test with a fixed time limit; and (3) data from a studyof urban university students (n=752) who took a similar 40-itemreading comprehension test under varying time limits. The HYBRIDmodel successfully identified "switch points" in examinees' itemresponse patterns in all three data sets. This paper discussesapplications of the model used to detect speededness and to provideadjusted estimates of item parameters and examinee abilities.(Contains 4 tables, 4 figures, and 38 references.) (Author)

***********************************************************************

Reproductions supplied by EDRS are the best that can be madefrom the original document.

***********************************************************************

Page 2: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

A

0

U.& COPAIONIENT O IFOUCATIONOde* al Edocinorel fleawort sad tnuournitniEOUCMONAL RESOURCES INFORMADON

CENTER (ERIC)

nes *mumt has bun rutoduidft/cured from tit pima or orprozottonwomb% It

0 Minot chomps have bowl made to woorovoowoductton (middy

Potnts of vow or opinions stolid m dot duo-mint do not rtincissartly riOreltent °MoodoeRI position of pokey

"PERMISSION TO REPRODUCE THISMATERIAL HAS EN GRANTED BY

AII / _ie 7.9 u iti

TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERICI.-

RR-95-16

MODELING THE MIXTURE OF IRT AND PATT'ERNRESPONSES BY A MODIFIED HYBRID MODEL

Kentaro YamamotoHoward T. Everson

BEST COPY AVAILABLE

Educational Testing ServicePrinceton, New Jersey

July 1995

Page 3: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Modeling the Mixture of IRT and PatternResponses by a Modified Hybrid Modell

Kentaro Yamamoto2Educational Testing Service

Howard T. Everson3The College Board

March 1995

I This paper was presented at the IPN symposium entitled: Applicatiohs of Latent Trait and Latent Class Models in. the Social Sciences. at Sankelmark, University of Kiel, Hamburg , May 1994.

2 Dr. Yamamoto's work was supported by the Educational Testing Service's Program Research Planning Council.

3 Dr. Everson's work was supported, in part, by a Postdoctoral Fellowship from the Educational Testing Service.

Page 4: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Copyright e 1995. Educational Testing Service. All rights reserved.

Page 5: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.2

Modeling the Mixture of IRT and PatterneResponses by a Modified Hybrid Model

Abstract

This study demonstrates the utility of a HYBRID psychometric model, which incorporates bothitem response theoretic and latent class models, for detecting test speededness. The model isolateswhere in a sequence of test items examinee response patterns shift from ones providing reasonableestimates of ability to those best characterized by a random response pattern. The study applied theHYBRID model to three distinct data sets: (1) simulated data representing the performance of3,000 examinees on a 70-item test; (2) data from a statewide field test (n=5997 of a 40 item readingcomprehension test with a fixed time limits; and (3) data from a study of urban university students(n=752) who took a similar 40-ittm reading comprehension test under varying time limits. TheHYBRID model successfully identified "switch points" in exarninee's item response patterns in allthree data sets. This paper discusses applications of the model used to detect speededness and toprovide adjusted estimates of item parameters and examinee abilities.

Page 6: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model

INTRODUCTION

1).3

It is no secret that many standardized tests used in educational settings are administered undertimed conditions. As a result, it is rare that all examinees have all the thr^ they need when tested.Critics are quick to point out that arbitrary limits to testing time may be unfair to some groups ofexaminees, particularly groups for whom English is a second language. Some examinees, it isargued, may simply run out of time because they work slowly, or because they spend a geed dealof precious testing time deciphering the test's language. Issues of test bias may arise whenexaminees from one or another minority group are differentially affected by the test's time limits.Thus, a central problem for test developers and test users--particularly when a test is scored "rightsonly" (i.e., the total number of correct responses) and examinees are not discouraged fromresponding randomly at the end of a test (i.e., guessing)--is determining the appropriate time limitfor a test .

Unfortunately, model-based methods for determining appropriate time-limits during testdevelopment have remained elusive, particularly when examinees are not discouraged fromguessing and the proportion of examinees not reaching the last few items is low. For example, themajority of approaches available in the psychometric literature are analyses of patterns of "notreached" items. When guessing occurs, surface examinations of the examinee response patterns,i.e., the number and position of items attempted, reveal little about the solution strategies used byexaminees as testing time is exhausted. Do all examinees have sufficient time to attempt all theitems on the test, or do some respond randomly as time runs out? From the test developmentperspective, what is needed is a psychometric model that serves to define and measure th degreeof speededness when tests are scored "rights only".

The purpose of this study was to demonstrate the utility of the HYBRID model (Yamamoto,1989)--a psychometric model that combines both item response theoretic and latent classapproaches--for detecting speededness in such a test. The model's utility, we argue, derives fromits ability to detect the point in a test where a significant proportion of examinees "switch" responsestrategies from meaningful (ability driven) responses to random respondes. By applying the modelto both simulated and field test data, we show how the HYBRID model can be used in the testdevelopment phase to determine the optimal length or time-limit of a test, thereby strengthening themeasurement properties of the test.

In the following section, we briefly review a number of earlier attempts to develop model-based approaches for estimating speededness, and discuss their limitations. We then describe theHYBRID model and show how it was extended to help identify the "switch points" (i.e., points ina test where examinees response patterns "switch" from meaningful to random respondes)presumably related to the tests length and/or time limits. This description is followed by anoverview of the research design, including descriptions of the simulated and field test data analyzedin this study. We conclude by summarizing the results of our analyses and discussing the utility ofthe HYBRID model for estimating the effects of test length and test speededness during the testdevelopment phase.

Earlier ApproachesMore than forty years ago, Cronbach and Warrington (1951) argued that "test theory will beclarified if we can determine and measure degree of speeding" (p.184). Over the years a numberof researchers have attempted to develop rules of thumb and indices for measuring testspeededness. The conventional measure of test speededness is the proportion of examineesattempting all test items within the allotted time period. According to criteria used by commercialtest developers, including ETS, (Nunnally, 1978; Swineford, 1974) a test may be consideredunspeeded if (1) virtually all examinees reach 75% of the items, and (2) at least 90% of thecandidates respond to the last item. In a discussion of time limits on standardized tests, Nunnally(1978) speaks of a "comfortable time limit," which he defines as "the amount of time required for90% of the persons to complete a test under power conditions" (1978, p.632). In general, thenumber of items completed by 80% of the test takers has served as an index of the relative

Page 7: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.4

speededness of a test (Schmitt et aL, 1991). This index of speededness, however, is much lessuseful for "rights only" scored tests, where examinees are encouraged by virtue of the scoringprotocol to guess randomly as testing time runs out. Ignoring the random responses of "test-wise"examinees, no doubt, underestimates test speededness.

Modeling only omitted responses may also provide biased ability estimates. Ability estimatesderived from item response theory (IRT; Lord, 1980), for example, do not explicitly incorporatespeededness estimates. According to the conventional argument, this implies that the IRT basedability estimate is unaffected by variations in speededness or by testing time limits, although it iscommonly assumed that performance levels decline when testing time is insufficient.

A number of earlier studies (Bejar, 1985; Donlon, 1980; Secolsky, 1989; Secolsky andSteffen, 1990) bear directly on this issue. Bejar (1985), working within an IRT framework,assumed that when speed was a factor, the less able examinees would leave the more difficult testitems for last and, therefore, could be characterized by a random response pattern. He proposed anindex of speededness that compared the observed performance on the most difficult test items withthe performance predicted by an overall IRT model. Bejar's index, analogous to a chi-squarestatistic, was calculated for several ability levels. As Bejar himself noted, the method was circularbecause the IRT parameters were estimated using the items that were assumed to be affected byspeededness. More important, the speededness index did not reflect the fit of the three-parameterIRT model. He reported, for example, model misfit in the extreme high and low ability regions,regions where the IRT model parameters were least accurately estimated. As a result, Bejar's indexfailed to detect speededness in the more important regions of the ability distribution.

Secolsky (1989) attempted to address the issue of test speededness by using techniquesbased on regression methods. Secolsky worked from the premise that under power (non-speeded)test conditions, scores on items early in the test would correlate highly with scores on items at theend of the test (i.e., scores on beginning test items should predict scores on test items at the end).Under speeded conditions, the expected relationship between the two portions of the test would bedifferent.

He examined data from four administrations of the Test of English as a Foreign Language(TOEFL) and concluded that the test was slightly speeded because the observed scores of a subsetof examinees on the last 4 to 6 test items were Significantly lower than the scores predicted byperformance on the first 4 to 6 items. Unfortunately, regression methods based on such a smallnumber of items are less than reliable, and may be subject to errors of classification based on thisuncertainty. Conclusions about test speeededness may be an artifact of the unreliability inherent inthis application of regression methods. In addition, the techniques used by Secolsky (1989) weresomewhat arbitrary in that they examined speededness at a single point in the test sequence, the last4 to 6 items, and ignored individual differences in speed of responding.

In sum, these earlier approaches shared a number of shortcomings: (1) they did not examinethe performance of the procedures when the test was unspeeded; (2) they examined speededness atsomewhat arbitrary points in the test; (3) they were, for the most part, unconcerned with the bias inthe model parameters that the test's speededness may have introduced; and (4) these approachesdid not address the detection of differential speededness by subpopulations of examinees. Recentdevelopments in MT modeling, i.e., the development of a "hybrid" model (see Yamamoto, 1990for details) address these problems by changing the essential question from "Is a test speeded?" to"How speeded is a test?" In the next section, we discuss extensions to the HYBRID model thatpermit estimates of the proportion of examinees who shift or switch to a random response patternat various points in the test item sequence. This model-based approach also enables us toexaminecarefully the effects of speededness on both the item and ability parameter estimates.

THE HYBRID MODEL

Both classical test theory and item response theory (IRT), including the one-, two-, andthree-parameter normal and logistic models, use a single measurement model to characterize theexaminee responses. These traditional psychometric models are efficient methods for ordering

Page 8: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model P.5

examinees on a unidimensional ability continuum, and they work reas.mably well for tests whereexaminees use essentially the same strategy to solve the items. They are less well suited to testingsituations that are decidedly multidimensional, or when examinees switch solution strategies atvarious points in the test (Kyllonen, Lohman, and Snow, 1984; Secolsky, 1989). Currently, thereare two psychometric models that incorporate multiple response strategiesthe HYBRID model(Yamamoto, 1989) and the Mixed Strategies model (Mislevy and Verhelst, 1990). When tests areused for diagnostic purposes or academic placement decisions, more information about examinees'cognitive processing characteristics is often required Thus, there is a need for psychometricmodels that capture the qualitative nature of an individual's performance on a test.

The HYBRID model supplements the discrete latent class model (LCM) of item respondeswith a continuous IRT model. In its basic conceptualization, the model presupposes a class ofexaminees whose responses are characterized well by an IRT model. For such a group ofexaminees, differences in their response patterns do not typically indicate qualitative differences intheir solution strategies. The response pattern of other examinees, however, may be moreaccurately described by a model that places test takers in discrete classes that are associated withqualitatively different solution strategies. The HYBRID model uses both psychometric approachesto achieve an optimal fit for a sample of examinees. For detecting test speededness, the model canbe extended to capture strategy switching; in this way a subset of examinees' responses is bestdescribed by a latent class model (i.e., a guessing class) while the remainder of the responses fit anIRT model.

The HYBRID model assumes that conditional independence holds for both the IRT and thelatent class groups. The model produces three sets of parameters: (1) IRT parameters--i.e., a setof item parameters for each item and an ability parameter for each examinee; (2) an estimate of theproportion of the population of examinees in the IRT and latent classes; and (3) a set of conditionalprobability estimates for each latent class. The ability parameter, however, is useful only for theproportion of examinees that are characterized adequately by the IRT model. Parameter estimationis done via the Marginal Maximum Likelihood (MMI.,) method (Bock & Aitkin, 1981; Mislevy,1983). Additional descriptions of the MML method are found in Harwell, Baker, and Zwarts(1988) and Yamamotó (1989).

The HYBRID model uses conventional IRT parameters--i.e., a one-,two-, or three-parameter logistic. For example, the conditional probability of a correct response to item i underthe two-parameter logistic IRT model is P(xj=1 I ej,13t),with parameter values 131=(ai,bi), and

examinee ability 0j. This is given by

1p(x1=110;,A)=1.0exp(Da1(191 bi))

(1)

The probability of a correct response to item i for an examinee in a latent class k (k=2, K, ifthere are K-1 latent classes) is denoted by P(xi=1 I rk). A large number of latent classes havebeen proposed in the past. They differ in the type of constraints used for conditional probabilities.Some latent classes can be a function of P(q),Yamamoto (1987) proposed several such constraintsin addition to more traditional constraints. Some of them can be expressed as p(xi= 11rice). Inthis paper, we limit our discussion to the standard latent class models.

By the assumption of conditional independence in the IRT model as well as in the latentclass groups, the conditional probabilities of observing a response vector X under the IRT class

Page 9: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.6

and the K-1 latent classes are:

P(xI 043) = 111P(x1 = 110 ,131) . (1 P(xi = 110,A))"' (2)(IRT)1=1

Kx y = k) = 111 P(x1 = lty = k)x (1 P(x1 = 11 y = k ))1'. (3) (LC)i.1

Let y=1 indicate the class modeled by IRT, and let y=2,....K indicate the K-1 latent classes.

Further, let P(k) denote the probability of membership in class k. The marginal probability of

observing a response pattern, x, given the model parameters 13 and r, is the summation ofconditional probabilities over all classes including the IRT group and the latent classes, and isexpressed by

13(2C.1/3)= iP(SIP,7 = k)P(Y = k)k=1

(4)

.LP(x 0,fi)f(0)dOP(y = 1) + P(2i I /3, y = k)P(y = k). (5)k=2

Since calculating the integral in the above function and succeeding derivations is cumbersome, amethod based on Dempster, Laird, and Rubin's (1977) EM algorithm is used for actual parameterestimation.

Extending the ModelWhen a test is speeded and scoring is based on the number of correct responses, patternedresponses are often observed at the end of the test. This occurs, for example, when an examineeruns out of time in a multiple-choice type test option A may be selected for the last n items. Yetunless the algorithm of patterned responses is obvious, it is often quite difficult to determinewhether a portion of the overall responses is patterned or not.

P(xii =110,01,k)= (1+ exp(Da1(9 b1)))1" C17+1 (6)

Where m=-1 when i<kj , and m=0 when i=kj, xi is a dichotomous response (0=wrong, 1=right)

on item i by examinee j, 13t is a vector of item parameters (ai,bi), Oi is the ability of examinee j, andkj indicates the last item answered by examinee j under the IRT model. The likelihood of

observing xj given Oi and kj is

19

P(x = 110,131,k) = p.)xo . (1 ci)1-x

i1=1 i=k1+1

(7)

(Notice that for those examinees who did not switch response strategy, the likelihood is identical tothose of the IRT only model given in Equation 6.)

Page 10: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model P-7

Moreover, the marginal probability of observing xj given the item parameters is,

P(xi IB) = P(x110, B, k)f (Olk)dof (k). (8)

Where f(0 I k) represents the conditional probability of 0 given a switch point k, and f(k) is thedistribution of switch points in the population.

The joint likelihood of the parameters given the observed response matrix X=(x lix.2,...xj) from atotal of J examinees is

L(B I X) = I B). (9)

The IRT item parameters can be estimated to maximize the marginalized likelihood function inEquation 9 using an iterative method such as the Newton-Raphson (N-R: Kendall and Stuart,1967) method. The N-R method can be described as Pn1 = Pn -D2-1*D2, where Pn+1 's avector of updated parameters from Pn by an amount designated by the function of D2 (a matrix ofsecond derivatives) and Di ( a vector of first derivatives). However, D2 can be quite large, andthe off diagonals need not be zero. Consequently, standard application of the N-R method wouldpresent too great a computational burden. Bock and Aitkin (1981) advanced the idea of using theEM algorithm (Dempster, Laird, and Rubin, 1977) with probit analysis inner cycles in the area ofIRT parameter estimation by replacing continuous 0 with discrete 0 points, chosen as convenientquadrature points for the integration. Thus, with respect to u as a model parameter for either anitem or a population density, the first derivative of the log-likelihood of Equation 9 can beexpressed as

f dP(x I ,B,k) f (19 (k)

du

dlnL(B I x) :i, \I-,t=11"k=1 1a0

P(x ;I B)(10)

Followed by the empirical Bayes method and an approximation of integration by summation,denoted by q quadrature points and A(OqIk) conditional weights approximating f(kilk), Equation 10for item parameter ui can be rewritten as

alnL v v A(0,1 1 k) dPik(0)L[xij - Pik(0 q)]f(k)Pi(0q 1 xj, k).

dui ta Pik (0q)Qik(eq) p.1

Since xij can be either 1 or 0, the right hand side of Equation 11 can be rewritten as

where

1 di' ik(8 q)ik(e q)N iqk)

Tic(; P ik(0 q)Q k(9q) dui

12)

P(x .I, ,B,k)A(O qlk)(13)

1P(x JIB)

P(x 1,8 ,B,k)A(0,11k)N g (14)

P(xjIB)

Page 11: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model

and

p.8

aPik(0 ) bi)pa(eq)Qii(eq)dai

dPik(eq)DaiPik(Odak(Oq)-

dbi

The matrix of second derivatives can be expressed as

d2 In L

db k

f(k)(eq 102 Niqk Pik ( Oq )Qik (9q)

(15)

(16)

= 2 IfIaAqkPik(Oq)Qik(8,1)(W. k q (17)

.921n L D2a; (19q bi .1v4kPik(0q)Qa(0q). (18)daiabi k q

Once the item parameters are estimated, the estimation of the examinees' proficiency can be carriedout using several existing methods, including the maximum likelihood method (MLE), Bayesmodal estimates (MAP), and the expected a posterior (EAP) method. The MLE of ability isdescribed by Lord (1980), and MAP and EAP are both described by Bock and Aitkin (1980).Here, the EAP for the typical model with estimator 0-j is

OqP(xj 1 q)A(Oq I k)f(k)

of = E(19i I x j)k2d LP(X q)A(eq I k)f(k)

(19)

k q

The variance of the EAP estimator is approximately

1,1(0 )2 P(x I 9q )A(0 I k)f(k)

Var(a) k q

cj I q)A(Oq I k)f(k)k q

(20)

Thus, the posterior distribution of the proficiency and switching population distributions can becalculated as

P(XI 0,B, k) P(XI 0,B, k)P(OIX,B,k)

P(XI 9q,B,k)=

P(XIB,k)

1 1

(21)

Page 12: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model

and

P(X I eq,B,k)

P(Ic I X,B) qP(X I a 1,B ,k)

k q

P.9

(22)

The notion of a prior distribution on the item parameters, proficiency distributions, and switchingpopulation distribution can be used during the maximization phase. The item parameters, forexample, can be viewed as being drawn from a particular distribution; and updating the parameterscould be constrained to meet that particular distribution. Similarly, the proficiency distribution canbe assumed to be normal at each switch point, including at the last test item. In addition, E(k)can be constrained to have a specific functional form in relation to the value of k.

Developing psychometric models that incorporate strategy switching is important for anumber of reasons: (1) to characterize examinees' strategy use when it is salient; (2) to detectextraneous strategy influences in estimated model parameters; and (3) to provide an opportunity toincorporate partial knowledge of latent classes. The extension of the HYBRID model discussedabove attempts to provide a qualitative evaluation of the response strategies used by examinees,and it does this by allowing a closer examination of interaction between the location of the testitems and strategy switching patterns. Standard IRT models do not capture this phenomenon and,as a result, can prompt misleading inferences about the proficiencies of the examinees and theproperties of the test items.

Fit Indices for the Extended ModelAn index of the goodness of fit was needed for this model. For the ideal condition, G2 could beused. Use of the chi-square test, however, on data derived from a sparse response patterndistribution was not warranted. In light of this, we sought convergent evidence from both the chi-square test and Akaike's (1985; 1987) "information coefficient" (the AIC). The AIC is defined as-2*log-likelihood ratio plus 2*df. Although the chi-square distribution may not be exactlyappropriate, the likelihood ratio for nested models was available for examining model fit.Comparing the fit of the two models, such as the 1PL IRT model versus the 2PL IRT model, canbe done by examining the improvement in the log-likelihood ratio, while taking into account thenumber of degrees of freedom expended. However, when competing models are non-nested, thelog-likelihood test is less appropriate. In such instances, the AIC can be used.

In this study, then, the LCM class is associated with a unique group of examineesexhibiting strategy switching. At issue is whether the extended HYBRID model , for whichclasses are suggested via a theory of item performance, can better account for test performanceunder speeded conditions than can more traditional psychometric models. To examine this issue weapplied the extended HYBRID model to simulated data, as well as data derived from two fieldstudies of a standardized, multiple-choice reading comprehension test.

In an effort to demonstrate the efficacy of the HYBRID model for detecting testspeededness by identifying strategy switching, we applied the model to three distinct data sets: (1)simulated data representing the performance of 3,000 examinees on a 70 item test; (2) data from astatewide field test (n=5997 of a 40 item reading comprehension test with a fixed time limits; and(3) data from a study of urban university students (n=752) who took a similar 40 item readingcomprehension test under varying time limits. The extended HYBRID model was used to analyzeboth sets of data, and the simulated and actual strategy switch points were mapped onto thestructure of the test.

Page 13: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.10

METHOD

Simulation StudyThe simulated data set consists of 3000 ability parameters from a standard normal

distribution N(0,1), and 70 pairs of item parameters of the standard 2PL IRT model simulatedfrom independent normal distributions, N(1, 0.4) for b, and N(0.0, 0.8) for s. Based on thesesimulated parameters, a 3000x70 response matrix was generated. Number of simulees switchingto random responses were increased in increments of 50 starting with the 51st item. Thus, among3000 simulees, 2000 did not switch to random responses (kj=70), while 50 simulees switched torandom response at the 51st item (kj=50), 50 more at the 52nd item (kj=51), so on till the last(70th) item (kj=69). Responses of simulees switching to random response were generated basedon ci=0.2.

Three sets of model parameters were estimated for the simulated data: 1) ordinary 2PL IRTparameters (140 parameters were estimated); 2) HYBRID model parameters (140+60+29=229parameters); and 3) ordinary 2PL IRT parameters with random responses treated as not presented(140 parameters). For the HYBRID model parameter estimation,f(elk) was constrained to benormal and considered for the last 30 items, making 60 parameters to be estimated. In addition, 29multinomial parameters were to be estimated to represent f(k), each parameter representingproportion of examinees switching at a particular point. The rationale for this was that if itemparameters were estimated using the third option, the IRT item parameter estimation would dependonly on the portion of the data that correspond to the IRT model, hence the estimation error wouldbe minimized. This would also mean that less data would be available to estimate item parametersfor the last 20 items, thus less accurate estimation would be expected. The IRT item parametersestimates based on the competing models would be computed with the estimates of this thirdoption.

Field Study 1Data for the first field study come from a statewide administration of a 47-item multiple-choice typereading comprehension test administered to nearly 6000 examinees (N=5997) in the Fall of 1990.The reading comprehension test was comprised of 47 multiple choice items based on 10 readingpassages of varying lengths. The test was administered in 50 minutes. Seven items--items 26 to32--were designated as experimental items and were omitted from the analysis. The examinees, allof whom were enrolled in a large public university system, were both ethnically and linguisticallydiverse and included roughly 64% whites, 6% Asian Americans, 15% African Americans, and15% Hispanics. Approximately 11% of the sample were identified as students for whom Englishwas a second language (ESL).

The extended HYBRID model was used in the analysis of these data in an attempt to mapthe switch points on to the structure of the reading comprehension test. Since this particularreading comprehension test contained a number of brief reading passages followed by a shortseries of multiple-choice items, mapping the points where examinees were affected by speedednessof the test "switch" to random response patterns could be achieved. Applying the extendedHYBRID model to these field trial data, then, permitted us to test the utility of model for detectingthe effects, if any, of speededness for different groups of examinees.

Field Study 2The second field study was conducted to further examine the utility of the extended HYBRIDmodel for analyzing test data and detecting speededness for different supgroups of the test takingpopulation-- English as a second language (ESL) and English as a primary language (EPL)examineeswhen tests are administered under different time conditions. In the second field study,a quasi-experimental research design was used, in which students from a large public urbanuniversity (n= 752) took the test in groups of 30 that were randomly assigned to either a 45-minuteor a 60-minute time condition. The sample was ethnically diverse, consisting of 40% African

Page 14: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.11

Americans, 17% Hispanics, 12% Asian Americans, 12% Whites, and 19% unclassified.Moreover, 16% were self-identified as students for whom English was a second language.

Using a strategy similar to Study 1, the extended HYBRID model was used to analyze ofthese data in an attempt to map the switch points on to the structure of the reading comprehensiontest under the two administrative conditions. Again, since this test contained a number of briefreading passages followed by a short series of multiple choice items, mapping the points whereexaminees affected by speeded "switch" to random response patterns could be achieved. Applyingthe extended HYBRID model to these field trial data, then, permitted us to test the utility of themodel for detecting the effects, if any, of the different time limits on both test speededness and theability estimates for different subgroups of examinees.

RESULTS

Simulation StudyWe discuss the results of the simulations from the perspective of the accuracy of the item parameterand ability parameter estimates. As noted earlier, three sets of model parameters were estimated: (1)ordinary 2PL IRT parameters ; (2) HYBRID model parameters; and (3) ordinary 2PL IRTparameters with random responses treated as hot presented. Table 1 shows the three sets ofestimated item parameters and the values of the model fit indices (i.e., -2*log-likelihood and theAIC). The first 39 items in the item sequence are not included since there were virtually identicalamong results from three methods. Summary statistics comparing the estimated item and abilityparameters are presented in Table 2.

INSERT TABLE 1 and 2 HERE

To further examine the amount and direction of the bias in the item parameter estimates ofthe competing models, we plotted the item parameter estimates produced by both the IRT onlymodel and the HYBRID model against the estimated item parameters based on method 3. Theseplots are presented in Figure 1.

INSERT FIGURE 1 HERE

The IRT only model overestimated the values of the location parameters for the last 20 items, whilethe HYBRID model produced much less biased estimates. Similarly, when we compare the plotsfor the item slope estimates, we see that the IRT only model consistently underestimated the valueof the a parameter for the last 20 items. Again, the HYBRID model produced less biased estimatesof the location parameters for the last 20 items. When the accuracy of estimated parameters for thelast 10 items is considered, the IRT only model is clearly inaccurate.

Accuracy of the estimates of the ability parameter is, perhaps, more important when testdata are believed to be influenced by speededness. Careful inspection of the ability estimates bythe IRT only method suggests that this method leads to biased estimates because ability for the first2000 simulated examinees is slightly over estimated, while the O's for the remaining 1000simulated examinees are underestimated. Figure 1 presents a plot of ability estimates by IRT onlymethod against the ability estimates by the method three, distinguishes the overestimations andunderestimations of 0 when standard IRT models are applied to speeded test data. Figure also

included a plot of the HYBRID model 0 estimates against the method three. There is a clearreduction in the bias estimates of ability under the HYBRID model.

Analysis of the mean deviation and the root mean square deviations (RMSD's) of the abilityestimates produced when the methods were used to analyze the data suggests that the HYBRIDmodel produced significant improvements over the IRT only model. Table 2 indicates that theRMSD of the ability estimates is nearly twice as large for the IRT only model for those in sequence

Page 15: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.12

between 2501 through 3000. Severity of estimation bias is greater for those with higher ability,more specifically for simulees with 0 greater than 1.

The estimated proportion of simulees who switched to random responding strategy,f(k)'sare presented in Table 1, with the clear demarcation of the 51st item where random respondingstarted. Slight overestimation of f(k)'s before 50th item (about 1% ) and near the 70th item (about3%) were found. The cumulative distribution of switched population is presented in Figure 2a. Itshows that the estimate closely follow the real distribution indicated by the solid line.

INSERT FIGURE 2a HERE

The competing models were also evaluated in terms of their ability to identify lack ofspeededness of the test when appropriate. This was done by using the original IRT only databefore random responses replaced part of the data. The two methods of the IRT only and theHYBRID model produced nearly identical results in estimated parameters and model fit, and thecumulative distribution of switched proportion only amounted to 0.4% at the last item. Thecumulative distribution of switched population is presented in Figure 2b. Comparison of twocumulative distributions of switched simulees presented in Figures 2a and 2b makes it easy toidentify which data set contains the speeded simulees. In addition, the HYBRID model correctlyidentified the data as not speeded in this simulated data set, and estiamated IRT parameters remainunbiased.

INSERT FIGURE 2b HERE

Field StudyWhen analyzing the field study data for speededness, we relied on the traditional approach ofobserving the performance of the various subgroups of examinees on items appearing three-fourthsof the way through the test. Since items 26-32 of the ,est booklet were omitted from our analyses,the item sequence was re-labeled 1 through 40. Item 30, therefore, represents the point at whichthree-fourths of the test has been completed. Table 3 presents the proportions of the varioussubgroups, across items, classified as belonging to the "switch" group.

lNSERT TABLE 3 HERE

At the point where 75% of the test is completed, 21% of Blacks, 18% of Hispanics, and17% of Asian were identified as having switched to a random response strategy. These proportionsare dramatic when contrasted with the fact that only 8% of the White examinees apparentlyswitched response strategies at this juncture. Similarly, 18% of the ESL and 11% of the EPLsubgroups switched to a random response strategy at this point in the test.

As expected, the increase in speededness continues as we extend our analysis to the point atwhich 87.5% of the test was completed, i.e., item 35 in Table 3. At this point, 29% of the Blackshave switched to a random response strategy, compared to 12% of Whites. Unexpectedly, wefound that Asians were affected by speededness of the test. The previous analysis using onlyomitted responses did not indicate any evidence of speededness among Asians. In fact these datainclude similar proportions of omitted responses for Whites and Asians. Moreover, nearly 25% ofthe ESL group switched to a random responding when attempting the items linked to the next tolast reading passage. Figure 3 presents the increase in the cumulative proportions of the threeminority groups compared to White examinees.

INSERT FIGURE 3 HERE

Field Study 2In the second field study, we set out to examine the effects of differing time limits on testspeededness. The analysis focused on the cumulative proportions of EPL and ESL examinees

Page 16: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.13

who switched to a random response strategy in the 45-minute and 60-minute test conditions. Table4 shows the cumulative proportions of the switching groups over the last twenty test items.

INSERT TABLE 4 HERE

Looking particularly at the distributions after item 36, the last item associated with the penultimatereading passage, we see, as expected, that both testing time and linguistic competence seem toaffect the switching strategy. In general, those examinees who had 60 minutes to complete the testhad a somewhat lower rate of switching to a random response strategy, 19% for the 60 minutegroup versus 25% for the shorter, 45 minute time condition. The difference is greater among EPLstudents, 26% vs. 16%. Contrary to our expectation, the shortened testing time seemed not toaffect strategy switching for the ESL examinees. Across groups, however, the organizaton andstructure of the reading test is captured well by the cumulative proportions in each of the strategyswitching categories presented in Table 2. We see, for example, that there are two points wherethe cumulative proportions show marked changes, at the 34th and 37th items. These two itemscorrespond well to the structure of the test itself, since both are the first items in the testletsassociated with the test's last two reading passages. Figure 4 presents the increase in thecumulative proportions of the ESL and non-ESL populations affected by speededness.

INSERT FIGURE 4 HERE

CONCLUSION

The extended HYBRID model appears to be a promising method for estimating the effects of testlength and testing time on test speededness. The analyses of the simulated data, where the item andability parameters were known, suggest that the model was robust to both a "switching" and an"omitting" speededness strategy. The two field studies revealed much the same thing. Theextended HYBRlD model tended to characterize examinees' strategy use when it was salient.Perhaps more importantly, it detected the extraneous influences of strategy switching in latterportions of a test on the estimated model parameters, both item parameters and ability parameters.

When we used this model to map the salient strategy switches onto the structure ofmultiple-choice reading comprehension tests in our field studies, we found that strategy switchingfollowed patterns closely related to the testlet structure of the tests. Examinees tended to switchtheir response strategies more abruptly between items that are structurally linked to test passagesnear to the end of the test as expected by classical models of test speededness.

Moreover, the model pointed up the potential differences in test speededness for membersof different ethnic and linguistic groups. These differences, however, may be more salient on testsof reading comprehension where surface features of language play a prominent role in the testcontent. Further research is needed to shed light on the important issue of how test speedednessaffects the scores of various subgroups on tests that vary in content and purpose. This researchfocused on tests of reading comprehension, and the model worked relatively well. It remains to beseen whether this model-based approach to detecting test speededness will work as well on tests ofmathematical ability or vocabulary where the comprehension demands would be expected to affectperformance less.

Page 17: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.14

REFERENCESAitkin, M. (1989). Direct likelihood inference. Unpublished manuscript. Isreal: Tel Aviv

University.

Akaike, H. (1985). Prediction and entropy. In A.C. Atkinson & S.E. Fienberg (Eds.) Acelebration of statistics (pp. 1-24). NY: Springer-Verlag.

Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317-332.

Bejar, 1.1. (1985). Test speededness under number-right scoring: An analysis of the Test ofEnglish as a Foreign Language. ETS Research Report (RR-85-1I ). Princeton, NJ:Educational Testing Service.

Bock, R.D., & Aitkin, M (1981). Marginal maximum likelihood estimation of item parameters:An application of an EM algorithm. Psychometrika, 46, 443-459.

Cronbach, L.J., & Warrington, W.G. (1951). Time-limit tests: Estimating their reliability anddegree of speeding. Psychometrika, 16(2), 167-188.

Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete datavia the EM algorithm. Journal of the Royal Statistical Society (Series B), 39,1-38.

Donlon, T.F. (1980). An annotated bibliography of studies of test speededness. GRE BoardResearch Report (GREB 76-9R). Princeton, NJ: Educational Testing Service.

Gulliksen, H. (1987). Them, of mental tests. Hillsdale, NJ:. Erlbaum Associates.

Gitomer, H. & Yamamoto, K. (1991). Performance modeling that integrates latent trait and classtheory. ETS Research Report (RR-91-1). Princeton, NJ: Educational Testing Service.

Haertel, E.H. (1984a). An application of latent class models to assessment data. AppliedPsychological Measurement, 8, 333-346.

Haertel, E.H. (1984b). Detection of a skill dichotomy using standardized achievement test items.Journal of Educational Measurement, 21, 59-72.

Haertel, E.H. (1991, in press). Using restricted latent class models to map the skill structure ofachievement items. Journal of Educational Measurement.

Hambleton, R., & Swaminathan, H. (1985). Item response theory: Principles and applications.Boston, MA: Klumwer-Nijhoff Publishing.

Harwell, M.R., Baker, F.B., & Zwarts, M. (1988). Item parameter estimation via marginalmaximum likelihood and an EM algorithm: A didactic. Journal of Educational Statistics,13 (3), 234-271.

Hulin, C.L., Drasgow, F. & Parsons, C.K. (1983). Item response theory: Application topsychological measurement. IL: Dow Jones-Irwin.

Kendall, M.G., & Stuart, A. (1967). The Advanced Theory of Statistics (Vol.2). NY: Hafner.

Kyllonen, P.C., Lohman, D.F., & Snow, R.E. (1984). Effects of aptitudes, strategy training,and task facets on spatial task performance. Journal of Educational Psychology, 76(1),130-145.

Page 18: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.15

Kyllonen, P.C., Lohman, D.F., & Woltz, D.J. (1984). Componential modeling of alternativestrategies for performing spatial tasks. Journal of Educational Psychology, 76(4), 1325-1345.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale,NJ: Erlbaum Associates.

Lord, F.M. & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA:Addison-Wesley.

Mislevy, R. J. (1983). Item response models for grouped data. Journal of Educational Statistics,8, 271-288.

Mislevy, R.J. (1991, in press). Foundations of a new test theory. In N. Frederiksen, R.J.Mislevy, & Bejar (Eds.), Test theory for a new generation of tests. Hillsdale, NJ:Erlbaum.

Mislevy, R.J. and Verhelst, N. (1990). Modeling item responses when different subjects employdifferent solution strategies. Psychometrika, 55(2), 195-215.

Mislevy, R.J., Yamamoto, K, & Anacker, S. (1991, in press). Toward a test theory for assessingstudent understanding. In R.A. Lesh (Ed.), Assessing higher-level understanding inmiddle-school mathematics. Hillsdale, NJ: Erlbaum.

Nunnally, J.C. (1978). Psychometric theory. NY: McGraw-Hill, Inc.

Rindler, S.E. (1979). Pitfalls in assessing test speededness. Journal of EducationalMeasurement, 16(4), 261-270.

Schmitt, A.P., Dorans, N.J., Crone, C.R, & Maneckshana, B.T. (1991). Differentialspeededness and item omit patterns on the SAT. ETS Research Report (RR-91-50).Princeton, NJ: Educational Testing Service.

Secolsky, C. (1989). Accounting for random responding at the end of the test in assessingspeededness on the Test fo English as a Foreign Language. ETS Research Report (RR-89-11). Princeton, NJ: Educational Testing Service.

Snow, R.E., and Peterson, P. (1985) Cognitive analyses of tests: Implications for redesign. InS. Embretson (Ed.), Test design: Developments in psychology and psychometrics. NY:Academic Press.

Stafford, R.E. (1971). The speededness quotient: A new descriptive statistic for tests. Journalof Educational Measurement, 8(4), 275-278.

Swineford, F. (1956). Technical manual for users of test analyses. ETS Statistical Report (SR56-42). Princeton, NJ: Educational Testing Service.

Swineford, F. (1974). The test analysis manual. ETS Statistical Report (SR-74-06). Princeton,NJ: Educational Testing Service.

Wise, L.L. (1986). Latent trait models for partially speeded tests. Paper presented at the annualmeeting of the American Educational Research Association, San Francisco, CA.Report(RR-89-41). Princeton, NJ: Educational Testing Service.

BEST COPY AVAILABLE

Page 19: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.16

Yamamoto, K. (1987). A model that combines IRT and latent class models. [UnpublishedDoctoral Dissertation] Champaign-Urbana, IL: University of Illiniois.

Yamamoto, K. (1989). Hybrid model of IRT and latent class models. Research Report.Princeton, NJ: Educational Testing Service.

Yamamoto, K. (1990). HYBILm: A computer program to estimate the HYBRID model.Princeton, NJ: Educational Testing Service.

Yamamoto, K. & Gitomer, D. (1993). Application of a HYBRID model to a test of cognitive skillrepresentation. In N. Frederiksen, I. Bejar, & R. Mislevy (Eds.) Test Theory for a NewGeneration of Tests.

Page 20: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.17

List of Tables

TABLE 1. Estimated item parameters in the simulated data set.

TABLE 2. Fit of the model and accuracy of estimated parameters by the HYBRID modeland the 2PL lItT model.

TABLE 3. Cumulative proportion of "switching" response patterns by examinee subgroup.

TABLE 4. The cumulative proportion of EPL and ESL examinees in the strategy switchinggroups for each time condition.

Page 21: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p . 1 8

List of Figures

FIGURE 1. Comparisons of estimated IRT parameters for the IRT only and the HYBRIDmodels.

FIGURE 2a. Estimated and true cumulative proportions of switched to random responsepopulation of a simulated speeded data.

FIGURE 2b. Estimated and true cumulative proportions of switched to random responsepopulation of a simulated not speeded data.

FIGURE 3. Estimated cumulative proportion of switched to random response population byethnicity of a reading comprehension test.

FIGURE 4. Estimated cumulative proportion of switched to random response population forESL and EPL examinees for the 60 minutes time limit condition.

Page 22: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model

TABLE 1. Estimated Item Parameters in the Simulated Data Set

p.19

1

IRT only2

HYBRID model3

IRT only withnot-presented

Item a b a b f(k) a b

41 1.15 -2.78 1.11 -2.89 0.0 1.17 -2.7242 1.39 -1.59 1.38 -1.65 0.0 1.42 -1.5743 0.68 0.27 0.68 0.22 0.1 0.69 0.2744 0.71 1.59 0.74 1.51 0.0 0.74 1.5445 0.93 0.23 0.94 0.18 0.1 0.95 0.2346 0.50 1.55 0.51 1.48 0.1 0.51 1.51

47 1.22 -0.52 1.22 -0.56 0.1 1.24 -0.5148 0.95 -0.43 0.95 -0.49 0.1 0.97 -0.4349 0.73 1.06 0.75 1.00 0.3 0.76 1.0350 0.91 -0.15 0.92 -0.20 0.3 0.93 -0.1451 1.14 0.96 1.21 0.89 0.8 1.23 0.9352 0.61 -0.89 0.65 -1.00 1.9 0.66 -0.9153 0.72 2.49 1.01 2.08 1.8 0.99 2.1554 0.92 1.26 1.08 1.11 1.5 1.08 1.1755 0.62 -0.45 0.72 -0.63 1.2 0.72 -0.5656 0.92 0.18 1.08 0.04 1.1 1.10 0.0857 0.54 0.88 0.61 0.68 1.7 0.61 0.7258 0.92 1.14 1.22 0.93 1.6 1.25 0.99.59 0.54 3.18 1.15 2.22 1.6 1.15 2.2660 0.78 1.56 1.10 1.28, 2.2 1.09 1.3361 0.23 -2.62 0.43 -2.73 1.6 0.44 -2.5962 0.69 1.27 0.86 1.01 1.7 0.88 1.0563 0.68

-0.411.14 0.96 0.84 1.7 0.93 0.88

64 0.14 0.57 -0.41 1.9 0.58 -0.3365 0.50 -0.91 1.14 -1.27 1.8 1.16 -1.1866 0.34 3.71 0.73 2.26 2.0 0.68 2.4367 0.72 0.17 1.47 -0.25 2.0 1.46 -0.1968 0.56 2.12 1.33 1.44 2.3 1.31 1.4769 0.24 -2.06 1.01 -2.31 2.1 1.09 -2.0170 0.33 -0.92 0.81 -1.59 1.0 0.90 -1.41

Fit of the model-2*1og-1ike1ihoodlNo. of Parameters

AIC

189,731140

190,011

185,348229

185,806

not Comparable

1 The -nog-likelihood for the IRT model on the omit data is not comparable due to the fact that 300*10 responseswere never included in calculating the likelihood; hence, it was not reported here.

4

Page 23: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.20

TABLE 2. Fit of the model and accuracy of estimated parameters2 by the HYBRID.model andthe 2PL IRT model

RMSD of item parameter estimates against option (3)

Parameters(1)

IRT only(2)

HYBRIDMean Deviation 1 RMSD Mean Deviation 1 RMSD

Items 1-70Slope b .02 .22 .02 .03

Location s .09 .24 -.06 .08

Items 1-50Slope b .02 .03 .02 .02

Location s .00 .03 -.06 .06

Items 51-60Slope b .21 .25 .00 .01

Location s .22 .33 -.05 .06

Items 61-70Slope b .48 .51 .01 .05

Location s .39 .53 -.11 .14

Ability (simulees)1-2000 .03 .06 -.00 .05

2001-2500 -.03 ..08 -.01 .072501-3000 -.10 .21 -.00 .09

2501-3000, e<.0 .02 .08 -.02 .06

2501-3000, .0<0<1 -.02 .07 -.02 .05

2501-3000, 1<0<2 -.34 .25 .04,

.10

2501-3000, 2<0 -1.0 .61-

.22 .19

2 Deviation was calculated using the following formula; for the slope parameter, deviation = 1- estimate (byirk. xis 1 or 3)/ estimate (by method 3), and for the location and ability parameters, deviation = estimate(bymethnds 1 or 2)-estimate(by method 3). The RMSD was calculated using the above values.

23

Page 24: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.21

TABLE 3. Cumulative Proportion of "Switching " Response Patterns by Examinee Subgroup

Item Race/Ethnicity Language TotalAsian Black Hispanic White ESL EPL348 920 514 3819 405 5359 5997

20 .00 .00 .00 .00 .00 .00 .0021 .06 .06 .07 .02 .06 .03 .0422 .07 .07 .08 .02 .07 .04 .0423 .07 .07 .08 .02 .07 .04 .0424 .07 .07 .08 .02 .07 .04 .0425 .09 .10 .10 .03 .09 .05 ..0626 .10 .13 .11 .04 .11 .07 .0727 .10 .13 .11 .05 .12 .07 .0728 .12 .15 .12 .05 .13 .08 .0829 .14 .17 .15 .06 .15 .09 .1030 .14 .18 .15 .06 .15 .09 .1031 .17 .21 .18 .08 .18 .11 .1232 .20 .25 .22 .10 .21 .14 .1433 .20 .25 .22 .10 .21 .14 .1434 .21 .26 .23 .11 .22 .15 .1535 .24 .29 .26 .12 .25 .17 .1836 .24 .29 .26 .12 .25 .17 .1837 .26 .31 .28 .14 .27 .19 .2038 .28 .34 .31 .16 .29 .21 .2239 .28 .34 .31 .16 .30 .21 .2240 .34 .38 .36 .27 .36 .30 .31

Page 25: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.22

TABLE 4. The Cumulative Proportion of EPL and ESL Examinees in the Strategy SwitchingGroups for Each Time Condition.

Item 45 minutes 60 minutes# All EPL ESL All EPL ESL

20 .00 .00 .00 .00 .00 .0021 .00 .01 .02 .00 .00 .0122 .00 .01 .03 .00 .01 .0123 .00 .01 .03 .02 .01 .0224 .03 .02 .06 .04 .02 .0525 .06 .05 .09 .06 .04 .0926 .06 .06 .09 .07 .05 .1027 .08 .08 .11 .08 .06 .1128 .08 .08 .11 .08 .06 .1129 .09 .09 .11 .08 .06 .1130 .09 .09 .12 .09 .06 .1131 .09 .09 .12 .(1" .06 .1132 .10 .09 .12 WY .07 .1233 .10 .10 .12 .09 .07 .12-34 .16 .16 .21 .14 .11 .1935 .17 .17 .22 .14 .11 .1936 .17 .17 .22 .14 .11 .1937 .25 .26 .25 .19 .16 .2338 .29 .30 .27 .22 .20 .2639 .29 .31 .27 .22 .20 .2740 .30 .32 .27 .23 .21 .27

Page 26: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model

Figure 1: Comparison of Estimated IRT parameters forthe IRT only and the HYBRID models

Slope Parameter

3

2

1

0 1 2Method 3 (IRT only)

Location Parameter

4

2

3

03 + Items 1-50

ti Items 51-70

2

4 4 2 0 2

Method 3 (IRT only)

Ability Parameter

4

2

2

p.23

1 2

Method 3 (IRT only)

3

44 4 2 0 2

+ Items 1-50Method 3 (IRT only)0 Items 51-70

4

4

2

0

2

4

4 4Simulee 1 -2000Simulee 2001-3000

dr

4 2 0 2

Method 3 (IRT only)4

11

It

26

2 0 2

Method 3 (IRT only)4

Page 27: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.24

FIGURE 2a. Estimated and true cumulative proportions of switched to random responsepopulation of a simulated speeded data.

X : Estimated proportionTrue proportion

FIGURE 2b. Estimated and true cumulative proportions of switched to random responsepopulation of a simulated not speeded data.

(%)40

30

ea.10

o

_

-

_

IMIII

=MI

_

,

o 10

x: Estimated proportion

20 30

Item

40 50 60 70

Page 28: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.25

FIGURE 3. Estimated cumulative proportion of switched to random response population byethnicity of a reading comprehension test.

(%)40

30

p0_ _ _'020 22 24 26 28 30 32 34 36 38

Item SequenceAsianBlackHispanicWhite

ro

40

Page 29: DOCUMENT RESUME ED 395 036 AUTHOR TITLE a ...DOCUMENT RESUME ED 395 036 TM 025 056 AUTHOR Yamamoto, Kentaro; Everson, Howard T. TITLE Modeling the Mixture of IRT and Pattern Responses

Hybrid model p.26

FIGURE 4. Estimated cumulative proportion of switched to random response population forESL and EPL examinees for the 60 minutes time limit condition.

(%)25

20

20 22

-,.I I I 1 I I I I

24 26 28 30 32

Item SequenceEnglish is primary languageEnglish is secondary language

34 36 38