11 adaptive testing-irt

85
Adaptive Testing (Item Respond Theory) Timothy K. Shih

Transcript of 11 adaptive testing-irt

Page 1: 11 adaptive testing-irt

Adaptive Testing (Item Respond Theory)

Timothy K. Shih

Page 2: 11 adaptive testing-irt

Item Response Theory

1. The Item Characteristic Curve

2. Item Characteristic Curve Models

3. Estimating Item Parameters

4. The Test Characteristic Curve

5. Estimating an Examinee’s Ability

6. The Information Function

7. Test Calibration

8. Specifying the Characteristics of a TestSource: FRANK B. BAKER, University of Wisconsin

Page 3: 11 adaptive testing-irt

Item Characteristic Curve

• What is Item Characteristic Curve– Certain probability that an examinee with the

ability will give a correct answer to the item– This probability is denoted by P

1.The Item Characteristic Curve

Page 4: 11 adaptive testing-irt

Item Characteristic Curve under one-parameter model

1.The Item Characteristic CurveHigher ability higher probability

Page 5: 11 adaptive testing-irt

3 Item Characteristic Curve with same discrimination

1.The Item Characteristic Curve

Higher difficulty lower probability

Page 6: 11 adaptive testing-irt

3 Item Characteristic Curve with same difficulty

1.The Item Characteristic Curve

Higher discrimination lower probability

Page 7: 11 adaptive testing-irt

Logistic Function

• The Logistic Function

– e is the constant 2.718– b is the difficulty

• typical value is between -3 to 3– a is the discrimination

• typical value is between -2.80 to 2.80– L = a(Θ-b) is the logistic deviate – Θ is an ability level

b-a-e1

1

e1

1P

L

2. Item Characteristic Curve Models

Page 8: 11 adaptive testing-irt

Logistic Function(two-parameter model)

• Example:– b = 1.0 (difficulty); a = 0.5 (discrimination)– Illustrative computation with ability level: -3 (Θ=-3)

1.L = a(Θ-b) = 0.5*(-3.0-1.0) = -2.0

2.EXP(-L) = EXP(2.0) = 2.7182.0 = 7.389

3.1+ EXP(-L) = 1 + 7.389 = 8.389

4.P(Θ) = 1/(1+EXP(-L)) = 1/8.389 = 0.12

2. Item Characteristic Curve Models

Page 9: 11 adaptive testing-irt

Logistic Function(two-parameter model)

Ability Logit EXP(-L) 1+EXP(-L) P

-3 -2 7.389 8.389 0.12

-2 -1.5 4.482 5.482 0.18

-1 -1 2.718 3.718 0.27

0 -0.5 1.649 2.649 0.38

1 0 1 2 0.5

2 0.5 0.607 1.607 0.26

3 1 0.368 1.368 0.73

2. Item Characteristic Curve Models

Page 10: 11 adaptive testing-irt

Logistic Function(two-parameter model)

2. Item Characteristic Curve Modelsb = 1.0 (difficulty); a = 0.5 (discrimination)

Page 11: 11 adaptive testing-irt

Logistic Function(one-parameter model)

• One Parameter Logistic Model (Rasch)– The discrimination parameter of the two-

parameter logistic model is fixed at a value of a = 1.0 for all items; only the difficulty parameter can take on different values

bee

1b-a- 1

1

1

1P

2. Item Characteristic Curve Models

b = difficultya = discrimination

Page 12: 11 adaptive testing-irt

Logistic Function(one-parameter model)

• Example:– b = 1.0 (difficulty)– Illustrative computation with ability level: -3 (Θ=-3)

1.L = Θ-1.0 = -3.0-1.0 = -4.0

2.EXP(-L) = EXP(4.0) = 2.7184.0 = 54.598

3.1+ EXP(-L) = 1 + 54.598 = 55.598

4.P(Θ) = 1/(1+EXP(-L)) = 1/55.598 = 0.02

2. Item Characteristic Curve Models

Page 13: 11 adaptive testing-irt

Logistic Function(one-parameter model)

Ability Logit EXP(-L) 1+EXP(-L) P

-3 -4 54.598 55.598 0.02

-2 -3 20.086 21.086 0.05

-1 -2 7.389 8.389 0.12

0 -1 2.718 3.718 0.27

1 0 1 2 0.5

2 1 0.368 1.368 0.73

3 2 0.135 1.135 0.88

2. Item Characteristic Curve Models

Page 14: 11 adaptive testing-irt

Logistic Function(one-parameter model)

2. Item Characteristic Curve Modelsa = 1.0 (fixed) b = 1.0

Page 15: 11 adaptive testing-irt

Logistic Function(three-parameter model)

• Three Parameter Model– One of the facts of life in testing is that examinees will

get items correct by guessing. Thus, the probability of correct response includes a small component that is due to guessing.

– b is difficulty– a is discrimination– c is guessing

» Theoretical value is between 0 to 1.0» But c>0.35 are not considered acceptable

» Hence c is between 0 to 0.35– Θ is an ability level

b-a-1

11P

ecc

2. Item Characteristic Curve Models

That is why multiple choice questions have 4 answers

Page 16: 11 adaptive testing-irt

Logistic Function(three-parameter model)

• Example:– b = 1.5 (difficulty); a = 1.3 (discrimination); c = 0.2 (guessing)– Illustrative computation with ability level: -3 (Θ=-3)

1.L = a(Θ-b) = 1.3*(-3.0-1.5) = -5.852.EXP(-L) = EXP(5.85) = 2.7185.85 = 347.2343.1+ EXP(-L) = 1 + 347.234 = 348.2344.1/(1+EXP(-L)) = 1/ 348.234 = 0.00295.P(Θ) = c + (1 - c) * 0.0029 = 0.2 + (1 - 0.2) * 0.0029

= 0.2 + 0.8 * 0.0029= 0.2 + 0.0023= 0.2023

2. Item Characteristic Curve Models

Page 17: 11 adaptive testing-irt

Logistic Function(three-parameter model)

Ability Logit EXP(-L) 1+EXP(-L) P

-3 -5.85 347.234 348.234 0.2

-2 -4.55 94.632 95.632 0.21

-1 -3.25 25.79 26.79 0.23

0 -1.95 7.029 8.029 0.3

1 -0.65 1.916 2.916 0.47

2 0.65 0.522 1.522 0.73

3 1.95 0.142 1.142 0.9

2. Item Characteristic Curve Models

Page 18: 11 adaptive testing-irt

Logistic Function(three-parameter model)

2. Item Characteristic Curve Models

– b = 1.5 (difficulty); a = 1.3 (discrimination); c = 0.2 (guessing)

Page 19: 11 adaptive testing-irt

Negative Discrimination

• While most test items will discriminate in a positive manner– the probability of correct response increases

as the ability level increases• Some items have negative

discrimination. In such items, the probability of correct response decreases as the ability level increases from low to high

2. Item Characteristic Curve Models

Page 20: 11 adaptive testing-irt

Negative Discrimination

2. Item Characteristic Curve Models

Page 21: 11 adaptive testing-irt

Negative DiscriminationItems with negative discrimination occur in

two ways.• the incorrect response to a two-choice item will always

have a negative discrimination parameter if the correct response has a positive value.

• sometimes the correct response to an item will yield a negative discrimination index.

• This tells you that something is wrong with the item:– Either it is poorly written or there is some

misinformation prevalent among the high-ability students.

• For most of the item response theory topics of interest, the value of the discrimination parameter will be positive.

2. Item Characteristic Curve Models

Page 22: 11 adaptive testing-irt

Discussion

Incorrect

Correct

2. Item Characteristic Curve Models

Page 23: 11 adaptive testing-irt

Discussion

1. The two item characteristic curves have the same value for the difficulty parameter (b = 1.0)

2. And the discrimination parameters have the same absolute value. However, they have opposite signs, with the correct response being positive and the incorrect response being negative.

2. Item Characteristic Curve Models

Page 24: 11 adaptive testing-irt

Observed Proportion

• M examinees responds to the N items in the test– These examinees will be divided into, J groups

along the scale so that all the examinees within a given group have the same ability level θj• And there will be mj examinees within group j, where j

= 1, 2, 3. . . . J.– Within a particular ability score group, rj

examinees answer the given item correctly.• at an ability level of θj, the observed proportion of

correct response is p(θj ) = rj/mj

• p(θj ) is an estimation of the probability of correct response at ability level θj

3. Estimating Item Parameters

Page 25: 11 adaptive testing-irt

Observed Proportion

• If the observed proportions of correct response in each ability group are plotted, the result will look like this

3. Estimating Item Parameters

Page 26: 11 adaptive testing-irt

Find the ICC best fits the observed proportions of correct response

1. Select a model for the curve to be fitted– two-parameter model will be employed here

2. Initial values for the item parameters– b = 0.0, a = 1.0

3. Using these estimates, the value of P(θj) is computed at each ability level via the equation of the two-parameter model.

4. The agreement of the observed value of p(θj) and computed value P(θj) is determined across all ability groups.

5. Adjustments to the estimated item parameters are found that result in better agreement between the ICC defined by the estimated values of the parameters and the observed proportions of correct response.

6. This process is continued until the adjustments get so small that little improvement in the agreement is possible.

7. At this point, the estimation procedure is terminated and the current values of b and a are the item parameter estimates.

3. Estimating Item Parameters

Page 27: 11 adaptive testing-irt

The Chi-square goodness-of-fit index

– J is the number of ability groups– Θj is the ability level of group j

– mj is the number of examinees having ability Θj

– p(Θj) is the observed proportion of correct response for group j

– P(Θj) is the probability of correct response for group j computed from the ICC model using the parameter estimates

J

j jj

jjj QP

Ppmx

1

2

2

3. Estimating Item Parameters

Page 28: 11 adaptive testing-irt

The Chi-square goodness-of-fit index

• If the value of the “Chi-square goodness-of-fit index” is greater than a criterion value– the item characteristic curve specified by the

values of the item parameter estimates does not fit the data• the wrong item characteristic curve model may

have been employed.• the values of the observed proportions of correct

response are so widely scattered that a good fit, regardless of model, cannot be obtained.

3. Estimating Item Parameters

Page 29: 11 adaptive testing-irt

The Group Invariance of Item Parameters

• Assume two groups of examinees are drawn from the same population of examinees

• The first group has a range of ability scores from -3 to -1, with a mean of -2; The second group has a range of ability scores from +1 to +3 with a mean of +2

• the observed proportion of correct response to a given item is computed from the item response data for every ability level within each of the two groups.

3. Estimating Item Parameters

Page 30: 11 adaptive testing-irt

The Group Invariance of Item Parameters

For the first group, the proportions of correct response are plotted as this

The maximum likelihood procedure is then used to fit an item characteristiccurve to the data and numerical values of the item parameter estimates, b(1) =-.39 and a(1) = 1.27, were obtained.

3. Estimating Item Parameters

Page 31: 11 adaptive testing-irt

The Group Invariance of Item Parameters

For the second group, the proportions of correct response are plotted like this

The maximum likelihood procedure is then used to fit an item characteristiccurve to the data and numerical values of the item parameter estimates, b(1) =-.39 and a(1) = 1.27, were obtained.

3. Estimating Item Parameters

Page 32: 11 adaptive testing-irt

The Group Invariance of Item Parameters

3. Estimating Item Parameters

• b(1) = b(2) and a(1) = a(2)• The item parameters are group invariant.• The values of the item parameters are a property of the item, not

of the group that responded to the item.• The value of the classical item difficulty index is not group

invariant.

Page 33: 11 adaptive testing-irt

True score

N

1ijij θPTS

TSj is the true score for examinees with ability level θj.i denotes an itemPi(θj ) depends upon the particular ICC model employed (i.e., computed from the ICC model)

4. The Test Characteristic Curve

Page 34: 11 adaptive testing-irt

True score

• Example– with two-parameter model; at an ability level of 1.0.

– Item 1:P1 (1.0) = 1/(1 + exp(-0.5(1.0 - (-1.0)))) = 0.73

– Item2:P2 (1.0) = 1/(1 + exp(-1.2 (1.0- (0.75)))) = 0.57

– Item3:P3 (1.0) = 1/(1 + exp(-0.8 (1.0 -(0)))) = 0.69

– Item4:P4 (1.0) = 1/(1 + exp(-1.0 (1.0 - (0.5)))) = 0.62

4. The Test Characteristic Curve

Page 35: 11 adaptive testing-irt

True score

4. The Test Characteristic Curve

Page 36: 11 adaptive testing-irt

True score

2.61

0.62 + 0.69 + 0.57 + .730

4

10.10.1

i

iPTS

4. The Test Characteristic Curve

Page 37: 11 adaptive testing-irt

Test Characteristic Curve

• Test Characteristic Curve (TCC)– The vertical axis would be the true scores and

would range from zero to the number of items in the test

– The horizontal axis would be the ability scale

4. The Test Characteristic Curve

Page 38: 11 adaptive testing-irt

Test Characteristic Curve

• The primary role of the TCC in IRT is to provide a means of transforming ability scores to true scores

• Given your ability, provides your “True Score”

4. The Test Characteristic Curve

Page 39: 11 adaptive testing-irt

Primary purpose for administering a test to an examinee

• Under IRT, the primary purpose for administering a test to an examinee is to locate that person on the ability scale. If such an ability measure can be obtained for each person taking the test, two goals can be achieved.– The examinee can be evaluated in terms of how

much underlying ability he or she possesses.– Comparisons among examinees can be made

for purposes of assigning grades, awarding scholarships, etc.

5. Estimating an Examinee’s Ability

Page 40: 11 adaptive testing-irt

Estimating an Examinee’s Ability

• Ability Estimation Procedures

N

i

SiSii

N

i

Siii

SS

QPa

Pua

1

^^2

1

^

^

1

^

Θ^s is the estimated ability of the examinee within iteration s

ai is the discrimination parameter of item i, i = 1, 2, . . . .Nui is the response made by the examinee to item i:

ui = 1 for a correct responseui = 0 for an incorrect response

Pi(θ^s ) is the probability of correct response to item i, under the

given ICC model, at ability level θ^ within iteration s.Qi (θ^

s ) = 1 - Pi(θ^s ) is the probability of incorrect response to item

i, under the given ICC model, at ability level θ^ within iteration s.

5. Estimating an Examinee’s Ability

Page 41: 11 adaptive testing-irt

Estimating an Examinee’s Ability

• Example– 3 items test:

• Item_1: b=-1; a= 1.0• Item_2: b=0; a=1.2• Item_3: b=1; a=0.8

– Under ICC two-parameter model

– The examinee’s item responses were:

• Item_1: 1• Item_2: 0• Item_3: 1

item u P(1) Q=(1-P)

a(u-P) a*a(PQ)

1 1 0.88 0.12 0.119 0.105

2 0 0.77 0.23 -0.922 0.255

3 1 0.5 0.5 0.4 0.160

sum -0.403 0.52

The examinee’s ability is set to θ^s = 1.0

ΔΘ^s = -0.403/0.520 = -0.773,

Θ^s+1 = 1.0 - 0.773 = 0.227

1’st iteration:

5. Estimating an Examinee’s Ability

Page 42: 11 adaptive testing-irt

Estimating an Examinee’s Ability

item u P(0.227)Q=(1-P)

a(u-P) a*a(PQ)

1 1 0.77 0.23 0.227 0.175

2 0 0.57 0.43 -0.681 0.353

3 1 0.35 0.65 0.520 0.146

sum 0.066 0.674

2’nd iteration:

ΔΘ^s = 0.066/0.674 = 0.097,

Θ^s+1 = 0.227 + 0.097 = 0.324

item u P(0.324)Q=(1-P)

a(u-P) a*a(PQ)

1 1 0.79 0.21 0.2102 0.1660

2 0 0.60 0.40 -0.7152 0.3467

3 1 0.37 0.63 0.5056 0.1488

sum 0.0006 0.6615

3’rd iteration:

ΔΘ^s = 0.0006/0.6615 = 0.0009,

Θ^s+1 = 0.324 + 0.0009 = 0.3249

5. Estimating an Examinee’s Ability

The iteration is terminated because the value of the adjustment (0.0009) is very small.The examinee’s estimated ability is 0.3249

Page 43: 11 adaptive testing-irt

Standard error

• The standard error is a measure of the variability of the values of θ^ around the examinee’s unknown parameter value θ.

5. Estimating an Examinee’s Ability

N

ii QPa

SE

1

^^2

^ 1

Page 44: 11 adaptive testing-irt

Standard error

5. Estimating an Examinee’s Ability

item u P(0.324)Q=(1-P)

a(u-P) a*a(PQ)

1 1 0.79 0.21 0.2102 0.1660

2 0 0.60 0.40 -0.7152 0.3467

3 1 0.37 0.63 0.5056 0.1488

sum 0.0006 0.6615

23.16615.0

1^

SE

Page 45: 11 adaptive testing-irt

Estimating an Examinee’s Ability

• The examinee’s ability (0.3249) was not estimated very precisely because the standard error is very large (1.23).– This is primarily due to the fact that only

three items were used here and one would not expect a very good estimate.

5. Estimating an Examinee’s Ability

Page 46: 11 adaptive testing-irt

Estimating an Examinee’s Ability

• Two cases for the estimation procedure fails to yield an ability estimate– When an examinee answers none of the items

correctly• the corresponding ability estimate is negative infinity.

– When an examinee answers all the items in the test correctly• the corresponding ability estimate is positive infinity.

• The computer programs used to estimate ability must protect themselves against these two conditions

5. Estimating an Examinee’s Ability

Page 47: 11 adaptive testing-irt

Item Invariance of an Examinee’s Ability Estimate

• The examinee’s ability is invariant with respect to the items used to determine it– All the items measure the same underlying

latent trait– The values of all the item parameters are in

a common metric

5. Estimating an Examinee’s Ability

Page 48: 11 adaptive testing-irt

Item Invariance of an Examinee’s Ability Estimate

• A set of 10 items having an average difficulty of -2 were administered to this examinee– the item responses could be used to estimate the examinee’s

ability, yielding θ^1 for this test.

• Another set of 10 items having an average difficulty of +1 were also administered to this examinee– these item responses could be used to estimate the examinee’s

ability, yielding θ^2 for this second test.

• Under the item invariance principle– θ^

1 = θ^2

– i.e., the two sets of items should yield the same ability estimate, within sampling variation, for the examinee

5. Estimating an Examinee’s Ability

Page 49: 11 adaptive testing-irt

The Information Function

• What’s “Information”– having information => knowing something

about a particular object or topic– In statistics & psychometrics

• The reciprocal of the precision with which a parameter could be estimated

6. The Information Function

Page 50: 11 adaptive testing-irt

The Information Function

• Measure of precision is the variance of the estimators, denote by σ2

• The amount of information, denoted by I

6. The Information Function

2

1

I

Page 51: 11 adaptive testing-irt

The Information Function

• If the amount of information is large, it means that an examinee whose true ability is at that level can be estimated with precision;– i.e., all the estimates will be reasonably close to

the true value• If the amount of information is small, it

means that the ability cannot be estimated with precision and the estimates will be widely scattered about the true ability

6. The Information Function

Page 52: 11 adaptive testing-irt

The Information FunctionThe amount of information has a maximum at an ability level of -1.0 and is about 3 for the ability range of -2<= θ <= 0.Within this range, ability is estimated with some precision.Outside this range, the amount of information decreases rapidly, and the corresponding ability levels are not estimated very well.

6. The Information Function

• The information function does not depend upon the distribution of examinees over the ability scale.

• In a general purpose test, the ideal information function would be a horizontal line at some large value of I and all ability levels would be estimated with the same precision.

• Unfortunately, such an information function is hard to achieve.

• Different ability levels are estimated with differing degrees of precision.

Page 53: 11 adaptive testing-irt

Item Information Function

6. The Information Function

1. The amount of information, based upon a single item, can be computed at any ability level and is denoted by Ii (θ ), where i indexes the item.

2. Because only a single item is involved, the amount of information at any point on the ability scale is going to be rather small.

3. The amount of item information decreases as the ability level departs from the item difficulty and approaches zero at the extremes of the ability scale.

Page 54: 11 adaptive testing-irt

Definition of Item Information

• Two-Parameter Item Characteristic Curve Model

iiii QPaI 2

ai is the discrimination parameter for item I

Pi(θ) = 1 / (1 + EXP(-ai(θ - bi)))

Qi(θ) =1 - Pi(θ)

θ is the ability level of interest

6. The Information Function

Page 55: 11 adaptive testing-irt

Definition of Item Information

θ L EXP(-L) Pi(θ) Qi(θ) Pi(θ)Qi(θ) a2 Ii(θ)

-3 -6 403.43 0.00 1.00 0.00 2.25 0.00

-2 -4.5 90.02 0.01 0.99 0.01 2.25 0.02

-1 -3.0 20.09 0.05 0.95 0.05 2.25 0.11

0 -1.5 4.48 0.18 0.82 0.15 2.25 0.34

1 0.0 1.00 0.50 0.50 0.25 2.25 0.56

2 1.5 0.22 0.82 0.18 0.15 2.25 0.34

3 3.0 0.05 0.95 0.05 0.05 2.25 0.11

Calculation of item information under a two-parameter model

b = 1.0, a = 1.5

6. The Information Function

Page 56: 11 adaptive testing-irt

Definition of Item Information

6. The Information Function

Page 57: 11 adaptive testing-irt

Definition of Item Information

• One-Parameter Item Characteristic Curve Model

iii QPI

Pi(θ) = 1 / (1 + EXP(-(θ - bi)))

Qi(θ) =1 - Pi(θ)

θ is the ability level of interest

6. The Information Function

Page 58: 11 adaptive testing-irt

Definition of Item Information

θ L EXP(-L) Pi(θ) Qi(θ) Pi(θ)Qi(θ) a2 Ii(θ)

-3 -4.0 45.60 0.02 0.98 0.02 1 0.02

-2 -3.0 20.09 0.05 0.95 0.05 1 0.05

-1 -2.0 7.39 0.12 0.88 0.11 1 0.11

0 -1.0 2.72 0.27 0.73 0.20 1 0.20

1 0.0 1.00 0.50 0.50 0.25 1 0.25

2 1.0 0.37 0.73 0.27 0.20 1 0.20

3 2.0 0.14 0.88 0.12 0.11 1 0.11

Calculation of item information under a one-parameter model

b = 1.0

6. The Information Function

Page 59: 11 adaptive testing-irt

Definition of Item Information

6. The Information Function

Page 60: 11 adaptive testing-irt

Definition of Item Information

• Three-Parameter Item Characteristic Curve Model

2

22

1 c

cP

P

QaI i

i

ii

Pi(θ) = c + (1 - c) (1/(1 + EXP (-L)))L = a (θ - b)

Qi(θ) =1 - Pi(θ)

θ is the ability level of interest

6. The Information Function

Page 61: 11 adaptive testing-irt

Definition of Item Information

• Example– b = 1.0;

a = 1.5;c = 0.2

– ability level of θ = 0.0.

1. L = a (θ - b) = 1.5 (0 - 1) = -1.5EXP (-L) = EXP (1.5) = 4.4821/(1 + EXP (-L)) = 1/(1 + 4.482) = 0.182Pi (θ ) = c + (1 - c) (1/(1 + EXP (-L)))

= 0.2 + 0.8 (0.182)= 0.346

2. Qi (θ ) = 1 - 0.346 = 0.654

3. Qi (θ )/P1 (θ ) = 0.654/0.346 = 1.890

4. (Pi (θ ) - c)2 = (0.346 - 0.2)2 = (0.146)2

= 0.021

5. (1 - c)2 = (1 - 0.2)2 = (0.8)2 = 0.64

6. a2 = (1.5)2 = 2.25

7. Ii (θ ) = (2.25) (1.890) (0.021)/(0.64) = 0.142

2

22

1 c

cP

P

QaI i

i

ii

6. The Information Function

Page 62: 11 adaptive testing-irt

Definition of Item Information

θ L Pi(θ) Qi(θ) Pi(θ)Qi(θ) (Pi(θ)-c) Ii(θ)

-3 -6.0 0.20 0.80 3.950 0.000 0.000

-2 -4.5 0.21 0.79 3.785 0.000 0.001

-1 -3.0 0.24 0.76 3.202 0.001 0.016

0 -1.5 0.35 0.65 1.890 0.021 0.142

1 0.0 0.60 0.40 0.667 0.160 0.375

2 1.5 0.85 0.15 0.171 0.428 0.257

3 3.0 0.96 0.04 0.040 0.481 0.082

Calculation of item information under a three-parameter model

b = 1.0; a = 1.5; c = 0.2

6. The Information Function

Page 63: 11 adaptive testing-irt

Definition of Item Information

6. The Information Function

Page 64: 11 adaptive testing-irt

Test Information Function

N

iiII

1

I (θ) is the amount of test information at an ability level of θ

Ii(θ) is the amount of information for item i at ability level θ

N is the number of items in the test

6. The Information Function

Page 65: 11 adaptive testing-irt

Computing a Test Information Function

• Example– 5-item– Under two-parameter model

item b a

1 -1.0 2.0

2 -0.5 1.5

3 -0.0 1.5

4 0.5 1.5

5 1.0 2.0

6. The Information Function

Page 66: 11 adaptive testing-irt

Computing a Test Information Function

θ 1 2 3 4 5 Test Information

-3 0.071 0.051 0.024 0.012 0.001 0.159

-2 0.420 0.194 0.102 0.051 0.010 0.777

-1 1.000 0.490 0.336 0.194 0.071 2.091

0 0.420 0.490 0.563 0.490 0.420 2.383

1 0.071 0.194 0.336 0.490 1.000 2.091

2 0.010 0.051 0.102 0.194 0.420 0.777

3 0.001 0.012 0.024 0.051 0.071 0.159

6. The Information Function

Page 67: 11 adaptive testing-irt

The Test Calibration Process

• The Birnbaum paradigm is an iterative procedure employing two stages of maximum likelihood estimation.– Stage 1: the parameters of the N items in the

test are estimated,– Stage 2: the ability parameters of the M

examinees are estimated.• The two stages are performed iteratively

until a stable set of parameter estimates is obtained• And the test has been calibrated and an ability scale metric

defined

7. Test Calibration

Page 68: 11 adaptive testing-irt

The Test Calibration Process

• Stage one:– The estimated ability of each examinee is treated as

if it is expressed in the true metric of the latent trait. – The parameters of each item in the test are

estimated via the maximum likelihood procedure discussed in Estimating Item Parameters.

– This is done one item at a time, because an underlying assumption is that the items are independent of each other.

– The result is a set of values for the estimates of the parameters of the items in the test.

7. Test Calibration

Page 69: 11 adaptive testing-irt

The Test Calibration Process

• Stage two:– The ability of each examinee is estimated

using the maximum likelihood procedure presented in Estimating an Examinee’s Ability

– It is assumed that the ability of each examinee is independent of all other examinees. Hence, the ability estimates are obtained one examinee at a time

7. Test Calibration

Page 70: 11 adaptive testing-irt

The Test Calibration Process

• The two-stage process is repeated until some suitable convergence criterion is met

• The overall effect is that the parameters of the N test items and the ability levels of the M examinees have been estimated simultaneously, even though they were done one at a time

7. Test Calibration

Page 71: 11 adaptive testing-irt

Test Calibration Under the one-parameter Model

1 2 3 4 5 6 7 8 9 10 RS

01 0 0 1 0 0 0 0 1 0 0 2

02 1 0 1 0 0 0 0 0 0 0 2

03 1 1 1 0 1 0 1 0 0 0 5

04 1 1 1 0 1 0 0 0 0 0 4

05 0 0 0 0 1 0 0 0 0 0 1

06 1 1 0 1 0 0 0 0 0 0 3

07 1 0 0 0 0 1 1 1 0 0 4

08 1 0 0 0 1 1 0 0 1 0 4

09 1 0 1 0 0 1 0 0 1 0 4

10 1 0 0 0 1 0 0 0 0 1 3

11 1 1 1 1 1 1 1 1 1 0 9

12 1 1 1 1 1 1 1 1 1 0 9

13 1 1 1 0 1 0 1 0 0 1 6

14 1 1 1 1 1 1 1 1 1 0 9

15 1 1 0 1 1 1 1 1 1 1 9

16 1 1 1 1 1 1 1 1 1 1 101 for correct and 0 for incorrect.

if an item is answered correctly by all of the examinees or by none of the examinees, its item difficulty parameter cannot be estimated.

examinee

items

Test calibration under the Rasch model: all examinees having the same number of items correct will obtain the same estimated ability.

7. Test Calibration

Page 72: 11 adaptive testing-irt

Test Calibration Under the one-parameter Model

1 2 3 4 5 6 7 8 9 10 ROW Total

1 1 1

2 1 2 1 4

3 2 1 1 1 1 6

4 4 1 2 2 3 1 1 2 16

5 1 1 1 1 1 5

6 1 1 1 1 1 1 6

9 4 4 2 4 4 4 4 4 4 2 36

COL Total

13 8 8 5 10 7 7 6 7 3 74

7. Test Calibration

items

score

Page 73: 11 adaptive testing-irt

Test Calibration Under the one-parameter Model

item difficulty

1 -2.37

2 -0.27

3 -0.27

4 0.98

5 -1

6 0.11

7 0.11

8 0.52

9 0.11

10 2.06

7. Test Calibration

Examinee Ability obtained Raw Score

1 -1.50 2

2 -1.50 2

3 +0.02 5

4 -0.42 4

5 -2.37 1

6 -0.91 3

7 -0.42 4

8 -0.42 4

9 -0.42 4

10 -0.91 3

11 +2.33 9

12 +2.33 9

13 +0.46 6

14 +2.33 9

15 +2.33 9

16 ***** 10

Page 74: 11 adaptive testing-irt

Test Calibration Under the one-parameter Model

• Under the Rasch model, the value of the discrimination parameter is fixed at 1 for all of the items in the test. This aspect of the Rasch model is appealing to practitioners because they intuitively feel that examinees obtaining the same raw test score should receive the same ability estimate.

7. Test Calibration

Page 75: 11 adaptive testing-irt

Test Calibration Under the 2/3-parameter Model

• When the two- and three-parameter item characteristic curve models are used, an examinee’s ability estimate depends upon the particular pattern of item responses rather than the raw score.

7. Test Calibration

Page 76: 11 adaptive testing-irt

Test Calibration Under the 2/3-parameter Model

• Under these models, examinees with the same item response pattern will obtain the same ability estimate. Thus, examinees with the same raw score could obtain different ability estimates if they answered different items correctly.

7. Test Calibration

Page 77: 11 adaptive testing-irt

The Framework of IRT

• In order to obtain the many advantages of IRT, tests should be designed, constructed, analyzed, and interpreted within the framework of the theory.

• This chapter provides the experiences in the technical aspects of test construction within the framework of IRT.

8. Specifying the Characteristics of a Test

Page 78: 11 adaptive testing-irt

Item Banking

• Test construction process is usually based upon having a collection of items from which to select those to be included in a particular test. (Item pools)

• Items are selected from such pools on the basis of both their content and their technical characteristics,i.e., their item parameter values

• Under IRT, a well-defined set of procedures is used to establish and maintain such item pools.item banking, has been given to these procedures

8. Specifying the Characteristics of a Test

Page 79: 11 adaptive testing-irt

Item Banking

• Basic Goal– have an item pool in which the values of the

item parameters are expressed in a known ability-scale metric.

8. Specifying the Characteristics of a Test

Page 80: 11 adaptive testing-irt

Developing a Test From a Pre-calibrated Item Pool

• ICC model is selected, the examinees’ item response data are analyzed via the Birnbaum paradigm, and the test is calibrated.

• The ability scale resulting from this calibration is considered to be the baseline metric of the item pool.

• From a test construction point of view, we now have a set of items whose item parameter values are known; in technical terms, a “pre-calibrated item pool” exists.

8. Specifying the Characteristics of a Test

Page 81: 11 adaptive testing-irt

Developing a Test From a Pre-calibrated Item Pool

• The advantage of having a pre-calibrated item pool is that the parameter values of the items included in the test can be used to compute the test characteristic curve and the test information function before the test is administered.

8. Specifying the Characteristics of a Test

Page 82: 11 adaptive testing-irt

Some Typical Testing Goals

• Screening tests– Tests used for screening purposes have the

capability to distinguish rather sharply between examinees whose abilities are just below a given ability level and those who are at or above that level.

– Such tests are used to assign scholarships and to assign students to specific instructional programs such as remediation or advanced placement.

8. Specifying the Characteristics of a Test

Page 83: 11 adaptive testing-irt

Some Typical Testing Goals

• Broad-ranged tests– These tests are used to measure ability over

a wide range of underlying ability scale. The primary purpose is to be able to make a statement about an examinee’s ability and to make comparisons among examinees.

– Tests measuring reading or mathematics are typically broad-range tests.

8. Specifying the Characteristics of a Test

Page 84: 11 adaptive testing-irt

Some Typical Testing Goals

• Peaked tests– Such tests are designed to measure ability

quite well in a region of the ability scale where most of the examinees’ abilities will be located, and less well outside this region.

– When one deliberately creates a peaked test, it is to measure ability well in a range of ability that is wider than that of a screening test, but not as wide as that of a broad-range test.

8. Specifying the Characteristics of a Test

Page 85: 11 adaptive testing-irt

Summary

• Classical Test Theory• IRT

– Item Characteristic Curve– Test Characteristic Curve– Estimating an Examinee’s Ability– Test Calibration– Item Banking

• Automatic Test Generation