Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s...
-
Upload
dylan-booth -
Category
Documents
-
view
218 -
download
3
Transcript of Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s...
Modern Test Theory
Item Response Theory (IRT)
Limitations of classical test theory
• An examinee’s ability is defined in terms of a particular test
• The difficulty of a test item is defined in terms of a particular group of test-takers
• In short, “examinee characteristics and test item characteristics cannot be separated: each can be interpreted only in the context of the other” (Hambleton, et.al, 1991, p. 3)
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications, Inc.
Joe and the 8-item testJoe’s Ability
Score: 8
Score: 0
Item 1 Item 8
Very Easy Test
Item 1 Item 8
Very Hard Test
Narrow Hard Test
Item 1
Item 8
Score: 3
Adapted from: Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.
Non-linearity of scoresJoe’s Ability
Tom’s Ability
Item 1 Item 8
Joe’s Ability
Tom’s Ability
Joe’s Ability
Tom’s Ability
Item 1 Item 8
Item 1 Item 8
Score = 0 Score = 8
Score = 8 Score = 8
Score = 4 Score = 4
Latent trait and performance
Latent Variable (True Score)
Form 1 scoreForm 2 scoreForm 3 score
Item 1 ResponseItem 2 ResponseItem 3 Response
Error 1
Error 2
Error 3
Classical Test Theory
Item Response Theory
Latent Variable
1010
10
Embretson, S. E. (1999). Issues in the measurement of cognitive abilities. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement (pp. 1-15). Mahwah, NJ: Lawrence Erlbaum Associates.
Item Response Theory (IRT)
• The performance of an examinee on a test item can be predicted (explained) by latent traits
• As a persons level of the underlying trait increases, the probability of a correct response to an item increases
• This relationship (person and item) can be visualized by an Item Information Curve (ICC)
(Hambleton, et.al., 1991)
Understanding Item Characteristic Curves
Imagine a continuum of vocabulary knowledge
Sleepy Somnolent Oscitant
Thorndike, R. M. (1999). IRT and intelligence testing: Past, present, and future. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement (pp. 17-36). Mahway, NJ: Lawrence Erlbaum Associates.
Understanding ICC (2)
(Thorndike, 1999, p. 20)
Item Difficulty
Item Discrimination
3-Parameter Model
Vocabulary ICC revisited
Basic IRT concept
PROB(Item Passed) =FUNCTION[(TraitLevel)-(ItemDifficulty)]
Assumptions of IRT
• Unidimensionality – only one ability is measured by a set of items on a test
• Local independence – examinee’s responses to any two items are statistically independent
• 1-parameter model – no guessing, item discrimination is the same for all items
• 2-parameter model – no guessing
Advantages of IRT
• Sample-free item calibration
• Test-free person measurement
• Item banking facility
• Computer delivery of tests
• Test tailoring facility
• Score reporting facility
• Item bias detection
Henning, G. (1987). A guide to language testing: development, evaluation, research. Boston: Heinle & Heinle.
Linking items across test forms
• As long as there are some common items (linking items), person ability estimates can be made from performance on different items
Items common to Test A and B
(Henning, 1987, p. 133)
Score reporting facility
(McNamara, 1996, p.201)
Test tailoring facility
An untailored standardized test gives
maximum information near its mean
Imagine that a university required a score above 67 to be admitted and above 82 to be exempt from language classes
A tailored test can be “loaded” with items that provide maximum information at the cut-points
Computerized testing
• Computer-delivered tests– Tests which use a computer rather than pencil and
paper for test content delivery– Items can take advantage of computer’s multimedia
capabilities
• Computer-adaptive tests– Test is created “on the fly” to match examinee’s ability
level
• Web-based tests– Delivered over the World Wide Web– Test-takers can access from anywhere
Adaptive testing
Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB and CAT. In W. A. Sands & B. K. Waters & J. R. McBride (Eds.), Computerized adaptive testing (pp. 3-10). Washington: American Psychological Association.
CAT advantages
• Increased efficiency– More able examinees are not bored with easy
questions– Less able examinees are not frustrated with
incredibly difficult questions
• Immediate feedback is possible• Examinees can work at own pace• Audiovisual material can be incorporated• Potential for “on demand” testing
CAT Challenges
• Technical sophistication required to develop and administer CAT
• Need for large item pool
• Overexposure of best items
• Ensuring consistency of measures and content across candidates
• Public perception of computer-based scores– Completely infallible
– Completely bogus