A Practitioner’s Introduction to Equating
description
Transcript of A Practitioner’s Introduction to Equating
![Page 1: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/1.jpg)
A Practitioner’s Introduction to Equating
Joseph Ryan, Arizona State UniversityFrank Brockmann, Center Point Assessment Solutions
with Primers on Classical Test Theory (CTT) and Item Response Theory (IRT)
Assessment, Research and Evaluation Colloquium Neag School of Education, University of Connecticut October 22, 2010
Workshop:
![Page 2: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/2.jpg)
Acknowledgments• Council of Chief State School Officers (CCSSO)
• Technical Issues in Large Scale Assessment (TILSA) and Subcommittee on Equating, part of the State Collaborative on Assessment and Student Standards (SCASS)
• Doug Rindone and Duncan MacQuarrie, CCSSO TILSA Co-Advisers Phoebe Winter, Consultant Michael Muenks, TILSA Equating Subcommittee Chair
• Technical Special Interest Group of National Assessment of Educational Progress (NAEP) coordinators
• Hariharan Swaminathan, University of Connecticut
• Special thanks to Michael Kolen, University of Iowa
![Page 3: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/3.jpg)
Workshop Topics
The workshop covers the following topics:
1. Overview - Key concepts of assessment, linking, and equating
2. Measurement Primer – Classical and IRT theories
3. Equating Basics
4. The Mechanics of Equating
5. Equating Issues
![Page 4: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/4.jpg)
1. Overview
Key Concepts inAssessment, Linking, Equating
![Page 5: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/5.jpg)
Assessment, Linking, and Equating
Validity is…… an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment.
(Messick, 1989, p. 13)
Validity is the essential motivation for developing and evaluating appropriate linking and equating procedures.
![Page 6: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/6.jpg)
Assessment, Linking, and EquatingThe Linking Continuum
Equating
Scores are matched or paired
Scores do NOT have thesame meaning or interpretation
Scores arematched or paired
Scores have the SAMEmeaning or interpretation
2007NRT
Gr 5 Test
2007NRT
Gr 5 Test
2007SBA
Gr 5 Test
2007SBA
Gr 5 Test
2006SBA
Gr 5 test
2006SBA
Gr 5 test
2007SBA
Gr 5 Test
2007SBA
Gr 5 Test
(strongest link)(weaker kinds
of linking)
![Page 7: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/7.jpg)
Linking and Equating
• Equating• Scale aligning• Predicting/Projecting
Holland in Dorans, Pommerich and Holland (2007)
![Page 8: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/8.jpg)
Misconceptions About Equating
Equating is…
• …a threat to measuring gains.
• …a tool for universal applications.
• …a repair shop.
• …a semantic misappropriation.
MYTH
WISHFUL THOUGHT
MISCONCEPTION
MISUNDERSTANDING
![Page 9: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/9.jpg)
2. Measurement Primer
Classical Test Theory (CTT) Item Response Theory (IRT)
![Page 10: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/10.jpg)
Classical Test Theory
The Basic Model
Oobserved
score
Ttrue score
Eerror
(with some MAJOR assumptions)
= +• Reliability is derived from the ratio of error score to true score • Key item features include:
Difficulty Discrimination Distractor Analysis
![Page 11: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/11.jpg)
Reliability reflects the consistency of students' scores• Over time, test retest• Over forms, alternate form• Within forms, internal consistency
Validity reflects the degree to which scores assess what the test is designed to measure in terms of
• Content• Criterion related measures• Construct
Classical Test Theory
![Page 12: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/12.jpg)
Item Response Theory (IRT)
The Concept
An approach to item and test analysis that estimates students’ probable responses to test questions, based on
• the ability of the students• one or more characteristics of the test items
![Page 13: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/13.jpg)
Item Response Theory (IRT)
• IRT is now used in most large-scale assessment programs
• IRT models apply to items that use dichotomous scoring with right (1) or wrong (0)
answers and polytomous scoring with items scored with ordered
categories (1, 2, 3, 4) common with written essays and open-ended constructed response items
• IRT is used in addition to procedures from CTT
INFO
![Page 14: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/14.jpg)
Item Response Theory (IRT)
IRT Models
All IRT models reflect the ability of students. In addition, the most common basic IRT models include:
The 1-parameter model – (aka Rasch model) models item difficulty
The 2-parameter model – models item difficulty and discrimination
The 3-parameter model – models item difficulty, discrimination and pseudo guessing
![Page 15: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/15.jpg)
Item Response Theory (IRT)
IRT Assumptions
Item Response Theory requires major assumptions:
• Unidimensionality
• Item Independence
• Data-Model Fit
• Fixed but arbitrary scale origin
![Page 16: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/16.jpg)
Item Response Theory (IRT)
A Simple Conceptualization
0 1 2 3-3 -2 -1
Easier Items
Harder Items
Higher Ability
Students
Lower Ability
Students
Alex
Blake
Chris
Devon
Item 1 Item 2 Item 3
-1.5 +2.25
![Page 17: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/17.jpg)
Item Response Theory (IRT)Probability of a Student Answer
![Page 18: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/18.jpg)
Item Response Theory (IRT)Item Characteristic Curve for Item 2
![Page 19: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/19.jpg)
Item Response Theory (IRT)
![Page 20: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/20.jpg)
IRT and Flexibility
IRT provides considerable flexibility in terms of
• constructing alternate tests forms• administering tests well matched or adapted to
students’ ability level• building sets of connected tests that span a
wide range (perhaps two or more grades)• inserting or embedding new items into existing
test forms for field testing purposes so new items can be placed on the measurement scale
INFO
![Page 21: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/21.jpg)
3. Equating Basics
Basic Terms (Sets 1, 2, and 3)
Equating Designs (a, b, c)
Item Banking (a, b, c, d)
![Page 22: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/22.jpg)
Basic Terms Set 1
Column A Column B __Anchor Items A. Sleepwear__Appended Items B. Nautically themed__Embedded Items apparel
C. Vestigial organs D. EMIP learning
module
USEFUL TERMS
![Page 23: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/23.jpg)
Basic Terms Set 2
Pre-equating -
Post equating -
USEFUL TERMS
For each term, make some notes on your handout:
![Page 24: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/24.jpg)
Basic Terms Set 3
Horizontal Equating –
Vertical Equating (Vertical Scaling) –
Form-to-Form (Chained) Equating – Item Banking –
USEFUL TERMS
For each term, make some notes on your handout:
![Page 25: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/25.jpg)
Equating Designs
a. Random Equivalent Groupsb. Single Groupc. Anchor Items
![Page 26: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/26.jpg)
Equating Designs
a. Random Equivalent Groups
Testing Population
RandomSample Group 2
RandomSample Group 1
Test Form A
Test Form B
![Page 27: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/27.jpg)
Equating Designs
b. Single Group
Testing PopulationRandom
Sample Group
Test Form A (first)
Test Form B (second)
CAUTION
The potential for order effects is significant--equating designs that use this data collection method should always be counterbalanced!
![Page 28: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/28.jpg)
Equating Designs
b. Single Group with Counterbalance
Testing Population orTested Sample
RandomSubgroup 2
RandomSubgroup 1
Test Form A (first)
Test Form B (second)
Test Form B (first)
Test Form A (second)
![Page 29: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/29.jpg)
Equating Designs
c. Anchor Item Design
common items
Testing Sample 1
Test Form A Items
Anchor Items
Test Form B Items
Anchor Items
Testing Sample 2
not always at the end
![Page 30: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/30.jpg)
Equating Designs
c. Anchor Item Set
Anchor Selection
GRADE 5Mathematics Test Form A
(50 test i tems)
Content Standard 110 items
Content Standard 210 items
Content Standard 310 items
Content Standard 410 items
Content Standard 510 items
Content Standard 12 items
Content Standard 22 items
Content Standard 32 items
Content Standard 42 items
Content Standard 52 items
GRADE 5Mathematics Anchors Set
(10 items)PROPER
![Page 31: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/31.jpg)
c. Anchor Item Designs
• Internal/Embedded
• Internal/Appended
• External
USEFUL TERMS
Equating Designs
![Page 32: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/32.jpg)
Equating DesignsInternal Embedded Anchor Items
Item 1 (B)Item 2 (Anchor)Item 3 (B)Item 4 (B)Item 5 (Anchor)Item 6 (B)Item 7 (B)Item 8 (B)Item 9 (Anchor)Item 10 (B)Item 11 (B)Item 12 (Anchor)Item 13 (B)Item 14 (B)Item 15 (Anchor)
Item 1 (B)Item 2 (Anchor)Item 3 (B)Item 4 (B)Item 5 (Anchor)Item 6 (B)Item 7 (B)Item 8 (B)Item 9 (Anchor)Item 10 (B)Item 11 (B)Item 12 (Anchor)Item 13 (B)Item 14 (B)Item 15 (Anchor)
Item 1 (A)Item 2 (Anchor)Item 3 (A)Item 4 (A)Item 5 (Anchor)Item 6 (A)Item 7 (A)Item 8 (A)Item 9 (Anchor)Item 10 (A)Item 11 (A)Item 12 (Anchor)Item 13 (A)Item 14 (A)Item 15 (Anchor)
Item 1 (A)Item 2 (Anchor)Item 3 (A)Item 4 (A)Item 5 (Anchor)Item 6 (A)Item 7 (A)Item 8 (A)Item 9 (Anchor)Item 10 (A)Item 11 (A)Item 12 (Anchor)Item 13 (A)Item 14 (A)Item 15 (Anchor)
Test Form A Test Form B
Item 2 (Anchor)Item 5 (Anchor)Item 9 (Anchor)Item 12 (AnchorItem 15 (Anchor)
Embedded, InternalAnchor Items
![Page 33: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/33.jpg)
Equating DesignsInternal Appended Anchor Items
Item 1 (B)Item 2 (B)Item 3 (B)Item 4 (B)Item 5 (B)Item 6 (B)Item 7 (B)Item 8 (B)Item 9 (B)
Item 10 (B)
Item 1 (B)Item 2 (B)Item 3 (B)Item 4 (B)Item 5 (B)Item 6 (B)Item 7 (B)Item 8 (B)Item 9 (B)
Item 10 (B)
Item 1 (A)Item 2 (A)Item 3 (A)Item 4 (A)Item 5 (A)Item 6 (A)Item 7 (A)Item 8 (A)Item 9 (A)
Item 10 (A)
Item 1 (A)Item 2 (A)Item 3 (A)Item 4 (A)Item 5 (A)Item 6 (A)Item 7 (A)Item 8 (A)Item 9 (A)
Item 10 (A)
Test Form A Test Form B
Item 11 (C)Item 12 (C)Item 13 (C)Item 14 (C)Item 55 (C)
Item 11 (C)Item 12 (C)Item 13 (C)Item 14 (C)Item 55 (C)
Item 11 (C)Item 12 (C)Item 13 (C)Item 14 (C)Item 55 (C)
Item 11 (C)Item 12 (C)Item 13 (C)Item 14 (C)Item 55 (C)
Form
-Spe
cific
Item
s
Form
-Spe
cific
Item
s
Anc
hor
Item
s
Anc
hor
Item
s
Appended, InternalAnchor Items
![Page 34: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/34.jpg)
Equating DesignsExternal Anchor Items
Item 1 (B)Item 2 (B)Item 3 (B)Item 4 (B)Item 5 (B)Item 6 (B)Item 7 (B)Item 8 (B)Item 9 (B)
Item 10 (B)
Item 1 (B)Item 2 (B)Item 3 (B)Item 4 (B)Item 5 (B)Item 6 (B)Item 7 (B)Item 8 (B)Item 9 (B)
Item 10 (B)
Item 1 (A)Item 2 (A)Item 3 (A)Item 4 (A)Item 5 (A)Item 6 (A)Item 7 (A)Item 8 (A)Item 9 (A)
Item 10 (A)
Item 1 (A)Item 2 (A)Item 3 (A)Item 4 (A)Item 5 (A)Item 6 (A)Item 7 (A)Item 8 (A)Item 9 (A)
Item 10 (A)
Test Form A Test Form B
Item 1 (C)Item 2 (C)Item 3 (C)Item 4 (C)Item 5 (C)
Item 1 (C)Item 2 (C)Item 3 (C)Item 4 (C)Item 5 (C)
Item 1 (C)Item 2 (C)Item 3 (C)Item 4 (C)Item 5 (C)
Item 1 (C)Item 2 (C)Item 3 (C)Item 4 (C)Item 5 (C)
Par
t 1
Par
t 1
Par
t 2
Par
t 2
Appended, ExternalAnchor Items
![Page 35: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/35.jpg)
Equating Designs
Guidelines for Anchor Items
• Mini-Test
• Similar Location
• No Alterations
• Item Format Representation
RULES of
THUMB
![Page 36: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/36.jpg)
3. Equating Basics
Basic Terms (Sets 1, 2, and 3)
Equating Designs (a, b, c)
Item Banking (a, b, c, d)
![Page 37: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/37.jpg)
Item Banking
a. Basic Conceptsb. Anchor-item Based Field Testc. Matrix Samplingd. Spiraling Forms
![Page 38: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/38.jpg)
• An item bank is a large collection of calibrated and scaled test items representing the full range, depth, and detail of the content standards
• Item Bank development is supported by field testing a large number of items, often with one or more anchor item sets.
• Item banks are designed to provide a pool of items from which equivalent test forms can be built.
• Pre-equated forms are based on a large and stable item bank.
Item Bankinga. Basic Concepts
![Page 39: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/39.jpg)
b. Anchor Item Based Field Test Design
RULE of THUMB
Field test items are most appropriately embedded within, not appended to, the common items.
Item Banking
![Page 40: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/40.jpg)
Item Banking
• Items can be assembled into relatively small blocks (or sets) of items.
• A small number of blocks can be assigned to each test form to reduce test length.
• Blocks may be assigned to multi forms to enhance equating.
• Blocks need not be assigned to multi forms if randomly equivalent groups are used.
c. Matrix Sampling
![Page 41: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/41.jpg)
Item Bankingc. Matrix Sampling
![Page 42: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/42.jpg)
Tests forms can be assigned to individual students, or students grouped in classrooms, schools, districts, or some other units.
1. “Spiraling” at the student level involves assigning different forms to different students within a classroom.
2. “Spiraling” at the classroom level involves assigning different forms to different classrooms within a school.
3. “Spiraling” at the school or district level follows a similar pattern.
Item Bankingd. Spiraling Forms
![Page 43: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/43.jpg)
d. Spiraling Forms
Item Banking
![Page 44: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/44.jpg)
Item Bankingd. Spiraling Forms
Spiraling at the student level is technically desirable:• provides randomly equivalent groups• minimizes classroom effect on IRT estimates
(most IRT procedures assume independent responses)
Spiraling at the student level is logistically problematic:
• exposes all items in one location• requires careful monitoring of test packets and
distribution• requires matching test form to answer key at the
student level
![Page 45: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/45.jpg)
It’s Never Simple!
Linking and equating procedures are employed in the broader context of educational measurement which includes, at least, the following sources of random variation (statistical error variance) or imprecision.
• Content and process representation • Errors of measurement• Sampling errors• Violations of assumptions• Parameter estimation variance• Equating estimation variance
IMPORTANT
CAUTION
![Page 46: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/46.jpg)
4. The Mechanics of Equating
The Linking-Equating Continuum Classical Test Theory (CTT) Approaches Item Response Theory (IRT) Approaches
![Page 47: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/47.jpg)
The Linking-Equating Continuum
Linking is the broadest terms used to refer to a collection of procedures through which performance on one assessment is associated or paired with performance on a second assessment. Equating is the strongest claim made about the relationship between performance on two assessments and asserts that the scores that are equated have the same substantive meaning.
USEFUL TERMS
![Page 48: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/48.jpg)
The Linking-Equating Continuum
different forms of linking
equating (strongest kind of linking)
![Page 49: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/49.jpg)
The Linking-Equating ContinuumFrameworks
![Page 50: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/50.jpg)
The Linking-Equating Continuum
Moderation Equating
Scores do NOT have the same meaning or interpretation
Scores have the SAMEmeaning
or interpretation
Linking Procedures/Approaches
CTT IRTLinear
Equipercentile
Common Item
Pre- and Post-equating
Calibration
Projection
Common PersonPool/Item Bank Development
In 1992, Mislevy described four typologies of linking test forms: moderation, projection, calibration, and equating (Mislevy, 1992, pp. 21-26). In his model, moderation is the weakest form of linking tests, while equating is considered the strongest type. Thus, equating is done to make scores as interchangeable as possible.
![Page 51: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/51.jpg)
The Linking-Equating ContinuumEquating – strongest form of linking, invariant across populations, maintains substantive meaning Calibration – may use equating procedures, not necessarily invariant across populations, and substantive meaning might not be preserved Prediction/Projection – unidirectional statistical procedure for predicting scores or projecting distributions Moderation – weakest form of linking, may be statistical or judgmental (social), based on comparisons of distributions or panel/reviewers decisions.
USEFUL TERMS
![Page 52: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/52.jpg)
CTT Linking-Equating Approaches
a. Mean Methodb. Linear Methodc. Equipercentile Method
![Page 53: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/53.jpg)
CTT Linking-Equating Approachesa. Mean Method
• Adjusts one set of scores based on the difference in the means of two tests
• Assumes a constant difference in the scales across all scores
• Useful for carefully developed and parallel or close-to-parallel forms
• Simple, but strains assumptions of parallel forms
![Page 54: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/54.jpg)
CTT Linking-Equating Approaches
b. Linear Method
• Based on setting standardized deviation scores from two tests equal
• Can be done in raw score scale with simple linear regression
![Page 55: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/55.jpg)
CTT Linking-Equating Approachesb. Linear Method
Linear Equating
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10
Raw Score on Form B
Ra
w S
co
re o
n F
orm
A
![Page 56: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/56.jpg)
CTT Linking-Equating Approaches
c. Equipercentile Method
• Based on scores that correspond to the same percentile rank position from two tests
• Does not assume a linear relationship between the two tests
• Provides for linking scores across the full range of possible test scores
• May require “smoothing” of the distributions, especially with small samples
![Page 57: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/57.jpg)
CTT Linking-Equating Approachesc. Equipercentile Method
![Page 58: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/58.jpg)
a. common itemsb. common people or randomly
equivalent groups treated as being the same people
IRT Linking-Equating Approaches
![Page 59: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/59.jpg)
IRT Linking-Equating Approaches
IRT linking and equating approaches:
• provide flexibility and are applicable to many settings
• provide consistency by employing the IRT model being used for calibration and scaling
• provide indices that reveal departures from what is expected (tests of fit)
![Page 60: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/60.jpg)
IRT Linking-Equating Approaches
a. Common Items
Approaches can be based on:
1. Applying an equating constant
2. Estimating item parameters with fixed or concurrent/simultaneous calibration
3. Applying the Test Characteristic Curve procedure (TCC) of Stocking & Lord, 1983
![Page 61: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/61.jpg)
IRT Linking-Equating Approaches
a. Common Items
Applying an equating constant
• Appropriate when two or more tests have a common set of anchor items and also some items unique to each form
• Requires selecting one form or some other location on the scale as the origin of the scale
![Page 62: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/62.jpg)
IRT Linking-Equating Approachesa. Common Items
1. Applying an Equating Constant
0 1 2 3-3 -2 -1
Easier Items
Harder Items
Higher Abil i ty
Students
Lower Abil i ty
Students
Item A
Item B
Item CTest Form X(20 items) = test item
A B C
0
![Page 63: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/63.jpg)
IRT Linking-Equating Approachesa. Common Items
1. Applying an Equating Constant
0 1 2 3-3 -2 -1
Easier Items
Harder Items
Higher Abil i ty
Students
Lower Abil i ty
Students
Item CTest Form Y(20 items) = test i tem
Item B
A B C
Item A
0
![Page 64: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/64.jpg)
IRT Linking-Equating Approaches
Common Item Approach
Applying an equating constant
0 1 2 3-3 -2 -1
Easier Items
Item C
2(0+2)
3(1+2)
4(2+2)
5(3+2)
-1(-3+2)
0(-2+2)
1
Harder Items
Test Form X (top)vs.
Test Form Y (bottom)A B C
Item B
A B C
Item A
THREE ITEMS IS NEVER -- EVEN UNDER MASSIVE DELUSIONARY INFLUENCES -- ENOUGH ITEMS!
EVEN WITH 15 TO 20 ITEMS -- A MINIMUM -- IT NEVER WORKS THIS SIMPLY.
CAUTION
![Page 65: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/65.jpg)
IRT Linking-Equating Approachesa. Common Items
1. Applying an Equating Constant
0 1 2 3-3 -2 -1
A B C
Item CItem A
Item B
4 5
Easier Items
Harder Items
Test Form X* (adjusted)and
Test Form Y
![Page 66: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/66.jpg)
Form Y Form X (Y-X)
Item A 0.5 -1.5 2
Item B 1.0 -1.0 2
Item C 1.5 -0.5 2
Sum 3.0 -3.0 6
Average 1.0 -1.0 2
Constant = Form Y - Form X = 2
If C= Y – X ; then Y = X + C
a. Common Items1. Determining an Equating Constant
IRT Linking-Equating Approaches
![Page 67: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/67.jpg)
CLOSER LOOK
IRT Linking-Equating Approaches
![Page 68: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/68.jpg)
a. Common Items1. Applying an Equating Constant
• The common items used for equating are the anchor
items
• Generally 15 to 20 items are needed for common item equating
• Not all items designed as anchor items will work effectively
• The anchor items should be in the same location on the tests
• The anchor items should reflect the content, format and difficulty range of the whole test
IRT Linking-Equating Approaches
![Page 69: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/69.jpg)
Test Form
XTest Form
Y1. Designate this as the Base form, which
defines the scale origin.2. Calibrate parameters (difficulty,
discrimination, and guessing) of all items. 3. Treat the item parameters of the anchor
items as fixed.
4. Use parameters of the anchor items from Form X for the same items (anchors) on Form Y.
Anchor Items
1, 5, 7, 10Etc.
5. Calibrate the Form Y items using the fixed parameter values of the anchor items.
6. Treat all other items and their parameters as free to vary
The resultant calibration of Form Y will be on the same scale as Form X ; it is “anchored” through the fixed values of the common items .
IRT Linking-Equating Approachesa. Common Items
2. Fixed Calibration
![Page 70: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/70.jpg)
Consider the following:
• 500 students take Form X
• 500 students take Form Y
• 1,000 take the anchor items
• The data for all students are stacked as shown on the next slide…
Test Form
XTest Form
Y15 Anchor
Items
40 Items 40 Items
IRT Linking-Equating Approachesa. Common Items
2. Concurrent or Simultaneous Calibration
![Page 71: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/71.jpg)
• Data are calibrated on 1,000 students• Students each “take” 65 items• Students are missing data on the form they did not take. • All students respond to the anchor items.
Form X Items (25 items)
Anchor Items(15 Items)
Form Y Items(25 Items)
The 500 students who take Form X will take 40 items
Item Responses to 25 items
Item Responses to 15 items
Missing Data
The 500 students who take Form Y will take 40 items
Missing Data
Item Responses to 15 items
Item Responses to 25 items
500 students 1,000 students 500 students
IRT Linking-Equating Approachesa. Common Items
2. Concurrent or Simultaneous Calibration
![Page 72: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/72.jpg)
• Developed by Stocking and Lord (1983)
• Very flexible and widely used
• Commonly applied with the 2- and 3- parameter IRT models.
IRT Linking-Equating Approachesa. Common Items
3. Test Characteristic Curve (TCC) Procedures
![Page 73: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/73.jpg)
• IRT scales have an arbitrary origin and an arbitrary scale
spacing e.g., size of each unit of measurement.
Origin is selected and fixed Scale spread is expended or reduced
• Item parameter estimates for the same items from two independent calibration will differ due to
Origin and scale differences Characteristics of other items Possibly sampling and estimation error
IRT Linking-Equating Approachesa. Common Items
3. Test Characteristic Curve (TCC) Procedures
![Page 74: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/74.jpg)
• If two scales differ in origin (location) and spread (variability), a linear transformation can be applied to one scale to re-express or transform it to be on the other scale
• The choice of what scale to use is informed by considering the intended use of the items, test forms, or item bank
• The figures on the next slide illustrate the basic idea of the TCC method
IRT Linking-Equating Approaches
3. Test Characteristic Curve (TCC) Procedures
a. Common Items
![Page 75: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/75.jpg)
IRT Linking-Equating Approaches
![Page 76: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/76.jpg)
• Transforms the Item parameter values for the common items
on one test form to be on the same scale as their corresponding parameter values on the other (target) form
• Requires two constants: the parameters are multiplied by one constant and then added to the second constant
• Begins with carefully chosen initial values for the constants
• Refines the constants to minimize the differences in estimated scores based on the transformed test form and the target form
• Never as simple as the theory
IRT Linking-Equating Approachesa. Common Items
3. Test Characteristic Curve (TCC) Procedures
![Page 77: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/77.jpg)
IRT Linking-Equating Approaches
![Page 78: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/78.jpg)
The same students, or two groups sampled to be equivalent on critical relevant characteristics, take Form X and Form Y; the forms do not have any common items
• Example: students’ average ability on Form X is -1.0 (low ability) and their average ability on Form Y is +1.0 (high ability)
• Differences in students’ abilities cannot explain the differences in the performance on Forms X and Y since the same students (common students) take both forms
IRT Linking-Equating Approachesb. Common Persons
or Random Equivalent Groups
QUESTION
How can the same group of students have two different mean abilities?
![Page 79: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/79.jpg)
• The difference in mean performance reflects the difference in the difficulty of the two forms
• The test forms must be different in difficulty since the students’ abilities were held constant (same students)
• On Form X, students look less able with a mean of -1; on Form Y students look more able with a mean of +1
• Form X is harder than Form Y in that it makes students look less able; the test forms differ by +2 units
• The difference of +2 is used as a linking constant to adjust the tests onto a single scale in the same way as a linking constant derived from common items
IRT Linking-Equating Approachesb. Common Persons
or Random Equivalent Groups
![Page 80: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/80.jpg)
5. Equating Issues Substantive Concerns Technical Issues Quality Control Issues
• Test design, development & administration• Scoring, analysis and equating
Technical Documentation Accountability Compliance Item Formats and Platforms
![Page 81: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/81.jpg)
Common Equating Concerns/Issues
Substantive Concerns
• Validity is the central issue
• Validity evidence must document fairness, absence of bias, and equal access for all students
• Carefully planned and rigorously monitored item and test form development are the most essential ingredients for successful equating
• Equating goes bad through items and test forms, not in the psychometrics
![Page 82: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/82.jpg)
Common Equating Concerns/Issues
Technical Issues
• Examining and testing IRT
assumptions
• Conducting and documenting IRT tests of fit Data to model fit Linking/equating fit
• Item Parameter Drift
![Page 83: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/83.jpg)
Common Equating Concerns/IssuesQuality Control Issues
Test design, development, and administration problems
• Changes in content standards or test specifications• Item contexts that differ between forms and affect
performance on anchor items• Anchor items that appear in very different locations among
forms• Item misprints/errors• Unintended accommodations (maps or periodic tables on
walls, calculators, etc.)• All manner of weird and unimaginable stuff and happenings
![Page 84: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/84.jpg)
Common Equating Concerns/IssuesQuality Control Issues
Item scoring, analysis, and equating quality issues
• Non-standard scoring criteria or changes in scoring procedures
• Redefinition in scoring rubrics, variation in benchmark papers• Item parameter drift• Departures from specified equating procedures• Unreliable and/or inconsistent item performance or score
distributions• Departures from specified data processing protocols• All manner of weird and unimaginable stuff and happenings
![Page 85: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/85.jpg)
Common Equating Concerns/Issues
Technical Documentation • General technical reports• Standards setting reports• Equating technical reports• Specify requirements for documentation
in RFPs, with TAC reviews and due dates
QUESTION
Can an independent contractor replicate the equating results?
![Page 86: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/86.jpg)
Common Equating Concerns/Issues
Accountability Concerns
• Standard Setting
• Adequate Yearly Progress (AYP)
![Page 87: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/87.jpg)
Common Equating Concerns/Issues
Item Formats and Platforms
• Open-ended or Constructed
Response Tasks
• Writing Assessment
• Paper-and-Pencil and Computerized Assessments
![Page 88: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/88.jpg)
References
Dorans, N. J., Pommerich, M., & Holland, P. W. (2007). Linking and aligning scores and scales. Statistics for social and behavioral sciences. New York: Springer.
Linn, R.L. (1993) Linking results of distinct assessments. Applied Measurement in Education, 6, 83-102.
Mislevy, R.J. (1992) Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: Educational Testing Service
Ryan, J. and Brockmann, F. (2009). A Practitioner’s Introduction to Equating with Primers on Classical Test Theory and Item Response Theory. Washington, D.C.:Council of Chief State School Officers (CCSSO).
![Page 89: A Practitioner’s Introduction to Equating](https://reader035.fdocuments.us/reader035/viewer/2022062323/5681533e550346895dc15ae9/html5/thumbnails/89.jpg)
A Practitioner’s Introduction to Equating
Joseph Ryan, Arizona State [email protected]
Frank Brockmann, Center Point Assessment [email protected]
END