Scoring Technology Enhanced Items Sue Lottridge Director of Machine Scoring Amy Burkhardt Senior...
-
Upload
anastasia-blankenship -
Category
Documents
-
view
223 -
download
3
Transcript of Scoring Technology Enhanced Items Sue Lottridge Director of Machine Scoring Amy Burkhardt Senior...
Scoring Technology Enhanced Items
Sue LottridgeDirector of Machine Scoring
Amy BurkhardtSenior Research Associate of Machine Scoring
Technology Enhanced Items
• Seeing more TEIs in assessments– Consortia – Formative assessments
• Decisions around TEIs– Count-based (e.g., 25 MCs, 2 CRs, 3 TEIs)– Content-based
Drag and Drop TEIs
• Select– Drag N objects to a single drop target– Similar to ‘Check all that apply’ Selected Response
Items
• Categorize– Drag N objects to M drop targets– Limits: an object can be dragged to multiple Y
targets, or no
• Order– Drag N objects to M drop targets in proper order
• Composites (multi-part)– Dependencies
• Claims– Choice of TEI– Justification
• Creation– Environment– Format– Complexity– Constraints
• Interoperability– Rendering– Data storage– Porting
• Performance– Response time– Latency– Efficiency
• Cost– Time to develop– Permissions– Storage– QA
• Scoring– Combinatorics– Who sets rules
TEI Considerations
TEIs Live in the “Grey Area” between MC and CRs
Multiple Choice Items
Constructed Response
ItemsTEIs
Evaluating TE Item Scoring
• Classical Theory Methods (p-value, score dist, pbis)
• Analyze trends in responses– Frequency of response patterns– Counts of object choices– Proportion of ‘blank’ responses– Frequent, incorrect responses
• Analysis may– Suggest where examinees may not understand
the item– Highlight alternative correct answers– Suggest need for partial credit or collapsing
categories
TEI Scoring and Performance Factors
Item Design
Structure
Clarity
Constraints
Examinee
“Gets” the item
Facility with Tools
Experience with Item Type
Scoring
Rubric Alignment
Rubric Clarity
Scoring Quality
Item 1
Key: 2 points if response matches key.1 point if top or bottom row matches key.0 otherwise.
There are 19,531 ways to answer a single part, and so 381,459,961 ways to answer both parts.
What do the data tell us? Response pattern frequencies
More students dragged 2/3 and then 1/3 into boxes than answered the item correctly.
Part 1 and Part 2 Frequencies
Summation versus expression representation?
Summation versus expression representation?
Original Rubric New RubricScore Count Percent Count Percent0 2432 81% 2257 75%1 212 7% 335 11%2 375 12% 427 14%p-value .16 .20
• 190 examinees would have received a higher score• 138 ---- 0 to 1• 37 ---- 0 to 2• 15 ---- 1 to 2
Item 1 Summary
• Item Design– Clarify question– Clarify directions– Review drag target size– Revisit number of drag objects
• Examinee – Enable practice with infinite wells– Observe examinees answering the item
• Scoring– Summation versus expression? – 14% of responses are blank, why?
Item 2
Score
Number of Correct Objects
Present
Number of Incorrect
Objects Present2 4 0
14 1 or 23 12 0
0 Otherwise
Ignoring order, there are 2^10 (1024) possible answers.Preserving order, there are about 10,000,000 possible answers.
Ignoring order, there were 573 unique answers.Preserving order, there were 2961 unique answers.
Response pattern frequencies
What objects are chosen by examinees?
Object MeanCorrelation with
item score3(x) 87% .13
x+x+x 69% .26x^3 65% -.52
5x-2x 46% .35x+3 43% -.37
3x+3 37% -.363(2x-x) 33% .17
x/3 55% -.495(x-2) 26% -.18x-x-x 23% -.25
Object selection by score
Object
0 (N=5814)
1(N=1212)
2(N=312)
3(x) 85% 94% 100%x+x+x 62% 92% 100%x^3 78% 20% 0%
5x-2x 37% 73% 100%x+3 53% 7% 0%
3x+3 46% 2% 0%3(2x-x) 31% 24% 100%
x/3 68% 6% 0%5(x-2) 30% 13% 0%x-x-x 28% 1% 0%
New Scoring Rules
• Student needs to drag more correct objects than incorrect objects to earn a score of 1
Scores Original Rubric New Rubric
0 79% 63%1 17% 33%2 4% 4%p-value .12 .21
Relationship of parts to item score
Object PercentOriginal
CorrelationNew
Correlation3(x) 87% .13 .12
x+x+x 69% .26 .30x^3 65% -.52 -.53
5x-2x 46% .35 .29x+3 43% -.37 -.52
3x+3 37% -.36 -.503(2x-x) 33% .17 .04
x/3 55% -.49 -.625(x-2) 26% -.18 -.24x-x-x 23% -.25 -.36
Object Selections by Score Point
Original Rubric Revised Rubric
Object0
(N=5814)1
(N=1212)2
(N=312)0
(N=4624)1
(N=2402)2
(N=312)3(x) 85% 94% 100% 85% 91% 100%
x+x+x 62% 92% 100% 58% 85% 100%
x^3 78% 20% 0% 84% 38% 0%
5x-2x 37% 73% 100% 36% 57% 100%
x+3 53% 7% 0% 64% 10% 0%
3x+3 46% 2% 0% 57% 4% 0%3(2x-x) 31% 24% 100% 35% 19% 100%
x/3 68% 6% 0% 79% 16% 0%
5(x-2) 30% 13% 0% 34% 15% 0%
x-x-x 28% 1% 0% 35% 2% 0%
Item 2 Summary
• Item Design– Review drag target size– Revisit number of drag objects
• Examinee – Examinees appeared to understand the task
• Scoring– Are more generous rules aligned with
standard/claim? – Other rules?
Item 3
Student earns a 2 if she drags 4 or 5 correct steps in order and last step is x-3.Student earns a 1 if she drags 3 correct steps in order and last step is x-3.Student earns a 0 otherwise.
There are 19,081 ways to answer this item.
20 ways to earn a 216 ways to earn a 1
Response Frequencies (1108 unique responses)
Score distributions
Score Original Rubric Revised Rubric
N % N %
0 3891 75% 3758 73%
1 40 1% 173 3%
2 1227 24% 1227 24%
P-value .24 .25
Revised rubric – allows for partial credit scoring when student response contains correct path, but student drags ‘extra’ objects to fill up the remaining spaces
775 (13% of responses were blank)
Item 3 Summary
• Item – Remove Infinite wells – Add ‘distractors’?– Remove borders around drop targets or make dynamic
• Examinee – Students seem compelled to drag objects to fill all
spaces– Students do not reduce to final answer
• Scoring– Combinatorics – complicated scoring rules – Reversals?– Same level transformations?
Conclusions
• A review of responses and frequencies can reveal areas of misunderstanding, potential for item revision, or uncaptured correct responses
• Complexity of item leads to complexity in scoring– More ‘objects’ = more possible correct responses!– Object content influences scoring
• Placing constraints on item can help– Infinite wells– Size and number of objects
• Changes to scoring don’t always add value