Post on 01-Apr-2015
The Journey Toward
Accessible Assessments
Karen Barton
CTB/McGraw-Hill
Validity & Accommodations:
Validity• Validity: the ongoing trust in the accuracy of the test, the
administration, and interpretations and use of results• According to Messick (1995), “validity is not a property of the test . . .
as such, but rather of the meaning of the test scores . . . (that) are a function not only of the items or stimulus conditions, but also of the persons responding . . . (p. 741).
• Validation must therefore encompass the full testing environment: – test constructs– items– persons – characteristics and interactions of each
This goes beyond the validity of accommodating to the heart of assessment validity.
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Accommodations Classification
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Targeted Accessible
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Accommodations Classification Targeted Accessible
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Accommodations Classification Targeted Accessible
Predetermined Pieces
Confounding Variables: Persons
Persons• Disabilities• Accommodations• Access/barriers• Response
Person Variables & Variation• Identification of disability• Accommodation policies, selection and
provision• Access to and instruction in varied standards
and depth/breadth of coverage • Access to test information and opportunity to
accurately respond
Given the current state of the state, where diverse examinees approach the assessment platform with various accommodations and in non-standard administrations, what can be done to improve the validity of the assessments?
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Accommodations Classification Targeted Accessible
Predetermined Pieces
What is a construct?• “A product of informed scientific imagination,
an idea developed to permit categorization and description of some directly observable behavior . . (The construct itself is) not directly observable. . . (and) it must first be operationally defined (Crocker and Algina, 1986, p. 230).
• ≠ Trait.
Construct
TargetedTrait
TargetedTrait
TargetedTrait
Evidence Evidence Evidence Evidence Evidence Evidence
The operational definition includes the specification of traits and observable skills that, together, represent the unobservable construct.
The operational definition should be researched and empirically supported.
Math Intelligence
ComputationProblem Solving
Numbers
Item Item Item Item Item Item
Predetermined Pieces
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Accommodations Classification Targeted Accessible
Access
Precision
Validity
Access• Student access to
– test information (directions, stimulus), – requirements (expectation of how to respond), – response capabilities (the way in which students
respond) • Item access to student ability – true
performance
Improved Access• Improved student access:
– Accommodations: Access tools specific to examinees that allow for assessment such that disability or language does not misrepresent true performance.
• Improved item access:– Minimizing Construct Irrelevant Variance
(systematic error)
improved precision
Precision threat: ErrorRandom error– Random or inconsistent– Inherent to the assessment– Examples – content sampling, summative “snap-shot” assessment, scoring,
distractions– Reduce usefulness of scores
Systematic error – Consistent– Inherent to examinee– Example – students with disabilities without needed accommodation(s), low item
accessibility– Reduce accuracy of scores
When error is minimized, scores are more trustworthy!
SE
RE
Validity
ExamineeContext
(CRT, NCLB)ConstructTest/Items
Accommodations Classification Targeted Accessible
RE
SE
SE
SE
Minimizing Error• Random: Standardization – belief that random
error can be minimized by standardizing test administrations.
• Systematic: Construct Irrelevant Variance– Constant ~ group specific– Over/underestimation of scores~ “Students potentially provide the most serious threat to
CIV.” (Haladyna & Downing, 2004. p.23)
~ This brings us back to the test and how students interact with the constructs to be measured.
Accommodations• Such tools change administrations from standard to
non-standard, threatening comparability of results. • Providing either a standard or non-standard
administration requires sacrifices: – random errors in a non-standard environment– systematic errors when a test is standard and inflexible to
the access of students to test information
The question is: at what point are the sacrifices impeding measurement precision and the validity
thereof?
Back to Basics: Valid Assessment Systems• Improved student data
– Improved collection, particularly in light of Peer Review, to include subgroup data
– Supporting students– Improving decisions on accommodations and
standardization of the provisions thereof– Recognizing the assumptions of policy decisions:
classifications and accommodations
• Re-conceptualization of “standardization.” A more valid conceptualization may be what is standard for each examinee.
Back to Basics: Valid Assessment Systems• Well targeted to clearly and operationally defined construct
– If we can’t define what we want to know, how do we know that what we know is what we want to know?
• Balanced and aligned expectations of:– standards – skills – range of difficulties
• Improved measurement precision – Reduction in random AND systematic errors– Expanded item sampling– Increased accessibility– Flexibility– UDA
~ the Goldilocks approach
Past Research • Ways to “validate” accommodations – DIF,
EFA, cluster analyses, qualitative reviews, etc.
• Inconclusive results• Difficulties in conducting research:
– Experimental designs– Concurrent accommodations vs. single
accommodations– Confounding variables
Past Research• Lack of consensus on what constitutes “valid”
accommodations– Does “boost” = validity? – Isn’t it possible that a valid accommodation
might increase precision in measurement and possibly reveal student inability – no boost?
Continued & Future ResearchGiven the confounding variables of both persons and tests,
accommodations can not be validated apart from an in- depth look at the assessment and what it is trying to measure, in concert with how the accommodation by the student and test items interact. (Ex. - construct irrelevant variance by researchers Abedi, Kopriva, Winters,, et. al)
It must be clear how the accommodations affect skill measurement.
Therefore, future research should focus deeply on assessment validity in light of how the wide range of students, with all their diversities (and confounding variables), approach assessments.
Continued & Future Research• Re-evaluation of test constructs• Research on all students, not limited to disability classifications• Is there a way to measure individual systematic error?• Research on distractors
– What are the types of errors students make/distractors students choose
• Think aloud studies focused on access and student response preferences
Continued & Future Research
• Flexibility:– New item types and acceptable student
response modes– Approach flexible item types and research
thereof as parallel item forms and formats for more than the “accommodated” sample.
General, accommodated, alternate, and modified alternate assessments can and should be – Better aligned to clearly defined constructs– More innovative by design, – Valid for more than the middle of the bell, and – More meaningful and useful.