ASSESSMENT LITERACY AS A MEANS FOR ENHANCING VALIDITY · Validity, reliability and fairness with...
Transcript of ASSESSMENT LITERACY AS A MEANS FOR ENHANCING VALIDITY · Validity, reliability and fairness with...
#eatpconf#eatpconf
ASSESSMENT LITERACY AS A MEANS FOR ENHANCING VALIDITY
#eatpconf
An Approach from US K-12 Education
Ellen Forte, Ph.D.
CEO & Chief Scientist
edCount, LLC
September 26, 2018
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Overview
I. Defining assessment literacy
II. An application
III. Key concepts for assessment literacyA.Validity
B.Purpose → Validity
C.The life cycle of a test
D.Validity questions
E.Validity evidence
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Assessment Literacy
Being assessment literate means that one understands
key principles about how tests are designed, developed,
administered, scored, analyzed, and reported upon in
ways that yield meaningful and useful scores.
An assessment literate person can accurately interpret
assessment scores and use them appropriately for
making decisions.
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
Strengthening Claims-based
Interpretations and Uses of Localand Large-scale Science
Assessment Scores (SCILLSS)
Addressing Assessment Literacy
• Assessment Literacy Modules• Self-Evaluation Protocols
• Evidence-centered Design resources
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
Standard 1.0. Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.
(AERA, APA, & NCME, 2014, p. 23)
#eatpconf
➢ guide next steps in instruction
➢ evaluate instruction
➢ evaluate curriculum
➢ These uses are more formative. They have relatively low stakes for students and educators, as long as scores are considered in combination with other information and decisions allow for flexibility in implementation.
➢ evaluate learning for calculating grades
➢ determine eligibility for program entry or exit
➢ diagnose learningdifficulties
➢ predict performance in a later setting
➢ These uses have high stakes for individual students and scores must always be considered in combination with other information.
➢ evaluate teachers
➢ evaluate schools or districts
➢ evaluate programs or services
➢ These uses have high stakes for educators and some administrators andscores must always be considered in combination with other information.
ASSESSMENT LITERACY -> VALIDITY
Why use tests?
To…
#eatpconf
Linking Validity with Assessment Purpose
ASSESSMENT LITERACY -> VALIDITY
Purpose Validity
#eatpconf
Connecting Validity, Purpose, andthe Life Cycle of a Test
ASSESSMENT LITERACY -> VALIDITY
Design and development
Administration
Scoring
Analysis
Reporting
Use of Scores
Purpose Validity
#eatpconf
Validity Questions
ASSESSMENT LITERACY -> VALIDITY
Design and development
Administration
Scoring
Analysis
Reporting
Use of Scores
Validity
Construct Coherence
Comparability
Accessibility & Fairness
Consequences
#eatpconf
Validity Evidence Relates to Validity Questions
ASSESSMENT LITERACY -> VALIDITY
Construct Coherence
Comparability
Accessibility & Fairness
To what extent do the test scores reflect the knowledge and
skills we’re intending to measure, for example, those defined in
the academic content standards?
To what extent are the test scores reliable and consistent in
meaning across all forms, students, test sites, and time?
To what extent does the test allow all students to
demonstrate what they know and can do?
ConsequencesTo what extent are the test scores used appropriately to
achieve specific goals?
#eatpconf
Construct Coherence
To what extent does the assessment yield scores that reflect the knowledge and skills we intend to measure (e.g., academic standards)?
Why is this evidence important? To ensure that the assessment has been designed, developed, and implemented to yield scores that reflect the constructs we intend to measure.
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Examples of Construct Coherence Questions1. What are you intending to measure with this test? (What are the measurement
targets?)
2. How was the assessment developed to measure these measurement targets?
3. How were items reviewed and evaluated during the development process to ensure they appropriately address the intended measurement targets and not
other content, skills, or irrelevant student characteristics?
4. How are items scored in ways that allow students to demonstrate, and scorers to
recognize and evaluate, their knowledge and skills? How are the scoring processes
evaluated to ensure they accurately capture and assign value to students’ responses?
5. How are scores for individual items combined to yield a total test score? What
evidence supports the meaning of this total score in relation to the measurement
target(s)? How do items contribute to subscores and what evidence supports the meaning of these subscores?
6. What independent evidence supports the alignment of the assessment items and
forms to the measurement targets?
7. How are scores reported in relation to the measurement targets? Do the reports
provide adequate guidance for interpreting and using the scores?
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Comparability
To what extent are the assessment scores reliable and consistent in meaning across all students, classes, and schools?
Why is this evidence important? To ensure the assessment scores carry consistent meaning across test forms, students, administration sites, and time
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Examples of Comparability Questions1. How is the assessment designed to yield consistent, reliable scores? What evidence
supports score reliability?
2. How is the assessment designed to support comparability of scores across forms? What evidence supports such comparability?
3. How is the assessment designed to support comparability of scores across time?
What evidence supports such comparability?
4. How is the assessment designed to support comparability of scores across
administration sites? What evidence supports such comparability?
5. How is the assessment administered to protect against various types of cheating so
that the scores reflect students’ knowledge and skills and not inappropriate access to testing materials? What evidence supports the implementation of these
safeguards and security protocols?
6. How is the assessment scored such that scores reflect students’ knowledge and skills
and not inaccuracies or inconsistencies in the scoring process? What evidence supports implementation of these scoring procedures?
7. How are scores reported in ways that appropriately support or disrupt comparability in score interpretation across time, administration sites, or variations in student
characteristics?
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Accessibility and Fairness
To what extent does the assessment allow all students to access the content and demonstrate their knowledge and skills?
Why is this evidence important? To ensure that test scores reflect what we’re intending to measure about students’ knowledge and skills and not irrelevant characteristics
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Examples of Accessibility and Fairness Questions1. How were the assessment questions developed to ensure that scores do not reflect
student characteristics that are irrelevant to the measurement targets?
2. How were the needs of students with disabilities addressed during assessment development? What evidence supports these efforts and their effectiveness?
3. How were the needs of English learners addressed during assessment development? What evidence supports these efforts and their effectiveness?
4. How are students with disabilities able to demonstrate their knowledge and skills through the availability and use of any necessary accommodations? What evidence
supports the identification and use of these accommodations at the time of testing?
5. How are English learners able to demonstrate their knowledge and skills through the
availability and use of any necessary accommodations? What evidence supports the identification and use of these accommodations at the time of testing?
6. How are students’ responses scored in ways that reflect only the construct-relevant aspects of those responses? What evidence supports the minimization of construct-
irrelevant influences on students’ responses?
7. How are assessment scores interpreted in relation to knowledge and skills that test
takers have had an opportunity to learn or are preparing to learn? What evidence supports the interpretation of students’ scores in relation to their learning
opportunities?
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Consequences
To what extent does the assessment yield information that is used appropriately to achieve specific goals?
Why is this evidence important? To ensure that test scores are interpreted and used in ways that are appropriate and not interpreted and used in ways that are inappropriate
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Examples of Consequences Questions1. How is the assessment developed, administered, scored, and reported in ways that deter
and limit instances of cheating by students or others associated with the assessment or its
stakes? What evidence supports the implementation and effectiveness of these efforts?
2. How are the scores from the assessment intended to be used as described by the test
developers and how are they used by your state? How well do these uses align?
3. If your state is using test scores for purposes other than those for which the test developers
intended, what evidence supports those uses?
4. If assessment scores are associated with recommendations for instruction or other
interventions for individual students, what evidence supports such interpretations and uses
of these scores? What tools and resources are available to educators for evaluating and
implementing these recommendations?
5. If assessment scores are associated with recommendations for whole-class or group
instruction, what evidence supports such interpretations and uses of these scores? What
tools and resources are available to educators for evaluating and implementing these
recommendations?
6. If assessment scores are associated with high stakes decisions for teachers, administrators,
schools, or other entities or individuals, what evidence supports such interpretations and
uses of these scores?
7. How are scores reported to students and parents in ways that support their understanding
of the scores and any associated recommendations or decisions?
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
Self-Evaluation Protocol, Steps One and Two:Identifying Purposes and Assessments Used to Serve
those Purposes
Need/purpose Assessment(s) Used to Serve this Purpose
Evaluate science curricula
Monitor learning and guide instruction in math
Monitor reading development
ISTEP+: science in grades 4, 6, and 10
MAP Growth in grades K-10
Edmentum Reading Eggs
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
MAP Growth in grades K-10
All* students in grades K-10
*Except some students with disabilities and English learners?
✓
Four times annually: September, December, February, April
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
ASSESSMENT LITERACY -> VALIDITY
MAP Growth in
grades K-10
#eatpconf
An Example from England Primary Assessment
Barbara H. Donahue, Ph.D.
Head of Assessment Research & Psychometrics
Standards and Testing Agency
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Organogram
ASSESSMENT LITERACY -> VALIDITY
Minister of State for School Standards
Standards and Testing Agency
Assessment Strategy, Policy and
CommunicationsTest Development Test Operations
Ofqual
#eatpconf
Standards and Testing Agency (STA)Executive agency of the Department of Education
Civil servants that serve the government of the day
STA’s Chief Executive role in test content and standard setting and maintaining
Assessment specialists with required professional development
Develop and deliver tests for primary-age children in reading, mathematics and grammar, punctuation & spelling (GPS)
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Purpose of the tests
The main purpose of these tests is to ascertain what pupils
have achieved in relation to the attainment targets outlined in
the most recent national curriculum publication.
The intended uses of the outcomes:
Hold schools accountable for the attainment and progress
made by their children
Inform parents and secondary schools about the
performance of individual children
Enable benchmarking between schools and monitoring of
performance locally and nationally.
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Validity, reliability and fairness with respect to the primary purpose and usesBeing mindful of the relevant Standards for Educational and Psychological Testing (2014)
Ensure our processes collect the evidence necessary to answer our claims to validity, reliability and fairness
Be transparent by publishing test frameworks and handbooks – thereby educating the general public to be more assessment literate.
Continuous improvement – validity framework
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Assessment literacy thread throughout
Assessment professionals within government
Ongoing professional development of staff
Transparency with respect to the purpose and aims and evidence for those
Improved assessment literacy of teachers, parents, general public
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Credentialing Policymakers
Chad Buckendahl, Ph.D.
Founding Partner
ACS Ventures, LLC
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Primary psychometric principles for credentialing policymakers
➢Validity• Domain analysis
• Standard setting
➢Reliability• Scores
• Scorers
• Decisions
➢Fairness• Substantive
• Procedural
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Policymakers
• Includes: test owners, credentialing board
• Contribution to test development: Make key policy decisions in the test development and validation process guided by input from content experts and test developers.
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
What level of assessment literacy should be expected of stakeholders?
➢ Levels of understanding
• Level 1: Recognition/Awareness
• Level 2: Understanding/Application
• Level 3: Analysis/Evaluation
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Assessment Literacy: Policymakers
➢Expectations• Recognition of policy-related concepts and phases
within the test development and validation process
• Understanding of how phases within test development are interrelated and focused on supporting intended use and interpretation of test scores
• Being critical consumers of testing evidence
➢ Increasing assessment literacy• Training provided by test developers throughout the
test development process to move beyond intuitive understanding
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Validity
➢Domain analysis
• Defining the competencies, knowledge, skills, abilities of the credential
• Representative of the meaning of the credential
➢Standard setting
• Systematic consideration of the passing score
• Alignment of policy expectations with test performance expectation
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Reliability
➢Scores
• Internal consistency of scores
➢Scorers
• Tolerable levels of variation among scorers (human and/or machine)
➢Decisions
• Classification consistency to provide confidence in pass/fail decisions
ASSESSMENT LITERACY -> VALIDITY
#eatpconf
Fairness
➢Substantive
• Content representation
• Consideration of all candidate groups
➢Procedural
• Adequate notice
• Equivalent conditions
ASSESSMENT LITERACY -> VALIDITY
#eatpconf