HO #3-Reviewing the Assessment-Scored Example-22JAN14
Click here to load reader
-
Upload
research-in-action-inc -
Category
Education
-
view
112 -
download
1
Transcript of HO #3-Reviewing the Assessment-Scored Example-22JAN14
Handout #3- Reviewing the Assessment-Scored Example 1
HANDOUT #3
STEPS 7-10 Performance Measure Rubric [SCORED EXAMPLE]
Task
ID
Descriptor Rating Evidence
1.1 The purpose of the performance measure is explicitly stated
(who, what, why). 1 See completed Template #1, Step 1.
1.2
The performance measure has targeted content standards
representing a range of knowledge and skills students are
expected to know and demonstrate.
1 See completed Template #1, Step 2.
1.3
The performance measure’s design is appropriate for the
intended audience and reflects challenging material needed
to develop higher-order thinking skills.
1
See Enduring Understanding/Key
Concept in Template #1, Step 2 and
completed Test Blueprint, Step 3.
1.4
Specification tables articulate the number of items/tasks,
item/task types, passage readability, and other information
about the performance measure -OR- Blueprints are used to
align items/tasks to targeted content standards.
1
See completed Template #1, Step 3;
Only targeted content standards within
the identified Key Concept were used
in creating the assessment.
1.5
Items/tasks are rigorous (designed to measure a range of
cognitive demands/higher-order thinking skills at
developmentally appropriate levels) and of sufficient
quantities to measure the depth and breadth of the targeted
content standards.
1
See completed Template #1, Step 3;
Cognitive demand of Level 2 or higher
represent over 70% of the overall
available points on the assessment.
Assessment has at least five points per
standard.
Strand 1 Summary _5__out
of 5
Additional Comments/Notes
100% of all items/tasks were reviewed by the department-level committee. Refinements to the test blueprint in the
subsequent year will include 10% field test/try-out items.
Handout #3- Reviewing the Assessment-Scored Example 2
Task
ID
Descriptor Rating Evidence
2.1
Items/tasks and score keys are developed using standardized
procedures, including scoring rubrics for human-scored,
open-ended questions (e.g., short constructed response,
writing prompts, performance tasks, etc.).
1
See completed items and tasks,
following Template #2, Step 4; See
completed score key and scoring
rubrics following Template #2, Step
5.
2.2
Item/tasks are created and reviewed in terms of: (a)
alignment to the targeted content standards, (b) content
accuracy, (c) developmental appropriateness, (d) cognitive
demand, and (e) bias, sensitivity, and fairness.
1
100% of all items and tasks were
reviewed prior to the development of
the test form, Template #2, Step 6.
2.3
Administrative guidelines are developed that contain the
step-by-step procedures used to administer the performance
measure in a consistent manner, including scripts to orally
communicate directions to students, day and time
constraints, and allowable accommodations/adaptations.
1
See completed Template #2, Step 6;
Administrative guidelines are
embedded within the test document.
Directions to test-takers are provided
for each item type.
2.4
Scoring guidelines are developed for human-scored
items/tasks to promote score consistency across items/tasks
and among different scorers. These guidelines articulate
point values for each item/task used to combine results into
an overall score.
1
See completed Template #2, Step 5.
Scoring rubrics articulate the
performance continuum in content-
based terms. No behavioral
dimensions are included. Overall
point values are listed in the test-taker
directions.
2.5
Summary scores are reported using both raw score points
and performance level. Performance levels reflect the
range of scores possible on the assessment and use terms or
symbols to denote each level.
.5
Performance levels are only
articulated within the scoring rubrics.
Students must attain 70% of the
points available to receive a passing
grade.
2.6
The total time to administer the performance measure is
developmentally appropriate for test-taker. Generally, this
is 30 minutes or less for young students and up to 60
minutes per session for older students (high school).
1
See completed Template #2, Step 6;
The “test window” is divided into
multiple class periods of no more than
50 minutes each.
Strand 2 Summary _5.5__out
of 6
Additional Comments/Notes
During the refinement [Step 10], the department-level will evaluate the established performance standard of 70% to
determine the appropriateness of the performance level. An “advanced” level will also be explored at that time. These
improvements will address the shortcomings in Task 2.5.
Handout #3- Reviewing the Assessment-Scored Example 3
STRAND 3: REVIEW
Task
ID
Descriptor Rating Evidence
3.1
The performance measures are reviewed in terms of design fidelity:
Items/tasks are distributed based upon the design properties
found within the specification or blueprint documents;
Item/task and form statistics are used to examine levels of
difficulty, complexity, distractor quality, and other properties;
and,
Items/tasks and forms are rigorous and free of bias, sensitive,
or unfair characteristics.
.5
100% of the items and tasks
were reviewed for quality
and content consistency.
70% of all points are
obtained from items/tasks
requiring at least cognitive
demand Level 2.
3.2
The performance measures are reviewed in terms of editorial
soundness, while ensuring consistency and accuracy of all documents
(e.g., administration guide):
Identifies words, text, reading passages, and/or graphics that
require copyright permission or acknowledgements;
Applies Universal Design principles; and,
Ensures linguistic demands and readability is developmentally
appropriate
1
The test form, including
administrative guidelines,
and the scoring rubrics for
the extended performance
tasks were reviewed for
editorial soundness.
Linguistic “load” for all
prompts were appropriate for
high school students.
3.3
The performance measure was reviewed in terms of alignment
characteristics:
Pattern consistency (within specifications and/or blueprints);
Targeted content standards match;
Cognitive demand; and,
Developmental appropriateness.
1
See completed Template #1,
Step 3. Also, the items/tasks
on the test form were
“mapped” back to the Test
Specification to ensure two-
way alignment (e.g.,
blueprint to test and test to
blueprint).
3.4
Cut scores are established for each performance level. Performance
level descriptors describe the achievement continuum using content-
based competencies for each assessed content area.
.5
Performance level descriptors
were developed for the EA
and EP rubrics only. A
performance standard of 70%
of the points was established
by the grade-level committee.
Additional Comments/Notes
During Step 9, Data Reviews, impact statistics from the first year of implementation will be examined following the
guidelines within Research in Action’s “Smartbook”, the Standards for educational and psychological Testing (AERA,
APA, NCME, 1999), and other assessment/psychometric literature. These data will inform the refinement [Step 10], if
needed, of the current performance standards and lead to the development of performance level descriptors for the overall
score.
Handout #3- Reviewing the Assessment-Scored Example 4
Note: The indicators below are evaluated after students have taken the assessment
(i.e., post-administration).
Task
ID
Descriptor Rating Evidence
3.5
As part of the assessment cycle, post administration analyses are
conducted to examine such aspects as items/tasks performance,
scale functioning, overall score distribution, rater drift, content
alignment, etc.
0
Will be completed after
administering the assessment.
3.6
The performance measure has score validity evidence that
demonstrated item responses were consistent with content
specifications. Data suggest the scores represent the intended
construct by using an adequate sample of items/tasks within the
targeted content standards. Other sources of validity evidence
such as the interrelationship of items/tasks and alignment
characteristics of the performance measure are collected.
0
Will be completed after
administering the assessment.
3.7
Reliability coefficients are reported for the performance
measure, which includes estimating internal consistency.
Standard errors are reported for summary scores. When
applicable, other reliability statistics such as classification
accuracy, rater reliabilities, and others are calculated and
reviewed.
0
Will be completed after
administering the assessment.
Strand 3 Summary _3.0__out
of 7
Additional Comments/Notes
During Step 9, item statistics and the overall distribution of scores, including those above the preliminary cut score of 70%
will be reviewed. Rater agreement among the department-level graders will be computed. Other impact data will be
examined during the refinement [Step 10] following the guidelines in Research in Action’s “Smartbook”, the Standards
for educational and psychological Testing (AERA, APA, NCME, 1999), and other assessment/psychometric literature.