HO #3-Reviewing the Assessment-Scored Example-22JAN14

Handout #3- Reviewing the Assessment-Scored Example 1

HANDOUT #3

STEPS 7-10 Performance Measure Rubric [SCORED EXAMPLE]

Task

ID

Descriptor Rating Evidence

1.1 The purpose of the performance measure is explicitly stated

(who, what, why). 1 See completed Template #1, Step 1.

1.2

The performance measure has targeted content standards

representing a range of knowledge and skills students are

expected to know and demonstrate.

1 See completed Template #1, Step 2.

1.3

The performance measure’s design is appropriate for the

intended audience and reflects challenging material needed

to develop higher-order thinking skills.

1

See Enduring Understanding/Key

Concept in Template #1, Step 2 and

completed Test Blueprint, Step 3.

1.4

Specification tables articulate the number of items/tasks,

item/task types, passage readability, and other information

about the performance measure -OR- Blueprints are used to

align items/tasks to targeted content standards.

1

See completed Template #1, Step 3;

Only targeted content standards within

the identified Key Concept were used

in creating the assessment.

1.5

Items/tasks are rigorous (designed to measure a range of

cognitive demands/higher-order thinking skills at

developmentally appropriate levels) and of sufficient

quantities to measure the depth and breadth of the targeted

content standards.

1


Cognitive demand of Level 2 or higher

represent over 70% of the overall

available points on the assessment.

Assessment has at least five points per

standard.

Strand 1 Summary _5__out

of 5

Additional Comments/Notes

100% of all items/tasks were reviewed by the department-level committee. Refinements to the test blueprint in the

subsequent year will include 10% field test/try-out items.


Task

ID


2.1

Items/tasks and score keys are developed using standardized

procedures, including scoring rubrics for human-scored,

open-ended questions (e.g., short constructed response,

writing prompts, performance tasks, etc.).

1

See completed items and tasks,

following Template #2, Step 4; See

completed score key and scoring

rubrics following Template #2, Step

5.

2.2

Item/tasks are created and reviewed in terms of: (a)

alignment to the targeted content standards, (b) content

accuracy, (c) developmental appropriateness, (d) cognitive

demand, and (e) bias, sensitivity, and fairness.

1

100% of all items and tasks were

reviewed prior to the development of

the test form, Template #2, Step 6.

2.3

Administrative guidelines are developed that contain the

step-by-step procedures used to administer the performance

measure in a consistent manner, including scripts to orally

communicate directions to students, day and time

constraints, and allowable accommodations/adaptations.

1


Administrative guidelines are

embedded within the test document.

Directions to test-takers are provided

for each item type.

2.4

Scoring guidelines are developed for human-scored

items/tasks to promote score consistency across items/tasks

and among different scorers. These guidelines articulate

point values for each item/task used to combine results into

an overall score.

1

See completed Template #2, Step 5.

Scoring rubrics articulate the

performance continuum in content-

based terms. No behavioral

dimensions are included. Overall

point values are listed in the test-taker

directions.

2.5

Summary scores are reported using both raw score points

and performance level. Performance levels reflect the

range of scores possible on the assessment and use terms or

symbols to denote each level.

.5

Performance levels are only

articulated within the scoring rubrics.

Students must attain 70% of the

points available to receive a passing

grade.

2.6

The total time to administer the performance measure is

developmentally appropriate for test-taker. Generally, this

is 30 minutes or less for young students and up to 60

minutes per session for older students (high school).

1


The “test window” is divided into

multiple class periods of no more than

50 minutes each.

Strand 2 Summary _5.5__out

of 6


During the refinement [Step 10], the department-level will evaluate the established performance standard of 70% to

determine the appropriateness of the performance level. An “advanced” level will also be explored at that time. These

improvements will address the shortcomings in Task 2.5.


STRAND 3: REVIEW

Task

ID


3.1

The performance measures are reviewed in terms of design fidelity:

Items/tasks are distributed based upon the design properties

found within the specification or blueprint documents;

Item/task and form statistics are used to examine levels of

difficulty, complexity, distractor quality, and other properties;

and,

Items/tasks and forms are rigorous and free of bias, sensitive,

or unfair characteristics.

.5

100% of the items and tasks

were reviewed for quality

and content consistency.

70% of all points are

obtained from items/tasks

requiring at least cognitive

demand Level 2.

3.2

The performance measures are reviewed in terms of editorial

soundness, while ensuring consistency and accuracy of all documents

(e.g., administration guide):

Identifies words, text, reading passages, and/or graphics that

require copyright permission or acknowledgements;

Applies Universal Design principles; and,

Ensures linguistic demands and readability is developmentally

appropriate

1

The test form, including

administrative guidelines,

and the scoring rubrics for

the extended performance

tasks were reviewed for

editorial soundness.

Linguistic “load” for all

prompts were appropriate for

high school students.

3.3

The performance measure was reviewed in terms of alignment

characteristics:

Pattern consistency (within specifications and/or blueprints);

Targeted content standards match;

Cognitive demand; and,

Developmental appropriateness.

1

See completed Template #1,

Step 3. Also, the items/tasks

on the test form were

“mapped” back to the Test

Specification to ensure two-

way alignment (e.g.,

blueprint to test and test to

blueprint).

3.4

Cut scores are established for each performance level. Performance

level descriptors describe the achievement continuum using content-

based competencies for each assessed content area.

.5

Performance level descriptors

were developed for the EA

and EP rubrics only. A

performance standard of 70%

of the points was established

by the grade-level committee.


During Step 9, Data Reviews, impact statistics from the first year of implementation will be examined following the

guidelines within Research in Action’s “Smartbook”, the Standards for educational and psychological Testing (AERA,

APA, NCME, 1999), and other assessment/psychometric literature. These data will inform the refinement [Step 10], if

needed, of the current performance standards and lead to the development of performance level descriptors for the overall

score.


Note: The indicators below are evaluated after students have taken the assessment

(i.e., post-administration).

Task

ID


3.5

As part of the assessment cycle, post administration analyses are

conducted to examine such aspects as items/tasks performance,

scale functioning, overall score distribution, rater drift, content

alignment, etc.

0

Will be completed after

administering the assessment.

3.6

The performance measure has score validity evidence that

demonstrated item responses were consistent with content

specifications. Data suggest the scores represent the intended

construct by using an adequate sample of items/tasks within the

targeted content standards. Other sources of validity evidence

such as the interrelationship of items/tasks and alignment

characteristics of the performance measure are collected.

0



3.7

Reliability coefficients are reported for the performance

measure, which includes estimating internal consistency.

Standard errors are reported for summary scores. When

applicable, other reliability statistics such as classification

accuracy, rater reliabilities, and others are calculated and

reviewed.

0



Strand 3 Summary _3.0__out

of 7


During Step 9, item statistics and the overall distribution of scores, including those above the preliminary cut score of 70%

will be reviewed. Rater agreement among the department-level graders will be computed. Other impact data will be

examined during the refinement [Step 10] following the guidelines in Research in Action’s “Smartbook”, the Standards

for educational and psychological Testing (AERA, APA, NCME, 1999), and other assessment/psychometric literature.

HO #3-Reviewing the Assessment-Scored Example-22JAN14

Education

Transcript of HO #3-Reviewing the Assessment-Scored Example-22JAN14