Post on 22-Dec-2015
The Assessment Review Process
September 30, 2014
Introductions
• Sue Brookhart, Ph.D.
• Juliette Lyons-Thomas, Ph.D. (Fellow, Regents Research Fund)
2
3
Webinar Norms
• All phones will be placed on mute• If you have a question, you can type into the
chat box, and your question will be addressed during a break
The chat box icon is located at the top right hand corner of your screen (remember to direct your chat to “Everyone”)
• At the end of the webinar, you will be asked to fill out a survey based on your experience today
4
Learning Outcomes
• The purpose of this webinar is to help attendees better understand the assessment review process, including:
Validity Reliability Instructional decisions Assessment purposes
5
Webinar Agenda
• Reviewing existing assessments Poll/question period
• Evaluating rigor and learning standards Poll/question period
• Comparability, assessments than inform instruction, and purpose
Poll/question period
6
Why do we need to review existing assessments?
7
A Sad Tale
Every Friday Story Test
15 points –
vocabulary
5 points –
comprehension
20 points total
8
Example – A Test Blueprint for Friday Story Test
Learning Objective Remember Understand Analyze/Create
Total
Know new vocabulary words
5 5 (17%)
Use new vocabulary words in sentences
5 5 (17%)
Understand the main points in the story
10 10 (33%)
Connect elements from the story (character, plot, or setting) with own life or other texts.
10 10 (33%)
Total 5 (17%) 15 (50%) 10 (33%) 30 (100%)
9
Results of Assessment ReviewShould Inform Subsequent Action Plans
• Keep an assessment• Revise an assessment• Replace an assessment
10
What are validity and reliability, and why are they necessary?
11
Assessment Quality
VALIDITY• Soundness of the
interpretations and uses of assessment results
• Does the assessment measure what it is intended to measure?
• Reliability is one aspect of validity
RELIABILITY• Degree to which results
are consistent across replications
Occasion (time) Test form Rater/grader
• Measurement error is the degree to which results are inconsistent
12
Validity and ReliabilityA Simple Example
13
Why are Validity and Reliability necessary?
• Sound, meaningful, accurate information for decisions
14
District/Consortium Assessment Review Form
Name of Assessment
Rigorous Comparable Informs instruction
Supports learning
goalsMeasures learning standards
Reliability and Validity
Utilizes a diverse set of assessment techniques (i.e. performance-based
tasks)
Recommendation (keep the assessment, eliminate the
assessment, modify existing assessments and/or identify or
create high-quality assessments that may be used for APPR and/or
other formative/instructional purposes)
15
How do the assessment review criteria represent validity (and reliability) for
local assessments?
16
Question: Which aspects of the assessment review process have you found most difficult to document for
the assessments you have reviewed?
In the chat box, please share your
thoughts
17
What is “rigor” and how can we know (and document) that an assessment is
rigorous?
18
Measuring Learning StandardsValidity
• Does the assessment REALLY indicate intended learning outcomes?
• Do the content and thinking skills match the standard?
• Two recommended methods for answering this question and documenting it:
Test blueprint Panel review of items or tasks
19
Test Blueprint for a Middle School Science Unit
Learning Objective
Re-member
Under-stand
Apply Total
Identify basic parts of cell 12 4 16 (40%)
Distinguish between plant & animal cells
4 4 (10%)
Describe diffusion and the function of cell membrane
6 2 8 (20%)
Explain the process of cell division 4 4 4 12 (30%)
Total 22 (55%) 8 (20%) 10 (25%) 40(100%)
20
Use Blueprint-style Thinking to Review Rubrics
CATEGORY 5 4 3 2 1 0Student Info Included (Name, Date, Period)
All items included One item missing Two items missingNo info provided
General Info Included
All 8 items included
One item missing or inaccurate
Two items missing or inaccurate
Three items missing or inaccurate
More than 3 items missing or inaccurate
No info provided
Eruption Information
All 6 items included
At least 4 items are included and accurate
Half the information is included or accurate
One 2 items included and accurate
Minimal or no information included or accurate
No info provided
Volcano DiagramClear, accurate diagram; with all 15 parts shown.
Diagram is included; 11-14 of parts clear and accurately shown.
Diagram is included; 6-10 parts clear and accurately shown.
Diagram is included; 3-5 labeled parts.
Diagram has less than 3 parts labeled.
No info provided
Overall Presentation
Clear, neat, organized. Layout well planned.
Layout is planned and organized. Writing is not neat
Info could be better organized. Writing is sloppy
Not organized. Not all info fits properly. Some attempt to make it work. Writing and lines are hastily done.
Very disorganized and poorly prepared. Lines not straight. Spacing is sloppy. Writing is hastily done. May have been done in homeroom.
No suggestions followed for organization and neatness
Use of Creativity
Various materials are used for effect. Attention to detail obvious. Good use of color. All 3/3
Some use of materials, attention to detail, and/or use of color.Some 3/3
Moderate use of varied materials, attention to detail, and/or color.Minimal 3/3
Minimal use of either varied materials, attention to detail, or color.Some 1-2/3
No use of either varied materials, attention to detail, or color.Only 1-2/3
No creativity0/3
21
Only half of this score is about understanding volcanoes, and all of those points have to do with counting facts (requiring copying, not even recall)!
CATEGORY 5 4 3 2 1 0
Student Info Included (Name, Date, Period)
All items included One item missing Two items missingNo info provided
General Info Included
All 8 items included
One item missing or inaccurate
Two items missing or inaccurate
Three items missing or inaccurate
More than 3 items missing or inaccurate
No info provided
Eruption Information
All 6 items included
At least 4 items are included and accurate
Half the information is included or accurate
One 2 items included and accurate
Minimal or no information included or accurate
No info provided
Volcano DiagramClear, accurate diagram; with all 15 parts shown.
Diagram is included; 11-14 of parts clear and accurately shown.
Diagram is included; 6-10 parts clear and accurately shown.
Diagram is included; 3-5 labeled parts.
Diagram has less than 3 parts labeled.
No info provided
Overall Presentation
Clear, neat, organized. Layout well planned.
Layout is planned and organized. Writing is not neat
Info could be better organized. Writing is sloppy
Not organized. Not all info fits properly. Some attempt to make it work. Writing and lines are hastily done.
Very disorganized and poorly prepared. Lines not straight. Spacing is sloppy. Writing is hastily done. May have been done in homeroom.
No suggestions followed for organization and neatness
Use of Creativity
Various materials are used for effect. Attention to detail obvious. Good use of color. All 3/3
Some use of materials, attention to detail, and/or use of color.Some 3/3
Moderate use of varied materials, attention to detail, and/or color.Minimal 3/3
Minimal use of either varied materials, attention to detail, or color.Some 1-2/3
No use of either varied materials, attention to detail, or color.Only 1-2/3
No creativity0/3
22
Panel Review of Items or Tasks
• Panel of people different from those who wrote the test or performance assessment
• Document their qualifications
• Go through the test item by item, or performance assessment task and rubrics, evaluating the match (of both content and thinking skills) to the standard. Tally. Report results.
23
Other Ways to Provide Validity Evidence
• Panel uses a protocol, document participants and results
• Correlate scores on the test or performance assessment with other known measures of the same standard
• Compare scores on the test or performance assessment from students with and without instruction on the standard
24
Reliability
• For tests with right/wrong scoring, report internal consistency (KR-20, KR-21, alpha, often available from scanning/scoring programs)
• For performance assessments or essay tests scored with teacher judgment (rubrics or other multi-point scoring), document that 2 or more raters give similar scores (use percent agreement or kappa, expectancy tables)
• For comparing forms, report the consistency (correlation) of scores for a sample of students taking both forms
25
Expectancy Table for Two Raters Using a Four-point Rubric
Rater 1Rater 2 1 2 3 4 Total
4 0 0 2 8 10
3 0 1 5 3 9
2 1 7 1 0 9
1 4 1 0 0 5
Total 5 8 7 13 33
Agreement = (4 + 7 + 5 + 8)/33 = 24/33 = 73%
Question: Thinking about validity and reliability of assessments often leads educators to think about other aspects of schooling. Which other topics crossed your mind during the presentation
about validity?
26
In the chat box, please share your
thoughts
27
Comparability
What is “comparability” and how can we know (and document) that an assessment supports comparable inferences across students, classrooms, and schools?
28
Comparability
• Scores are comparable if they mean the same thing from student to student, classroom to classroom, and school to school. Validity and reliability are the foundation of comparability.
• Comparability also means that the test or performance assessment is administered in the same way (directions, timing, access to materials, atmosphere) across classrooms and schools, in any regard that would make a difference to the scores.
29
How can we know (and document) that an assessment informs instruction?
30
Document Instructional Decisions
• Teacher team, PLC, or department meetings: Document who reviewed what data, and what instructional decisions were made (like the minutes from a meeting)
31
Resources for Making Instructional Decisions from Assessment Data
• Data-driven instruction www.engageny.org/ddi-library
• DDI Rubric www.engageny.org/resource/driven-by-data-data-driven-implementation-rubric/
32
Resources for Planning Instruction
• Student Learning Objectives www.engageny.org/resource/student-learning-objectives
• Tri-State/EQuIP Rubric to evaluate the quality of lessons and units intended to teach Common Core Standards www.engageny.org/resource/tri-state-quality-review-rubric-and-rating-process
33
Document Instructional Decisions
• Individual teacher(s) or informal groups of colleagues: Document who reviewed what data, and what instructional decisions were made (like a journal entry)
34
Less Direct
• Teachers doing the assessment review survey or tally the number of colleagues who report taking the results of the assessment into account when making instructional decisions and describe their reports of instructional decisions
35
How can we know (and document) that an assessment supports the learning of all
students?
36
Document the Effects of Instructional Decisions on Learning
• Pre-test/post-test studies: Use the assessment as a pretest, base a unit of instruction on the pretest results, and document improvement from pre- to post-test
• Be careful about Test form Fidelity of instruction to standard (e.g., Tri-State/EQuIP
Rubric)
Pretest
Instruction on Standard Based on Pretest
Posttest
37
Document the Effects of Instructional Decisions on Learning
• Comparison group studies: Compare achievement for a group of students for whom assessment results were used to inform instruction with a “business as usual” group
• Be careful about Comparability of students Fidelity of instruction to standard (e.g., Tri-State/EQuIP
Rubric)
Pretest
Instruction on Stand
ard Based
on Pretes
t
Posttest
Pretest
Instruction on
Standard As
Usual
Posttest
38
Assessment Purpose
How can we decide whether a test or performance assessment is more appropriate for a particular assessment purpose?
39
Tests
• Can cover a larger amount of material (better sampling of domain, goes to validity)
• Can sample a larger amount of student performance (goes to reliability)
• Best for recall, comprehension, and application, some analysis
40
Performance Assessments
• Requires students to create a product, demonstrate a process, or both
• Is evaluated with observation and judgment, using criteria
• Best for learning outcomes requiring students to use their learning to create something or do/perform something
41
Example – CCSS Math 5.NBT.7
Add, subtract, multiply, and divide decimals to hundredths, using concrete models or drawings and strategies based on place value, properties of operations, and/or the relationship between addition and subtraction; relate the strategy to a written method and explain the reasoning used.
Which part(s) of this standard could be assessed with a test and which part(s) would better be assessed with a performance assessment?
42
Example – CCSS Rdg Info Text 7.7
Compare and contrast a text to an audio, video, or multimedia version of the text, analyzing each medium’s portrayal of the subject (e.g., how the delivery of a speech affects the impact of the words).
Which part(s) of this standard could be assessed with a test and which part(s) would better be assessed with a performance assessment?
Question: Educators often find the process of documenting assessment review results to be difficult at first, but ultimately they report learning a lot about assessment. Has your participation in assessment reviews so
far increased your own assessment literacy?
43
In the chat box, please share your
thoughts
44
Thank you
• The slides and a video of this webinar will be posted at www.engageny.org/video-library
• Next webinar: Performance Assessment 3:30pm-5:00pm on November 5th, 2014
• Feedback: https://www.surveymonkey.com/s/nysedassessmentreview