MSP Evaluation Rubric and Working Definitions Xiaodong Zhang, PhD, Westat Annual State Coordinators...
-
Upload
dorthy-hudson -
Category
Documents
-
view
217 -
download
2
Transcript of MSP Evaluation Rubric and Working Definitions Xiaodong Zhang, PhD, Westat Annual State Coordinators...
MSP Evaluation Rubric and Working Definitions
Xiaodong Zhang, PhD, Westat
Annual State Coordinators Meeting
Washington, DC, June 10-12, 2008
2
Relevant GPRA Measure
• GPRA measure—the percentage of MSP projects that use an experimental or quasi-experimental design for their evaluations that are conducted successfully and that yield scientifically valid results
• Westat’s Data Quality Initiative (DQI) developed a rubric to determine whether a grantee’s evaluation meets the GPRA measure
• The rubric is applied to each grantee’s final evaluation report
3
Evaluation Rubric
• The criteria on the rubric are the minimum criteria that need to be met for an evaluation to have been successfully conducted and yield valid data
• An evaluation has to meet each criterion in order to meet the GPRA measure
4
Evaluation Components Covered in Rubric
For Experimental Designs:
• Sample size
• Quality of measurement instruments
• Quality of data collection methods
• Data reduction rates
• Relevant statistics reported
For Quasi-Experimental Designs:
All of the above, plus…
• Baseline equivalence of groups
5
Working Definitions
• DQI developed working definitions to help implement the rubric criteria
• Report eligibility: final evaluation report that contains post-test results on key outcomes
• Multicomponent evaluations: each component will be coded separately
Teacher content knowledge Teacher instructional practice Student achievement
6
Working Definitions (Continued)
• Baseline equivalence
Pretest on key outcomes is most relevant Other related variables are acceptable (e.g., student SES
for student outcomes; education; experience for teacher outcomes)
7
Working Definitions (Continued)
• Minimum sample sizes: based on final sample Balanced designs
• Teacher outcomes– School/district level: N=12 – Teacher level: N=60
• Student outcomes– School/district level: N=12– Classroom level: N=18– Student level: N=130
Unbalanced design: smaller group must meet minimum size divided by 2
Sample size recommendations are based on power analysis
8
Working Definitions (Continued)
• Quality of instruments Existing state accountability or widely used assessment
(e.g., Iowa Test) Select items from validated assessment: must have a
minimum of 10 items, 70% of which must be from a validated and reliable instrument(s)
Grantee-developed assessment: must demonstrate reliability and validity
All instruments must have face validity
• Data reduction Allow flexibility if study population is highly mobile or if
potential differences were addressed in analysis
9
Conclusions
• Written guidance is forthcoming
• Q&A at next breakout session: Analysis of Final Reports