Determining Validity and Reliability of Key Assessments

Susan MaloneMercer University

“The unit has taken effective steps to eliminate bias in assessments and is working to establish the fairness, accuracy, and consistency of its assessment procedures and unit operations.”

Address contextual distractions (inappropriate noise, poor lighting, discomfort, lack of proper equipment)

Address problems with assessment instruments (missing or vague instructions, poorly worded questions, poorly produced copies that make reading difficult)

[Review candidate performance to determine if candidates perform differentially with respect to specific demographic characteristics]

[ETS: Guidelines for Fairness Review of Assessments; Pearson: Fairness and Diversity in Tests]

Assure candidates are exposed to the K, S, & D that are being evaluated

Assure candidates know what is expected of them

Instructions and timing of assessments are clearly stated and shared with candidates

Candidates are given information on how the assessments are scored and how they count toward completion of programs

Assessments are of appropriate type and content to measure what they purport to measure

Aligned with the standards and/or learning proficiencies they are designed to measure [content validity]

Produce dependable results or results that would remain constant on repeated trials◦ Provide training for raters that promotes similar

scoring patterns◦ Use multiple raters◦ Conduct simple studies of inter-rater reliability◦ Compare results to other internal or external

assessments that measure comparable K, S, & D [concurrent validity]

Dispositions Assessment Portfolios Analysis of Student Learning Summative Evaluation

Alignment with INTASC standards, program standards, and the Conceptual Framework (accuracy; content validity)◦ Matrices◦ LiveText standards mapping◦ PRS Relationship to Standards section◦ PRS alignment with program standards requirement in

Evidence for Meeting Standards section Alignment with other assessments (accuracy;

concurrent validity)◦ Matrices ◦ Potential documentation within LiveText Standards

Correlation Builder

Rubrics/assessment expectations shared with candidates in courses; field experience orientations; and LiveText (fairness)

Rubrics/assessment expectations shared with cooperating teachers by university supervisors (consistency)

Statistical study (in process) examining correlations among candidate performances on multiple assessments (where those assessments address comparable K, S, & D) (consistency; concurrent validity)

Multiple assessors (consistency) Exploration of faculty’s assumptions re:

purpose of the assessment, expectations of behaviors, and meaning of rating scale (consistency; reliability)

Revision of rating scale, addition of more specific indicators, development of two versions (courses/field experiences) (accuracy; content validity)

Norming session with supervisors (consistency; reliability)

Revision of instructions to align more closely with rubric expectations and expected process (fairness; avoidance of bias)

Review of coursework and fieldwork to ensure candidates are prepared for assignment (fairness)

Seeking feedback from experts (P12 partners) on whether assignment reqs and assessment criteria are authentic (accuracy; content validity)

Annual review of data disaggregated by demographic factors (gender, race/ethnicity, site, degree program) (fairness, avoidance of bias)

Recent revision of portfolios and rubrics to align with new INTASC (accuracy; content validity)

Revision of rubrics to include more specific indicators related to the standards (change from generic rating descriptors) (accuracy; content validity)

Cross-college workshop on artifact selections (accuracy; content validity)

Annual review of artifact selections (accuracy; content validity)

Inter-rater reliability study (consistency; reliability)

Review of rubric expectations and all other PR and ST assignments to ensure opportunities to demonstrate all standards and indicators during experience (fairness)

Workshops for supervisors on the rubric expectations (consistency)

Feedback from cooperating teachers on relevance of the assessment (accuracy; content validity)

Annual review of data disaggregated by demographic variables (fairness; avoidance of bias)

Statistical study to identify correlations among entry reqs and successful program completion

Statistical study to determine if key assessments and entry criteria are predictive of program success (as defined by success in student teaching)

Determining Validity and Reliability of Key Assessments

Documents

Transcript of Determining Validity and Reliability of Key Assessments