Post on 18-Mar-2016
description
State Exemplar: Maryland’s Alternate Assessment
Using Alternate Achievement Standards
The Alternate Maryland School AssessmentPresenters
Sharon Hall U.S. Department of Education
Martin KeheMaryland State Department of Education
William Schafer University of Maryland
Session Summary
This session highlights the Alternate Assessment based on Alternate Academic Achievement Standards in Maryland – The Alternate Maryland School Assessment (Alt-MSA)
Discussion will focus on A description of the assessment and the systems-change process
which was required to develop and implement the testing program Development of reading, mathematics and science item banks The process to ensure alignment with grade-level content
standards and results and results of independent alignment studies
Technical documentation and research agenda to support validity and reliability.
Agenda Developing Maryland’s AA-AAAS: A Systems Change Perspective
Conceptual Framework Alt-MSA Design
Developing the Mastery Objective Banks Evaluation of the Alt-MSA’s alignment with content standards Technical Documentation and Establishing a Research Agenda
Support Validity and Reliability Questions and Answers
A Systems Change Perspective
Process Collaboration
Divisions of Special Education and Assessment Stakeholder Advisory Alt-MSA Facilitators Alt-MSA Facilitators and LACs MSDE and Vendor
Instruction and Assessment Students assigned to age appropriate grade (for purposes
of Alt-MSA) Local School System Grants
A Systems Change Perspective
Content
Reading and Mathematics mastery objectives and artifacts (evidence) linked with grade level content standards
No program evaluation criteria
Maryland’s Alternate Assessment Design (Alt-MSA)
Portfolio Assessment 10 Reading and 10 Mathematics Mastery
Objectives (MOs) Evidence of Baseline (50% or less attained) Evidence of Mastery (80% - 100%): 1 artifact for
each MO 2 Reading and 3 Mathematics MOs aligned with
science Vocabulary and informational text; measurement and
data analysis
What’s Assessed: Reading Maryland Reading Content Standards
1.0 General Reading Processes Phonemic awareness, phonics, fluency (2 MOs) Vocabulary (2 MOs; 1 aligned with science) General reading comprehension (2 MOs)
2.0 Comprehension of Informational Text (2 MOs; 1 aligned with science)
3.0 Comprehension of Literary Text (2 MOs)
What’s Assessed: Mathematics Algebra, Patterns, and Functions
(2 MOs) Geometry
(2 MOs) Measurement
(2 MOs; 1 aligned with science) Statistics-Data Analysis
(2 MOs aligned with science) Number Relationships and Computation
(2 MOs)
What’s Assessed: Science (2008)
Grades 5, 8, 10 Grades 5 and 8; select 1 MO each
Earth/Space Science Life Science Chemistry Physics Environmental Science
Grade 105 Life Science MOs
Steps in the Alt-MSA Process Step 1: September
Principal meets with Test Examiner Teams
Review results or conduct pre-assessment
Steps in the Alt-MSA ProcessStep 2: September-November
TET selects or writes Mastery Objectives
Principal reviews and submits
Share with parents
Revise (written) Mastery Objectives
Steps in the Alt-MSA ProcessStep 3: September-March
Collect Baseline Data for Mastery Objectives: 50% or less accuracy
Teach Mastery Objectives
Assess Mastery Objectives
Construct Portfolio
Standardized Number of mastery objectives assessed Format of mastery objectives Content standards/topics assessed All mos must have baseline data and evidence of
mastery at 80%-100% Types of artifacts permissible Components of artifacts Training and Handbook provided Scoring training and procedures
MO Format
Evidence (Artifacts) Acceptable Artifacts (Primary Evidence)
Videotapes-1 reading and 1 math mandatory Audiotape Student work (original) Data collection charts (original)
Unacceptable Artifacts photographs, checklists, narrative descriptions
Artifact Requirements Aligned with Mastery Objective Must include baseline data that demonstrates
student performs MO with 50% or less accuracy Data chart must show 3-5 demonstrations of
instruction prior to mastery The observable, measurable student response
must be evident (not “trial 1”) Mastery is 80%-100% accuracy Name, date, accuracy score, prompts
Scores and Condition CodesA MO is not alignedB Artifact is missing or not acceptableC Artifact is incompleteD Artifact does not align with MO, or
components of MO are missingE Data Chart does not show 3-5
observations of instruction on different days prior to demonstration of mastery
F Accuracy score is not reported
Reliability: Scorer Training Conducted by contractor scoring director, MSDE
always present Must attain 80% accuracy on each qualifying set Every portfolio is scored twice by 2 different teams Daily backreading by supervisors and scoring
directors Daily inter-rater reliability data Twice weekly validity checks Ongoing retraining
Maryland’s Alt-MSA Report
Development of the Mastery Objective Banks
Initial three years of program involved teachers writing individualized reading and mathematics Mastery Objectives (approximately 100,000 objectives each year)
Necessary process to help staff learn the content standards
Maryland and contractor staff reviewed 100% of MOs for alignment and technical quality
Mastery Objective Banks Prior to year 4, Maryland conducted an
analysis of written MOs to create the MO Banks for reading and mathematics
Banked items available in an online application, linked to and aligned with content standards
Provided additional degree of standardization Process still allows for writing of customized
MOs, as needed
Mastery Objective Banks In year 4, Baseline MO measurement was
added Teachers take stock of where a student is, without
prompts at the beginning of the year on each proposed MO
This helps to ensure that students are learning and assessed on skills and knowledge that has not already been mastered
Year 5 added Science MO Bank
Mastery Objective Banks
Mastery Objective Banks
Mastery Objective Banks
Mastery Objective Banks
Mastery Objective Banks
National Alternate National Alternate Assessment Center (NAAC)Assessment Center (NAAC)
Alignment Study Alignment Study of the Alt-MSAof the Alt-MSA
NAAC Alt-MSA Alignment NAAC Alt-MSA Alignment StudyStudy
Conducted by staff from University of North Carolina at Charlotte and Western Carolina University from March – August, 2007
Study was an investigation of the alignment of Alt-MSA Mastery Objectives in Reading and Mathematics to grade-level content standards
NAAC Alt-MSA Alignment NAAC Alt-MSA Alignment StudyStudy
Eight (8) criteria used to evaluate Developed in collaboration of content experts
special educators and measurement experts at University of North Carolina at Charlotte (Browder, Wakeman, Flowers, Rickleman, Pugalee, & Karvonen, 2006)
A stratified random sampling method (stratified on grade level) was used to select the portfolios, grades 3 – 8 and 10, 225 reading/231 mathematics
Criterion 1: The content is academic and includes the major domains/strands of the content area as reflected in state and national standards (e.g., reading, math, science)
Outcome: Reading: 99% of MOs were rated academic Math: 94% of MOs were rated academic
Alignment Results by CriterionAlignment Results by Criterion
Criterion 2: The content is referenced to the student’s assigned grade level (based on chronological age)Outcome: Reading: 82% of the MOs reviewed were referenced to a grade level standard (2.0% were not referenced to a grade level standard. 16% were referenced to off-grade standards (K-2) which were referenced to the standards of phonics and phonemic awareness.) Math: 97% were referenced to a grade level standard
Alignment Results by CriterionAlignment Results by Criterion
Criterion 3: The focus of achievement maintains fidelity with the content of the original grade level standards (content centrality) and when possible, the specified performanceOutcome Reading: 99% MOs rated as far or near for content centrality, 92% MOs rated partial or full performance centrality, and 90% rated as being linked to the MO
Math: 92% MOs rated as far in content centrality, 92% MOs rated partial performance centrality, and 92% rated as being linked to the MO
Alignment Results by CriterionAlignment Results by Criterion
Criterion 4: The content differs from grade level in range, balance, and Depth of Knowledge (DOK), but matches high expectations set for students with significant cognitive disabilities. Outcome Reading: All the reading standards had multiple MOs that were linked to the standard and although 73% were rated at the depth of knowledge level of memorize/recall, there were MOs rated at the highest level of depth of knowledge levels (i.e., comprehension, application, and analysis)Math: MOs were aligned to all grade level standards and distributed across all levels of depth of knowledge except the lowest level (i.e., attention), with the largest percentage of MOs at the performance and analysis/synthesis/evaluation levels.
Alignment Results by CriterionAlignment Results by Criterion
Criterion 5: There is some differentiation in achievement across grade levels or grade bands.Outcome Reading: Overall the reading has good differentiation across grade levels Math: While there is some limited differentiation, some items were redundant from lower to upper gradesCriterion 6: The expected achievement for students is for the students to show learning of grade referenced academic content Outcome: The Alt-MSA score is not augmented with program factors. However, in cases where more intrusive prompting is used, the level of inference that can be made is limited.
Alignment Results by CriterionAlignment Results by Criterion
Criterion 7: The potential barriers to demonstrating what students know and can do are minimized in the assessmentOutcome: Alt-MSA minimizes barriers for the broadest range of heterogeneity within the population, because flexibility is built into the tasks teachers select. (92% of the MOs were accessible at an abstract level of symbolic communication, while the remaining MOs were accessible to students at a concrete level of symbolic communication).
Criterion 8: The instructional program promotes learning in the general curriculumOutcome: The Alt-MSA Handbook is well developed and covers the grade level domains that are included in alternate assessment. Some LEAs in MD have exemplary professional development materials.
Alignment Results by CriterionAlignment Results by Criterion
Study SummaryStudy Summary
Overall the Alt-MSA demonstrated good access to the general curriculum
The Alt-MSA was well developed and covered the grade level standards
The quality of the professional development materials varied across the different counties
Technical Documentationof the Alt-MSA
Sources Alt-MSA Technical Manuals (2004, 2005, 2006) Schafer, W. D. (2005). Technical Documentation for
Alternate Assessments. Practical Assessment, Research and Evaluation, 10(10). At PAREonline.net.
Marion, S. F. & Pellegrino, J. W. (2007). A validity framework for evaluating the technical adequacy of alternate assessments. Educational Measurement: Issues and Practice, 25(4), 47-57.
Report from the National Alternate Assessment Center from a panel review of the Alt-MSA.
Contracted technical studies on Alt-MSA
Validity of the CriterionIs Always Important
To judge proficiency in any assessment, a student’s score is compared with a criterion score
Regular assessment: standard setting generates a criterion score for all examinees
Regular assessment: the criterion score is assumed appropriate for everyone It defines an expectation for minimally acceptable
performance It is interpreted in behavioral terms through achievement
level descriptions
Criterion in Alternate Assessment
A primary question in alternate assessment is Should the same criterion score should apply to everyone?
Our answer was no, because behaviors that imply success for some students, imply failure for others
This implies that flexible criteria are needed to judge the success of a student or of a teacher – unlike the regular assessment
Criterion Validity
The quality of criteria is documented for the regular assessment through a standard setting study
When criteria vary, then each different criterion needs to be documented
So we need to consider both score and criterion reliability & validity for Alt-MSA.
Technical Research Agenda
There are four sorts of technical research we should undertake: Reliability of Criteria Reliability of Scores Validity of Criteria Validity of Scores
We will describe some examples and possibilities for each.
Reliability of Criteria
Could see if the criteria (MOs) are internally consistent for a student in terms of difficulty, cognitive demand, and/or levels of the content elements they represent
Could do that for, say, 9 samples of students: L-M-H degrees of challenge for L-M-H grade levels,
Degree of challenge might be assessed by age of identification of disability or by location in the extended standards of last year’s MOs
Reliability of Scores
2007 rescore of a 5% sample of 2006 portfolios (n=266) showed agreement rates of 82%-89% for reading & 83%-89% for math
A NAAC review concluded the inter-rater evidence of scorer reliability is strong
Amount of evidence could be evaluated using Smith’s (2003) approach of modeling error using the binomial distribution to get decision accuracy estimates:
Decision Accuracy Study Assume each student produces a sample of
size 10 from a binomial population of MOs Can use the binomial distribution to generate
the probabilities of all outcomes (X=0 to10) for any π
For convenience, use the midpoints of ten equally-spaced intervals for π (.05 … .95)
Using X=0-50 for Basic, X=60-80 for Proficient, X=90-100 for Advanced yields:
Classification Probabilities for Students with Various πs
π Basic Proficient Advanced.95 .0001 .0861 .9138.85 .0098 .4458 .5443.75 .0781 .6779 .2440.65 .2485 .6656 .0860.55 .4956 .4812 .0232.45 .7384 .2571 .0045.35 .9052 .0944 .0005.25 .9803 .0207 .0000.15 .9986 .0013 .0000.05 1.000 .0000 .0000
3x3 Decision AccuracyCollapsing across π with True Basic = .05-.55, True
Proficient = .65-.85, True Advanced = .95:Classification
True Level Basic Proficient Advanced TotalAdvanced .0000 .0086 .0914 .1000Proficient .0336 .1789 .0874 .3000Basic .5118 .0855 .0028 .6000P(Accurate) = .5118 + .1789 + .0914 = .7821This assumes equally-weighted πs
Empirically Weighted πsMastery Objectives Mastered in 2006 for Reading and Math (N = 4851 students)
Percent Mastered Reading Percent Math Percent100 21.8 26.4 90 16.1 16.7
80 11.6 10.3 70 8.0 7.8 60 6.7 6.1
50 5.5 5.8 40 4.9 4.6
30 5.1 4.1 20 4.7 4.1 10 6.7 6.3 0 6.9 7.7
3x3 Decision Accuracy with Empirical Weights - Reading
Observed Achievement LevelTrue Level Basic Proficient Advanced TotalAdvanced .0000 .0258 .2726 .2984Proficient .0274 .1768 .1057 .3099Basic .3414 .0486 .0017 .3917
P(Accurate) = .3414 + .1768 + .2726 = .7908
NCLB requires decisions in terms of Proficient/Advanced vs. Basic
Observed Level Group - ReadingTrue Level Basic Proficient or AdvancedProficient or Advanced .0451 .9549Basic .8716 .1284
These are conditional probabilities – they sum to 1 by rows.P[Type I Error (taking action)] = .0451P[Type II Error (taking no action)] = .1284These are less than Cohen’s guidelines of .05 and .20.
3x3 Decision Accuracy with Empirical Weights - Math
Observed Achievement LevelTrue Level Basic Proficient Advanced TotalAdvanced .0000 .0299 .3174 .3474Proficient .0256 .1676 .1014 .2946Basic .3092 .0472 .0017 .3581
P(Accurate) = .3092 + .1676 + .3174 = .7942
NCLB requires decisions in terms of Proficient/Advanced vs. Basic
Observed Level Group - MathTrue Level Basic Proficient or AdvancedProficient or Advanced .0398 .9602Basic .8635 .1365
These are conditional probabilities – they sum to 1 by rows.P[Type I Error (taking action)] = .0398P[Type II Error (taking no action)] = .1365These are also less than Cohen’s guidelines of .05 and .20.
Reliability of ScoresConclusions
Decision accuracy of Reading is 79.1% Decision accuracy of Math is 79.4% Misclassification probabilities are
False Reading MathProf. 12.8% 13.6%Not Prof. 4.5% 4.0%
These are within Cohen’s guidelines
Validity of Criteria:Content Evidence
Could study MO development & review process for 9 samples of students, L-M-H degrees of challenge for L-M-H grade levels
Could map student progress along content standard strands over time
Could evaluate and monitor the use of the bank
Could survey parents: are MOs too modest, about right, or too idealistic
MSDE will conduct a new cut-score study
Validity of Criteria: Quantitative Evidence
For n=267 same-student portfolio pairs from 2006 & 2007 95% of 2007 reading MOs 90% of 2007 math MOs were completely new or more demandingthan the respective student’s 2006MOs(suggesting growth)
Alternate standard-setting studies could generate evidence about validity of the existing (or resulting) criteria:
Possible Alternate Standard Setting Study Approaches
Develop percentage cut-scores for groups with different degrees of disability (e.g., modified Angoff) & articulate vertically & horizontally
Establish criterion groups using an external criterion and identify cut scores that minimize classification errors (contrasting groups)
Set cutpoints that match the percentages of students in the achievement levels in the general population (equipercentile)
Validity of Criteria:Consequential Evidence
Could study IEPs to see if they have become more oriented toward academic goals over time
Could study of the ability of Alt-MSA to drive instruction – e.g., do the enacted content standards move toward the assessed content standards?
Validity of Scores:Content Evidence
Could study how well raters can categorize samples of artifacts into the content strand elements their MOs were designed to represent
Validity of Scores:Consequential Evidence
Could survey stakeholders: How have the scores been used? How have the scores been misused?
Two Philosophical Issues
Justification is needed for implementing flexible performance expectations all the way down to the individual student
Justification is needed for using standardized percentages for success categories across the flexible performance expectations
Contact Information Sharon Hall – Sharon.Hall@ed.gov Martin Kehe – mkehe@msde.state.md.us William Schafer – wschafer@umd.edu