Role of Educational Leadership in Confronting Classroom Assessment … · 2020-02-28 · assessment...

Role of Educational Leadershipin Confronting Classroom AssessmentInequities, Biased Practices,and a Pedagogy of Poverty

39

Connie M. Moss

ContentsWhy Injustices Inherent in Classroom Assessment Practices Receive Little Attention . . . . . . . 866Understanding Assessment for Social Justice: An Executive Summary of AssessmentTypes and Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867The Characteristics of High Quality Classroom Summative Assessments . . . . . . . . . . . . . . . . . . . . 869Students Have Assessment Rights and Should Learn Assessment Responsibilities . . . . . . . . . . . 871The Relationship Between Summative Assessment Practices and a“Pedagogy of Poverty” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872Not All Classroom Summative Assessments Are Created Equal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874ATwo-Pronged Issue: The Quality of the Data and the Soundness of the Interpretation . . . . . 876The Gap Between Teacher Assessment Competence and Confidence . . . . . . . . . . . . . . . . . . . . . . . . . 877The Accuracy of Classroom Assessment Practices and the Reasons Teachers Give forTheir Summative Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878The Impact of Assessment Practices on Student Motivation to Learn . . . . . . . . . . . . . . . . . . . . . . . . . 880Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883

AbstractThis chapter examines the connection between educational leadership, classroomassessment practices, and a “pedagogy of poverty” as a social justice issue thathas been ignored for too long. Unlike classroom observations of instruction,principals and other educational leaders rarely observe, question, or investigatewhat goes into the grades that students receive and the tests that inform thosegrades. The chapter explores possible reasons that assessment problems continueto exist unnoticed, year after year, despite the insidious impact they have on allaspects of teaching and learning. The chapter explains the two types of assess-

C. M. Moss (*)Duquesne University, Pittsburgh, PA, USAe-mail: [email protected]

© Springer Nature Switzerland AG 2020R. Papa (ed.), Handbook on Promoting Social Justice in Education,https://doi.org/10.1007/978-3-030-14625-2_147

863

http://crossmark.crossref.org/dialog/?doi=10.1007/978-3-030-14625-2_147&domain=pdf

mailto:[email protected]

https://doi.org/10.1007/978-3-030-14625-2_147

ment, formative and summative, but focuses squarely on the teachers’ summativeassessment practices and the impacts they have on student expectations, motiva-tion to learn, and achievement. Part of that impact can be seen in the ways thatassessment practices foster and are supported by a pedagogy of poverty – awatered down form of curriculum that is most often served to students in poorurban schools but can also be found in rural districts as well as in affluent suburbs.The chapter argues for assessments that contribute to student learning. Classroomassessment practices are under the direct influence of educational leaders. Butimproving the quality of classroom assessments will take both increased assess-ment literacy at all levels of schooling and a commitment to moving beyond data-driven decision making to recognizing that sound educational decisions requirehigh quality data. The best source of that data comes not from standardized tests,but rather from the classrooms themselves.

KeywordsAssessment literacy · Summative assessment · Assessment injustice · Pedagogyof poverty · Social justice · Deficit ideologies · Education outcome disparities ·Purpose of assessment · Making sound assessment inferences · Reasoning fromevidence · Assessment process · Formative assessment · High-stakesassessment · Measurement error · Assessment bias · Learning outcomes ·Assessment planning and design · Assessment rights and responsibilities ·Learning challenges · Urban classrooms · Teacher behavioral burnout · Schoolclimate · Garbage in. Garbage out · Educational reform · Evidence-baseddecision-making · Validity · Test interpretation · Performance assessment ·Measuring learning · Assessment competence · Non-achievement factors ·Grading effort · Classroom walkthroughs · Learning progressions · Normativescale · Norm referenced · Criteria · Variance · Motivation to learn · Extrinsicmotivation · Self-efficacy · Intrinsic motivation · Feedback · Student achievement

There is a common idiom cautioning against watching how sausages are made. Thepoint being that if people watched sausages getting made, they might be disturbed bythe hidden details, and becoming aware of those details might taint both the appealand the fondness for the final product. The idiom reminds us that are revelationsabout certain processes that we might not want or should choose not to hear about.

Sadly this is the case regarding the many hidden decisions surrounding class-room assessment – those made by teachers and students in particular – regardingwhat students know and can do. First, many of the decisions are hidden inside whatBlack and Wiliam (1998a) characterized as the “black box” of the classroom. Andsecond, many educational leaders rarely investigate or question the content ofclassroom tests, the process of classroom assessment, and the decisions made allalong the way that result in the grades students receive. Too often, the assessmentsthemselves and the conclusions teachers draw from the assessment process may betainted and inaccurate. Yet, these assessments result in accepted public grades,scores, and marks that classify each student’s progress and achievement. Central to

864 C. M. Moss

social justice issues related to classroom assessment is that educational leaders andthe general public often assume that each factor in summarizing student achieve-ment occurs with the utmost care, fidelity, and competence. And while in someclassrooms, that is certainly the case, in all too many classrooms summativeassessment practices do not provide a clear picture of student learning and oftenmisrepresent student achievement, ability, and potential. In fact, there is a wealth ofevidence that the everyday assessment practices of the majority of classroomteachers are plagued by serious problems and shortcomings (Black & Wiliam,2010; Moss, 2013). Classrooms are complex environments, and the process ofassessing learning within these ever-changing dynamics requires teacher expertise,honest administrative coaching, and collaborative vigilance. The problems thatarise when teachers and administrators fail to realistically evaluate their levels ofassessment literacy are often invisible to them, complicated, deeply rooted in thedaily operations of their classrooms and schools, and consistently connected toconditions that already put too many students at risk. Teachers with the best ofintentions, along with well-meaning principals and other instructional leaders,often miss the inconsistencies and injustices that are part of the everyday testingand grading of school students.

Classroom assessments have consequences that affect students’ lives in and out ofschools. And even though local, national, and international governments regularlyspend time and treasure to assess students, these high-stakes assessments tied to eachreform effort show little evidence of actually contributing to the improvementstudent learning. While these high-stakes tests have revealed many achievementissues over the years, they have been powerless to fix them (Black & Wiliam, 2010;Moss, 2015; Stiggins, 2007). Most of the reform initiatives to which the state andnational standardized tests are tied “treat the classroom as a black box.” The tests arenot designed to yield information that would provide “direct help and support to thework of teachers in the classroom. . .[Instead they send the message] that it is up toteachers: They have to make the inside work better” (Black & Wiliam, 2010, p. 81).In fact, an African proverb aptly and vividly describes the situation:Weighing a cowdoesn’t make it fatter. In other words, it is impossible to test students into increasedlearning. But the problem with classroom assessment practices does not stop there.In the current obsession with high-stakes standardized tests, there is a failure torecognize that the majority of the tests students experience are not highly funded, norcarefully studied. The most frequent testing events are teacher designed, scored, andinterpreted (Brookhart & Durkin, 2003). Regardless of this fact, teachers andeducational leaders rarely spend time investigating the validity of classroom assess-ments even though they account for “99.9% of the assessments in a student’s schoollife” (Stiggins, 2007). To complicate matters, millions of classroom assessments arecreated by thousands of teachers daily. Going back to the African proverb aboutweighing cows to make them fatter, consider what would happen if the scales used toweigh cows were created in thousands of different ways, by thousands of differentpeople (some with little to no ability to build a scale at all, let alone one that wasaccurate). This would result in a wide variety of weight accuracy discrepanciesacross that many home-made scales. And what would happen, if the questionable

39 Role of Educational Leadership in Confronting Classroom Assessment. . . 865

weights from that many inaccurate scales were used to support serious decisionsabout what happened to each cow? And what if some of the scale builders even usedthe information their scales recorded about the weight of the cows to make decisionsthat were either marginally or not at all connected to weight, like bovine cleanliness,for example? With that many different scale designs, weighing procedures, ques-tionable weights, and conclusions regarding what the weights mean, it is easyimagine the countless opportunities for bias, inequity, discrimination, and distrac-tion. Now consider what happens daily in classrooms with assessment conditionsthat perfectly mirror this fantasy scenario.

This chapter investigates the realities of the classroom assessments with a specificfocus on summative assessments. The purpose is to look into the “black box” (Black& Wiliam) of the classroom to reveal how the sausage is made so that educationalleaders and the teachers they serve can work to improve their practices and eliminatesocial injustice for countless students. Each section of the chapter works to revealareas where assessment injustices breed unnoticed. And even though principals areencouraged to walkthrough classrooms to coach and evaluate teacher effectivenessand student learning, many principals are unable to recognize pitfalls with assess-ment because they lack the assessment knowledge and understanding themselves(Moss, 2013). The reason being that evaluating classroom practice depends on the“eye of the beholder.” Educators do not actually describe what they see. That isbecause they can only see what they can describe (Moss, 2002). For example, if aprincipal does not understand the concept of self-regulation, the principal will beunable to notice the degree to which a lesson promotes a student’s ability ordisposition to make sound decisions regarding monitoring and improving the stu-dent’s own learning.

What follows is an examination of classroom assessment practices to help makepublic how the sausage is made. Each section focuses on a specific issue ofclassroom assessment practice to reveal relationships between classroom assess-ment, social justice, and a “Pedagogy of Poverty” (Haberman, 1991). By examiningthese issues, current and aspiring educational leaders can develop and sharpen theirassessment literacy – their understanding of “fundamental assessment concepts andprocedures deemed likely to influence educational decisions” (Popham, 2018, p. 2).

Why Injustices Inherent in Classroom Assessment PracticesReceive Little Attention

Inequities in educational assessment contribute to the impacts of social injustice inschools. These inequities and their relation to effective school leadership are notdiscussed strongly enough in the literature. Yet, it is an area of focus that deservesincreased attention for the preparation of teachers and educational leaders (Black &Wiliam, 2010), and in the conclusions educators draw regarding the effectiveness ofclassroom practices and school leadership (Brookhart & Moss, 2013; Moss &Brookhart, 2015; Moss, Brookhart, & Long, 2013).

866 C. M. Moss

One reason for this seemingly lack of attention may be that assessment injustice,compared to the host of injustices that students face each day in school, can be seenas less significant and therefore remains less common in social justice discoursecompared with other lines of inquiry. For example, social justice oriented educa-tional leaders are routinely informed that school discipline practices contribute tolearning opportunity disparities with school suspensions accounting for approxi-mately one-fifth of Black-White racial differences in school performance(Rosenbaum, 2018) and unequal loss of instructional days for students with disabil-ities (Losen & Whitaker, 2018). Likewise, leaders are apt to investigate and debateeducational structures, policies, and beliefs that promote and maintain racial, gender,and class discrimination in access to advanced placement and honors classes(Solórzano & Ornelas, 2002). Also, in the age of increased school violence, researchcautions educational leaders that students who perceive a lack of fairness in regard totheir treatment by teachers and administrators, along with uneven support from otheradults in the school, have an increased likelihood to carry weapons and engage ingroup fights and other forms of violence in schools (James, Bunch, & Clay-Warner,2015). And as a final example, educational leaders can readily find research regard-ing deficit ideologies and biased understandings of conditions such as socio-economic-based outcome disparities that continue to drive economic structuralinequalities and education outcome disparities (Gorski, 2016).

While these and countless other areas of educational injustice in schools rightlyhold places of urgency for educational leaders, the insidious impacts of assessmentinjustices continue to fly comfortably under the radar. Examining the sources andconsequences of injustices linked to educational assessments and the role thateducational leaders can play to mitigate and even eliminate those disparities inschools and classrooms is a crucial area of concern. These injustices promote,among other biased conditions, a “Pedagogy of Poverty” (Haberman, 1991) andexacerbate injustices for students who are already in conditions that put them at risk.

Understanding Assessment for Social Justice: An ExecutiveSummary of Assessment Types and Purposes

At its core, assessment is a process designed to observe student performances inways that allow educators to draw reasonable inferences about what students knowand can do. In order to align with the tenets of social justice, assessments mustpromote equitable learning environments for all students (Milner, 2018). Specifi-cally, equitable assessments should help educators “learn about students, not sortthem.” Through a social justice lens this stipulation holds special importance inregard to “students who are often placed at the margin of learning – black and brownstudents, students whose first language is not English, students who have learningdifferences, and students who live below the poverty line” (p. 88). The purpose ofassessment is not to compare one student or one district to another – those disad-vantaged to those privileged by their zip codes, for example. And for assessment tofulfill its purpose educators must use results to not only draw conclusions about the


quality of student learning, but also to discern the quality of the instruction that led toor derailed student learning. Social justice demands that the adults in the school useassessments to hold themselves accountable for what was taught, how it was taught,what students were asked to do to learn it, and the impacts of those decisions onstudent learning (Milner, 2018; Moss & Brookhart, 2015).

Simply put, assessment means that the assessor is looking for evidence ofsomething (Guskey, 2007; Moss & Brookhart, 2012). Inherent in that definition isthe assumption that the assessor can describe exactly what that “something” is foreach daily lesson and can design a tool to measure where individual students area intheir journeys toward learning that “something.”Assessments that come at the end ofa unit of study, then, must be designed to measure the sum of relationships amongeach of those “somethings” that make up the overall concepts, skills, and reasoningprocesses that are the focus of the unit of study (Moss & Brookhart, 2015). Thisfoundational definition of assessment prompts four important lines of inquiry foreducational leaders. Are educators sure of what they are looking for? Can theydesign assessments that provide information regarding that something? Are they ableto make sound inferences based on that information? What are the social justiceimpacts of negative responses to the first three questions?

Assessments of learning in schools provide educators, policy makers, students,and parents with the information they need to make decisions. The specific purposefor which an assessment will be utilized is an important consideration in every aspectof its design, implementation, and interpretation. For example, a classroom assess-ment designed by a teacher to audit what students have learned at the end of a unit ofstudy might be designed to provide specific details about the nature of the content,the reasoning processes promoted in the unit, and the ability of students to applytheir new understandings to a novel situation. The acts of collecting and usingassessment information support inferences about what students know. That is whyassessment is framed as a process of reasoning from evidence (Mislevy, 1994, 1996).The word “process” here is key. Assessment is a process – a series of events, ratherthan a stand-alone, one time occurrence whether that assessment is formative orsummative. Formative assessment involves gathering data for improving studentlearning, whereas summative assessment uses data to audit or certify how much astudent knows or has retained at the completion of a learning sequence (AmericanEducational Research Association, American Psychological Association, &National Council on Measurement in Education, 2014).

Formative assessment or assessment for learning is student and learning centered(Stiggins, 1994) and functionally bonded to effective instruction (Wiliam & Thomp-son, 2007). It occurs during learning and is intended to result in improved learning.When done right, it can impact the quality of student learning and the effectivenessof teacher instruction (Moss & Brookhart, 2019; Wylie, Lyon, & Goe, 2009).Formative assessment can be defined as “an active and intentional learning processthat partners the teacher and the students to continuously gather evidence of studentlearning with the express goal of raising student achievement” (Moss & Brookhart,2012, p. 6). Formative assessment is assessment that happens minute-by-minute andday-to-day as learning is happening. What makes it “formative” is that it contributes

868 C. M. Moss

to student learning rather than simply auditing it (Moss & Brookhart, 2019; Wiggins,1998). Formative assessment is commonly seen as “activities undertaken by teachers– and by their students in assessing themselves – that provide information to be usedas feedback to modify teaching and learning activities” (Black & Wiliam, 2010,p. 82). Formative assessment involves strategies such as the following (Moss &Brookhart, 2019; Black & Wiliam, 2010):

• Sharing learning targets and criteria for success with students• Feedback that feeds forward, from teachers, peers, or other sources• Student self-assessment and goal setting• Using strategic questions and engaging students in asking effective questions

Summative assessment on the other hand is assessment of learning that occurs afteran episode of learning and is intended to summarize the student’s achievement level at aparticular time (Moss, 2013). All summative assessments are “cumulative assessments. . . that intend to capture what a student has learned, or the quality of the learning, andjudge performance against some standards” (National Research Council, 2001a, p. 25).And while formative assessments are primarily used to inform teacher and studentlearning by providing up-to-the-minute information about how learning is unfoldingduring a lesson, summative assessments are seen as high-stakes assessments since theycertify how much a learning has taken place during a certain block of time (Gardner,2010). By their nature, summative assessments are “almost always graded, are typicallyless frequent, and occur at the end of segments of instruction. Examples of summativeassessments are final exams, state tests, college entrance exams (e.g., GRE, SAT, &LSAT), final performances, and term papers” (Dixon & Worrell, 2016). Teachers andschools use summative assessments to determine student eligibility for special pro-grams, to determine if a student should be retained in the same grade, to provideguidance on career paths and choices, and to determine qualifications for awards(Harlen & Gardner, 2010). These examples illustrate a few of the common uses ofsummative assessments. And while summative assessments also include mandatedstate tests like the Pennsylvania System of School Assessment (PSSA) and KeystoneExams, the Rhode Island Partnership for Assessment of Readiness for College andCareers (PARCC), and the New York State Assessments and Regents Exams (Gewertz,2018). While there are problems inherent in state and nationally mandated tests, thischapter focuses squarely on the use and abuse of teacher made summative assessments– those assessments that students take daily in classrooms.

The Characteristics of High Quality Classroom SummativeAssessments

Assessment is a process by which educators make inferences about what studentsknow, understand, and can do. To support appropriate inferences, an assessmentmust be designed to align with and provide an appropriate measure of the specificlearning goal the assessment purports to measure. And because all assessments are


subject to measurement error, a quality assessment plan for a unit of study should usemultiple sources of evidence to insure that the inferences that result from them aredependable. Teachers and educational leaders should continually reflect upon thealignment that exists (or does not exist) between the stated learning goals of thelesson or unit of study and the classroom assessments. Specifically, educators shouldpursue the following line of inquiry: Do the summative assessments used in eachclassroom measure everything that matters or only those things that are easiest totest? (McTighe, 2018, p. 16). Studies of classroom summative assessment raiseserious doubts in response to this important question (Frey & Schmitt, 2010). Forexample, when a school district decided to survey the quality of the assessments usedin their classrooms, the results highlighted a disturbing pattern. After gathering a664 classroom assessments from across the district, a panel of teachers and admin-istrators rated the quality of a random sample of 20% of the total (138 tests) for theirquality. The panel concluded that most of the assessments (75.5%) measured thelowest levels of cognition – recall and comprehension. And a majority of theassessments (80%) used multiple-choice, true-false, matching, and fill in the blanktest items (Gibble, 2000). And while all assessment formats have their pros and cons,these types of assessment items are clearly the simplest and most convenient toscore, which may explain their overuse.

The qualities of sound classroom assessment begin with six areas of responsibilityfor teachers (Brookhart & Nitko, 2019). First, teachers should design or choose highquality assessments that match their learning outcomes and are understandable to theirstudents. If teachers build their own summative assessment, they should take care tofollow the principles of sound assessment planning and design, item writing, rubricdevelopment and rubric sharing, and writing directions for the assessment that areeasily understood by all students to promote student success. That means payingattention to all the details to make sure that the test items do not contain errors orinaccuracies. And should a teacher discover that a test contains errors, it is important tocorrect the error as soon as possible and rescore or re-administer the assessmentdepending on the nature of the error. Second, teachers should choose appropriateassessment procedures for using the assessment. That means that the procedures theyuse should not disadvantage students by gender, ethnicity, social or economic classand do not promote stereotypes or other forms of bias. Third, educators must be surethat they administer the assessments fairly for done fairly to all students and that theprocedures they use will not yield assessment results that are difficult or impossible tointerpret. Fourth, teachers must take care to score the assessment accurately and fairlyfor each student and do so in a timely manner. Fifth, teachers must be scrupulous toensure that their interpretations of the assessment are not only as valid as possible, butthat they also use their conclusions to promote positive student outcomes and mini-mize negative student outcomes. And finally, teachers must recognize their responsi-bility to communicate complete, useful, and correct information.

In addition to the responsibility to design and use high quality assessment tasks,procedures, and interpretation and communication of results, teachers have a respon-sibility to adequately prepare their students for an assessment. Part of that prepara-tion means making sure the assessment is understandable to all students. This seems

870 C. M. Moss

simple on its face, but research (Jakwerth, Stancavage, & Reed, 1999) shows thatwhen students fail to answer a question or leave an item blank, they do so for avariety of reasons linked to their inability to understand the test itself including notbeing able to “figure out what the question was asking” (p. 9), “didn’t really get thequestion” (p. 9), or “didn’t realize [I}. . .had to do both parts” (p. 10). Teachers whofail to clearly write and thoroughly explain the assessment to their students aresetting them up for problems and allowing factors such as misunderstanding thedirections to bias the assessment results.

Students Have Assessment Rights and Should Learn AssessmentResponsibilities

Classroom assessments and assessment procedures should be fair to all students –those from all ethnic and socioeconomic backgrounds (e.g., Mahalingappa, Rodriguez,& Polat, 2017), those for whom English is not their first language (Polat, 2016), andthose students with learning challenges and disabilities (Thurlow & Kopriva, 2015).Tests should be written so that the words used in problems or tasks allow all students toappropriately interpret what they are being asked to do. Students have the right tounderstand the directions and be able to follow them. And if the length or thecomplexity of the task disadvantages students with learning challenges, studentshave the right to appropriate accommodations of the wording or how the test isadministered. For example, students can demonstrate their understanding of thewater cycle verbally if they are unable to write an extensive explanation if requiredby the test. The purpose of the assessment is to measure the student’s understanding ofthe concept, not assess the student’s ability write a cogent explanation in set amountof time.

All students have the right to be prepared to do their best on a planned assessmentand these rights focus on fairness, respect, and transparency. Some of the consider-ations that are important include being given assessments that strongly match thepurpose of the assessment and that have been crafted and administered using soundassessment principles. Students should be told in advance when a test is going to takeplace and what accommodations are available to them. Students also have the rightto be fully informed about the consequences of not taking a test or failing to finish atest. They should also understand what options they have during a test if they findthemselves unable to continue (Brookhart & Nitko, 2008). For example, a secondgrade class was cautioned that talking during the test, even to raise a hand to talk tothe teacher was forbidden. Unfortunately half-way through the test, the pencil of oneof the students broke. Afraid to talk and confused about what to do, the studentcompleted stopped unable to complete the rest of the test. In this case, an absolutecommand for young students without the cognitive development to think abstractlyabout what it means to obey a rule rendered the student unable to make the bestdecision and resulted in the student turning in incomplete work.

Classroom assessments also present opportunities for students to learn about theideas of responsibility. In fact, the Test Takers’ Rights Working Group (1999)


prepared a list of student responsibilities regarding assessments. The list includes acomprehensive treatment of obligations that students must fulfill that include thefollowing. Students are responsible for studying and preparing for tests in theirclasses. Students must also be respectful and courteous toward other students whoare taking the test. Students should learn that they should behave honestly during atest and not cheat. Students should be responsible to learn the rules that govern whatthey should do if illness prevents them from taking the test and how they go aboutscheduling a make-up test (Geisinger, 2001).

In classrooms where students are treated with respect that respect must include thecommunication of high expectations for student conduct during an assessment and thesupport students need to reach them. Teachers and educational leaders should bemindful of the expectations that testing procedures and directions communicate tostudents. When students are not held to high standards, when teachers and principalsare okay with substandard assessments, when they ignore students who are consistentlyunprepared and blame the students who are not motivated to persist, they disrespect theirstudents and in turn contribute to conditions that promote a pedagogy of poverty.

The Relationship Between Summative Assessment Practicesand a “Pedagogy of Poverty”

In 1991, Martin Haberman introduced the concept of a “Pedagogy of Poverty” as thewatered-down version of education served daily to poor, urban students who weredisproportionately children of color. Teachers in these urban classrooms demonstratean overreliance on direct instruction and routine seatwork, supported by the belief thatmastering basic skills must precede students exposure to higher order thinking (Means& Knapp, 1991). A pedagogy of poverty uses strategies to uphold classroom man-agement above all costs. It requires constant teacher direction and unquestioningstudent compliance. This kind of environment appeals to “to those who fear minoritiesand the poor. . .to those who have low expectations for minorities and the poor. . .[and]to those who do not know the full range of pedagogical options available” (Haberman,1991, pp. 82–83). Haberman included school administrators in the category of thoseunfamiliar with a range of effective teaching and assessment practices.

A pedagogy of poverty is characterized by four underlying assumptions abouthow to best teach in poor, urban schools: (1) Teachers teach, students learn. They donot engage in the same activities; (2) teachers are in charge and the student’s job is tofollow the teacher’s directions so teachers can teach them appropriate behaviors;(3) urban students have challenging home lives and many have handicaps. It isinevitable that many of them will end up not learning although some students mayactually excel, and (4) students need to learn basic skills so they can earn a living.The best way to teach them those basic skills is by using a directive pedagogy(Haberman, 1991, p. 83). Classrooms based on these beliefs do not work for theteacher or the students, but rather, the conditions they produce work on them. Theactions based on these assumptions foster the kind of teaching that is mind numb-ingly repetitive. Overtime, the teachers experience a unique behavioral burnout, a

872 C. M. Moss

condition that sees them remaining as paid employees but no longer functioning asprofessionals. They go through the motions of teaching without emotional andpersonal investment, believing that no matter what they do, they will never make atangible difference in the lives of their students. In this burned out mode, teachers areable to cope with the serious problems faced by their students and the negativeconditions in their schools supported by dysfunctional bureaucracies that no longersee failures as a sign of any personal inadequacies in the administrative team(Haberman, 2005a).

In turn, the type of instruction the teachers are able to provide increases the causes ofbehavior problems for bored, unengaged, and demotivated students. To control thestudents, teachers must become increasingly authoritarian causing increased teacherburnout as teachers realize that their goals for being a teacher who helps and guideseager learners have been replaced by the reality of what they have become in order tofunction in this kind of classroom environment. And as standardized test scores fail torise year after year, those who believe in these pedagogical methods double down to domore of the same. And so it goes, year after year since teachers themselves are neverfaulted when “their ‘deprived,’ ‘disadvantaged,’ ‘abused,’ ‘low-income’ students are notlearning” (p. 85).

When Haberman (1991) first introduced the idea of a pedagogy of poverty, hepredicted that its influence was powerful enough to undermine any reform effort todilute or overthrow it, because by its very nature a pedagogy of poverty completelycontrols every aspect of the lives of the teachers and their students. It determinesexactly how students spend their class time, what behaviors they are taught andexpected to strictly enact, what students learn about any topic, how they learn it,what attributes they should value in themselves, and what they should count assuccess – how they will know when learned something. Haberman’s (1991) proph-ecy has proven to be extremely accurate. Over a decade later, Kozel (2005) foundlittle change in poor urban schools and classrooms designed to serve those withlearning challenges. Unfortunately, a pedagogy of poverty continues unabated tofoster and maintain a school climate where the primary lesson students learn is thatthey can succeed without engaging or thinking critically.

Despite a call for change, routine instruction and the assessment of rote memo-rization and low-level facts continue to dominate poor urban schools (McKinney,Chappell, Berry, & Hickman, 2009). These factors communicate to urban parents,some of whose own school experiences were unrewarding, that as in their day,students should be forced to learn. Additionally, a pedagogy of poverty communi-cates accepted norms to all those involved in the education of disadvantaged childrenand youth that includes expectations for what teachers do, what students shouldexpect, and what parents and the general public should assume teaching to be(Haberman, 2005b).

An examination of mathematics instruction in typical urban schools reveals thatroutine, traditional assessment practices dominate mathematics education (Hiebert,2003; Van De Walle, 2006) and continue to fuel the engines of the pedagogy ofpoverty. In a study exploring the summative assessment practices of 99 elementaryteachers from high poverty schools (McKinney et al., 2009), researchers concluded


that teachers were not engaging in the kinds of summative assessment practicesrecommended by the National Council of Teachers of Mathematics (NCTM) (2000).The recommendations encourage math teachers to align their assessment practiceswith their assessment purposes; be mindful of the ways classroom assessment can beused to enhance student learning; and, employ alternative strategies like student self-assessments, portfolios, interviews and conferences, analysis of error patterns, andauthentic assessments. Instead, of using these recommended approaches that bothteach and assess student’s mathematical inquiry and sharpen student ability to useproblem-solving and reasoning skills, teachers commonly used traditional tests ofcomputation and math facts instead. Teachers who did use some of the promotedapproaches admitted to using those practices infrequently (McKinney et al., 2009).

Clearly, inadequate summative assessment practices continue to strengthen apedagogy of poverty. Pop quizzes, tests that require the regurgitation of disparatefacts, and grades that are comprised of scores on products and performances that donot indicate student learning progress are part of the problem. For example, in manyclassrooms, students receive points and grades for things like accurately copyingnotes from the board or from the teacher’s PowerPoint, or completing study guidesthat require them to find the missing word or phrase from a verbatim sentence in thetext. Is this evidence of learning or does what students are asked to do simply certifytheir ability to comply with the directions and complete the assignment? Teachersand educational leaders should ask themselves this question: Would another quali-fied educator determine that a copied set of notes or a completed low-level studyguide was compelling evidence that students had master the concepts and skillsbeing taught (Moss & Brookhart, 2015)? Or to put the question into a specificcontext, would the correctly copied notes be an indicator that a student understandsand can explain the War of 1812? Clearly, the answer is no. In classrooms wherethese practices are the rule of thumb, students can earn good grades for simplycompleting assignments whether they reach conceptual understanding or not.

Not All Classroom Summative Assessments Are Created Equal

As the previous examples and discussion indicate, summative assessments do notfunction in isolation. The quality of any summative assessment depends on itsconnection to the curriculum and the ways that the teacher asks students to learnthat curriculum (National Research Council, 2001b). Summative assessments arepart of a complex process of collecting evidence of student learning and drawinghigh-stakes conclusions based on that evidence. These conclusions, however, can bebiased in numerous ways (as in the example of using copied notes as evidence oflearning) and flawed at various decision points along the way, as educators designspecific summative assessments and pass judgments based the results the assess-ments yield. To put a finer point on it, the decisions that educators make aboutstudent learning are only as useful as the quality of the information educators use tomake those decisions.

874 C. M. Moss

In the assessment world, the logic that weak assessments yield weak data issummarized in the phrase “garbage in, garbage out” (Moss & Brookhart, 2012; Rose& Fischer, 2011). The process of crafting the assessment, establishing assessmentprocedures, administering the assessment, scoring the assessment, interpreting andusing the assessment results, drawing conclusions based on assessment information,and communicating the results of the assessment can lead to judgments with seriousconsequences for students when any one of those actions is the weak link in thechain. Teacher judgments based on poorly collected or weakly connected informa-tion can directly influence student achievement, study patterns, self-perceptions,attitudes, effort, and motivation to learn (Black & Wiliam, 1998b; Brookhart,1997; Rodriguez, 2004).

Assessment, and specifically testing, has long been used to promote educationalactions that ameliorate inequities in achievement. During each wave of educationalreform over the last five decades, policy makers promoted some form of high-stakestest designed to close the achievement gap (Moss, 2015). It stands to reason, then, thateducators spend a great deal of time debating the cost and impact of state-mandatedtests compared to the information they yield (e.g., Nichols & Berliner, 2005). Evenmore crucial, educators should hold serious debates regarding the impacts of lowquality classroom assessments that disadvantage countless students by diversity cir-cumstances and influences that have nothing to do with the constructs and contentsthose tests and quizzes purport to measure. These circumstances and influencesinclude the student’s race, gender, ethnicity/nationality, organizational role, age, sexualorientation, social economic status, mental/physical ability, and religion/culture/lan-guage (Plummer, 2003). They also include aspects that cloud teacher judgment andrender them unable to distinguish between student achievement and student traits likeperceived ability, motivation, and engagement that relate to achievement (Gittman &Koster, 1999; Sharpley & Edgar, 1986). These poor judgments can be further exac-erbated when teachers assess students with diverse backgrounds and characteristics(Darling-Hammond, 1995; Martínez & Mastergeorge, 2002; Tiedemann, 2002).

Educational leaders often describe themselves as “evidence-based decisionmakers” a term so ubiquitous that its meaning becomes cloudier and more dissipatedwith each use. That’s whyWiliam (2013) framed data-driven decision-making as notparticularly helpful, and instead, called for a “commitment to decision-driven datacollection” (p. 17). To commit to decision-driven data collection, classroom teachersand educational leaders should pay more attention to what is being collected viaclassroom tests, how and why it is being collected, and the usefulness of thecollected information to inform educational decisions (Perkins & Engelhard, 2011;Rose & Fischer, 2011). While standardized tests supply mountains of data to schoolsand educators on a regular basis, they often leave teachers drowning in informationand thirsting for meaning. The focus on standardized test results has pulled educatorsoff course to routinely see their job as working to raise test scores rather thanworking to increase learning for all students in their care.

Classroom summative assessments, when done right, offer teachers richer infor-mation about what students know and can do. They occupy a very important place indecision-driven data collection. Large-scale accountability results are exactly what


the term implies – “large; . . . they don’t contain any information about the details oflearning and instruction that are needed to craft a plan” for improvement (Brookhart,2016, p. 44). Large-scale tests can isolate an issue – reading scores in third grade arepoor, but they cannot explain what happened that led to the poor scores. Without thedetailed information that classroom-level assessments provide, the only conclusionthat educators in this scenario can draw is “work harder in reading” (p. 45). Infor-mation from the classroom level is needed if educators are to design an improvementplan that serves the needs of the students who are struggling with reading in the firstplace.

A Two-Pronged Issue: The Quality of the Data and the Soundnessof the Interpretation

Validity is the term that describes the suitability of assessment data for a particularpurpose. It is the “degree to which an evidence-based argument supports theaccuracy of a test’s interpretation for a proposed use of a test’s results” (Popham,2018, p. 19). All types of assessment data have a place, but when data are usedwithout a clear understanding of what the data are for, educators quickly experiencetrouble. Those difficulties occur because test scores are only measures of studentlearning and not the learning itself. And while it is possible to count, add up, andaverage test scores, it is impossible to tally learning. The closest educators can get tomeasuring learning happens when they design and use a mental measurement thatgauges student learning in a limited domain (Brookhart, 2013, 2016). That meansthat teachers must be able to clearly define “what that domain is, use a test orperformance assessment that taps this domain in known ways, use a score scalewith known properties that maps the student’s performance back to the domain, andinterpret that score scale correctly when making inferences about student learning”(Brookhart, 2016, pp. 2–3). These mental measures can provide educators with goodestimates of what a student knows and can do, if they are scrupulously constructedand soundly interpreted. In other words, educators must address several indicators ofvalidity before they can claim confidence in what a particular summative assessmentscore means. Were the questions on the test about things that the students had theopportunity to learn? Was the test clearly written in language the students couldunderstand and use? Would students need other knowledge beside that of the contentbeing tested in order to answer the test questions (e.g., students must understand howto read a map in order to plot the distance between two cities)? Must students usethinking skills that were not taught along with the content (e.g., the test asks studentsto draw conclusions about events in history that were presented as low-level, factuallectures)? In other words, teachers must be capable of building “a bridge of reason-ing and evidence between the score on any assessment and its meaning” (Brookhart,2013, pp. 14–15).

As teachers make the journey from designing, using, and interpreting summativeassessments, pitfalls abound, and conditions of social justice and equity are easilybreeched. What follows is an examination of some prevalent pitfalls.

876 C. M. Moss

The Gap Between Teacher Assessment Competenceand Confidence

Studies of classroom teachers’ data use reveal that teachers tend to not notice data inthe first place and often discount or ignore data what is contrary to what they believeor data that does not conform to their preexisting expectations (Bickel & Cooley,1985; David, 1981; Hannaway, 1989; Ingram, Louis, & Schroeder, 2004; Kennedy,1982; Young & Kim, 2010). It is important, therefore, to clarify the tensions thatexist between classroom assessment practices and the assessment competencies ofclassroom teachers and recognize the impact of that gap on students.

A meta-analysis of 50 years of classroom summative assessment research (Moss,2013) documents a dangerous gap between teachers’ perceived and actual assessmentcompetence. In fact, the study concluded that teachers are over confident and undercompetent when it comes to summative assessment practices. Historically, the litera-ture on classroom assessment practices has alerted educational leaders to a specificarea of weakness; teachers routinely use varied assessment techniques without ade-quate preservice preparation or inservice professional development that teaches themhow to best design, interpret, and use summative assessments (Goslin, 1967;O’Sullivan & Chalnick, 1991; Roeder, 1972).

Teachers routinely and without hesitation include nonachievement factors likebehavior and attitude, degree of effort, or perceived low-levels of student motivationfor the topic or assignment in their summative assessments. Consider this example.Teachers routinely lower the grades of students who they believe do not put forth thecorrect amount of effort on graded assignments. Yet, the same teachers rarely lower thegrade of a gifted student who is able to complete the same assignment without puttingforth any effort at all. If the amount of student effort is part of the criteria the teacheruses in order to assign good grades on an assignment, then all students should beexpected to put forth the same degree of effort to earn the same grade. But that is notusually the case. More times than not, the misguided practice of grading effortprivileges students who arrive in the classroom with sufficient knowledge of thetopic or enough proficiency with the specific skill to pass the test or complete theassignment without expending any real effort studying or preparing. Grading effortdisadvantages students who struggle with studying or who view studying as a waste oftheir time, since their efforts do not translate into better grades. Simply put, teacherrarely lower the grades of those who do well on a test because they did not have tostudy, but often deduct additional points from the scores of students who did poorly onthe test because the teacher judges them as not trying their best. And finally, teachersoften compound the impact of their biased decisions regarding student achievement,like including effort into the grade, by calculating grades without weighing the variousassessments or components of the assessment by importance (Black & Wiliam, 2010;Griswold, 1993; Hills, 1991; Moss, 2013; Stiggins, Frisbie, & Griswold, 1989).

Many teachers use performance assessments, considered a best practice, bytoday’s teacher evaluation standards without fully understanding how to createthem or assess them. Classroom teachers commonly fail to define success criteriafor the various levels of the performance or plan appropriate scoring schemes and


procedures prior to instruction. Moreover, teachers have a tendency to record theirjudgments after a student’s performance rather than assessing each performance as ittakes place. This practice consistently weakens accurate conclusions about how eachstudent performed (Goldberg & Roswell, 2000). What’s more, researchers found thatteachers often taught test items, provided clues and hints, extended time frames, andeven changed students’ answers (Hall & Kleine, 1992; Nolen, Haladyna, & Haas,1992). Even when summative tests were not compromised, many teachers wereunable to accurately interpret the test results (Hills, 1991; Impara, Divine, Bruce,Liverman, & Gay, 1991) and lacked the skills and knowledge to effectively com-municate the meaning behind the scores (Plake, 1993).

Clearly, a look at teachers’ summative assessment practices over past decades pointsto patterns of assessment practice abuse that result in conditions of injustice and inequityfor students. The teachers and principals included in the studies, and those who wereprepared in the 1980s and 1990s may still be practicing in classrooms. Joining them areteachers who were more recently prepared to use data-driven decision-making and aresupported by principals taught to perform rigorous classroom walkthroughs focused oncoaching instruction and assessment (Moss & Brookhart, 2015). Does that mean then,that assessment conditions in schools have improved in the new millennium?

The simple answer is, no. Investigations of classroom assessment practices stillnote significant discrepancies between teacher perceptions of effective summativeassessment practices and even their self-reports of their actual classroom practices(Black, Harrison, Hodgen, Marshall, & Serret, 2010; McKinney et al., 2009; McMil-lan & Nash, 2000; Rieg, 2007). Secondary teachers demonstrate a general trendtoward objective tests over alternative assessments (McMillan, 2001; McMillan &Lawson, 2001) even though higher usage of essays in mathematics and English wasrelated to higher test scores (McMillan & Lawson, 2001). These discrepancies mightbe explained in part by the influence of high-stakes testing on the choices teachersmake based on their changing views of the essential purposes for summarizing studentachievement (McMillan 2003, 2005). Another influence may lie in the level ofassessment knowledge that teachers possess and the grade levels that they teach.This tendency may be partially attributed to the teachers’ perceived assessmentknowledge – a factor found to exert more influence on a teacher’s assessment practicesthan the teacher’s actual teaching experience (Zhang & Burry-Stock, 2003). Thesefactors support a conclusion that teachers continue to be over confident and undercompetent when it comes to classroom assessment.

The next section investigates current assessment practices with a focus on thearguments teachers make to support their practices and strengthen their argumentsthat they are competent classroom assessors.

The Accuracy of Classroom Assessment Practices and the ReasonsTeachers Give for Their Summative Decisions

Teachers tend to misestimate student achievement at all levels of schooling. Inpreschools, classroom teachers’ estimations of their young children’s numbersense, geometry, and measurement skill showed that teachers were unable to

878 C. M. Moss

accurately rate their students in these areas. When the factors contributing to theteachers’ inability to accurately judge their students were investigated, researchersfound that “approximately 40% of the variation in teachers’ ratings of students’mathematics skills stem[ed] from characteristics inherent to the teacher and notthe skills of the child” (Kilday, Kinzie, Mashburn, & Whittaker, 2011, p. 7). Oneof the conclusions the researchers drew was that early childhood teachers wereunfamiliar with student learning progressions – how students normally go fromlittle to no understanding of a concept to more sophisticated understandings ofthose concepts within a specific domain. In this case, teachers lacked a solid graspof how young children learn mathematics. This finding reveals that knowledge ofdevelopment, content expertise, instruction practices, and assessment practicesare all interrelated. If a teacher has a limited understanding of learning progres-sions, one can assume that the logic leading to how the teacher plans to sequencethe lessons – what should be taught first, second, and so on – in order to pull thecognitive and conceptual development of students will also be lacking.

In another study (Martínez, Stecher, & Borko, 2009), researchers examined thetypes of assessments teachers most commonly used and which factors teachers sawas important to consider when assessing student performance. The researchers alsoinvestigated whether the teachers held the same standards when they assessed andgraded all students in their classrooms or if they applied different standardsdepending on their perceptions of student need or ability (p. 85). The findingsshowed that the teachers used what is known as a normative scale when judgingthe students in their classrooms. A normative scale means that scores on a test arecompared to what is “normal” for that age, grade, or class and is something thatstandardized test makers employ. What is different here is that the teachers wereusing “in the head criteria” based on the teachers’ very limited experience withstudents, as opposed to the large, carefully selected, and studied populations ofstudents against which standardized tests are norm-referenced. As a consequence,the teachers tended to rate their students as high or low in relation to other studentsthey had personally taught in the same grade or class, in the same district, and evencompared current students to students they had taught in the past (p. 90). Considerthe implications this finding has for children in school. A student could receive highgrades if their teacher concluded that the student was one of the smartest childrenever to be in the teacher’s class. This could result in inflated judgments of thestudent’s achievement. The student might well be the smartest eighth grader in theteacher’s school but could still pale in comparison to eighth graders in another schooldistrict.

The same study (Martínez et al., 2009) found that elementary teachers were lessable to accurately appraise the achievement of students with disabilities because ofthe complexities of evaluating students with various challenges. The researchersnoted that the teachers routinely adjusted their ratings upward or the criteria that theyused to assess students traditionally disadvantaged by gender, race, and socioeco-nomic status downward to compensate for their disadvantages. These actions openteachers’ summative assessments to areas of error and bias. Again, connections to apedagogy of poverty are apparent. By adjusting their criteria downward, teachersperpetuate low expectations for their students. And while the teachers’ actions are


well-meaning, their time would be better spent enriching their instructional environ-ments, raising their expectations for their students, and providing students withadditional supports to help them reach deeper levels of understanding.

Thus far, the focus has been on disadvantaged students and disadvantagedschools; but a pedagogy of poverty can exist in affluent districts and classroomswithin those districts. McMillan and Lawson (2001) investigated grading andassessment practices of 213 secondary science teachers from urban, suburban,and rural schools. They found that most teachers, regardless of location, reliedmost heavily on objective test items that emphasized the recall of information.This pattern related specifically to the teachers perceptions of the ability level ofthe students. Students who were perceived by their teachers to have higherabilities, regardless of location, were advantaged by multiple assessments strat-egies, the use of more performance assessments, and a greater emphasis onassessing higher cognitive levels.

This finding plays out in schools across the country. In fact, according to theProgram for International Student Assessment (PISA) (Kastberg, Chan, & Murray,2016) an international assessment that measures 15-year-old students in reading,mathematics and science every 3 years in more than 70 countries and educationalsystems, the variance between schools in all countries is 36%. When the findingsregarding student achievement in the United States were evaluated, the researchersfound that the variation between schools in the United States was 30%, but thevariation within a school was 70%. That means that in the United States, the effect ofthe teacher a child gets in a school has four times more impact on the students qualityof learning than the school or district the child attends. In other words, in eachbuilding there is a four times difference between the best and worst teacher in thebuilding. Consider the impact that has on what students learn no matter the zip code.

The Impact of Assessment Practices on Student Motivationto Learn

Motivation to learn can be seen in the form of student persistence, curiosity, andperformance (Lei, 2010). Students exhibit two kinds of motivation. Extrinsic moti-vation to learn describes the kind of motivation that is present when students decideto engage in learning for some kind of outside reinforcement like recognition, a goodgrade, or stars on a chart, candy, or free time, to cite a few examples. This kind ofmotivation has drawbacks for learners in that students who are motivated throughexternal reinforcers tend to put forth only minimal effort (e.g., What do I have to doto get the grade I want? Is this going to be on the test?), stop the learning processonce the goal is achieved (e.g., stop reading a book a week once they receive acertificate for a free pizza after they read so many books), are less cooperative (e.g.,Why should I work on a team when it won’t contribute to my grade?), and havelower perceptions of self-efficacy – the belief that they can accomplish the task athand (e.g., Students who memorize material for a test without truly understanding it

880 C. M. Moss

begin to believe in luck and fate rather than in their own ability to master difficultconcepts) (Bandura, 1997; Lei, 2010).

Intrinsic motivation, in contrast, propels students to participate in an activity forpleasure, or satisfaction because they derive pleasure from their actions. For exam-ple, teachers rarely provide candy or stickers to encourage students to play videogames during free time. Students want to play the video games – they are intrinsi-cally motivated to do so. Intrinsic motivation, the motivation that resides within thestudent, is highly prized. As intrinsic motivation increases for students, they learnabout what it takes to complete a specific task, like writing a quality essay orsolving a quadratic equation, and become more confident in their ability to do itagain, which increases their sense of self-efficacy for the specific task (writing theessay). Self-efficacy then is a personal belief in one’s own capability to executespecific strategies and actions, under one’s own control, to reach a designated goal(Bandura, 1997). In fact, intrinsic motivation is highly correlated with self-efficacy(Zimmerman & Cleary, 2006) and leads to increased student effort (Schunk, Hanson,& Cox, 1987).

What teachers assess and the ways they assess it promote classroom climates thatinfluence students’ motivation and their interactions with their teachers. Classroomassessment environments also shape student perceptions regarding the purposes ofsummative assessments. Because they influence student motivation to learn, sum-mative assessments also influence students’ goal-setting, effort, and beliefs abouttheir competence and self-efficacy. Brookhart (1997) proposed a theoretical model toexplain the classroom assessment environment as a dynamic context, continuouslyexperienced by students, as their teachers communicate assessment purposes, assignassessment tasks, create success criteria, provide feedback, and monitor studentoutcomes. These interwoven assessment events communicate what is valued, estab-lish the culture of the classroom, and have a significant influence on students’motivation and achievement goals (Ames, 1992; Brookhart, 1997; Harlen & Crick,2003). For example, if a teacher creates a summative assessment that relies on thememorization of low level, and even irrelevant facts, students come to believe thatlearning means memorizing and if they are not good at memorizing disparate facts,they declare themselves no good at the particular subject (English Literature) orcontent (The Great Gatsby). Their self-efficacy for interpreting literature is damp-ened by their experience with the test believing that no matter how hard they study,they can never pass the test, even when the extrinsic motivator of a good test score ison the line. In a very real way, hopelessness always trumps extrinsic motivation.

Similarly, Alkharusi (2008) concluded that summative assessment practices havea profound effect on student motivation to learn. In particular, when classroomsemphasize “the importance of grades rather than learning and focus on public ratherthan private evaluation and recognition of student achievement” (p. 262) they tend toreduce and even derail student motivation to learn. Moreover, students come tobelieve that learning means memorizing facts, getting a certain grade, or acquiring aset number of points, rather than mastering important concepts and skills over timethrough practice and study. These assessment conditions contribute to lower self-efficacy in students. And students who have low self-efficacy for accomplishing a


task often avoid the task all together – in this case, they may decide not to study forthe test at all (Bandura, 1997; Banfield & Wilkerson, 2014).

Conclusion

Classroom assessment practices have occurred out of the public eye and have goneunchecked for too long. Because few people are in the “black box” of the classroom,the summative assessment, and grading sausage that are made daily do little to helpstudents learn, improve teaching, or paint an accurate picture of what students knowand can do. Often teachers’ summative assessment practices do not honor eachstudent’s potential or help to enrich the learning opportunities that produce self-confident students who are motivated to learn.

Although teachers are interpreting more test results and testing more frequently,many teachers are underprepared and insufficiently skilled. This leads to summativejudgments that are often inaccurate and unreliable. Yet teachers commonly reportpositive beliefs about and high levels of confidence in their assessments skills andcompetence despite evidence to the contrary (Black et al., 2010; Rieg, 2007). Manyteachers misinterpret student achievement or misestimate students’ abilities (Kildayet al., 2011). Frequently teachers arrive at their judgments of student achievementthrough idiosyncratic methods and interpret assessment results using flexiblecriteria. These tendencies allow teachers to pull for students who deserve bettergrades or adjust scores down for students with poor attitudes or behavior (Wyatt-Smith, Klenowski, & Gunn, 2010). Traditional and routine practices are commonacross the board with low-level recall and objective tests figuring prominently in theassessment arsenals of teachers regardless of grade level or subject area. Low-leveltesting can be found in many classrooms where it impacts both the quality of thelearning that happens there and the motivation of the students who must engage inthose assessments (McKinney et al., 2009) that are more often than not, the productof a pedagogy of poverty. Sadly, the impact of poor assessment practice cuts evendeeper in classrooms with poorer or less able students. Yet even when teachersrecognize effective assessment practices, they often see the realities of their class-room environments and other external factors imposed on them as prohibitive(McMillan & Nash, 2000) and continue to resort to the lowest common denominatorwhen it comes to assessing student learning.

When teachers collaborate with each other and are coached by those withexpertise in summative assessment practices, they are more likely to recognize therealities of their assessment competencies and begin to address their assessmentneeds. They can mediate for each other a more systematic and intentional inquiryprocess into the quality of their assessments and become mindful how the quality ofthose assessments influence student learning and achievement (Black et al., 2010;Moss et al., 2013).

Educational leaders have ignored this problem for too long. Poor assessmentpractices are an important social justice issue that is under the direct influence ofschools. Unlike poverty, drugs, discrimination, and a host of other maladies

882 C. M. Moss

impacting children outside of the classroom, assessment practices can be monitored,coached, and changed. As principals and other educational leaders become betterable to accurately recognize, describe, and lead initiatives to improve summativeassessment practices of teachers in their buildings and districts, students at all levelsand in all zip codes will profit.

This chapter is just the beginning of the learning and inquiry that needs to occur.Just like watching sausage being made, the information that tells the whole story arethe hidden details that occur within each individual classroom and the decisionsmade by each individual teacher. Factors discussed here reveal some of the ways thatassessment sausage is made to unfairly position students for their future. We nolonger assume that all teachers are testing fairly, making sound judgments regardingstudent achievement, or using accurate assessment information to improve andcreate conditions of optimal learning for all students. It will take educators at alllevels to intentionally increase their awareness and deepen their assessment literacyto move beyond data-driven decision making based on incomplete, irrelevant, orinaccurate data regarding what students know and can do.

References

Alkharusi, H. (2008). Effects of classroom assessment practices on students’ achievement goals.Educational Assessment, 13(4), 243–266.

American Educational Research Association, American Psychological Association, & the NationalCouncil on Measurement in Education. (2014). Standards for educational & psychologicaltesting. Washington, DC: Author.

Ames, C. (1992). Classrooms: Goals, structures, and student motivation. Journal of EducationalPsychology, 84, 261–271.

Bandura, A. (1997). Self-efficacy: The exercise of control. New York, NY: Freeman.Banfield, J., & Wilkerson, B. (2014). Increasing students intrinsic motivation and self-efficacy

through gamification pedagogy. Contemporary Issues In Education Research, 7(4), 291–298.Bickel, W. E., & Cooley, W. W. (1985). Decision-oriented educational research in school districts:

The role of dissemination processes. Studies in Educational Evaluation, 11(2), 183–203.Black, P., Harrison, C., Hodgen, J., Marshall, B., & Serret, N. (2010). Validity in teachers’ summative

assessments. Assessment in Education: Principles, Policy & Practice, 17(2), 215–232.Black, P., & Wiliam, D. (1998a). Inside the black box: Raising standards through classroom

assessment. Phi Delta Kappan, 80(2), 139–148.Black, P., & Wiliam, D. (1998b). Assessment and classroom learning. Assessment in Education,

5, 7–74.Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom

assessment. Phi Delta Kappan, 92(1), 81–90.Brookhart, S. M. (1997). A theoretical framework for the role of classroom assessment in motivat-

ing student effort and achievement. Applied Measurement in Education, 10(2), 161–180.Brookhart, S. M. (2013). Comprehensive assessment systems in service of learning: Getting the balance

right. In R. W. Lissitz (Ed.), Informing the practice of teaching using formative and interimassessment: A systems approach (pp. 165–184). Charlotte, NC: Information Age Publishing.

Brookhart, S. M. (2016). How to make decisions with different kinds of student assessment data.Alexandria, VA: ASCD.

Brookhart, S. M., & Durkin, D. T. (2003). Classroom assessment, student motivation and achieve-ment in high school social studies classes. Applied Measurement in Education, 16(1), 27–54.

Brookhart, S. M., & Moss, C. M. (2013). Leading by learning. Phi Delta Kappan, 94(8), 12–17.


Brookhart, S. M., & Nitko, A. J. (2008). Assessment and grading in classrooms. Upper SaddleRiver, NJ: Pearson.

Brookhart, S. M., & Nitko, A. J. (2019). Educational assessment of students (8th ed.). Upper SaddleRiver, NJ: Pearson.

Darling-Hammond, L. (1995). Equity issues in performance-based assessment. In M. T. Nettles &A. L. Nettles (Eds.), Equity and excellence in educational testing and assessment (pp. 89–114).Boston, MA: Kluwer.

David, J. L. (1981). Local uses of Title I evaluations. Educational Evaluation and Policy Analysis,3(1), 27–39.

Dixon, D. D., & Worrell, F. C. (2016). Formative and summative assessment in the classroom.Theory Into Practice, 55, 153–159.

Frey, B. B., & Schmitt, V. L. (2010). Teachers’ classroom assessment practices. Middle GradesResearch Journal, 5(3), 107–117.

Gardner, J. (2010). Developing teacher assessments: An introduction. In J. Gardner, W. Harlen,L. Hayward, G. Stobart, & M. Montgomery (Eds.), Developing teacher assessment (pp. 1–11).New York, NY: Open University Press.

Geisinger, K. F. (2001). Development of a statement of Test Taker Rights and Responsibilities. InG. R. Walz & J. C. Bleuer (Eds.), Assessment: Issues and Challenges for the Millennium.Greensboro, NC: CAPS Publications/ERIC Clearinghouse for Counseling & Student Services.Pp. 143–162.

Gewertz, C. (2018). What test does each state require? An interactive breakdown of states’2016–2017 testing plans. Education Week. Retrieved from https://www.edweek.org/ew/section/multimedia/what-tests-does-each-state-require.html

Gibble, J. (2000). Report from the office of the supervisor of curriculum and instruction. Mt Joy, PA:Donegal School District.

Gittman, E., & Koster, E. (1999, October). Analysis of ability and achievement scores for studentsrecommended by classroom teachers to a gifted and talented program. Paper presented at theannual meeting of the Northeastern Educational Research Association, Ellenville, NY.

Goldberg, G. L., & Roswell, B. S. (2000). From perception to practice: The impact of teachers’scoring experience on performance-based instruction and classroom assessment. EducationalAssessment, 6, 257–290.

Gorski, P. C. (2016). Poverty and the ideological imperative: A call to unhook from deficit and gritideology and to strive for structural ideology in teacher education. Journal of Education forTeaching, 42(4), 378–386.

Goslin, D. A. (1967). Teachers and testing. New York, NY: Russell Sage.Griswold, P. A. (1993). Beliefs and inferences about grading elicited from student performance

sketches. Educational Assessment, 1(4), 311–328.Guskey, T. R. (2007). Formative classroom assessment and Benjamin S. Bloom: Theory, research,

and practice. In J. H. McMillan (Ed.), Formative classroom assessment: Theory into practice(pp. 63–78). New York, NY: Teachers College Press.

Haberman, M. (1991; 2010). The pedagogy of poverty versus good teaching. Phi Delta Kappan,73(4), 290–294.

Haberman, M. (2005a). Teacher burnout in black and white. The New Educator, 1, 153–175.Haberman, M. (2005b). Star teachers: The ideology and best practice of effective teachers of

diverse children and youth in poverty. Huston, TX: The Haberman Educational Foundation.Hall, J. L., & Kleine, P. F. (1992). Educators’ perceptions of NRT misuse. Educational Measure-

ment: Issues and Practice, 11(2), 18–22.Hannaway, J. (1989). Managers managing: The workings of an administrative system. New York,

NY: Oxford University Press.Harlen, W., & Crick, R. D. (2003). Testing and motivation for learning. Assessment in Education:

Principles, Policy & Practice, 10, 169–207.Harlen, W., & Gardner, J. (2010). Assessment to support learning. In J. Gardner, W. Harlen,

L. Hayward, G. Stobart, & M. Montgomery (Eds.), Developing teacher assessment(pp. 15–28). New York, NY: Open University Press.

884 C. M. Moss

https://www.edweek.org/ew/section/multimedia/what-tests-does-each-state-require.html

https://www.edweek.org/ew/section/multimedia/what-tests-does-each-state-require.html

Hiebert, J. (2003). What research says about the NCTM standards. In J. Kilpatrick, W. G. Martin, &D. Schifter (Eds.), A research companion to principles and standards for school mathematics(pp. 5–23). Reston, VA: National Council of Teachers of Mathematics.

Hills, J. R. (1991). Apathy concerning grading and testing. Phi Delta Kappa, 72(7), 540–545.Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive

test score information help teachers? Educational Measurement: Issues and Practice, 10(4),319–320.

Ingram, D., Louis, K. S., & Schroeder, R. G. (2004). Accountability policies and teacher decisionmaking: Barriers to the use of data to improve practice. Teachers College Record, 106,1258–1287.

Jakwerth, P. R., Stancavage, F. B., & Reed, E. D. (1999). An investigation of why students do notrespond to questions (NAEP validity studies). Palo Alto, CA: American Institute for Research.

James, K., Bunch, J., & Clay-Warner, J. (2015). Perceived injustice and school violence: Anapplication of general strain theory. Youth Violence and Juvenile Justice, 13(2), 169–189.

Kastberg, D., Chan, J. Y., & Murray, G. (2016). Performance of U.S. 15-year old students in science,reading, and mathematics literacy in an international context: First look at PISA 2015 (NCES2017–048). U.S. Department of Education. Washington, DC: National Center for EducationStatistics. Retrieved from https://nces.ed.gov/pubs2017/2017048.pdf

Kennedy, J.M. (1982). Metaphor in pictures. Perception, 11, pp. 589–605.Kilday, C. R., Kinzie, M. B., Mashburn, A. J., & Whittaker, J. V. (2011). Accuracy of teacher

judgments of preschoolers’ math skills. Journal of Psychoeducational Assessment, 29(4), 1–12.Kozel, J. (2005). The shame of the nation: The restoration of apartheid schooling in America.

New York, NY: Random House.Lei, S. A. (2010). Intrinsic and extrinsic motivation: Evaluating benefits and drawbacks from

college instructors’ perspectives. Journal of Instructional Psychology, 37(2), 153–160.Losen, D. J., & Whitaker, A. (2018). 11 million days lost: Race, discipline and safety at U.S. Public

Schools. Retrieved from https://www.aclu.org/sites/default/files/field_document/final_11-million-days_ucla_aclu.pdf

Mahalingappa, L., Rodriguez, T. L., & Polat, N. (2017). Supporting Muslim students: A guide tounderstanding the diverse issues of today’s classrooms. Lanham, MD: Rowman & Littlefield.

Martínez, J. F., & Mastergeorge, A. (2002, April). Rating performance assessments of students withdisabilities: A generalizability study of teacher bias. Paper presented at the annual meeting ofthe American Educational Research Association, New Orleans, LA.

Martínez, J. F., Stecher, B., & Borko, H. (2009). Classroom assessment practices, teacher judg-ments, and student achievement in mathematics: Evidence in the ECLS. Educational Assess-ment, 14, 78–102.

McKinney, S. E., Chappell, S., Berry, R. Q., & Hickman, B. T. (2009). An examination of theinstructional practices of mathematics teachers in urban schools. Preventing School Failure:Alternative Education for Children and Youth, 53(4), 278–284.

McMillan, J. H. (2001). Secondary teachers’ classroom assessment and grading practices. Educa-tional Measurement: Issues and Practice, 20(1), 20–32.

McMillan, J. H. (2003). The relationship between instructional and classroom assessment practicesof elementary teachers and students scores on high-stakes tests (Report). ERIC DocumentReproduction Service No. ED472164.

McMillan, J. H. (2005). The impact of high-stakes test results on teachers’ instructional andclassroom practices (Report). ERIC Document Reproduction Service No. ED490648.

McMillan, J. H., & Lawson, S. (2001). Secondary science teachers’ classroom assessment andgrading practices (Report). ERIC Document Reproduction Service No. ED450158.

McMillan, J. H., & Nash, S. (2000). Teacher classroom assessment and grading practices decisionmaking (Report). ERIC Document Reproduction Service No. ED447195.

McTighe, J. (2018). Three key questions on measuring learning. Educational Leadership, 75(5),14–20.

Means, B., & Knapp, M. S. (1991). Cognitive approaches to teaching advanced skills to educa-tionally disadvantaged students. Phi Delta Kappan, 73, 282–289.


https://nces.ed.gov/pubs2017/2017048.pdf

https://www.aclu.org/sites/default/files/field_document/final_11-million-days_ucla_aclu.pdf

https://www.aclu.org/sites/default/files/field_document/final_11-million-days_ucla_aclu.pdf

Milner, H. R., IV. (2018). Assessment for equity: Assessment should help us learn about students—Not sort them. Educational Leadership, 75(5), 88–89.

Mislevy, R. J. (1994). Evidence and inference in educational assessment. Psychometrika, 59(4),439–483.

Mislevy, R. J. (1996). Test theory reconceived. Journal of Educational Measurement, 33(4),379–416.

Moss, C. M. (2002). In the eye of the beholder: The role of educational psychology in teacherinquiry. Paper presented at the annual meeting of the American Educational Research Associ-ation, New Orleans, LA.

Moss, C. M. (2013). Research on classroom summative assessment. In J. H. McMillan (Ed.),Handbook of research on classroom assessment (pp. 235–255). Los Angeles, CA: Sage.

Moss, C. M. (2015). Achievement gaps: Causes, false promises, and bogus reforms. In F. English(Ed.), The SAGE guide to educational leadership and management (pp. 169–184). Los Angeles,CA: Sage.

Moss, C. M., & Brookhart, S. M. (2012). Learning targets: Helping students aim for understandingin today’s lesson. Alexandria, VA: ASCD.

Moss, C.M., & Brookhart, S.M. (2015). Formative classroom walkthroughs: How principals andteachers collaborate to raise student achievement. Alexandria, VA: ASCD.

Moss, C. M., & Brookhart, S. M. (2019). Advancing formative assessment in every classroom:A guide for instructional leaders (2nd ed.). Alexandria, VA: ASCD.

Moss, C. M., Brookhart, S. M., & Long, B. A. (2013). Administrators’ roles in helping teachers useformative assessment information. Applied Measurement in Education, 26(3), 205–218.

National Council of Teachers of Mathematics. (2000). Principles and standards for school math-ematics. Reston, VA: Author.

National Research Council. (2001a). Classroom assessment and the National Science EducationStandards. Washington, DC: National Academies Press. Retrieved from https://www.nap.edu/catalog/9847/classroom-assessment-and-the-national-science-education-standards

National Research Council. (2001b). Knowing what students know: The science and design ofeducational assessment. Washington, DC: The National Academies Press. Retrieved from https://www.nap.edu/catalog/10019/knowing-what-students-know-the-science-and-design-of-educational

Nichols, S. L., & Berliner, D. C. (2005). The inevitable corruption of indicators and educatorsthrough high-stakes testing. East Lansing, MI: The Great Lakes Center for EducationalResearch & Practice. Retrieved from http://greatlakescenter.org/docs/early_research/g_l_new_doc/EPSL-0503-101-EPRU.pdf

Nolen, S. B., Haladyna, T. M., & Haas, N. S. (1992). Uses and abuses of achievement test scores.Educational Measurement: Issues and Practice, 11(2), 9–15.

O’Sullivan, R. G., & Chalnick, M. K. (1991). Measurement-related course work requirements forteacher certification and recertification. Educational Measurement: Issues and Practice, 10(1),17–19.

Perkins, A., & Engelhard, G., Jr. (2011). Talking back to data: Comments on a framework for datause. Measurement: Interdisciplinary Research and Perspectives, 9(4), 211–216.

Plake, B. S. (1993). Teacher assessment literacy: Teachers’ competencies in the educationalassessment of students. Mid-Western Educational Researcher, 6(1), 21–27.

Plummer, D. L. (Ed.). (2003). Handbook of diversity management. Lanham, MD: University Pressof America.

Polat, N. (2016). L2 learning, teaching, and assessment: A comprehensible input perspective.Tonawanda, NY: Multilingual Matters.

Popham, W. J. (2018). Assessment literacy for educators in a hurry. Alexandria, VA: ASCD.Rieg, S. A. (2007). Classroom assessment strategies: What do students at-risk and teachers perceive

as effective and useful? Journal of Instructional Psychology, 34(4), 214–225.Rodriguez, M. C. (2004). The role of classroom assessment in student performance on TIMSS.

Applied Measurement in Education, 17(1), 1–24.

886 C. M. Moss

https://www.nap.edu/catalog/9847/classroom-assessment-and-the-national-science-education-standards

https://www.nap.edu/catalog/9847/classroom-assessment-and-the-national-science-education-standards

https://www.nap.edu/catalog/10019/knowing-what-students-know-the-science-and-design-of-educational

https://www.nap.edu/catalog/10019/knowing-what-students-know-the-science-and-design-of-educational

http://greatlakescenter.org/docs/early_research/g_l_new_doc/EPSL-0503-101-EPRU.pdf

http://greatlakescenter.org/docs/early_research/g_l_new_doc/EPSL-0503-101-EPRU.pdf

Roeder, H. H. (1972). Are today’s teachers prepared to use tests? Peabody Journal of Education, 59,239–240.

Rose, L. T., & Fischer, K. W. (2011). Garbage in, garbage out: Having useful data is everything.Measurement, 9, 222–226.

Rosenbaum, J. (2018). Educational and criminal justice outcomes 12 years after school suspension.Youth & Society. Retrieved from https://doi.org/10.1177/0044118x17752208

Schunk, D. H., Hanson, A. R., & Cox, P. D. (1987). Peer-model attributes and children’s achieve-ment behaviors. Journal of Educational Psychology, 79(1), 54–61.

Sharpley, C. F., & Edgar, E. (1986). Teachers’ ratings vs. standardized tests: An empirical inves-tigation of agreement between two indices of achievement. Psychology in the Schools, 23,106–111.

Solórzano, D. G., & Ornelas, A. (2002). A critical race analysis of advanced placement classes: Acase of educational inequity. Journal of Latinos and Education, 1(4), 215–229.

Stiggins, R. J. (1994). Student-centered classroom assessment. New York, NY: Merrill.Stiggins, R. J. (2007). Five assessment myths and their consequences. Education Week. Retrieved

from http://www.ewcupdate.com/userfiles/assessmentnetwork_net/file/Five%20Assessment%20Myths%20Stiggins.pdf

Stiggins, R. J., Frisbie, R. J., & Griswold, P. A. (1989). Inside high school grading practices:Building a research agenda. Educational Measurement: Issues and Practice, 8(2), 5–14.

Test-Takers’ Rights Working Group. (1999). Statement of test takers’ rights. Washington, D.C.:American Psychological Association, Joint Committee on Testing Practices.

Thurlow, M. L., & Kopriva, R. J. (2015). Teacher assessment and the assessment of students withdiverse learning needs. Review of Research in Education, 39, 331–369.

Tiedemann, J. (2002). Teachers’ gender stereotypes as determinants of teacher perceptions inelementary school mathematics. Educational Studies in Mathematics, 50(1), 49–62.

Van De Walle, J. (2006). Raising achievement in secondary mathematics. Buckingham, UK: OpenUniversity Press.

Wiggins, G. P. (1998). Educative assessment: Designing assessments to inform and improve studentperformance. San Francisco, CA: Jossey-Bass.

Wiliam, D. (2013). Assessment: The bridge between teaching and learning. Voices from the Middle,21(2), 15–20.

Wiliam, D., & Thompson, M. (2007). Integrating assessment with instruction: What will it take tomake it work? In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning(pp. 53–82). Mahwah, NJ: Erlbaum.

Wyatt-Smith, C., Klenowski, V., & Gunn, S. J. (2010). The centrality of teachers’ judgementpractice in assessment: A study of standards in moderation. Assessment in Education, 17(1),59–75.

Wylie, E. C., Lyon, C. J., & Goe, L. (2009, March). Teacher professional development focused onformative assessment: Changing teachers, changing schools. ETS Research Report Series.https://doi.org/10.1002/j.2333-8504.2009.tb02167.x

Young, V. M., & Kim, D. H. (2010). Using assessments for instructional improvement: A literaturereview. Education Policy Analysis Archives, 18(19), 1–3.

Zhang, Z., & Burry-Stock, J. A. (2003). Classroom practices and teachers’ self-perceived assess-ment skills. Applied Measurement in Education, 16(4), 323–342.

Zimmerman, B. J., & Cleary, T. J. (2006). Adolescents’ development of personal agency: The roleof self-efficacy beliefs and self-regulatory skill. In F. Pajres & T. Urdan (Eds.), Self-efficacybeliefs of adolescence (pp. 45–69). Mahwah, NJ: Information Age Publishing.


https://doi.org/10.1177/0044118x17752208

http://www.ewcupdate.com/userfiles/assessmentnetwork_net/file/Five%20Assessment%20Myths%20Stiggins.pdf

http://www.ewcupdate.com/userfiles/assessmentnetwork_net/file/Five%20Assessment%20Myths%20Stiggins.pdf

https://doi.org/10.1002/j.2333-8504.2009.tb02167.x

Role of Educational Leadership in Confronting Classroom Assessment … · 2020-02-28 · assessment...

Documents

Transcript of Role of Educational Leadership in Confronting Classroom Assessment … · 2020-02-28 · assessment...