DOCUMENT RESUME ED 390 919 TM 024 326 AUTHOR Gearhart ... · then supporting more abridged...

DOCUMENT RESUME

ED 390 919 TM 024 326

AUTHOR Gearhart, Maryl; Herman, Joan L.TITLE Portfolio Assessment: Whose Work Is It? Issues in the

Use of Classroom Assignments for Accountability.Evaluation Comment.

INSTITUTION California Univ., Los Angeles. Center for the Studyof Evaluation.; National Center for Research onEvaluation, Standards, and Student Testing, LosAngeles, CA.

SPONS AGENCY Office of Educational Research and Improvement (ED),Washington, DC.

PUB DATE 95

CONTRACT R117G10027NOTE 26p.

PUB TYPE Reports Evaluative/Feasibility (142)

EDRS PRICE MFOI/PCO2 Plus Postage.DESCRIPTORS *Accountability; Cooperation; Educational Assessment;

Educational Change; Inferences; *Performance BasedAssessment; *Portfolio Assessment; Portfolios(Background Materials); State Programs; *StudentEvaluation; Teacher Student Relationship; TeachingMethods; *Testing Programs; Test Use; *Validity

IDENTIFIERS *Large Scale Programs; Vermont

ABSTRACTThis article examines one of the challenges to the

integration of classroom and large-scale portfolio assessment, achallenge posed by the use of a student's classronm portfolio forlarge-scale assessment of his or her individual competencies. Whenwork is composed with the support of peers, teachers, and parents,whose work is being judged? The validity of inferences drawn from theassessment can be compromised unless the question can be answered.Experience with portfolio assessment in Vermont and in another studyconducted by the Center for Research on Evaluation, Standards, andStudent Testing (California) suggest that the quality of student workreflects not only a student's competence, but also the amount andquality of support received from others. Procedures that highlight astudent's contribution to the work must be developed, complicatedthough this will be. Nevertheless, large-scale portfolio assessmentprograms appear to carry significant benefits for instructionalreform. (Contains 5 tables and 73 references.) (SLD)

***********************************************************************

Reproductions supplied by EDRS are the best that can be made *

from the original ( lcument.***********************************************************************

U.S. DEPARTMENT OF EDUCATIONOf Ice of Educabonai Resesoch and ImplovemInt

EDUC TIONAL RESOURCES INFORMATIONCENTER ;ERIC)

11401$ 00CurhOhl has tele (*produced asreclhvolo0 1101h III 00rSOn Or OrgilhallohOroofignafgc It

0 M.001 changes hens been mode 10 ompf0vreor0OuCh0n Clualoty

Oonts of wow of opm.onsstatAdn thisdoctr

moat do h01 hCSsisnly mpleeent othcalOERI pos,40" of 001.cv

SSION TO REPRODUCE THISMATERIAL HA..1 BEEN GRANTED BY

Ag6-Et--

TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC).-

PORTFOLIO ASSESSMENT: WHOSE WORK IS IT?ISSUES IN THE USE OF CLASSROOM ASSIGNMENTS FOR ACCOUNTABILITY

M aryl Gearhart & Joan L. Herman

Center for the Study of EvaluationNational Center for Research on Evaluation, Standards, & Student Testing

University of California, Los Angeles

t

EVALUATION COMMENT

Winter 1995

Portfolio Assessment: Whose Work Is It?Issues in the Use of Classroom Assignments for Accountability

Marvl Gearhart and Joan L. Hermani

Center for the Study of EvaluationNational Center tbr Research on Evaluation, Standardsmd Student Testing

University of California, Los Angeles

T0 many engaged in educational reform,portfolio assessment captures a vision ofassessment integrated with instruction.

Concerned about the equity and validity of large-scale assessment, portfolio advocates argue thatstudents' classroom work and their reflections onthat work provide a richer and truer picture ofstudents' competencies than do traditional or otheron-demand assessments. Concerned about theimpact of testing on teaching, advocates point outthatis displays of the products of instruction,porttblios challenge teachers and student to focuson meaningful outcomes. Furthermore, porttblioassessment practices support the assessment oflong-term pnijects over time, encourage student -initiatedrevision, and provide a context tbr presentation,guidancemd critique. Given such an ambitiousagenda thr assessment, instruction and account ability, it i., no surprise that what is meant bY "porttblio-or -porttiilio JSSCSSMC111" varies markedly in practice and purpose .2

'cN1

New & ReeentCRESST TechnicalReports & Products

See pages .17-23 l'r listings oinew- orreeently relestsed-rclrts and productsli-Qat a yarie-ty reseal-cite-us.

Shared by most large-scale assessmenthowever, is a commitment to bridge the worlds ofpublic accountabihtv and classroom p..actice. Thegoal is to give students, teachers, and pc hey makersauthentic roles in the assessment of stu-ients at alllevels of an accountability system ..jid to providedata that are appropriate and useful at each les cl.The portfolio spans one leYel of decision making tothe next, providing detailed evidence at the dass-

projects,

3

2


room level of the process and outcomes of studentperformance to guide instruction and learning, andthen supporting more abridged inferences at thelarge-scale level about the quality of performanceand schooling. Integrated with instruction andtargeted on high standards for student perfor-mance, the portfolio is the bridge that supportsreform of classroom practices on the one side andaccountability on the other. The vision is enticing,but will it work? Can classroom work be utilized forlarge-scale, high-stakes assessment?

In this article, we examine one of the challengesto the integration of classroom and large-scaleportfolio assessment, a challenge posed by the

use of a student's classroom portfolio for large-scaleassessment of his or her individual competencies.3When raters working outside the classroom con-text4 ao asked to make judgments about an individualstudent based on a portfolio of work composed withtilt support of peers, teachers, and parents, whosework is being judged? We argue that certain answersto this luestion could threaten the validity of infer-ences that can be drawn about individual performancefrom portfolios constructed in the social complexi-ties of classroom life. Thus the investigation ofpossible answers to the "whose work" questionbecomes an essential component to thc study of thevalidity of portfolio assessment.

...patterns of relatioaships amongon-demand assessments and portfo-lio assessments raise questions aboutthe validity of test scores.

Concerns regarding the validity of individualstudent scores are already emerging in the fledglingtechnical literature on portfolio assessment. Con-sider, for example, findings from two CREW'.

111111111,

eforts to provide evidence of validity. In both ofthese studies, patterns of relationships among on-demand assessments and portfolio assessments raisequestions about the validity of test scores.

Koretz and his RAND colleagues have beenevaluating Vermont's statewide portfolio assess-ment program since 1990.5 The Vermont pro-gram targets writing and mathematics at Grades4 and 8 and includes three components for eachsubject area: year-long student portfolios, "bestpieces" drawn from the portfolios, and statesponsored "uniform" tests which arc standard-ized but not necessarily multiple-choice. Pat-terns of relationships between the results ofportfolio assessment and uniform tests in bothsubjects were problematic (Koretz, Klein,McCaffrey, & Stecher, 1993). While recogniz-ing that portfolios and standard assessment maywell emphasize different aspects of a subjectdomain, the researchers expected correlationsbetween the two types of assessments within asubject to be stronger than those across subjectareas. Instead, they found essentially the samelevel of correlation within and across subjectareas: For example, in writing, writing portfolioscores correlated moderately with the standardmeasure of writing and with the portfolio andstandard measures of mathematics.

Gearhart and Herman have conducted two tech-nical studies of the ratability of classroom writ-ing portfolios. In an initial study, the researchersfound no relationship between scores for writingportfolios and for standard writing assessments:Two-thirds of the students classified as compe-tent based on the porttblio score were not soclassified on the basis of the standard assessment.Similarly, there was only a weak relationshipbetween contrasting procedures fir portfolioscoring: Half the students classified as compe-


tent on the basis of the single porttblio scorewere not so classified when scores for individualpieces were averaged, though correlations be-tween the two kinds of portfolio scor s weremoderately high (in the .6 range) Riearhart,Herman, Baker, & Whittaker, 1992, 1993;Herman, Gearhart, & Baker, 1993). In a sub-sequent comparative studs' of two writing ru-brics, the researchers found a positive relation-ship between porttblio scores and standard writ-ing assessments for only one of the rubrics(Gearhart, Novak, & Herman, in press).

Granted, these researchers were hampered intheir quest for validation by the paucity of techni-cally sound, performance-based criterion measuresto which portfolio scores could be compared. Nev-ertheless, within the constraints set by the currentstate of the art in performance assessment, findingslike those we have illustrated do raise questionsabout the validity of porttblio scores as measures ofindividual performance. What factors may havecontributed to these weak relationships betweenportfolio scores and on-demand assessments? Nodoubt there are many, and each will require furtherinvestigation.

As summed up by a Vermont teacherafter rating portfolios for several days:"Whose work is this anyway?"

Consider just two that focus on measurementdesign: The portfolio and on-demand assessmentsmay have tapped different domains of performancewithin a subject area; the on-demand and port tbliotasks may have differed in difficulty." The factorthat we consider in this paper arises In un theclassroom context of porttblio assessment. As

summed up by a Vermont teacher after ratingporttblios fin- several days: "Whose work is thisanyway?"

Wc.

begin our discussion by examining theways in which the nature of classroomwork may undermine the validity of

"individual" portfolio scores. We illustrate withCRESST data from both an evaluation of a state-wide assessment program and a laboratory study ofthe scorability of elemcntary writing portfolios. Weconclude with a discussion of the implications of the"whose work" issue for portfolio assessment policyand practice.

Whose Work Is It? An Issue for the Validity ofLarge-Scale Portfolio Assessment

The "whose work is it?" question arises becauseindividual student portfolios are constructed in asocial context. Portfolios contain the products ofclassroom instruction, and good classroom instruc-tion according to current pedagogical and curriculumreforms involves an engaged community of practi-tioners in a supportive learning process (Camp,1993; Duschl & Gitomer, 1991; Wolf, D.P., 1989;Wolf, Bixby, Glenn, & Gardner, 1991; Wolf &Gearhart, 1993a, 1993b ). Exemplary instructionalpractice, in short, supports student performance.Central to the National Writing Project, for ex-ample, is a core instructional model which featuresmultiple stagesprewriting, precomposing, writ-ing, sharing, revising, editing and evaluation. Eachof these stages stands for instructional activities thatengage a student with resources and with othersrelated readings, classroom discussions, field trips,idea w ebs, small group collaboration, outlining,peer review, review and feedback. The sociallycontexted character of student writing is seen bothas a scafibld for students' writing process and areplication of what "real" writing entails, in thatwriting is often a very social endeavor. Consider aswell w hat is regarded as exemplary porttblio assessment practice. A "pordblio culture" is viewed as"replacing ... th,:entire clivelope ofassessment ...ss it hextended, iterative processes, agreeing that we ,ire

5

_...Zill11111111111

4


interested in what students produce when they aregiven access to models, criticism, and the option torevise" (Wolf, D. P., 1993, p. 221). Assessmentopportunities are available at multiple classroommomentsin the course of the work that mav beadded to a portfolio, in the construction of theportfolio, and in a presentation of the portfolio,making collaboration, assessment, and revision con-tinual processes within the classroom.

These visions of an engaged community of learn-ers and reviewers have implications for the validityof classroom portfolios for large-scale assessmentpurposes: The more developed the community, themore engaged others will be in the work taggedwith an individual student's name. While :.he locusofauthorship ma!, shift outward from the individualstudent to tilt- c:immunity of writers, the shift isunlikely to be systematic: Others' contributions tostudents' work are likely to vary across assignments,students, and classrooms. An irony emerges thatwhen the student's work is more her own, that workmay index practices and curriculum that lack certainkey features of current reforms.

How is a rater unfamiliar with a student orthe classroom context to assign an indi-vidual student a score for a portfolio

collection that includes assisted or collaborativework? Research by Webb (1993) suggests that anindividual's performance in the context of groupactivity may or may not represent his or her capa-bility. Her finding, for example, that low-abilitystudents had higher scores on the basis of groupwork than on individual work suggests that a rater'sscore for a porfiblio may overestimate student per-formance because it constitutes a rafing of eftbrtsthat were assisted. Alternatively, the rater who isaware that work is assisted ma% adjust downwardthe individual's scoreigain biasing the rating.-

Whose Work Is It? Data From CRESSTStudies

While question, regarding the roles of authorshipand assisted performance in large-scale portfolioassessment have been raised (Condon & Hamp-Lyons, 1991; Gitomer, personal communication,September, 1994; Herman, et al., 1993; Koretz,McCaffrey, Klein, Bell, & Steelier, 1993; Koretz,Stecher, & Deibert, 1992; Koretz, personal com-munication, September, 1994; Stecher & Hamilton,1994), they have been neither directly investigated

These studies suggest substantialvariability in instructional support forstudents' work...

nor widely discussed. As we discuss next, however,preliminary results from the CRESST Vermontstudies (Koretz, Stecher, Klein, & McCaffrey, inpress) and the laboratory-based studies of portfolioratability (Gearhart et al., 1992; Gearhart, Herman,Novak, Wolf, & Abedi, 1994; Herman et al., 1993)add some empirical basis for concern. These studiessuggest substantial variability in instructional sup-port for students' work, variability which may wellcompromise the meaning and comparability ofscoreswithin as well as between classrooms and schools.

VermontWhile the RANI) evaluation addresses three broad

issuesthe actual implementation of the programin schools and classrooms, the program's diverseeffects, and the quality ofthe information yielded bythe assessmentof interest here are results from asurvey distributed to all fburth- and eighth-grademath teachers during the second year (1992-93) ofVermont's statewide implementation.' Results arebased on the responses of approximately 52% of t hemathematics teachers at Grade 4 ( N= 382) and 41%at Grade 8 ( N = 137) (p. 6 ).

Portfolio Assessment: Whose Work Is It?J,sues in the Use of Classroom Assignments for Accountability

Teachers' responses to a number of ques-t ions indicated substantial variation in howinathematics portfOlios were implemented

across classrooms, and consequently substantialvariation in how much help and support studentsreceived in putting their -best face tOrward" for theporttblio assessment. Teachers' reported policieson revising best pieces are a first case in point:Although more teachers ClICOUraffed revision ofmost best pieces i 57% 1, many teachers departedfrom this pattern by either requirim revision (19%),simply permittingit ( 19%), or generally prohibiiingit (5%). Similarly, the amount of time staientsspent revising varied widely. The average time inrevision was 30-40 minutes, but in 17% of class-rooms students did not revise at all, and in another

5

15% of classrooms students took more than, one fullclass period to revise a best piece. Provision of timeand support tin revision clearly represents an aid toperfiwmanceind thus students who are not en-couraged to revise their best pieces may well be ata disadvantage relative to those ,tudents who areprovided greater opportunities to revise.

There also was considerable variation in teachers'policies regarding who \% as permitted to assist stu-dents in revising their best pieces Table 1 ). One infour teachers did not report assisting their ownstudents in revisions, and a similar proportion didnot report permitting students to help each other.Seventy percent offourth-grade teachers and 39% ofeighth-grade teachers fiwbade parental or otheroutside assistance. Further complicating these find-

Table 1Assistance Allowed by Teachers on Best Pieces

(Percentage of Teachers)

Source

Assistance allowed onwhich best pieces?

Rules differfor

individualstudentsGrade None Some Most All

Teacher 4 13 14 16 218 27 9 13 19

Other 4 34 31 11 12 1

Students 8 13 11 1/ 15

Parents or 4,1 71 13 4 4 8

others out-side school

8 39 28 8 13 11

Gradc kfiticrenk stgniticant at the 5% lesci (p.:.0:-;)

-,1111111111111

7

6


ings regarding classroom variation, roughly 10% ofteachers reported that their policies regarding assis-tance varied for different students within theirclassrooms. Teachers' policies also differed withrespect to acknowledgment of outside help. Onlyabout 20% of teachers required their students toacknowledge or describe the assistance they re-ceived, and, therefore, the raters of most students'portfolios would not know who contributed to theentries or the nature of their assistance.

Finally, the Vermont teachers reported sub-stantially different degrees of influence onstudents' choices of "best pieces" for their

portfolios (Table 2): Some teachers reported play-ing an equal role with their students in makingportfolio selections, while others reported no role atall. Certainly thc type and quality of the work thatbecomes part of a student's portfolio can be influ-enced by who selects the pieces for inclusion. Inparticular, since teachers presumably have a better

understanding of the scoring criteria than do stu-dents, the portfolios of students whose choices wereassisted by their teachers may be more likely to showstudents' capabilities.

Thus the RAND/CRESST study found sizablevariations among classrooms in factors such as theamount of revision that was permitted and theextent to which teachers limited assistance fromothers. These implementation findings may help toexplain the weak patterns of relationships betweenportfolio scores and on-demand assessments, rela-tionships decribed earlier. If some teachers provide(directly or through other adults or students) morehelp than others, comparisons among the portfolioscores of their students would be clouded by thecontributions that others make to a given student'sportfolio. Because such factors enter only intoportfolio scores and not into scores on a standard-ized, on-demand assessment, they would tend toweakcn the relationships between portfolio scoresand scores from on-demand assessments.

Table 2Who Selects Best Pieces? (Percentage of Teachers)

Who selects best pieces?

Students on their own

Students with limitedteacher input

Students and teachershave equal role

Teacher with limitedstudent input

Teacher

Grade 43 Grade 8

21 30

55 57

18 8

5 3

1 1

a Grade-level dif ference Agnificant at the 5% level (p<.05

441i


CRESST Laboratory Studies of the Scorabilityof Writing Porttblios

To document the contributions of others to thewriting contained within students' writing porab-lios, Gearhart et al. (1993V asked teachers ratethe level of their instructional support tbr theirwriting assignments. Data were collected in thespring of 1991 from nine teachers spanning GradesI to 6. Each teacher was asked to designate twostudents at each of three levels of writing compe-tency ( high, medium, and low ), to collect completeportfolios of all of their work, and for each writing-assignment to document the instructional supportprovided during the composing and editing phases.

Ratings were keyed to the same dimensionsused at that time to assess students' writ-'ng progress ( Baker, Gearhart, & Herman,

1991): Content/Organization (topic/subtopics ortheme, and their structure and development); Style(elements of text like descriptive language, wordchoice, sentence choice, tone, mood, voice, andaudience ); and Mechanics( spelling, grammar, punk:-tuationmd other conventions). The scale pointswere defined along a continuum from 0 ( no sup-

port ) to ?, (teacher has specified the requirement indetail ). Teachers also were asked to rate eachass*nment in terms of Copied work ( the extent towhich the student's work appeared to be copiedfrom peers or from direct modeling by a teacher orparent ) and to estimate the time the child spent onthe assignment in hours or fractional parts of hours.The dataset consisted of spring 1991 ratings of 228assignments from a total of 54 students. Thenumber of assignments per student ranged from 1to 21, with a modal number of 3. (One teacherreturned 14-21 assignments per target student,compared with 1-5 for the remaining eight teach-ers.m) Across all assignments, teachers reportedproviding generally low to moderate levels of sup-port to their target students, Kit their reportedsupport differed substantially among students' com-petency levels: Teachers were far more likely toreport providing higher levels of support to their"low" studmts than to their more able students(Table 3), a finding that raises concerns about thedifferential meaning of scores that may be assignedto students' portIblios.

The patterns of teachers' reported support dif-fered across the three writing dimensions, reflecting,

Table 3Percentage of Teachrs Reporting Greater Supportby Writing Dimension and Student Ability Level

Writing dimension

Content/organization

%le

Ntechanics

Student ability level

High 1.ow

34 72

13 55

26 60

Note. "greater" level of support was defined as rahngs of 2 or3, where 2 indicated some guidelines and feedbackind 3 represented di tailed guidelines and tixdback.

- - r -

9

8


it seems, variations in curriculum. In Table 4, we seethat teachers' experience with portfolio assessmentwas related to their patterns of instructional sup-port. The three teachers who had been usingportfolios in their classrooms for over a V car tendedto report providing higher levels ofsupport than didthe six teachers who had just begun experimentingwith portfolio assessment, and we believe that themore experienced teachers engagement with a writ-ing process approach contributed to their greaterinvolvement with students' assignments ( and/or totheir greater perceptions of involvement ). Table 5hints at the ways that teachers' reported levels ofsupport may be related to grade level as well asportfolio experience. While these data are purely anillustration from a very small dataset, we see heretwo second-grade teachers providing quite difkrentlevels of assistance with style vs. mechanics, and twofifth-grade teachers differing more in levels of sup-port for style. Furthermore, the second- andfifth-grade teachers with a year of portfolio experi-ence reported emphases on different writingdimensionsthe second-grade teacher more con-

ccrned with mechanics, the fifth-grade teacher moreconcerned with style.

Thus teachers in the Gearhart et al. (1993) studyreported variations in instructional practices thatwere likely to have impacted dilkrentially thc qual-ity ofstudent work in the portfolios. As in Vermont,these implementation findings may help to explainthe weak relationships between studems' portfolioscores and their standard writing assessments.

Reflections and RecommendationsTeacher self-report data from two CRESST

studies have produced evidence of variation in howportfolio work is produced and supported. Weacknowledge the flaws ofthe preliminary self-reportdata that we have presented and fully recognize thatfurther research is neededstudies that employlarger sample sizes and multiple methodologies toverify the variety of support provided to studentsand the impact of such support on assessed perfor-mance. But iffindings like these can be substantiatedin more systematic research, they suggest that thequality of student work reflects not only a student's

Table 4Comparison of Teachers With Little vs. One Year ExperienceWith Portfolios: Percentage of Assignments Given Greater

Support by Writing Process Dimension

Portfolio experience

Dimension

Focus/organization Style Mechanics

Little ( n=6)

One year (n=3)

54

92

41

82

36

74

Note. A -great cr" level of support was defined as ratings of 2 or 3, where2 indicated some guidelines and feedback, and 3 represented di toiled guidelines and feedback.


Table 5Ilhistrative Comparison of Selected Teachers:

Percentage of Assignments Given Greater Support byWriting Process Dimension

'Feacher's lex

Dimensiongrade el

portfolio experience Focus; organization Style Nlechanics

Second gradeLittle experiencewith porttOlios 83 58 67

One year experiencewith porttOlios 65 18 100

Fifth gradeLittle experiencewith portfolios 50 .79 21

One year experiencewith portfolios 72 44

Note. A "greater" level of support ss as defined as ratings of 2 or 3, %%here 2indicated SOW guidelines and feedback, and 3 represented detailed guidelines and feedback.

competence but also the amount and quality ofsupport received from others. Thus, whose work isthe classroom work contained in a student's porttO-lio? From the preliminary evidence presented here,it seems it may dependon students' competenceand a range of variable circumstances: teachers'methods of instruction, the nature of their assign-ments, peer and other rest MCCCS available in theclassroommd home support .

What meaning, then, can a large scaleporfolio assessment program ascribe tostudent work contained in IN wtrolio

collections? A professional --ss hether a w ricer. scientist,oreducat it mal researcher --ss ho is accustomedto others' input may respond to this questitm ss it hphilosophical reflection or an identity crisis. Indeed,

11

9

whose work is this article, for example? In whatss avs does it reflect the writing and research compe-tencies of either of its authors? We value (Mr oss nopportunities for collaborative work as much as wevalue the efforts to engage students in authenticcommunities in the classroom. But, from a mea-surement perspective, the validity oeinferences aboutstudent competence based solely tin portfolio workappears suspect. While this is not a gras c concernfor classroom assessment where teachers and stu-dents can judge perfOrmances s ith knowledge oftheir context, the pmblem is troubling indeed forlarge-scale assessment purposes w here'comparabil-itv of data is an issue. Under w hat circumsta,,, es,then, can port ft ihto assessments be used to rank ormake seritms decisions about students, teachers,schools, or districts?

10


The question requires attention to ( a) the pur-poses of portfolio assessment, (b ) the integrateddesign of portfolio contents, rubric contents, ratingprocedures, and uses of the resultsmd (c) a recog-nition ofpossible conflicts between the measurementand instructional aims of portfolio assessment. Theapparent inverse relationship between support andstudents' ability level in the Gearhart et al. (1993 )study is a telling example in this regard. Certainlyif low-performing students are to achieve high stan-dards, it is likely they will need an enrichedinstructional process to give them the capability tbrtransferable performanceample models, coachingand mentoring, and multiple opportunities for prac-tice, feedback and revision. But if the work thatemerg from this same instructional process is usedto assess students' individual pertbrmance, thenthere will be problems of comparability of scoresacross students. Can we bridge this apparent gulfbetween what is required to serve the purposes ofclassroom instruction and large-scale accountabil-ity?

While no easy solutions come to mind, itdoes appear that any valid assignment ofan individual student score to a porttb-

lio for large-scale purposes will require proceduresto highlight the student's contribution to the work.Adjustments in either composition of the portfolioor rating procedures, or both, will be necessary toassure comparability of student results. As weoutline below, a number of strategies have beensuggested, although most have so far been rejectedfor reasons of kasibility, cost, or violation of certainfundamental porablio program assumptions, suchas instructional fteedom, seamless integration ofporttblios with instruction, and cm mi mit ment tohonor diversity (1). (itomer, pL rsonal communication, September, 1994; 1). Komi, personalcommunication, September, 1994; K. Sheingold .personal communication, September, 1994 1.

- Restrictions Lould be imposed on w ork siLldents produce tiir their porublios, ntrolliiwho is permitted to pn wide assistance andunder what circumstances. These pn iced ures.largely rejected as violations of instructionalfreedom, w ould require critication that thecontrols on assistance w ere_ in place.

1Portfolios could be "seeded- w ith students'responses to a standard peril wmance based w fling assessment; ratings ot ihew entries might beused to adjust overall pornblio sioreS, or toraise "red tlags- when scores tbr standard assessments are discrepant w ith other portfoliomaterial. But this Option would bring additional complications.

Portfolio procedures could incorpo-rate strategies fbr documenting others' assistance and input...

First, many perfbrmance -based assessments of

writing and reading currently incorpc irate Components of a process approachfor example,shared readings and class discussion, or es enpeer responseand thus the "seeded- assessments might also need checks that the assistance pnwided was comparable air( ss st u dent s

Second, procedures f.br adjusting port ti Ain st. wcs

would require ef msensus on a framework fi ii-justitiing those pn iced ures: On w hat gi mindscan a student's individual writing be omparedagainst his writing supported Ns ot het,: lesponses and guidance

Portfolio procedures could no irporate sti e,.fics

documenting ()diets assisfani e and inpm

12


Porttblio entries could be accompanied withcontext descriptions tlLit document the resourcesthat were available to a student and the contri-butions of others to the process of composingthe work. However, this option would requireprocedures tbr (a ) producing t hose descriptions(training teachers and/or students to providecomparable intOrmat ion across assignments, sw

dentsind classro(mis ), and ( b ) using the c(intext intbrm:nion 55 hen rating the porttblio ma-terial (e.g., will raters tirst rate the work on itsown meritsmd then adjust the score afterexamining evidence of support? Will raters ratethe work and let someone else rate the supportand make the adjustment? Will raters look at allsources of evidence and make sonie kind ofintegrated, summary judgment of the student'sindividual contribution and competence?).

Raters could score a sample of porablios from agiven classroom at virtually the same time, toenable them to see how an individual's workcompares with or duplicates the responses ofothers from the class. To date, this procedurehas simply been used as an exercise, but it is ofinterest that the exercise has yielded two differ-ent conclusions. In the Vermont project, researchers reviewing mathematics portfolios fromonc class noticed identically worded key phrasesin many students' responses to the same assign-

ment. This suggested that the work had beenstructured so heavily by the teacher that theresponses did not really represent the pertbr-mance of individual students (S. Klein and D.Koretz, personal communication, September,1994 ); In am)t her project, six target studentswere interviewed in each ()1 several portfolioclassrooms, and, in some classrooms, worrisomecommonalties in the content of students' ss riting reflected impressive corIMUmalties in stu

13

11

dents' understandims of the work based ontheir interview responses; students had learneda great deal from an assignment that emergedfrom intensive classroom collaboration. Howcould we tell the difference, then, betweencommlm understandings and copying?

There is interest in incorporating assessment ofgroup process.

Portfolios could be constructed to provide evi-dence for the very interactive processes thatendangt.r the validity of individual scores as-signed to the firJ product. That is, if studentsare using resourcesind soliciting and makinguse of input from others, then it makes sense todocument and assess students' competencieswith these ways of working within a writingcommunity (ci a current analytic review onmethods for group assessment by Webb, 19941.

Similarly, students' unique contributions to groupproducts could be documented:

The design of porablio assessment could docu-ment more explicitly an individual student'srole in a given product. The inclusion ofstudent self-assessments, peer assessments(when

the work was collaborative), and teacher assess-

ments could help to claritY a student's uniquecontribution; a follow-up individual assignmentcould demonstrate what a student had learnedfrom a collaborative project or a project heavilyguided by the teacher. However, once again,little is known ab( Mt the ways that raters w( mldutilize these inclusions in making a portfoliojudgment .

12


Listed here to stimulate thinking about pos-sible solutions, our collated suggestionsprovide imperfect and somewhat un wieldy

answers. The alternative? A large-scale, high-stakes porttblio program could produce individualstudent scores but do nothing to address the "whosework" problem, thereby ensuring invalid compari-sons among students.

This is not to say that large-scale portfblio assess-ment is without value. Our own research and thatof others indicates that large-scale porttblio assess-ment programs carry significant benefits forinstructional reform (Gearhart & Wolf, 1994;Herman & Winters, 1994; Koretz, jtecher, Klein,McCaffrey, & Deibert, 1993; Koretz et al., in press;Sheingold, Heller, & Paulukonis, in press). InVermont, for example, principals and teachers re-

...in the hands of skilled teachers,parents, and students, portfoliosshould provide critical contexts fordiscussing, assessing and improvingthe process and the outcomes of stu-dents' learning.

ported that the portfolio program has induced siz-able and diverse changes in instruction that arelargely consistent with the goals of reffirm effortsboth in Vermont and nationwide. This is an impor-tant consequence even if portfolios fail to providevalid individual measurement. But can portfblioassessment provide us with valid indices of studentcompetencies usable tbr large-scale accountability?There are some promising possibilities: For ex-ample, the question of "whose work" may be lesstroublesome if porttblio assessment results are ag-gregated fbr decision making at the school or districtlevel, using matrix sampling and excluding indi-

vidual-level data. As Moss (1992, 1994) argues,Multilevel designs could assign responsibility forindividual-level decisions to the local level, whereportfolio-based decisions about the capability ofindividual students can be iriformed by professionaltadgment and knowledge of the local school con-text.

Students' porfiblio work could also provideinvaluable evidence of students' "opportu-nities to learn." If some controls over ineq-

uitable help from sources outside the classroomwere in place, then portfblio assessment (at any levelof the system ) could provide a window on thequality of curriculum and instruction by showingwhat work students are asked to do and how wellthey are able to do it. From this perspective, wewould expect students' porablios to show the bestof what students can do with help from an effectiveinstructional process.

Finally, in moving forward on large-scale portfo-lio assessment, we will ne !(.1 to remember that thevery complexity of portfolio assessment is at once itsstrength and its weakness. While portfolios mayresist attempts to reduce their contents to simple,reliable individual scores, in the hands of skilledteachers, parents, and students, porablios shouldprovide critical contexts for discussing, assessingand improving the process and the outcomes ofstudents' learning.

References

Baker, E. L., Gearhart, NI., & Herman, J. L. (1991). TheApple Classrooms of Tomorrow: 1990 evaluation study.Report to Apple Computer, Inc. Los Angeles: Universityof California, Center for the Study of Evaluation.

Baker, E. I.., Gearhart, M., 11erman, J. L., Tierney, R., &WhittakerA. K. (1991 ). Stevens Creek portfolioproject: Writing Jssessment in the technology classroom. Porrfidio News, 2(3), 7-9.

Belanoll, P., & Dickson, M. ( Eds.). (1991). PortIblio.s:Process and product. Portsmouth, NH: Heinemann.


Calfee, & Perfumo, P. A. (19921. A surrey ofportfaio practices. Berkeley: I. ersit of (.alif(irma,Center tiir the Study of Writing.

Callee, R. C., & Perthmo, P. A. (1993.». .1t udentportfhlios and teacher logs: Blueprint*. II repaint:on inassessment ( CSW Tech. Rep. TR 65 Berkeley: Uni-ersity of Calilbrnia, Center tiir the Study of Writing.

Caltee, R. C., & Perfumo, P. A. (19931». StudentportIblios: Opportunities tiir a resolution in assessment. burna/ iy'Reading, 532 537.

Camp, R. 1990 lunking, together about porttblios.17K Onarterly tbe National Writing Project anti theCenter fOr the Study of Writing, 12(2 ), 8.14, 27.

Camp, R. ( 1993 The place orportfolios in our changingviews of writing assessment. In R. E. Bennett & W. C.Ward ( Eds. ), (inistruction persits choice in cognitivemeasurement: Issues in constructed response, peqbrmancetesting, and portfam assessment( pp. 183-212 ). [Misdate,NJ: Erlbaum.

Camp, R., & Levine, I). S. (1991). Porttblios evolving:Background and variations in sixth through tsselithgrade classrooms. In P. Belanoti& M. Dickson ( Eds.1,Portfidios: Process and product (pp. 194-205). Ports-mouth, NI-I: I leinemann.

Condon, W., & Hamp-Lyons, L. (1991 ). Introducing aport tblio- based writing assessment. Progress throughproNems. In I'. Belanotf& NI. Diekmm ( Partfii-has: Process and prminet ( pp. 231-247 ). Portsmouth,NH: I lememann.

Duschl, R. A., & Gitomer,1). I I. (19911. Epistemological perspectives on conceptual change: implicationeducati(mal practice. Journal of Research in St ieneeTeaching, 28(9), 839-858.

Freedman, S. (19931. I.inking large-scale testing andclassroom pordblio assessments of student ss riting.Educational Assessment, 1(1). 27 52.

Gearhart, NI., Herman, J. I.., Baker, E. I.., & Whittaker,A. K. (19921. IVriting portfolios at the elementary/evil:A study of methods fmr writing assessment ( CSE Tech.Rep. 3371. I .os Angeles: University of California,Center for Research ...I Evaluation, Standards, andStudent resting.

Gearhart, NI., I Ierman, I. I ., Baker, E. I.., & Whittaker,A. ( 1993 'hose work is ii ? A question for the validityofla rge-cea pornidio n3sessment ((-SE 'leclt. Rep. 3631.

'EST COPY AVAILABLE 15

13

Los Angeles: University of California, Center tiir Research ( m Evaluation, Standards, and Student Testing.

Gearhart, NI.,Ilerman, J. I.., Novak, J. It., Wolf, S..A., &Abedi, J. (1994). Tifirard the instructional utility oflaige-seale writing assessment: Validation (if' a newnarrative rubric (CSE Tech. Rep. 389). Los Angeles:University ofCalitbnua, Center tbr Research on Evalu-ation, Standardsind Student Testing.

Gearhart, M., Novak, I. R., & 1 lerman, J. L. t in pressi.LiSnis iii portfidio assessment: he seara bility narranvecollections (CSE Tech. Rep. ). I (PI Angeles: UniversityolCalifornia, Center for Research on Evaluation, Stan-dardsmd Student Testing.

Gearhart, M., & Wolf, S. A. (19941. Engaging teachersin assessment of their students' vs riting: The role ofsubject matter knowledge. Assessim Writing, 1(1), 67-90.

Gearhart, M., & Wolf, S. A. ( in press ). The student 's rolein la Te-scale port/Olio assessment: Providing epidence ofwriting competency. Part I: lbe purposes and processes ofwriting (CSE Tech. Rep.). I.os Angeles: University ofCalifornia, Center tbr Research on Evaluation, Stan-dardsmd Student Testing.

Gearhart, M., Wolf, S. A., Burkev, B., & Whittaker A. K.( 1994 1. Engaging teachers in assessment of their stu-dents' narrative writing: Impact on teachers' knowledgeand practice ( CSE Tech. Rep. 377). Los Angeles:University of California, Center for Research on Evaluanon, Standardsmd Student Testing.

Gentile, C. (1992). Exploring nen' methods.Thr collectingstudents' school-based writim: EP's 1990 Partlidtostudy. Washington, DC: National Center (br Educadon Statistics.

Gill, K. ( Ed.1. (1993). Process and portfolios in writinginstruction. Classroom practices in teaching English, Vol.26, Urbana. I I. Nati(mal Council of Teachers ofEnglish.

Gitomer, D. I I. (1993). Performance assessment andeducational measurement. In It. E. Bennett & W. CWard ( Eds.), ConstructUm versus choice in cognitivemeasurement (pp. 241 2631. Hillsdale, NI: Lass renceErlbaum.

Gla/er, S. M., Bross n,C. S., Fantail/pi, P. D., Nugent, I).kV., & Seat toss, I .. 1993 Pomblios and beyond:Colla bora rive a msessment in reading and writing.Norm uid, NIA: Christopher Gordon.

14


Hamp-lons, L., & Condon, W. (1993). Question Mgassumptions about porttblio-based assessment. ColhgeComposition and Communication, 44, 176-190.

Herman, J. L., Aschbacher, P.R., & Winters, L. (1992). Apracticalguide to alternative assessment. Alexandria, VA:Association fbr Supervision and Curriculum Develop-ment ( ASCD).

Herman, J. I.., Gearhart, M., & Aschbacher, P. R. ( ui

press). Portfolios for classrxiom assessment: Design andimplementation issues. In R. C. CAT& ( Ed. ), Portjblioassessment. Hillsdale, NJ: Erlbaum.

Herman, J. I.., Gearhart, M., & Baker, E. L. (1993).Assessing writing porttblios: Issues in the validity andmeaning of scores. Educational Assessment, 1(3), 201224.

Herman, J. L., & Winters, 1.. ( 1994). Portfolio research:A slim collection. Educational Leadership, 52(2), 48-55.

Hewitt, G. (1991). Leadim and learning: A portfblioof change in Vermont schools (Tech. Rep.). Vermont:Governor's Institutes.

Hewitt, G. (1993, May -June ). Vermont's portfolio-basedwriting assessment program: A brief history. Teachersand Writers, 24(5), 1-6.

Hiebert, F., & Calfee, R. C. (1992). Assessment ofliteracy: From standardized tests to performances andporffblios. In A. E. Fastrup & S. J. Samuels ( Eds. 1, Whatresearch says about reading instruction (pp. 70-100).Newark, DE: International Reading Association.

Hill, R. (1992, June). Assessments developed in support ofeducational refbrm in contrast to assessments developed insupport of educational rtfinement. Paper presented atthe Assessment Conference of the Education Commis-sion of the States, Boulder, CO.

Howard, K. (1990). Making the ss riting portfolio real.The Quarterly of the National Writing Project and the(:enter fir the Study of Writing, 12(2), 4-7, 27.

Koretz, I). (1993). New report of Vermont PortfolioProject documents challenges. National Council onMeasurement in Education Quarterly Newsletter, 1(4),1-2.

Koretz, D., Klein, S., McCaffrey, D., & Sti:cher, B. (1993 ).Interim report: The reliability of the Vermont portfidioscores in the 1992 school yea r (('SE Tech. Rep. 37)) ;. LosAngeles: University of California, National Center fiirResearch on Evaluation, Standards, and Student Test-ing.

Koretz, D., McCaffrey, D., Klein, S., Bell, R., & Stecher,-B. (1993). The reliability of' scores from the 1992Vermont Portjblio Assessment Prggrain (CSE Tech.Rep. 355). Los Angeles: University of California,Center for Research on Evaluation, StandardsmdStudent Testing.

Koretz, D . , Stecher, B., & Deibert, E. (1992 ). TheVermont Assessment Program: Interim report on imple-mentation and impact, 1991-92 school year ( CSE Tech.Rep. 350). I.os Angeles: University of Califmnia.Center for Research on Evaluation, Standards, andStudent Testing.

Koretz, D., Stecher, B., Klein, S., & McCaffrey, I).(1994). The evolution of a portjblio program: Theimpact and quality of 'the Vermont prograin in its secondyear (1992-93) (CSE Tech. Rep. 385). Los Angeles:University ofCalifornia, Center for Research on Evalu-ation, Standards, and Student Testing.

Koretz, D., Stealer, B., Klein, S., & McCaffrey, D. (inpress). The Vermont Portfolio Assessment Program:Findings and implications. Educational Measurement:Issues and Practice, 13(3).

Koretz, D., Stecher, B., Klein, S., McCaffrey, D., &Deibert, E. (1993). Can portfolios assess studentpeiformancc and influence instruction? The 1991-92Vermont experience (CSE Tech. Rep. 371). Los Ange-les: University of California, Center for Research onEvaluation, Standards, and Student Testing.

LeMahieu, P. G., Fresh, J. T. & Wallace, RC. (1992).Using student portfolios fora public accounting. SchoolAdministrator, 49(11), 8-15.

LeMahieu, P. G., Gitomcr, D. H., & Fresh, J. T. (inpress). Portfolios in large-scale assessment: Difficultbut not impossible. Educational Measurement: Issuesand Practice.

Meyer, C. A., Schuman, S., & Angello, N. (September,1990). .-Iggregatika porthdio data (White Paper).Lake Oswego, OR: Northwest Evaluation Associa-tion.

Mills, R. P. (1989, December). Portfolios capture a richarray of student performanc.e. The School Administra-tor, pp. 8-11.

Mills, R. P., & Brewer, W. (1988 ). Work:Tr nigetherto tholl' Au approach to school accountability inVermont. Monpelier: Vermont Department of Educa-tion, October 18/November 10.


Moss, P. A. (1992 ). Portfiilios, accountabilityind aninterpreme appn tach u)%alidity. Educational Aleasunmen t: Issues and Practice, h 31, I 2 2 1 .

Moss, P.A. ( 1994, March Can there be validity ss it houtreliability? Educational Researcher, 23( 21, 5- 12.

NIoss, P..A., Beck, J. S., Ebbs, C., 1 lerter, R., Nlatson, B.,Nluchmorc, J., Steele, D., & Taylor, C. (1991, April ).Further enlarging the assessment diaio.aue: portfit-ho to communtrate beyond the classnonn. Paper presentedat the annual meeting of the American EducationalResearch Association, (:hicago.

Nlurphy, S., & Smith, NI. (1990, Spring). Talking aboutporttOlios. The Onarterlyofthe National11'riting Projectand the Center fin- the Study of /2(2), 1 3, 24-,7.

Nlurphy, S., & Smith, NI. (1992). 1.00king into portfolios. In K. B. Yancey (Ed.), Portfolios in the writinaclassroom. An introduction pp. 49-61). Urbana, 11.:National Council of Teachers of English.

O'Neil, J. (1992). Putting perfiirmance assessment to thetest. Educational Leadership, -PA 8 I, 14-19.

Paulson, E. L., Paulson, P. R., & Meyer, C. A. (1991,February ) . What makes.a portion() a porttOlio? Educa-tional Leadership, 48(5 ), 60-63.

Reidy, E. (1992, lune ). What dins a realistic stateaSSCSSinent program look like in the near term? Paperpresented at the annual l-CS/CDE tneeting, Boulder,Co.

Saylor, K., & Os erton, J. (1993, March 1. Kentuckywriting anti math portfidnis. Paper presented at theNational Conference on Creating the Quality Scliciol.

Sheingold, K. (1994, September). Designing a laige-scaleportjblio assessment system .fOr classroom consequences.Presentation at the 1994 CR ESST Conkrence: GettingAssessment Right!, I.os Angeles.

Sheingold, K., I teller, J., & Paulukonis, S. in press ).Actively seeking evidence: Shilis in teachers thinkino andprattle(' thnatah astessment development (Tech. Rep. }.Princeton, NJ: Edncational Testing Service, (einer (OrPertOrmance Assessment.

Simmons, J. (19901. Port n)lios as large scale assessment .

Lanaume Arts, 6-, 262 268.

smith, NI. A., & Nlurphs, S. A. (1992, Winter "( (midyou please come and do port tiiho assessment tor us?"Onarterlvorthe Natrona/ Writing Protect and the C'enterIhr the Study of Wri n lin and Literacy, lIt 11, 14 17.

17

15

Steelier, B. SI., & I lamiiiltimii. E. G. 1994, April). Port-j'r'dio assessment in Vermont, 1902-93: The teachers'perspective on implementation and impact. Paper pre .

sented at the annual ineeting °idle American Educational Research Association, New Orleans.

Stroble, E. J. ( 1993, December). Kentuck\ studentEspectations oisuccess. Equity and Excel-

lence in Education, 20(3), 54 59.

Tierne, R. I. (1992 ). Port tblios: Windm% s on learning.Learning, 2, 61-64.

fierney, R. J., Carter, NI. A., & Desai, L. E. I 199I).Port.l'olio assessment in the reading writing classroom.Norwood, MA: Christopher Gordon.

Valencia, S. \V., & Calfee, R. C. t 1991 ). The des elopmentand use of literacy porttblios for students, classesindteachers. Applied Educational Meant) cult nt, 4, 333-345.

Verniont Department of Educati(m. 11990, S.:ptember).Vernumt Writing Assessment: 'Pie pilot y.ar. Montpe-lier, VT: Author.

Vermont Department of Education. (1991a ). "ihis is mybest": Vermont's1Vriting Assessment Program. pilot Vt al*1990-199E Montpelier, VT: Author.

Vermont Department of Education. ( 1991b ). Lookingbeyond "The Answer": Vie report of Vermont's Math-ematics Portfolio Assisi-mew Program. Montpelier, VI.:Author.

Vernumt Department of Education. (1991c). VermontMathematics Portiblio Project: IL esil 1 I lit' book. 511i111 pclier, VT : hor.

Vermont r .part ment of Education. ( 199Id1. I "ermontMathematics PortfOlio Project:1Cacher'sguide. Minitpelier, VT: Author.

Webb, N. ( 19931. Collaborative group l'ersus individualassessment in mathematics: (;roup processes and outcomes( CSE Tech. Rep. 352 ). Los Angeles: Universit of(alifornia, Center for Research on Es ahiation, Standardsind Student Testim.

1Vebb, N. M. ( 1994 ). Group collaboration in a.,sessment:C:omPettnaollfectil'es, proresseA. and outcomes ((.SE Tech.Rep. 386 ). Los Angeles: Unisersity itt CalitOrma,Center for Researt. II on Es aluation, Standards, andStudent resting.

Wolf, D. P. (1989, April ). Port bin() assessment: SampIMg student (irk. Educational Leadership, 40+7 ), 410.

..._-.7.11111111111111

16


Wolf, D. P. (1993 ). Assessment as an episode oflearning.In R. Bennett & W. Ward ( Eds. ), Construction vs. choia.in cognitive measurement( pp. 213-240). Hillsdale, NJ:Lawrence Erlbaum Associates.

Wolf, D. P., Bixby, J., Glenn, J., & Gardner, I I. (1991 ).To Ilse their minds well: Investigating new fi;rms ofstudent assessment. In G. Grant ( Ed. ), Review ofResearch in Education Vol. 17, pp. 31 -74 ). Washing-ton, DC: American Educational Research Association.

Wolf, S. A., & Gearhart, M. ( 1993a ). Writing what youread: Assessment as a learning event ( CS E Tech. Rep.358 1. Los Angeles: University of California, Center tbrResearch on Evaluation, Standards, and Student Test-ing.

Wolf, S. A., & Gearhart, M. ( 1993b ). Writins what youread. A guidebookfir the assessment of children's narra-tives (CSE Resource Paper No. 10). Los Angeles:University of Calitbrnia, Center for Research on Evalu-ation, Standards, and Student Testing.

Yancey, K. B. (1992 ). Porttblios in the writing classroom:A final reflection. In K. B. Yancey (Ed.), Portfolios in thcwriting classroom.- An introduction (pp. 102-116).Urbana, IL: NCTE.

2

Both authors contributed equally to this article. Portionsof this article are closely adapted from Gearhart, Herman,Baker, & Whittaker (1993). We thank Noreen Webb,Dan Koretz, Brian Stecher, Drew Gitomer, and RonDietel for detailed comments and revisions of an earlierdraft.Examples include: Baker, Gearhart, Herman, Tierney, &Whittaker, 1991; Belanoff & Dickson, 1991; Calfee &Perfumo, 1992, 1993a, 1993b; Camp, 1990, 1993; Camp& Levine, 1991; Freedman, 1993; Gentile, 1992; Gill,1993; Gitomer, 1993; Glazer, Brown, Fantauzz.o, Nugent,& Seartbss, 1993; Hamp- Lyons & Condon, 1993;Herman, Aschbacher, & Winters, 1992; Herman, Gear-hart, & Aschbacher, in press; Hewitt , 1991, 1993; Hiebert& Calfee, 1992; Hill, 1992; Flinvard, 1990; Kotetz,1993; Koretz, Klein, McCaffrey, & Stecher, 1993; Koretz,McCaffrey, Klein, Bell, & Steelier, 1993; Koretz, Steelier,& Deibert, 1992; Koretz, Stecher, Klein, & McCaffrey, inpress; Koretz, Steelier, Klein, McCaffrey, & Deibert,1993; LeMahieu, Eresh, & Wallace, 1992; LeMahieu,Gitoiner, & Eresh, in press; Meyer, Schuman, & Angello,1990; Mills, 1989; Mills & Bresser, 1988; Moss, 1992;Moss, Beck, Ebbs, Herter, Matson, Muchmore, Steele, &Taylor, 1991; Murphy & Smith, 1990, 1992; O'Neil,1992; Paulson, Paulson, & Meyer, 1991; Reidy, 1992;

11111111111111=111_

_Saylor & Overton, 1993; Sheingpld 1994; Sinumms,1990; Smith & Murphy, 1992; Steelier & Hamilton,1994; Stroble, 1993; Tierney, 1992; Tierney, Carter, &Desai, 1991; Valenc;a & Calfee, 1991; Vermont Depart-ment of Education, 1990, 1991a, 1991b, 1991c, 1991d;Wolf, D. P., 1989, 1993; Wolf, Bixby, Glenn, & Gardner,1991; Yancey, 1992.

3 Moss (1994) has proposed that high-stakes, large-scalewriting assessment could be conducted in w.ws that affOrdgreater local authority and that foster dialogue and ongo-ing review of both the methods of portfoho assessmentand the results. Her work is the impetus hir very prod LIC-tive debate concerning a complex set of issues, includingthe ways that the goals and models of portfolio assessmentprograms may need to be designed differently for differentlevels of an assessment system. In this paper, we reitY thefeatures of a "large-scale portfolio assessment program,"both simplifying their complexity and drawing on teaturesof several current programs.

4 In the Kentucky program, the classroom teacher scores hisor her own portfolios, but a sampling of porffblios isrescored by another teacher ( usually from another school )and/or by a KRIS staff member. Because a given teacher'sscores may he adjusted by patterns of relationships be-tween his scores and an outsider's, we still view the issuefor Kentucky in the same way: Credibility for a student'sscore still derives ultimately from an outsider's judgment.Koretz, 1993; Koretz. McCaffrey, & Steelier, 1993;Koretz, McCaffrey, Klein, Bell, & Steelier, 1993; Koretz,Stecher, & Deibert, 1992; Koretz, Steelier, Klein, &McCaffrey, 1994, in press; Koretz, Steelier, Klein,McCaffrey, & Deibert, 1993.

6 There are certainly other possible explanations. Moss(1994), for example, discusses potential incompatibilitiesamong methods and purposes of different approaches toassessment .

7 Again we are assuming a rater 'Who is scoring outside theclassroom context.

8 The description below is taken from Koretz ct al., 1994.Readers refi2rred to that technical report for moredetails about the study.

9 The description below is taken from Gearhart et al., 1993.Readers are referred to that technical report tbr tii redetails about the study.

1() To compensate tin- atiation in the number of assignments rated per child and the number of children desig-nated as high, medittm, or low in ss iii ing ability, weightedaverages were computed (breach teacher. Thus,a teacher'sratings were averaged (Or each student's assignmentsmdthen, aS appropriate, a "mean of mea ns" was computed fieach teacher, or (Or "high," "medium," and "low" students hir each teacher.

18

New & Recent CRESST/CSE TECHNICAL REPORTS

On Concept Maps as Potential "Authentic"Assessments in ScienceRichard Shavelson, Heather Lam, and Bridget LewinCSE Technical Report 388, 1994 S4.00

Using concept maps as an assessment tool n,ill urgeeducators to teach students more than simple jgctsand concepts, but how ditlerent concepts relate toeach other. An evaluationa I tool such as conceptmappiug urges the individual to think on a deepercognitive level than a 'fill-in the lila nk" test wouldrequire. 'There is value in both assessment toolsneither should be ignored.

Fourth-grade teacher

The preceding statement from On ConceptMaps as Potential "Authentic" Assessmentsin Science points to the multiple expected

benefits from the use of concept maps as alternativeassessments. In this report, CRESST researchersRichard Shavelson, Heather Ling, and BridgetLewin examine concept mapping issues as an assess-ment technique exploring questions related to validity-and reliability of student scores. The authors beginwith a clear definition of concept mapping:

"A concept map," write the authors, "constructedby a student, is a graph consisting of nodes repre-senting concepts and labeled lines denoting therelation between a pair of nodes (concepts)."

Based on an extensive review of concept mapusage, the authors frmnd that concept mappingtechniques differed widely.

"No less than 128 possible variations were iden-tified" say the authors. "Methods for scoring mapsvaried almost as widely, fi-om the admonition 'don'tscore maps' to a detailed scoring system for hierar-chical maps."

The researchers' review led to the conclusion thatan integrative "working" cognitive t heory is neededto begin to limit this great variation for alternativeassessment purposes. "Such a theory," concludethe authors, "would also serve as a basis for much

17

needed psychometric studies of the reliability andconstruct validity ofconcept maps since such studiesare almost nonexistent in the literature."

The review presents issues arising from large -scaleuse of mapping techniques, including the impor-tance of students skills in using concept mapsindthe possible negative impact of teachers teaching tothe assessment ifstudents have to memorize concept!naps provided by textbooks or themselves.

Specifications for the Design of Problem-Solv-ing Assessments in ScienceBrenda SugrueCSE Technical Report 387, 1994 (54.00 1

In Specifications for the Design frf Problem -!;olv-ing Assessments in Science, CRESST researcherBrenda Sugrue draws on the CRESST perthr-

mance assessment model to develop a new set of testspecifications for science. Sugrue recommends thatdesigners li)llow a straightthrward approach thrdevehping alternative science assessments.

"Carry out an analysis of the subjct matter con-tent to be assessed," says Sugrue, "identifying keyconcerts, principlesmd procedures that are embodied in the content." She adds that much of thisanalysis already exists n state frameworks or in thenational science stanC.frds.

Either multiple-choice, open-ended, or hands-onsci,:nce tasks can then be created or adapted tomeasure individual constructs, such as concepts andprinciples, and the links between concepts andprinciples.

In addition to measuring content-related con-structs, Sugrue's model advocates measuringmetacognitive constructs and motivational constructsin the context of' the content. This permits morespecific identification of the sources of students'poor perthrmance. Students may perform pm wivbecause of deficiencies in content knowledge, and/or deficiencies in constructs such as planning andmonitorin:j., and/or maladaptive perceptions ()kill'

_

18

NEW & RECENT CRESST/CSE TECHNICAL REPORTS

and task. The more specific the diagnosis of thesource of poor performance, the more specific canbe instructional interventions to improve perfor-mance.

Sugrue's model includes specifications for taskdesign, task development, and task scoring, alllinked to specific components of problem solvingability. An upcoming CRESST report will ut..;cussthe results of a study designed to evaluate theeffectiveness of the model for attributing variance inperformance to particular components of problemsolving and particular formats for measuring them.

Group Collaboration in Assessment: Compet-ing Objectives, Processes, and OutcomesNoreen WebbCSE Technical Report 386, 1994 ($4.00)

Learning from other students, developinginterpersonal skills, and maximizing col-laborative performance are three primary

goals of small-group collaboration. But accordingto Noreen Webb in Group Collaboration in Assess-ment: Competing Objectives, Processes, and Outcomes,group assessment may or may not be an effectivestrategy for measuring such goals.

"There may be an appropriate place for collabo-rative group work in educational assessment," assertsWebb. "Most importantly," she adds, "how groupwork is used in an assessment should coincide withthe purpose of an assessment."

Webb's own research and that of others stronglysuggcsts that the purposes must be clearly under-stood from the start of a group assessment project.Measuring individual student learning versus groupproductivity, for example, may call fbr differences inthe assessments

If the purpose is to measure individual achieve-ment, suggests Webb, then the instructions mightbe worded to encourage individual effort. A Con-necticut assessment, for example, told students that"each person should be able to explain fully the

111.1111111111111111111:4::-

conclusions reached by the group" and should beprepared to give an oral presentation on their group'sexperiment. Thus, students expected that theywould be held accountable individually.

But if the assessment purpose is different, so tooshould be the focus, says Webb. She cites groupproductivity as one example:

"Assessment focusing on group productivity,"says Webb, "would give a group a task to complete,and evaluation would focus on the completed task,not on individual students' contributions to com-pleting the task."

Webb says that teachers need to prepare studentsfor group assessment. Small-group collaborationcan help students to develop valuable communica-tions skills and give them a better "understanding ofwhat kinds of group processes help them learn asindividuals and what kinds of processes help maxi-mize group productivity," she concludes.

The Evolution of a Portfolio Program: TheImpact and Quality of the Vermont Program inIts Second YearDaniel Koretz, Brian Stecher, Stephen Klein, andDaniel McCaffreyCSE Technical Report 385, 1994 ($4.00)

part of art ongoing evaluation of the Ver-mont portfolio assessment program byRAND/CRESST researchers, this report

presents recent analyses of the reliability of Vermont portfolio scores, and the results of schoolprincipal interviews and teacher questionnaires.

The message, especially from Vermont teachers,say the researchers, remains mixed. Math teachers,for example, have modified their curricula and teach-ing practices to emphasize problem solving andmathematical communication skills, but many reelthey are doing so at the expense of other areas of thecurriculum. About one-half of the teachers reportthat student learning has improved, but an equalnumber feel that there has been no change. Addi-

2 0

-c


tionally, teachers reported great variation in theimplementation of porttblios into their classrooms,including the amount of assistance provided tostudents.

"One in tbur teachers," found the authors, "doesnot assist his or her own students in revisions, anda similar proportion does not permit students tohelp each other. Seventy percent of fourth-gradeteachers and 39% of eighth-grade teachers forbidparental or other outside assistance."

Consequently, students who receive more sup-port from teachers, parents and other students mayhave a significant advantage over students whoreceive little or no outside help.

Reliability problems continue. "The degme ofagreement," write the authors, "among Vermont'sportfolio raters was much lower than among ratersin studies with other types of constructed responsemeasures."

The authors suggest that one cause of the lowreliability was thc diversity of tasks within eachportfolio. Because teachers and students are free toselect their own pieces, performance on the tasks ismuch more difficult to assess than if the work werestandardized.

Despite these problem areas, support for theportfolio program remains high. Teachers, for ex-ample, expressed strong support for expandingportfblios to all grade levels. Seventy percent ofprincipals said that their schools had extended port-folio usage beyond the original Vermont statemandate.

A Conceptual Framework for Analyzing theCosts of Alternative AssessmentLawrence 0. PiensCSE Technical Report 384, 1994 ($4.00 )

Despite the fact that many states are investing millions ofdollars in the developmentof alternative assessments, little is known

about the actual costs of such assessments. In A

19

Conceptual Framework fiw Analyzing the Costs ofAlternative Assessments, CRESST partner Lawrence0. Picus analyzes many of the issues related toidentifying the costs of new assessments includingthe relationship between costs and goals.

"If, as is often the case in education," says Picus,"there are multiple goals established for an alterna-tive assessment program, then estimation of thecosts of that program must include all of the re-sources necessary to accomplish all of those goals."

To identify alternative assessment costs, Picussuggests the use of a three-dimensional model com-prised of levelsofexpenditures, kindsofexpenditures,and expenditure components. Levels of expendituresare the source of expense such as national, state,district, school, classroom, or private market levels.Kinds of expenditures include personnel, materials,supplies, and travel. Components include assess-ment development, production, training, scoring,reporting and program evaluation.

"The laigest single expenditure item in any assess-ment program," concludes Picus, "seems likely tohe personnel."

Opportunity costs must also be considereduldsPicus. Resources committed to creating an alterna-tive assessment program are resources used to supporta former testing program or resources that could bespent on other programs, such as bilingual educa-tion.

The framework developed in A Conceptual Frame-work fir Analyzing the Costs of Altern atire Assessmentaddresses both opportunity costs and assessmentcosts matched to goals.

20

RECENT CRESST/CSE TECHNICAL REPORTS

Economic Analysis ofTesting: Competency, Cer-tification, and "Authentic" AssessmentsJames S. Carrera ll and Lynn WintersCSE Technical Report 383, 1994 (S4.00)

Cost- benefit and cost-effectiveness analv-ses, say researchers James Catterall andl.yn ri \linters in Economic Ana lvsis of Test-

ing: Competency, Certification, and "Authentic"Assessments, have similar policy purposes.

"Both analyses," note the authors, "aim at whatchoices might be made either to reach given goalswith lower costs, or to attain more results for a givenbudget allocation."

But trying to use either economic analysis isdifficult when applied to educational assessmentbecause anticipated benefits are moot. The authorsnote, for example, that policy makers, test coordi-nators, principals, and counselors have used minimumcompetency tests to motivate students, encourag-ing them, albeit negatively, to develop and improvebasic skills. Yet a study by James Catterall in 1990showed that fewer than half of 736 students in eighthigh schools ( fbur states) were even aware that theirhigh school required them to pass a minimumcompetency test prior to graduation. Clearly theintended motivational benefit or effect sought bypolicy makers was not reflected by what was trulyhappening. Thus, tying the costs of assessments tobenefits and effects that may not really occur isproblematic.

Regarding performance assessments, the authorssuggest that linking policy and costs will be equallychallenging. Because performance assessments shou Idbe good instructional activities themselves, it is

difficult to differentiate the costs into specific cat-egories such as assessment, curriculum, or, in casesof scoring, professional development for teachers.

"That they r perfiirmance assessments I have re-turns over and above current tests is presentlyassumed," conclude Catterall and Winters. "Estab-

lishing the linkages between the costs and benefitsmay be an important factor in the course of testingreform in the 1990s."

Analysis of Cognitive Demand in Selected Alter-native Science AssessmentsGail Baxter, Robert Glaser, and Kalyani RaghavanCSE Technical Report 382, 1994 ( S-1.0(, )

Working with pilot science assessments inCalifornia and Connecticut, the research-ers in Analysis qf Cvnitive Demand in

Selected Alternative Science Assessments focused oncognitive activity required for successful completionof performance assessment tasks. Of special interestwas the degree to which task performance reflecteddifferences in student understanding.

"We focused," wrote Baxter, Glaser, and Raghavan,"on the extent to which: (a) tasks allowed studentsthe opportunity to engage in higher order thinkingskills and (b) scoring systems reflected differentialperformance of students with respect to the natureof cognitive activity in which they engaged."

Data came from three types of science assessmenttasksexploratory investigation, conceptual integra-tion, and :omponent identificationeach varyingwith respe.:t to grade level, prior knowledge, stageof development and purpose. Analyses of the dataresulted in some important recommendations forthe development of assessment tasks and scoring.

In general, wrote the authors, "tasks should: (a)be procedurally open-ended afibrding students anopportunity to display their understanding; (b)draw on subject matter knowledge as opposed toknowledge of generally familiar facts; and (c) becognitively rich enough to require thinking."

The authors concluded that scoring systems should:(a) link score criteria to task expectations; ( b ) besensitive to the meaningful use of knowledge; and( c ) capture the problem -solving processes the stu-dents engage in.

RECENT CRESST/CSE TECHNICAL REPORTS

Measurement- Driven Reform: Research on Policy,

Practice, RepercussionAudrey J. Noble and Maly Lee SmithCSE Technical Report 381, 1994 i S4.00 )

Dc monst rating a top-dos% n educational re

finni strategy and a belielt hat assessmentcan leveragc educational change, the Ari-

zona state legislature in 1990 passed Arimma Res ised

Statute 15-741. The legislation resulted in the Ariion , ' I ident Assessment Program i ASAP i,a program

that incorporated both standardized and perfor-mance-based assessments. Measurement-DripenRefirm: Research on Policy, Practice, Repercussionreports on how ASAP was conceived, negotiated,and implemented. The CRESST researchers con-ducting the study, Audrey Noble and Mary LeeSmith, were critical of t he policy process that created

ASAP.ASAP "reveals both the ambiguities characteristic

of the [assessment ; policy-making process," writeNoble and Smith, "and the dysfunctional side effectsthat evolve from the 1, 'hey's disparities."

Employing multiple research methods, the researchers interviewed members oft he policy- shaping

community and examined documents and artifactsof the testing policy. ....heir analysis determined that

competing ideas about student learning, teachers,curriculum, and assessment resulted in inetkctiyeimplementation of the assessment program. 1-ive

inconsistencies were reported:

Policy makers' detiniticms of "learning" w ereincoherent;

Polio. makers held dissonant expectations ofteachers;

Policy makers clashed regarding the rt de of

curriculum;Rtlics makers alk.gcd t hat ,1 single pertinmanceassessment could fulfill the dual purposes ofinstructional improsement and accountability;

23

11

The implementation plan of the Arizona Stu-dent Assessment Program ss as a dysfunctionalside effect of a policy built on contradictoryideals.

"Although ASAP appeals to mans. because of itsambiguity," conclude Noble and Smith, "this samecharacteristic may undermine its capacity to effectany substantial change in educational practice."

What Happens When the Test Mandate Changes?Results of a Multiple Case StudyMary Lee Smith, Audrey I. Noble, Marilyn Cabal.,Walt Heinecke, M. Susan Junker, and Yvonne Saf-

fronCSE Technical Report 380, 1994 (S4.501

The Arizona Student Assessment Program(ASAP) was designed to solve what policymakers perceived to be the state's most

pressing educ,:tional problems: moving schoolstoward the state curriculum framework and makingschools more accountable fin student achievement.However, as findings from this multiple case studydemonstrate, the actions of practitioners were farfrom uniform in response to this policy mandate.

In this report, CRESST researcher Mary LeeSmith and colleagues outline the results to date ofa three-year, qualitative study of school reactions tothe ASAP mandate. One of a series of reports on alarger project, What Happens When the Test Man-date Changes?, the present study addresses theconsequences of the change mandate in four Ari-zona elementary schools during the first Year ofimplementation.

Using a case study met hodology, the researchersfocused on the interplay of polic and practice byengaging directly in the local, school-site scene.This part icular approach allowed them to gaincess to participant meanings and to show hcmmeanings in act km c\ olved over time.

12


Results from this study indicated that local schoolresponses to the policy mandate varied substantially.The goal of transtbrming classroom instructionalpractices was achieved in only one school that hadadopted such practices prior to the state mandate.

"Local interpretations and organizational normsintervened to color, distort, delay, enhance, orthwart the intentions of the policy and the policy-shaping community," concluded the authors.Expectations for school reform based on mandatesmust consider the vast disparities that exist betweenindividual schools and teachers.

Assessment, Testing, and Instruction: Retro-spect and ProspectRobert Glaser and Edward SilverCSE Technical Report 379, 1994 ($4.50)

/ncreasing concern about the nature and formof student assessments and the uses made oftcst results forms the basis for Assessment,

Testing and Instruction: Retrospect and Prospect byCRESST researchers Robert Glaser and EdwardSilver. The authors explore the nature of testing andassessment by examining some of the deficienciesand abuses associated with past practices in educa-tional measurement, then i.westigating present andfuture possibilities for alternative forms of assess-ment.

"At this point in timc," write Glaser and Silver,"assessment and testing in American schools arecaught between the extensive rhetoric of reform andthe intransigence of long-established practices."Through an informative discussion of the two standard purposes of educational assessmenttestingfor selection and placement and assessing educa-tional outcomesthe authors demonstrate the needfor an evaluation of the purposes of educationaltesting.

11111111111111111111111C..4wee.A....

MORE TECHNICAL REPORTSPolicy Makers' Views of Student AssessmentLorraine Mc DonnellCSE Technical Report 378, 1994 (54.00)

Engaging Teachers in Assessment of Their Stu-dents' Narrative Writing: Impact on Teachers'Knowledge and PracticeMaryl Gearhart, Shelby A. Wolf Bette Burker, andAndrea K. WhittakerCSE Technical Report 377, 1994 (S4.00)

Test Theory ReconceivedRobert J. MislevyCSE Technical Report 376, 1994 ($4.50)

Linking Statewide Tests to the National Assess-ment of Educational Progress: Stability ofResultsRobert L. Linn and Vonda L. KiplingerCSE Technical Report 375, 1994 ($4.00)

UCLA's Center for the Study of Evaluation &The National Center forResearch on Evaluation,

Standards, and Student TestingEva L. Bakcr, Co-director

Robert L. Linn, Co-directorJoan L. Herman, Associate Director

Ronald Dietel, EditorKatharine Fry, Editorial Assistant

Thc work reported in this publication was supportedunder the Educational Research and Development Cen-ter Program cooperative agreement numberR117610027 and CFDA catalog number 84.117G asadministered by the Office of Educational Research andImprovement, U.S. Department of Education. Thefindings and opinions expressed in this publication donot reflect the position or policies of the Office ofEducational Research and Improvement or the U.S.Departmen. .1 Education.

24

OTHER CURRENT CRESST/CSE PRODUCTS

DATABASESAlternative Assessments in Practice DatabaseListings from over 250 developers of new assess-ments, Macintosh version only, 1993 (S15.00)

VIDEOTAPESAssessing the Whole Child18-minute video prouram includes new practitioner'sguidebook. VS, 1994 S15.001

Portfolio Assessment and High Technology10 minute video program includes practitioner'sguidebook on portfolio assessment, V2, 1992($12.50)

HANDBOOKCRESST Performance Assessment Models: As-sessing Content Area ExplanationsEva L. Baker, Pamela R. Aschbacher, David Niemi,and Edynn SatoThe CRESST Handbook, 1992 (S10.00)

13

RESOURCE PAPERWriting What You Read: A Guidebook for theAssessment of Children's NarrativesShelby Wolf and Maryl GearhartCSE Resource Paper 10, 1993 (S4.00)

For a complete list of all CRESST products, pleasecontact Kim Hurst at 310-206-1532, e-mail:[email protected], or UCLA Center fbr the Study ty'Evaluation, 10880 litilshire Blvd., #-00, Los Angeles,CA 90024-1394.

FOLD AND SECURE

UCLA Center for the Study of Evaluation10880 Wilshire Blvd., Suite 700Los Angeles, California 90024-1394

25

PlacePostage

Here

24

CSE Number

Order FormAttach addizi(inal sheet if more room is needed. Fiurm is pre addressed ,11 les crsc

CSE/CRESST Products

Tide Number of Cuyics Prk. c rcr ,,pu 4.31 c

POSTAGE & HANDLING(Special 4th Class Book Rate)

Subtotal of SO to S10 add S1.50$IO to S20 add $2.50S20 to $50 add $3.50

over $50 add 10% a Subtotal

ORDER SUBTOTAL

POSTAGE S: HANDLING tscalc at left )

California residents add 8.25%

Your name & mailing addressplease print or type:

TOTAL

Orders of less than $10.00 must be prepaid

Payment enclosed C.3 Please bill me

U 1 Would like to recene free copies of theCRESST Line and Ern/wit:on Commentpublications.

UCLA Center fbr the s...mks of Evaluation10880 Wilshire Blvd., Suite 700Los Angeles, California 90024-1394

ADDRESS CORRECTION REQUESTED ( ED 63 )

26

NONPROFIT OR(

U.S. N)STA(il.

PAI D

U.C.I..A.

Dr. Carol AscherSenior Research Assoc.Institute for Urban It Minority Ed.Columbia University Teachers ColleoeP.O. Box 40

DOCUMENT RESUME ED 390 919 TM 024 326 AUTHOR Gearhart ... · then supporting more abridged...

Documents

Transcript of DOCUMENT RESUME ED 390 919 TM 024 326 AUTHOR Gearhart ... · then supporting more abridged...