DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot...

28
DOCUMENT RESUME ED 424 254 TM 029 126 AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of Large-Scale Educational Testing. A Policy Information Perspective. INSTITUTION Educational Testing Service, Princeton, NJ. Policy Information Center. PUB DATE 1998-06-00 NOTE 27p.; Based on an invited paper presented to the National Research Council's Board on Testing and Assessment (Orlando, FL, February 1997). AVAILABLE FROM Policy Information Center, Mail Stop 04-R, Educational Testing Service, Rosedale Road, Princeton, NJ 08541-0001; Web site: http://www.ets.org; e-mail: [email protected] ($9.50). PUB TYPE Reports Descriptive (141) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS Accountability; Computer Assisted Testing; *Educational Assessment; Educational Planning; *Futures (of Society); *Test Construction; Test Content; Test Format; Test Use; *Testing; Testing Problems; Testing Programs IDENTIFIERS *Large Scale Assessment ABSTRACT This paper offers a scenario for how educational assessment might change in response to market forces that affect not only the future of large-scale testing but also society in general. The scenario divides into three generations distinguished by the purpose of testing, test format and content, and the extent to which testing capitalizes on new technology. The first generation lays the basic infrastructure for electronic tes'zing. It combines advances in psychometrics with technology to deliver large-scale tests adaptively. In the second generation, large-scale tests undergo qualitative change, but their purposes and delivery mechanisms remain essentially the same. The last generation brings a rethinking of the purposes and mechanisms of large-scale assessment. One hallmark of this generation will be the emergence of interactive environments that facilitate individual growth in addition to serving the accountability functions normally fulfilled by large-scale tests. Large-scale electronic distance examinations will play a role in this scenario. Whether this scenario represents the way testing will develop, there is no doubt that large-scale testing must change to meet the different demands of the future. (Contains 1 table and 48 references.) (SLD) ******************************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. ********************************************************************************

Transcript of DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot...

Page 1: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

DOCUMENT RESUME

ED 424 254 TM 029 126

AUTHOR Bennett, Randy ElliotTITLE Reinventing Assessment. Speculations on the Future of

Large-Scale Educational Testing. A Policy InformationPerspective.

INSTITUTION Educational Testing Service, Princeton, NJ. PolicyInformation Center.

PUB DATE 1998-06-00NOTE 27p.; Based on an invited paper presented to the National

Research Council's Board on Testing and Assessment (Orlando,FL, February 1997).

AVAILABLE FROM Policy Information Center, Mail Stop 04-R, EducationalTesting Service, Rosedale Road, Princeton, NJ 08541-0001;Web site: http://www.ets.org; e-mail: [email protected] ($9.50).

PUB TYPE Reports Descriptive (141)EDRS PRICE MF01/PCO2 Plus Postage.DESCRIPTORS Accountability; Computer Assisted Testing; *Educational

Assessment; Educational Planning; *Futures (of Society);*Test Construction; Test Content; Test Format; Test Use;*Testing; Testing Problems; Testing Programs

IDENTIFIERS *Large Scale Assessment

ABSTRACTThis paper offers a scenario for how educational assessment

might change in response to market forces that affect not only the future oflarge-scale testing but also society in general. The scenario divides intothree generations distinguished by the purpose of testing, test format andcontent, and the extent to which testing capitalizes on new technology. Thefirst generation lays the basic infrastructure for electronic tes'zing. Itcombines advances in psychometrics with technology to deliver large-scaletests adaptively. In the second generation, large-scale tests undergoqualitative change, but their purposes and delivery mechanisms remainessentially the same. The last generation brings a rethinking of the purposesand mechanisms of large-scale assessment. One hallmark of this generationwill be the emergence of interactive environments that facilitate individualgrowth in addition to serving the accountability functions normally fulfilledby large-scale tests. Large-scale electronic distance examinations will playa role in this scenario. Whether this scenario represents the way testingwill develop, there is no doubt that large-scale testing must change to meetthe different demands of the future. (Contains 1 table and 48 references.)(SLD)

********************************************************************************

Reproductions supplied by EDRS are the best that can be madefrom the original document.

********************************************************************************

Page 2: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

U.S. DEPARTMENT OF EDUCATIONOffice of Educational Research and Improvement

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

ffrIfil'is document has been reproduced asreceived from the person or organizationoriginating it.

0 Minor changes have been made toimprove reproduction quality.

Points of view or opinions stated in thisdocument do not necessarily representofficial OERI position or policy.

PERMISSION TO REPRODUCE ANDDISSEMINATE THIS MATERIAL HAS

BEEN GRANTED BY

001( 0 - te-)1

TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)

1

SPECULATIONS

ON THE

FUTURE OF,

LARGE-SCALE

EDUCATIONAL

TESTING

BEST COPY AVAILABLE

2

ETSPOLICY INFORMATION CENTER

Research DivisionEducational Testing Service

Princeton. New Jersey 08541-0001

Page 3: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

Additional copies of this report can beordered for $9.50 (prepaid) from:

Policy Information CenterMail Stop 04-REducational Testing ServiceRosedale RoadPrinceton, NJ 08541-0001(609) 734-5694Internet [email protected]

http://www ets.org

Copyright © 1998 by EducationalTesting Service. All rights reserved.Educational Testing Service is anAffirmative Action/Equal OpportunityEmployer. The modernized ETS logo is atrademark of Educational Testing Service.

June 1998

Page 4: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

TABLE 6F CdNTENTS

Preface iii

Acknowledgments

Introduction 1

First-Generation Computer-Based Tests 3

Next-Generation Electronic Tests 5

Generation "R" 1 1

Conclusion 15

References 17

4REINVENTING ASSESSMENT i

Page 5: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

PREFACE

Reform in elementary and secondaryeducation remains in the forefront of thepublic's mind, and therefore also remainshigh on the agenda of elected officials andeducators generally And the demand for per-formance and accountability is spreading tohigher education as well. Most of thesereforms rely on testing testing to showincreased rigor of school curricula, testing todetermine if students advance and graduate,testing to judge the effectiveness of schoolsand teachers, and testing to compare districts,

states, and nations.

Despite the increased demands on test-ing, Randy Bennett points out that large-scaleassessment has little changed over the lasttwo decades. And that cannot continue.Bennett closes this brief report saying:"...large-scale assessment must change in themost fundamental ways, for nothing shortof reinvention will prepare it to meet the dra-matically different demands it will soon face."

Bennett tells us what these demands are andmight come to be. And he gives us a glimpseinto the future of the testing enterprise atdifferent stages of change and reinvention.

Paul E. BartonDirector

Policy Information Center

REINVENTING ASSESSMENT ° iii

Page 6: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

ACKNOWLEDGMENTS

This report is based on an invited paperpresented to the National Research Council'sBoard on Testing and Assessment, Orlando,February 1997. I appreciate the helpfulcomments of Paul Barton, Isaac Bejar,Nancy Cole, Larry Frase, and Bob Mislevyon an earlier version of the manuscript. The

positions presented are, however, my ownand not necessarily those of the reviewers,the National Research Council, or ETS. I amalso grateful for the production contribu-tions of Al Benderson, Carla Cooper, andKelly Gibson, without whom this PolicyInformation Center publication would nothave been possible.

REINVENTING ASSESSMENT 0 v

Page 7: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

INTRODUCTION

arge-scale educational assessmentconsists of those tests administered to sizable-numbers of people for such purposesas placement, course credit, graduation, educational admissions, and schoolaccountability. It includes group-administered, standardized tests used most often inthe secondary through postsecondary years.

Although there has been much recent intel-lectual ferment and experimentation ineducational assessment, the practice of large-

scale testing is much the same today as itwas 20 years ago. Most large-scale tests stillserve only institutional purposes, are admin-istered to big groups in single sittings on afew dates per year, make little use of newtechnology, and are premised on a psycho-logical model that probably owes more tothe behaviorism of the first half of thiscentury than to the cognitive science of thecurrent half.

There are good reasons to believe thatthis situation is about to change. Perhapsmost important is the emergence of a globaleconomy, which has created a general busi-ness climate of intensified competition(Sherman, 1995). Many domestic compa-nies now find themselves not only compet-ing on the home front with new foreignrivals but, at the same time, having toexpand to offshore markets where they mayencounter more competition. This environ-ment has encouraged successful U.S. busi-nesses to emphasize, among other things,innovation that responds to rapidlyshifting market needs, productivity, and

customer service as keys to competitiveadvantage (Treacy & Wiersema, 1995, p. 9).To foster these innovations, productivity, andcustomer-service goals, businesses arecapitalizing upon rapid advances inhardware, software, and communicationstechnology. Finally, our changing demo-graphics especially, the growth of minor-ity populations is making responsivenessto diverse customers yet another path tocompetitive edge.'

These market forces are having substan-tial impact on our society generally, so it isunderstandable that they are affecting large-scale educational assessment too. The testingenterprise has grown dramatically during thepast several decades, attracting a larger cor-porate presence and fueling competition.(Witness the purchase of testing and test-preparation companies by big corporations,and the partnership of nonprofit agencies with

for-profit ones.) At the same time, constitu-ents are beginning to expect the same thingsfrom testing agencies that they get in every-day commerce: innovation, productivity (asreflected in competitive pricing), andcustomer service. Increasingly, they areexpecting adaptation to diversity too.

'Diversity may belong under market needs; however, it is such an important issue in educational testing that I havecategorized it independently.

REINVENTING ASSESSMENT ° 1

Page 8: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

This paper offers a scenario for howeducational assessment might change inresponse to these forces. In doing so, thepaper seeks to stimulate thinking about thefuture of large-scale testing. This thinkingshould include the development of alterna-tive scenarios that can be played off againstone another to help define valued charac-teristics for future assessments.

The scenario divides into three genera-tions distinguished by purpose of testing,test format and content, test delivery loca-tion, and the extent to which testingcapitalizes on new technology The first gen-eration lays the basic infrastructure forelectronic testing. In the second generation,large-scale tests undergo qualitative change,but their purposes and delivery mechanismsremain essentially the same. The last gen-eration brings a rethinking of the purposesand mechanisms of large-scale assessment.

2 o REINVENTING ASSESSMENT 8

Page 9: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

FIRST-GENERATION COMI'UTER-BASED TESTS

lie early effects of innovation, customerservice, and (to a lesser extent) productivity can be seen in the first generation ofcomputer-based tests just emerging.

This first generation combines advances inpsychometrics with technology to deliverlarge-scale tests adaptively The computerselects questions based in part on previousresponses, tailoring the test to individualskill levels. Depending on the testing pro-gram, individuals can register by phone oremail; pay by credit card; test by appoint-ment in a relatively small, comfortable cen-ter; and receive scores at the conclusion ofthe session. Testing organizations can elec-tronically exchange questions and examineeresponses with test centers, and send scoresto institutions in the same way

In the 1997-98 academic year, rela-tively few tests were offered in this mode,accounting for roughly a million examinees.

Among those tests were the Graduate RecordExaminations (GRE) General Test (offeredas an alternative to the paper version), theGraduate Management Admission Test,Praxis I (for entry into teacher education pro-

grams), the SAT I: Reasoning Test (only forstudents applying to special precollege pro-grams for the academically talented), ACT'sCOMPASS, and the College Board'sACCUPLACER (the last two being for place-

ment in first-year college courses). However,

in the next several years, volumes willincrease dramatically as the Test of English

as a Foreign Language comes on line and theGRE General Test eliminates all paperexaminations. In addition, the Law SchoolAdmissions Council has started exploratorywork on a computerized version of the LSAT,

and the National Assessment GoverningBoard has recommended that the next NAEPcontractor be required to computer adminis-ter the National Assessment in at least onesubject at one grade. level (National Assess-ment Governing Board, 1996).

Whereas there is certainly a concertedmove toward electronic large-scale tests(especially in educational admissions), thereis no question that this assessment mode isstill in its infancy Like many innovations intheir early stages, today's computerized tests.automate an existing process withoutreconceptualizing it to realize the dramaticimprovements that the innovation couldallow. Thus, these tests are substantively thesame as those administered on paper: theymeasure the same skills, use the same behav-ioral designs, and depend primarily on thesame types of tasks. In addition, like manyinitial implementations, this first generationhas increased costs since, at least forhigh-stakes programs, item pools mustbe continuously replenished to supportongoing administration.

REINVENTING ASSESSMENT o 3

9

Page 10: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

To become firmly established, computer-

based assessment will clearly have to offermuch more. Fortunately, this first generationmay do just that. In fact, its most importantlegacy may be the basic infrastructure for anew generation of tests that uses technologymore pointedly to innovate in response toeducation community needs, improve pro-ductivity, enhance customer service, andaccommodate diversity

4 ° REINVENTING ASSESSMENT 1 0

Page 11: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

NEXT-GENERATION -ELECTRONIC TESTS

he next generation of large-scale elec-tronic tests will steadily incorporate advances in technology, psychometrics, and to agrowing extent, cognitive science.'

These tests will still be largely aimed at cre-

ating value for institutional constituencies,and they will continue to be administeredin dedicated centers.

The typical test in this period will bequalitatively different from those of the first

generation. This difference will be evidentin the test questions (and, in some cases,the characteristics they measure), as well asin development, scoring, and administrativeprocesses. The impetus for this shift, ofcourse, predates electronic testing. It comesfrom sustained criticism by educators(NCTM, 1989), the measurement commu-nity (Mislevy, 1993a; Shepard, 1992), cog-nitive psychologists (Pellegrino, 1992;Resnick & Resnick, 1990, 1992; Sternberg,1992), and the public, challenging the rel-evance of standardized tests.

A major impediment to making thesechanges has been the cost of deliveringalternative forms of assessment (Office ofTechnology Assessment, 1992, p. 243). This

next generation will deliver qualitatively dif-

ferent tests cost-effectively, opening the way

to significant change.

Logically, change should occur first in test

design which, as noted, today reflects an out-moded psychology Design, however, willprobably not be the first element to changebecause existing designs may well be adequate

for the limited selection purposes served bymost tests (Everson, in press; Mislevy, 1996).Also, design change is fundamental and willbe a difficult shift for institutional testing pro-

grams to make. It will occur, but only as test-ing purposes themselves transform.

Widespread change will come first in thenature of test questions and the formats forresponse, and the possibilities both providefor measuring new skills. In conventional test-ing, feasibility concerns dictate that audio orvideo be used only when a critical skill can-not be adequately assessed by standard meth-ods (e.g., when measuring foreign-languagelistening skills). Even under these circum-stances the temporary nature of conventionaltest centers makes using audio or video anadministrative burden. However, the adventof permanent computer-based test centerswith equipment capable of delivering high-quality multimedia will allow us to

2 Much has already been learned from cognitive science about how to improve ability and achievement testing (Glaser,1991; Sternberg; 1991). So why is this discipline likely to have limited influence on tests of the next generation? One reason isthat, as bureaucracies with multiple competing constituencies, large-scale assessment programs change incrementally From theperspective of many program directors and test developers, the move to computers is change enough. Even so, the importanceof this transition is readily understandable because computers are infusing most other parts of our society The need to rethinktests to reflect cognitive principles is not as well appreciated and, with few exceptions, serious efforts at conceptual redesignhave not yet begun.

FIST COPY AYAIILULE 11REINVENTING ASSESSMENT ° 5

Page 12: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

administer tests incorporating sound andvideo more efficiently Perhaps more impor-tantly, it will permit us to introduce audio,video, and animation in the assessment ofother skill areas.

One result of this introduction will bethe capability to measure traditional skillsmore comprehensively (Bennett, Goodman,Hessinger, Ligget, Kahn, Marshall, & Zack,in press). For example, most introductorycollege history courses include the analysis

of artifacts. To reflect this emphasis, theAdvanced Placement Examination in U.S.History asks students to use artifacts (e.g.,excerpts from diaries, news articles, letters,maps, and political cartoons) as evidence toformulate a written argument in responseto a question prompt. Twentieth-centuryhistory, of course, is documented by film andbroadcasts, as well as by print. Computerscan accommodate all the artifacts used onthe conventional test as well as historical

FIGURE 1: A QUESTION USING A MULTIMEDIA STIMULUS

TO TEST HISTORICAL ANALYSIS SKILLS

°IItem 1 of 11

DirectionsThis question is based on historicaldocuments horn the Office of War Information(OWI) between 1942-1944.

Analyze the different ways the DWI tried toinfluence Americans on the homebont duringthe Second World War.

In the film. "Inflation." the OWI tries toconvince the citizenry that it is beingmanipulated by the Nazis into weakening thewar effort through capricious spending. -Who ........is..._Died" takes a very different tack. Here theOWI tries to convince the viewer that

11 II

Who Died? Any Bonds Today i

Inflation Careless Talk 1__ ._ ___ _

iSection,

E>it Tata ,.,- ,IIIIMI

To play a selection, the examinee clicks on one of the four labeled buttons under the video windowandthen on the sideways triangle just above "Who Died?". Responses are typed into the lower-left entry box(which shows a partially complete answer). From Using multimedia in large-scale computer-based testingprograms (RR-97-3), by R. E., Bennett, M. Goodman, J. Hessinger, J. Ligget, G. Marshall, H. Kahn, & J.Zack, 1997, Princeton, NJ: Educational Testing Service. The scene depicted above comes from theCD-ROM, "Powers of Persuasion: The Art of Propaganda in World War II," produced by Fife and DrumSoftware, Silver Spring, MD, from records of the National Archives, Washington, D:C.

6 0 REINVENTING ASSESSMENT 12 EST COPY AVAIIIIA Iili LE

Page 13: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

films, TV, and radio broadcasts, therebyextending the range of source material withwhich students must be proficient (seeFigure 1).

At the same time as multimedia is usedto extend measurement of traditional skills,it will be used to measure new skills. In both

paper and computerized tests, we oftenassess skill in getting information from print.

We do this assessment because we considerreading critical to success in school, in mostjobs, and in activities of daily living. Theimportance of electronic media in commu-nicating information is clearly growing.(Witness the fact that most Americans gettheir news from TV and also note the rapidascent of the World Wide Web.) Conse-quently, we will increasingly expect students

to be able to process information from avariety of sources. Given this expectation,perhaps we should evaluate not only howeffectively people handle print but howwell they reason with information from film,radio, TV, and computers.

In addition to changes in the nature oftest questions, response formats will shift dra-

matically (Bennett, 1993). Some items onthese next-generation tests will be much thesame type of short, single-best-answer prob-lem as found on today's multiple-choice tests(but without the response options). Otherquestions will be somewhat lengthier and per-

haps less well-determined the kind ofproblem in which one is looking not for thebest answer, but only for a reasonable onegiven certain constraints (see Figure 2). Still

FIGURE 2: A CONSTRUCTED-RESPONSE PROBLEM WITH SEVERAL REASONABLE ANSWERS

°I

Item 1 of 1

A business needs to move cartons horn its warehouses to its stores at a reasonable cost. Here arethe relevant data:

Number of Cartons at Each Warehouse Number of Cartons Needed at Each Stole

Warehouse 1 5000 Store 1 2000

Warehouse 2 4000 Store 2 1000

Warehouse 3 3000 Store 3 8000

Store 4 1000

Transportation Cost Per Carton Between Store and WarehouseStore 1 Store 2 Store 3 Store 4

Warehouse 1 $1 $2 $2 $3

Warehouse 2 $2 $1 $2 $3

Warehouse 3 $2 $2 $1 $3

Fill in a distribution below, so each store will receive the number of cartons it needs, with a totaltransportation cost below $24,000.

Number That Should Be Shipped Between Each Warehouse and Each Store.

, Store 1 Store 2 1 Store 3 Store 41

Warehouse 1 0 0 0 0

arehouse 2W10 0 0 01

Warehouse 3 0 0 0 0

Isection

E> rt

<14= 1=#Lila_ X=

Responses are typed into the bottom matrix. Fully creditable answers must meet all stated conditions.Copyright © 1997 by Educational Testing Service.

BEST COPY AVAILABLE .13REINVENTING ASSESSMENT 0 7

Page 14: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

other problems will be completely open-ended; for instance, the educationcommunity's insistence on testing writingperformance will compel most programs toinclude an essay component that may, as stu-dents move universally to word processing,be offered only on computer. Tasks callingfor other types of performance oral pre-sentation, sign language, the display of anartifact will also be administered routinely,with responses captured digitally throughcomputer-controlled microphones and TVcameras (Bennett, in press-a). Short simu-lation exercises will also appear.3

Along with redefining the test questionwill come the reengineering of test devel-opment, scoring, and administrative pro-cesses, all driven primarily by the needto improve productivity With continuoustesting, item pools for high-stakes programshave to be replenished constantly, otherwisesecurity may be compromised. Replenish-ing the pool involves not only writing newquestions but also pretesting them to gen-erate the statistical calibrations used inadaptive testing.

To help create and calibrate items at therequired rate, new tools will emerge (Sing ley

& Bennett, 1995). These tools will allowdevelopers to construct question templates(or select them from large libraries) and then

vary both surface elements tangential tosolution as well as deeper structural ele-ments. Variants of questions will be created

based on theories of item difficulty drawnfrom cognitive psychology (e.g., Kirsch &Mosenthal, 1990; Sebrechts, Enright,Bennett, & Martin, 1997). Using these theo-ries, item parameters will be estimatedprecisely enough to reduce significantly thesample sizes needed for pretesting (Mislevy,

Sheehan, & Wingersky 1993; Sheehan &Mislevy, 1994).

Item generation tools may also eventu-ally help in specifying test designs thetype, organization, number, and parametersof tasks needed to achieve a specific result.Once a test design is specified, these toolscould automatically suggest which questiontemplates to use. A variety of standarddesigns derived from cognitive principlesmight exist from which the developer couldselect or create a new design, thereby push-ing tests toward formulations based moresolidly on cognitive theory (e.g., see Bennett& Sebrechts, 1997).

At some point the tools could wellbecome sophisticated enough to generateitems without human intervention. Adap-tive tests would no longer be built by creat-ing pools from which items were selected.Items would be created on the fly, eachone fashioned to a particular specificationat the moment of administration (Bejar,1993, 1996).

Test design will also be the focal pointfor responding to diversity (though the mostfar-reaching design changes for this purpose

3 Why not long domain-based simulations? The economics of large-scale educational testing in dedicated centerswon't permit it. Such simulations will become a standard pan of occupational and professional assessment. This fieldalready employs lengthy tests, sometimes taking days instead of hours, and can more easily charge the fees required tocover the considerable costs that simulations incur.

8 ° REINVENTING ASSESSMENT1 4

Page 15: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

will be yet to come). The effects of differenttest designs on minority group members,females, and older examinees will be rou-tinely simulated in deciding what skills andwhich task formats to use in large-scaleassessments (Willingham & Cole, 1997).(The recent addition of an essay to the PSAT-

NMSQT is an early, if post hoc, example.)Also, as the complexity of responses

required by computer-delivered perfor-mance tasks increases, test designers will find

it particularly challenging to build examineeinterfaces that are equally easy for all(Bennett, in press-b). The need to achievesuch a result will make systematic interfaceplanning and evaluation, as well as thedevelopment of more extensive preparationmaterials, central to equitable test design.

The need to develop more extensivepreparation materials to compensate forincreasing response complexity will, itself,force innovation. Test preparation softwarewill incorporate intelligent tutoring meth-ods (Wenger, 1987), that enable test takersto master new interfaces and test contentmore rapidly. The former purpose willrecede in prominence as more natural waysof interacting with computers become con-ventional. Employing sophisticated techno-logical tools to teach educational substance,however, will play an important role in thereconceptualization of large-scale testingdescribed in the next section.

Shifts in test design, and especiallychanges in the nature of test questions, willcause scoring processes to alter considerablytoo. Many constructed-response tasks that arenow graded manually will be processed auto-matically (e.g., Bennett, Steffen, Sing ley,Morley, & Jacquemin, 1997; Page & Petersen,1995; Sebrechts, Bennett, & Rock, 1991). Thecharacter of the responses will be simple atfirst mathematical expressions, equations,and graphs; words, phrases, and singlesentences but will eventually encompassmore complex performances, including essays(e.g., Burstein et al., 1998). Responses that can-

not be graded automatically will be handledcollaboratively by humans and computers(Bennett, in press-a).

Obviously, human judges will alsoretain purview over those tasks that are notcomputer deliverable some important skillswon't soon be amenable to assessment in thismedium (e.g., in the performing arts) buteven those tasks that can be electronicallyadministered may not be delivered that wayuntil all the requisite technological pieces arein place. Where responses can be recorded onpaper, they will be scanned and digitized.Other responses will be digitally recorded bycomputer-controlled camera or microphone.Once in digital form, the encoded results willbe sent electronically to human judges whowill grade the performance on screen. Judgesmay be at the same general location as theexaminee or elsewhere. If they are elsewhere,

15 REINVENTING ASSESSMENT ° 9

Page 16: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

they may work at the same site or at sepa-rate locations. If they work at separate sites,they may do their grading simultaneouslyor at different times. Software will trainjudges in the scoring rules, make workassignments, facilitate the judges' interac-tions with one another and with supervi-sors, introduce pre-calibrated benchmarksto maintain grading standards, and moni-tor any drift in standards that might occur.

As scoring and development processesevolve, there will be an inevitable linkage ofthe two. Software tools will help developersspecify and test automatic scoring keys asquestions are composed. Automatic itemgeneration will bring with it automatic keygeneration: the templates used to spawnvariants of questions and estimate theirparameters will simultaneously define thefeatures of correct responses that scoringengines will need to do their grading.

The last notable change will be admin-istrative. The establishment of internationalsystems of test centers will cause a shift from

the dedicated electronic networks of the first

generation to the Internet. This shift will bemotivated by cost efficiencies, improvements

in Internet data security, and improvedtransmission capability, allowing hugeamounts of data to be exchanged andmanipulated quickly Competition among

testing agencies will lead to alternativetest-center networks, which should help fuel

innovation, productivity, customer service,and greater response to diversity4

One result is that test makers will forgenew public-private partnerships, openingschool-based centers to supplement thosealready in commercial establishments. A sec-ond result is that customer interactions withtesting companies will be entirely electronic:

students will register, get preparation mate-rials, practice, explore institutions of highereducation, apply, receive scores, send scores

and performance samples, and resolveservice problems all via the Internet.

In sum, several key transformations will

define this next generation of large-scaletests. These transformations will be in thecharacter of questions, development andscoring processes, test design, and test cen-ter networks. Most notably, the ability todeliver multimedia questions, capture andscore complex constructed responses, cre-ate tasks efficiently, and move and manipu-late large amounts of data electronically will

make performance assessment a vital, if notprincipal, element of large-scale testing.Finally, although the administrative aspects

of testing will improve for test takers, inoverall purpose the enterprise will still beinstitutionally driven.

4 In planning or already established are test center networks run by ETS/Sylvan Learning Systems, NationalComputer Systems/VUE, and Harcourt-Brace Educational Measurement/Assessment Systems, Inc.

10 0 REINVENTING ASSESSMENT

Page 17: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

GENERATIONR.11

n this third generation, testing will reinvent

itself, breaking radically with tradition in several ways.

One hallmark will be the emergence ofinteractive environments that facilitateindividual growth in addition to serving theaccountability functions normally fulfilledby large-scale tests. To achieve thisend, large-scale assessment will join withinstruction, first at "arm's length" but in timecommingling totally. Along with otherforces, instructional integration willhave profound effects, producing the even-tual decline of conventional, one-time,center-administered examinations.

The need for new tests will be drivenby an increasingly competitive globaleconomy in which continuous learningbecomes central to success for much of ourpopulation; the establishment of the Internetas a pivotal commercial and social structure,making the delivery of quality educationmore cost-effective; and the imperativeto formulate a testing system that activelyhelps the nation's expected non-Whitemajority succeed.

The quickly escalating cost of a tradi-tional college education coupled with theconvenience of electronic networks willestablish distance learning as a dominantforce in higher education. Distance learn-ing will be international, permitting foreign

students to enroll at U.S. institutions with-out leaving home and U.S. students to takecourses abroad. Once established, thisapproach will migrate down to secondaryschool, permitting more students to learn inneighborhood institutions from world-classteachers and curricular materials, move to anew community without interrupting theirschooling, receive a home education, orreturn to pursue a General EquivalencyDiploma (Owston, 1997).

Large-scale electronic distance examina-tions will play a key role in this scenario.Computerized assessments based on conven-tionally recognized standards (e.g., NCTM,1989), will be embedded in the schoolcurriculum and occur frequently Sometimesan assessment will be made known inadvance; at other times with informedconsent it will simply be embeddedseamlessly in the distance-learning sessionand be indistinguishable from the instruc-tional components of that session. Decisionslike certification of course mastery gradua-tion eligibility, and school effectiveness willno longer be based largely on one examina-tion given at a single time but will alsoincorporate information from a series ofmeasurements (Bennett, in press-a). College

1 7REINVENTING ASSESSMENT 0 11

Page 18: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

admissions and placement may follow suitas the assessments used in standards-basedhigh school courses like the College Board'sPacesetter program and its Advanced Place-ment program, migrate to this curriculum-embedded, distance model. (See Bunderson,Inouye, and Olsen, 1989, for a moreelaborated, though not distance-based, con-ceptualization of continuous curriculum-embedded measurement.)

Such a model has profound implications

for accommodating population diversity, inparticular for addressing group differencesin test performance. In the current model,where large-scale tests are too often divorced

from schooling in both content and deliv-ery, it is easy to become fixated on ques-tioning the veracity of assessment. And,indeed, enormous energy has beeninvested with relatively little returnin debating how well current tests reflect the

skills of different population groups. Byvirtue of moving assessment into the cur-riculum, the locus of the debate over per-formance differences must logically shiftfrom the accuracy of assessment to theadequacy of instruction.

By this generation, the influence of cog-nitive science will be more strongly evident,

driving course and test design, and makingpossible closer articulation of assessmentand instruction. For assessment in particu-lar, design will be "theory-based," restingsquarely on fundamental conceptions of thenature of subject-matter expertise and the

structure of intellectual abilities (Everson, inpress). These conceptions will help design-ers organize tests around descriptions of whatskills are important to proficiency in a givenfield, how those skills are composed andinterrelated, and how they might be trained.

This merger of assessment and instruc-tion will be realized in some significant partthrough the use of electronic learning tools.These tools will implement a range of instruc-tional methods. For present purposes, how-ever, their most salient characteristic will bein leaving an electronic record of studentactivity that might contribute almost inciden-tally to summative decision making.

Fot example, intelligent tutors (Wenger,1987), microworlds (Shute & Glaser, 1990),and simulations (Schank & Cleary, 1995,chap. 5) exemplify environments that couldbe used for delivering instruction and certi-fying competence in specific skill areas (much

the way flight simulators are used for thesedual purposes in aviation today). With intel-ligent tutors particularly, student knowledgewill be dynamically modeled using cognitiveand statistical approaches capable both ofguiding instruction on a highly detailed leveland of providing a general summary of over-all standing (e.g., Mislevy & Gitomer, 1996).

Instruction will be adapted not only to themultiple dimensions that characterize stand-ing in a broad skill area, but to personalinterests and background, allowing moremeaningful accommodation to diversity thanwas possible with earlier approaches.

12 o REINVENTING ASSESSMENT

16'

Page 19: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

While these tools will typically be usedby individuals in isolation, other tools willbe oriented toward collaborative workundertaken in electronic learning commu-nities (Gordin, Gomez, Pea, & Fishman,1996). Such communities should permitstudents and teachers from different schoolsto exchange ideas and jointly carry outprojects. Mentoring may also be providedby practitioners who share expertise via theInternet to help inculcate the methods,mores, and social organization of a field. For

summative assessment purposes, a commonframework might be developed that allowsconsiderable latitude in the choice of groupprojects and, at the same time, comparablescores across the differing results (Mislevy1993b; Myford & Mislevy, 1995). Assum-ing the ability to grade even the most com-plex projects remotely and assignment of thesame grade to all members of a project team,

work in learning communities might inci-dentally serve large-scale assessmentpurposes too.5

The tasks associated with electroniclearning and assessment tools will bevery different from those of prioreras. For instance, simple multimediaexercises will give way to virtual realitysimulations. These simulations will modelcomplex environments science labs, field

experiences giving students a chance tolearn and be assessed under conditions simi-lar to those encountered by practitioners.(Again, today's flight simulators suggest thetype of educational environment that coulddevelop.)

In addition to improving the realism ofquestions, Generation "R's" computers willlet individuals respond more naturally Forexample, they may respond directly to thestudent's physical actions in virtual realitysimulations. Speech also will be accuratelyunderstood, as the additional informationin lip movements is employed to reduceambiguity (Gates, 1996). This understand-ing will further aid diversity by allowinginstruction and its embedded assessment tobe delivered through multiple presentationmodes (e.g., by presenting the same infor-mation in text and through audio), so thatall students (but especially those withdisabilities) can use computers more easily

What skills will these new learning andassessment tasks target? The proliferation ofsophisticated information and communica-tion technologies will undoubtedly influence

what skills are valued, taught, and assessed.Many valued skills will be the same as those

we consider important today, but some willbe new. With continuous learning requiredalmost universally, interest may reemerge in

5 Work in virtual communities may someday come to dominate education, as well as the world of work. Butcollaboration will not eliminate the need to assess individuals. Our political and economic systems value individualaccomplishment, reward, and accountability even in the presence of group assignment (e.g., note the commondistribution of individual awards and sanctions in team sports). As long as those ingrained values remain, assessing theindividual will continue to be important in educational decision making.

19 REINVENTING ASSESSMENT ° 13

Page 20: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

cognitive modifiability, or how effectively an

individual benefits from instruction (Lidz,1987). Also, the growth of electronic learn-ing and work communities will make remote

collaborative skills essential. Finally, in aknowledge-rich environment, we will haveeasy, cheap, and almost instantaneousaccess to information. The ability_ to posethe right questions and find, analyze, andorganize relevant knowledge may take onincreased importance. Intelligent agents willbe widely available to help us do some ofthese tasks (Gilbert, undated), making one'sdeftness in deploying virtual assistants acritical skill.

Where will this distance learning andassessment occur? It will certainly occur inthe home and for the foreseeable future inschools, commercial learning establish-ments, libraries, businesses, and communitycenters. To the extent that the Internetbecomes universal, some of these institu-tions may transform to virtual entities. Elec-tronic networks will likely reduce the needfor physical libraries (Lesk, 1997) and, per-haps in the long term, for schools as weknow them.

Dedicated test centers also may be onthe endangered list. To the extent that someform of embedded assessment works effec-tively as a basis for summative decisionmaking, the large-scale, one-time, test-center-delivered examination may diminish.The latter model may be weakened furtheras miniaturized, wireless broadcast

technologies are used to break test security(Colton, 1997). These technologies will posesuch new threats as the electronic pilfering ofitems from the radio frequency signals emit-ted by screen displays and the real-time coach-ing of test takers by expert compatriots whoremotely read screens and immediately broad-

cast the correct answers back to examinees.6The almost continuous (and often unobtru-sive) nature of assessment in the curriculum-embedded model should make suchhigh-tech gamesmanship considerablyless practical.

What does a curriculum-embedded,distance assessment model assume? Certainly,

it assumes a common set of content standardsfor educational attainment. Such standardshave been promulgated in most subject areas(e.g., NCTM, 1989). Second, it requires set-ting performance standards and developingmeasurements that are largely coterminouswith instruction. The Advanced Placementprogram, Pacesetter, and the New StandardsProject represent early, paper-based movesin this direction; the University of the Stateof New York's Regents College, a virtual uni-versity that certifies knowledge and grantscourse credits and degrees through distanceassessment is a post-secondary-level example;

and work on electronic learning tools that canincidentally provide global proficiency esti-mates is a complementary development.Finally, it demands an electronic infrastruc-ture for delivery and management, the foun-dations of which are already emerging.

6 The basic technology for CRT emanation monitoring already exists. van Eck (1985) wrote the classic scientific paper inthe field and Behar (1997, pp. 66, 70) gives a more recent popular description.

14 o REINVENTING ASSESSMENT

2 0

Page 21: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

CONCLUSION

This paper presented one scenario for thefuture of large-scale educational assessment.The scenario divides into three generations(summarized in Table 1). In recent years,large-scale assessment has changed relatively

little in purpose, administration mode, useof technology, and scientific grounding. This

situation is about to alter because competi-tion will force test makers to satisfy newmarket needs through innovation, improveproductivity, enhance customer-service, andaddress population diversity In response,large-scale assessment will come to serveboth summative and formative purposes, becurriculum-embedded and performance-based, occur at a distance, and measure newcompetencies as the skills valued by societychange. New technology along withadvances in cognitive and measurement

science will be the chief catalyst in reach-

ing these goals.Although it is attractive to emphasize

technology and the possibilities it engenders,

our focus should remain on large-scale edu-cational assessment and the needs it mustsatisfy Right now, large-scale assessmentruns the risk of falling out of synch on mul-tiple counts, including its relevance to edu-cational decision-making and its ability toaccommodate diversity Change, unfortu-nately, does not always come easily or asquickly as it should to well-establishedinstitutions. But large-scale educationalassessment must change in the most funda-mental ways, for nothing short of reinven-tion will prepare it to meet the dramaticallydifferent demands it will soon face.'

7 Some readers may find the scenario presented in this report implausible. But what may seem incomprehensibletoday can quite abruptly become reality. As an example, take global geo-politics, which is almost certainly far more complexthan large-scale testing. Had one asserted 15 years ago that the Berlin Wall would fall, the Soviet Union collapse, Germanyreunite, and the Cold War end, the reaction would have been disbelieving, to say the least!

2. REINVENTING ASSESSMENT 0 15

Page 22: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

TABLE 1: THREE GENERATIONS OF LARGE-SCALE EDUCATIONAL ASSESSMENT

Generation Key Characteristics .

First-Generation Computer-Based Tests(Infrastructure Building)

1. Primarily serve institutional needs

2. Measure traditional skills and use test designs and item formatsclosely resembling paper-based tests, with the exception that testsare given adaptively

3. Administered in dedicated test centers as a "one-time"measurement

4. Take limited advantage of technology

Next-Generation Electronic Tests(Qualitative Change)

1. Primarily serve institutional needs

2. Use new item formats (including multimedia and constructedresponse), automatic item generation, automatic scoring, andelectronic networks to make performance assessment an integralprogram component; measure some new constructs

3. Administered in dedicated test centers as a "one-time"measurement

4. Allow customers to interact with testing companies entirelyelectronically

Generation "R" Tests (Reinvention) 1. Serve both institutional and individual purposes

2. Integrated with instruction via electronic tools so thatperformance is sampled repeatedly over time; designed according tocognitive principles

3. Use complex simulations, including virtual reality, that model realenvironments and allow more natural interaction with computers

4. Administered at a distance

5. Assess new skills

16 o REINVENTING ASSESSMENT 22

Page 23: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

EFERENCES

Behar, R. (1997, February 3). Who's read-ing your e-mail? Fortune, 135(2), 56-70.

Bejar, I. I. (1993). A generative approach topsychological and educational measure-ment. In N. Frederiksen, R. J. Mislevy & I.Bejar (Eds.), Test theory for a new generationof tests (pp. 323-357). Hillsdale, NJ:Erlbaum.

Bejar, I. I. (1996). Generative response mod-eling: Leveraging the computer as a test deliv-ery medium (RR-96-13). Princeton, NJ:Educational Testing Service.

Bennett, R. E. (1993). On the meanings ofconstructed response. In R. E. Bennett &W C. Ward (Eds.), Construction vs. choice incognitive measurement: Issues in constructedresponse, performance testing, and portfolioassessment (pp. 1-27). Hillsdale, NJ:Erlbaum.

Bennett, R. E. (in press-a). An electronic in-frastructure for a future generation of tests.Journal of Learning and Evaluation.

Bennett, R. E. (in press-b). Computer-basedtesting for examinees with disabilities: Onthe road to generalized accommodation. InS. Messick (Ed.), Assessment in higher educa-tion: Issues of access, student development, andpublic policy. Hillsdale, NJ: Erlbaum.

Bennett, R. E., Goodman, M., Hessinger, J.,Ligget, J., Marshall, G., Kahn, H., & Zack, J.(in press). Using multimedia in large-scalecomputer-based testing programs. Computersin Human Behavior.

Bennett, R. E., & Sebrechts, M. M. (1997).Measuring the representational component ofquantitative proficiency Journal of EducationalMeasurement, 34, 62-75.

Bennett, R. E., Steffen, M., Singley, M. K.,Morley, M., & Jacquemin, D. (1997). Evalu-ating an automatically scorable, open-endedresponse type for measuring mathematicalreasoning in computer-adaptive tests. Journalof Educational Measurement, 34, 163-177.

Bunderson, C. V, Inouye, D. K., Olsen, J. B.(1989). The four generations of computerizededucational measurement. In R. L. Linn, Edu-cational measurement (3rd Ed.). New York:ACE/MacMillan.

Burstein, J. C., Braden-Harder, L., Chodorow,M., Hua, S., Kaplan, B., Kukich, K., Lu, C.,Nolan, J., Rock, D., & Wolff, S. (1998). Com-puter analysis of essay content for automatedscore prediction (RR-98-15). Princeton, NJ:Educational Testing Service.

Colton, G. D. (1997, March). High-techapproaches to breaching examination security.Paper presented at the annual meeting ofthe National Council on Measurement inEducation, Chicago.

23REINVENTING ASSESSMENT 17

Page 24: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

Everson, H. T. (in press). A theory-basedframework for future college admissionstests. In S. Messick (Ed.), Assessment in highereducation: Issues of access, student development,and public policy. Hillsdale, NJ: Erlbaum.

Gates, B. (1996). The road ahead. New York:Penguin.

Gilbert, D. (undated). IBM intelligentagents lOn-line]. Available: http:/www.networking.ibm.com/iag/iagwpl.html.

Glaser, R. (1991). Expertise and assessment.In M. C. Wittrock & E. L. Baker (Eds.), Test-ing and cognition (pp. 17-30). EnglewoodCliffs, NJ: Prentice-Hall.

Gordin, D. N., Gomez, L. M., Pea, R. D., &Fishman, B. J. (1996). Using the World WideWeb to build learning communities ink-12. Journal of Computer-Mediated Commu-nications [On-line], 12. Available: http://www.usc.edu/dept/annenberg/vol2/issue3/gordin.html.

Kirsch, I. S., & Mosenthal P. B. (1990).Exploring document literacy: Variablesunderlying the performance of young adults.Reading Research Quarterly, 15, 5-30.

Lesk, M. (1997). Practical digital libraries. SanFrancisco: Morgan Kaufmann.

Lidz, C. S. (1987). Dynamic assessment: Aninteractional approach to evaluating learningpotential. New York: The Guilford Press.

Mislevy, R. J. (1993a). Foundations of a newtest theory. In N. Frederiksen, R. J. Mislevy,and I. Bejar (Eds.), Test theory for a new gen-eration of tests. Hillsdale, NJ: Erlbaum.

Mislevy, R. J. (1993b). Linking educationalassessments: Concepts, issues, methods, and pros-pects. Princeton, NJ: Policy Information Cen-ter, Educational Testing Service. (ERIC # ED-353-302)

Mislevy, R. J. (1996). Test theory reconceived.Journal of Educational Measurement, 33, 379-416.

Mislevy, R. J., & Gitomer, D. H. (1996). Therole of probability-based inference in anintelligent tutoring system. User-Modeling andUser-Adapted Interaction, 5, 253-282.

Mislevy, R. J., Sheehan, K. M., & Wingersky,M. S. (1993). How to equate tests with littleor no data. Journal of Educational Measurement,30, 55-78.

Myford, C. M., & Mislevy, R. J. (1995). Moni-toring and improving a portfolio assessmentsystem (Center for Performance AssessmentResearch Report). Princeton, NJ: EducationalTesting Service.

National Assessment Governing Board.(1996). Policy statement on redesigning theNational Assessment of Educational Progress.Washington, D.C.: National AssessmentGoverning Board.

National Council of Teachers of Mathematics(NCTM). (1989). Curriculum and evaluationstandards for school mathematics. Reston,VA: National Council of Teachers ofMathematics.

Office of Technology Assessment. (1992).Testing in America's schools: Asking the rightquestions. Washington, D.C.: Office of Tech-nology Assessment, Congress of the UnitedStates.

18 0 REINVENTING ASSESSMENT

2 4

Page 25: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

Owston, R. D. (1997). The World WideWeb: A technology to enhance teaching andlearning? Educational Researcher, 26(2),27-33.

Page, E. B., & Petersen, N. S. (1995). Thecomputer moves into essay grading. PhiDelta Kappan, 76, 561-565.

Pellegrino, J. W (1992). Commentary:Understanding what we measure and mea-suring what we understand. In B. R. Gifford& M. C. O'Connor (Eds.), Changing assess-ments: Alternative views of aptitude, achieve-ment and instruction (pp. 275-300). Boston:Kluwer.

Resnick, L. B., & Resnick, D. P (1990). Testsas standards of achievement in sch000ls. InJ. Pfleiderer (Ed.), Proceedings of the 1989 ETSInvitational Conference: The uses of standard-ized tests in American education (pp. 63-80).Princeton, NJ: Educational Testing Service.

Resnick, L. B., & Resnick, D. P (1992).Assessing the thinking curriculum: Newtools for educational reform. In B. R. GiffOrd& M. C. O'Connor (Eds.), Changing assess-ments: Alternative views of aptitude, achieve-ment and instruction (pp. 37-75). Boston:Kluwer.

Schank, R. C., & Cleary, C. (1995). Enginesfor education. Hillsdale, NJ: Erlbaum.

Sebrechts, M. M., Bennett, R. E., & Rock,D. A. (1991). Agreement between expertsystem and human raters' scores on com-plex constructed-response quantitativeitems. Journal of Applied Psychology,76, 856-862.

Sebrechts, M. M., Enright, M., Bennett, R.E., & Martin, K. (1997). Using algebra wordproblems to assess quantitative ability:Attributes, strategies, and errors. Cognitionand Instruction, 14, 285-343.

Sheehan, K. & Mislevy, R. J. (1994). A tree-based analysis of items from an assessment ofbasic mathematics shills . (RR-94-14).Princeton, NJ: Educational Testing Service.

Shepard, L. A. (1992). Commentary: Whatpolicy makers who mandate tests shouldknow about the new psychology of intellec-tual ability and learning. In B. R. Gifford &M. C. O'Connor (Eds.), Changing assess-ments: Alternative views of aptitude, achieve-ment and instruction (pp. 301-328). Boston:Kluwer.

Sherman, H. (1995). The new worldeconomy In The Conference Board, (Ed.),Business and economic outlook: How will thecentury close (Report No. 1104-95-CH) (pp.9-11). New York: The Conference Board.

Shute, V J., & Glaser, R. (1990). A large-scale evaluation of an intelligent discoveryworld: Smithtown. Interactive LearningEnvironments, 1, 51-77.

Singley, M. K., & Bennett, R. E. (1995).Toward computer-based performanceassessment in mathematics (RR-95-34).Princeton, NJ: Educational Testing Service.

Sternberg, R. J. (1991). Toward better intel-ligence tests. In M. C. Wittrock & E. L. Baker(Eds.), Testing and cognition (pp. 31-39).Englewood Cliffs, NJ: Prentice-Hall.

25 REINVENTING ASSESSMENT ° 19

Page 26: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

Sternberg, R. J. (1992). Ability tests, mea-surements, and markets. Journal of Educa-tional Psychology, 192, 84, 134-140.

Treacy, M., & Wiersema, F. (1995). The dis-cipline of market leaders. Reading, MA:Addison-Wesley

van Eck, W (1985). Electromagnetic radia-tion from video display units: An eavesdrop-ping risk? Computers & Security, 4,269-286.

Wenger, E. (1987). Artificial intelligence andtutoring systems. Los Altos, CA: MorganKaufmann.

Willingham, W W, & Cole, N. S. (Eds.).(1997). Gender and fair assessment. Hillsdale,NJ: Erlbaum.

20 ° REINVENTING ASSESSMENT 26

Page 27: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

a_

04202 -14800 S58M6 204905 Printed in U.SA.

2 1

Page 28: DOCUMENT RESUME ED 424 254 · DOCUMENT RESUME. ED 424 254 TM 029 126. AUTHOR Bennett, Randy Elliot TITLE Reinventing Assessment. Speculations on the Future of. Large-Scale Educational

062)

U.S. DEPARTMENT OF EDUCATION()inc. of Educational fieboarch and Improvement (OEM)

fchocIlanal Hobourcos Inlolf1111110/1 cnhor (info

NOTICE

REPRODUCTION BASIS

lepuclTM029126

This document is covered by a signed "Reproduction Release(Blanket)" form (on file within the ERIC system). encompassing allor classes of documents from its source organization and, therefore,does not require a "Specific Document" R.elease form.

This document is Federally-funded, or carries its own permission toreproduce, or is otherwise in the public domain and, therefore, maybc reproduced by ERIC without a signed Reproduction Releaseform (either "Specific Document" Of "Blankre).