1 A Century of Testing: Ideas on Solving Enduring Accountability and Assessment Problems UCLA, Los...

Post on 18-Jan-2018

216 views 0 download

description

3 …so much has happened… r Advancing the link to teaching and learning r Refining system monitoring l effectiveness (quality) l efficiency (value for money) l equity r Taking an international perspective l IEA surveys such as TIMSS, PIRLS l OECD Programme for International Student Assessment (PISA) –different national achievement on social background slopes –Google on PISA r Somewhere else? (given what else was on the programme)

Transcript of 1 A Century of Testing: Ideas on Solving Enduring Accountability and Assessment Problems UCLA, Los...

1

A Century of Testing:A Century of Testing:Ideas on Solving Enduring Ideas on Solving Enduring

Accountability and Assessment Accountability and Assessment ProblemsProblems

UCLA, Los AngelesUCLA, Los Angeles8-9 September 20058-9 September 2005

Barry McGawBarry McGawDirector for EducationDirector for Education

Organisation for Economic Co-operation and DevelopmentOrganisation for Economic Co-operation and Development

Celebrating 20 years of Research Celebrating 20 years of Research on Educational Measurementon Educational Measurement

The 2005 CRESST Conference:The 2005 CRESST Conference:

2

Where to focus…Where to focus…

3

…so much has happened… Advancing the link to teaching and learning

Refining system monitoring effectiveness (quality) efficiency (value for money) equity

Taking an international perspective IEA surveys such as TIMSS, PIRLS OECD Programme for International Student Assessment (PISA)– different national achievement on social background slopes

– Google on PISA Somewhere else? (given what else was on the programme)

4

One key problem to be One key problem to be resolvedresolved

5

Point of reference for judging individuals

Abandoning hope of an external measure Psychophysics

– comparing judgements (such as brightness of light) with measure

– requiring judgements of differences, not absolute values Psychological phenomena

– developed in the context of differential psychology – individual performance judged in relation to other’ performance

– in particular, in relation to average performance of others

– norm-referenced (Want to look better? Choose other company.)

In search of an external criterion Separating scale construction and measurement

– Thurstone– criterion-referenced measurement

Simultaneous scale construction and measurement– item-response models (person-response-to-item models)

6

Application in a high-stakes Application in a high-stakes arenaarena

7

Public examinations High-stakes assessments based on curriculum

secondary certification and university entrance selection of highly competitive courses (top 1½ per cent)

need a common curriculum across schools The comparability-over-time problem…

Grade distributions used to monitor standards– failure rate used as a measure of ‘standards– claim that if participation rates grow, grades should decline to ensure that an ‘A’ still and ‘A’, etc

– do enough students fail? Criterion (standards) and norm (cohort)-referencing– ‘standards’ were never absent (in curriculum, examination)– ‘standards’ were ignored in the norm-based award of results– cannot use link items over time, whole test must become public

– marrying criterion and norm-referencing with judgments

8

Marrying criterion and norm-referencing

England use of criteria defined for some grade boundaries

review of previous years’ scripts at grade boundaries

reference to prior grade distributions reference to evidence of change in student cohort to justify shifts in grade distributions between years

Australia (New South Wales) development of band descriptors ‘consistent’ definition of bands over years. reporting with norm and criterion-referencing

9

The Suite of Documents

10

All HSC courses listed with Assessment Mark, Examination Mark, HSC Mark and Performance Band

All Preliminary courses listed

11

Descriptions in bands: summary of what students know and can do

Minimum standard expected (50)

Graph of distribution of results to show how all students performed

Student’s HSC Mark

Mark Range 0–100

Examination Mark

School Assessment

Mark

Number of candidates

12

How they got there… Review and recommendations for change

New NSW Higher School Certificate– McGaw, (1997). Shaping their future: Recommendations for reform of the Higher School Certificate. Sydney: Department of Training and Education Co-ordination

Scaling process– standards-referencing to curriculum and over-time– Bennett, J. (2001), Standards-setting and the NSW Higher School Certificatewww.boardofstudies.nsw.edu.au/manuals/pdf_doc/bennett.pdf

Developing grade descriptors Used past examinations

– experienced examiners for each subject– reviewed examination papers and students’ marked papers

Developing band descriptors– described performance for Band 6 to 2, low Band 1 not described

13

Using grade descriptors Stage 1

examiners independently form ‘image of band’ set cut mark for each band boundary on each question

Stage 2 examiners work together to reach agreement on boundary locations for bands on each question

boundary locations for total scores also established

Stage 3 Student work at boundaries on total scores reviewed Cut points reviewed and determined Boundaries located on mark scale

– 5/6 boundary set to 90– 4/5 boundary set to 80– …– 1/2 boundary set to 50

14

but, it does not alwaysbut, it does not alwayschange debate…change debate…

15

Debate isn’t always changed Federal Minister

found an English paper awarded a pass despite some inadequate expression within it

concluded too few students were being failed Nature of debate

became again a debate about desirable failure rates

important for such debates to be reconstructed as a debate about nature of performance judged inadequate

16

OECD education websitewww.oecd.org/edu

ContactBarry.McGaw@oecd.org

Thank-you