Issues relating to Large-scale Assessments

27
Issues relating to Large-scale Assessments Margaret Wu Victoria University 1

description

Issues relating to Large-scale Assessments. Margaret Wu Victoria University. International large-scale assessments. Main problem: interpretations of the results Focus on country rankings An example: - PowerPoint PPT Presentation

Transcript of Issues relating to Large-scale Assessments

Page 1: Issues relating to  Large-scale Assessments

1

Issues relating to Large-scale Assessments

Margaret WuVictoria University

Page 2: Issues relating to  Large-scale Assessments

2

International large-scale assessmentsMain problem: interpretations of the resultsFocus on country rankingsAn example:

In August 2012, Julia Gillard, then Prime Minister of Australia, declared that Australia would strive to be ranked in the ‘top five’ in international education assessments by 2025.

So strong is this ambition that it has been inscribed into the Australian Education Act of 2013 as its very first objective, which reads: ‘Australia to be placed, by 2025, in the top 5 highest performing countries based on the performance of school students in reading, mathematics and science’ (Australian Education Act, 2013, p. 3)

Page 3: Issues relating to  Large-scale Assessments

3

Does high ranking mean good education system?A Vietnamese researcher queried why Vietnam did well in

PISA despite poor education system in VietnamGorur & Wu, 2014, (Former OECD official– interview

transcript):What’s the good of [the rankings]? what is the benefit to the US to be told that it is number seven or number 10? It’s useless, meaningless, except for a media beat up and political huffing and puffing. It’s very important for the US to know, having defined certain goals like improving participation rates for impoverished students from suburbs in large cities – whether in fact that is happening, and if it is, why it is happening and if not, why not. And it is irrelevant whether Chile or Russia or France is doing better or worse – that doesn’t help one bit – in fact it probably hinders. Makes people feel uncertain, unsure, nervous, and they rush over there and find out why they are doing better.

Page 4: Issues relating to  Large-scale Assessments

4

And rushed there, they did…(Australian) Grattan Institute’s Catching Up:

Learning from the Best School Systems in East Asia (Jensen et al., 2012)… researchers from Grattan Institute visited the

four education systems [Hong Kong, Shanghai, Korea and Singapore] studied in this report. They met educators, government officials, school principals, teachers and researchers. They collected extensive documentation at central, District and school levels. Grattan Institute has used this field research and the lessons taken from the Roundtable to write this report (p. 6)

Page 5: Issues relating to  Large-scale Assessments

5

Suggested factors for high ranking (performance)One observation made by the Grattan

Institute…“Shanghai, for example, has larger class sizes

to give teachers more time for school-based research to improve learning and teaching.” (p.2)

(Observation also made by OECD PISA, 2010)New Zealand government proposed to

increase class size to free up money to fund initiatives to raise the quality of teaching (NZ Treasury briefing paper, March, 2012)

Page 6: Issues relating to  Large-scale Assessments

6

Discussion pointsThese “policies” are often said to be

“evidence-based”, where large-scale assessments are frequently quoted as the sources of evidence.

Why should we be concerned with these policies?

ConsiderValidity - issuesReliability - issues

Page 7: Issues relating to  Large-scale Assessments

7

Validity issuesLinking factors to performanceKorea and China perform well, and have large class

sizes.Can we conclude large class size leads to good

performance?Making inferences:

No. of storks positively correlated with no. of babies born

Crime rate positively correlated with ice cream salePeople who take care of their teeth have better general

healthMediating variables at play

Page 8: Issues relating to  Large-scale Assessments

8

Linking PISA to PoliciesPISA tells us about student performance, and

background of students/schools/countriesLinking background to performance is done

by people, not proven by statistics.Any interpretation is an inference.PISA cannot substantiate the validity of the

inferences.Need other in-depth studies.

Page 9: Issues relating to  Large-scale Assessments

9

A common misunderstanding about statistical analysisregression equation Y = a + bX

X is termed explanatory variableY is termed dependent variable

Does X explain Y?Try X = a + bYExactly the same resultsRegression does not test for causal inference.

Regression only reflects correlation.

Page 10: Issues relating to  Large-scale Assessments

10

Regress Reading on GDP scoresCoefficientsa

ModelUnstandardized

Coefficients

Standardized

Coefficients

t Sig.BStd.

Error Beta1 (Constant) 479.828 10.310   46.54

2.000

GDP .427 .301 .243 1.416 .167a. Dependent Variable: Reading

Coefficientsa

ModelUnstandardized

Coefficients

Standardized

Coefficients

t Sig.BStd.

Error Beta1 (Constant

)-36.464 48.202   -.756 .455

Reading .138 .098 .243 1.416 .167a. Dependent Variable: GDP

Page 11: Issues relating to  Large-scale Assessments

11

Reliability Issues

How strong is the relationship between two variables?

P value = 0.11

n.s. at 95% level

  Small class size

and/or low teachers’

salaries

Large class size and

high teachers’

salaries

Number of countries

performed higher

than OECD average in

reading / Total

number of countries

Low cumulative

expenditure on

education

3 out of

31 countries

performed higher

than OECD average

in reading.

3 out of

12 countries

performed higher

than OECD average

in reading

6/43

High cumulative

expenditure on

education

8 out of

20 countries

performed higher

than OECD average

in reading

2 out of

2 countries

performed higher

than OECD average

in reading

10/22

Number of countries

performed higher

than OECD average

in reading / Total

number of countries

11/5

15/14 16/65

Page 12: Issues relating to  Large-scale Assessments

12

Top five in what?Interview transcript of a senior OECD official (Gorur & Wu):OECD Official: Well, Australia is doing pretty well!RG: It’s doing well, right? But you know what we want to do

now? Our Prime Minister says we want to be in the top five in PISA!

OECD Official: Top five in what? RG: In PISA.OECD Official: Yes, but for which students? The average

student in Canada, in Korea, Finland, Shanghai, China – that’s one thing. If you then look at high performing students or how low performing students do, then we may get a completely different picture. And that’s where policy efforts are most interesting for me.

Page 13: Issues relating to  Large-scale Assessments

13

Australian 2009 PISA Reading results, by state

State Mean score Confidence interval

ACT 531 520–543WA 522 510–534QLD 519 505–532NSW 516 505–527VIC 513 504–523SA 506 497–516TAS 483 472–495NT 481 469–492Australia 515 510–519

In top 5 already

Below OECD

average

Page 14: Issues relating to  Large-scale Assessments

14

Ranking by item content Country Item

M408Q01TR   Country Item

M420Q01TRHong Kong-China 0.60   New Zealand 0.66

Finland 0.56   Australia 0.64Australia 0.56   Canada 0.64Chinese Taipei 0.55   Ireland 0.62United Kingdom 0.55   Shanghai-China 0.62New Zealand 0.55   United Kingdom 0.60Macao-China 0.53   United States 0.59Iceland 0.52   Chinese Taipei 0.58Ireland 0.51   Singapore 0.57Singapore 0.50   Denmark 0.57

Page 15: Issues relating to  Large-scale Assessments

15

Differential Item Functioning (DIF)Australia performed extremely well on Items

M408Q01TR and M420Q01TR, ranking third and second respectively internationally. For ItemM408Q01TR, Shanghai-China ranked 20th, despite the fact that Shanghai took the top spot internationally in mathematics literacy, with a mean score much higher than the second place country, Singapore. For Item M420Q01TR, Australia outperformed all top ranking countries.

In contrast, for Item M462Q01DR, Australia ranked 43 internationally, with an average score of only 0.1 out of a maximum of two, while Shanghai had an average score of 1.5 out of a maximum of two.

Page 16: Issues relating to  Large-scale Assessments

16

Implications of DIFAverage score (and ranking) hides DIF.Existence of DIF threatens comparisons

across countries, as the achievement results depend on which items are in the test.

Page 17: Issues relating to  Large-scale Assessments

17

An Example - JapanPISA reading

2000: 522 2003: 498 a 24 point drop, about 6 months of growth!

Triggered huge reactions in JapanBlame on reform started two years beforeNew reforms and policies

Page 18: Issues relating to  Large-scale Assessments

18

How PISA trends are establishedSelect some items from 2000 as “anchoring

items”Place in 2003 testSo 2003 results can be placed on the 2000

scale

Page 19: Issues relating to  Large-scale Assessments

19

Item BiasItems don’t work in the same way in all

countries. One item may be relatively more difficult for

one country than for other countries.Differential Item Functioning (DIF)

Page 20: Issues relating to  Large-scale Assessments

20

Differential Item FunctioningHypothetical example:

Item Country A (% correct)

Country B (%Correct)

1 65 762 74 833 42 514 79 855 73 646 72 917 46 54

Biased against B

/Favours A

Biased against A

/Favours B

Page 21: Issues relating to  Large-scale Assessments

21

Japan vs International Item Parameters

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Comparison of International Item Parameter and National parame-ters for Japan

International difficulty (logit scale)

Diffi

culty

for J

apan

(log

it sc

ale)

Page 22: Issues relating to  Large-scale Assessments

22

Anchoring items in Reading 2003Many anchoring items were biased against

JapanJapan’s mean score would increase by 10

score points if one particular reading unit was removed from the set of eight anchoring units. (Monseur & Berezner, 2007).

Page 23: Issues relating to  Large-scale Assessments

23

Fluctuation of Country ResultsOwing to items selected for a test for reasons

such asCultural differencesLanguage differencesCurriculum differences

Page 24: Issues relating to  Large-scale Assessments

24

2000 – 2009 trendsIt has often been claimed that Australia is

slipping in Reading.

Page 25: Issues relating to  Large-scale Assessments

25

-13 points

Page 26: Issues relating to  Large-scale Assessments

26

What PISA tells us

Big pictureAustralia is doing pretty wellAustralia and New Zealand lead the English

speaking countries(Confucius culture) Asian countries lead in

academic performanceFinland does very well in non Asian countriesMay suggest something for further

investigation

Page 27: Issues relating to  Large-scale Assessments

27

Limitations of large-scale assessments

Not able to collect data on all factors related to education.

For example, private spending on education has not been captured

Students’ lives outside schools.Look beyond international ranksFocus on within country comparisonsDon’t jump to conclusions on policy

implications