Issues relating to Large-scale Assessments
description
Transcript of Issues relating to Large-scale Assessments
1
Issues relating to Large-scale Assessments
Margaret WuVictoria University
2
International large-scale assessmentsMain problem: interpretations of the resultsFocus on country rankingsAn example:
In August 2012, Julia Gillard, then Prime Minister of Australia, declared that Australia would strive to be ranked in the ‘top five’ in international education assessments by 2025.
So strong is this ambition that it has been inscribed into the Australian Education Act of 2013 as its very first objective, which reads: ‘Australia to be placed, by 2025, in the top 5 highest performing countries based on the performance of school students in reading, mathematics and science’ (Australian Education Act, 2013, p. 3)
3
Does high ranking mean good education system?A Vietnamese researcher queried why Vietnam did well in
PISA despite poor education system in VietnamGorur & Wu, 2014, (Former OECD official– interview
transcript):What’s the good of [the rankings]? what is the benefit to the US to be told that it is number seven or number 10? It’s useless, meaningless, except for a media beat up and political huffing and puffing. It’s very important for the US to know, having defined certain goals like improving participation rates for impoverished students from suburbs in large cities – whether in fact that is happening, and if it is, why it is happening and if not, why not. And it is irrelevant whether Chile or Russia or France is doing better or worse – that doesn’t help one bit – in fact it probably hinders. Makes people feel uncertain, unsure, nervous, and they rush over there and find out why they are doing better.
4
And rushed there, they did…(Australian) Grattan Institute’s Catching Up:
Learning from the Best School Systems in East Asia (Jensen et al., 2012)… researchers from Grattan Institute visited the
four education systems [Hong Kong, Shanghai, Korea and Singapore] studied in this report. They met educators, government officials, school principals, teachers and researchers. They collected extensive documentation at central, District and school levels. Grattan Institute has used this field research and the lessons taken from the Roundtable to write this report (p. 6)
5
Suggested factors for high ranking (performance)One observation made by the Grattan
Institute…“Shanghai, for example, has larger class sizes
to give teachers more time for school-based research to improve learning and teaching.” (p.2)
(Observation also made by OECD PISA, 2010)New Zealand government proposed to
increase class size to free up money to fund initiatives to raise the quality of teaching (NZ Treasury briefing paper, March, 2012)
6
Discussion pointsThese “policies” are often said to be
“evidence-based”, where large-scale assessments are frequently quoted as the sources of evidence.
Why should we be concerned with these policies?
ConsiderValidity - issuesReliability - issues
7
Validity issuesLinking factors to performanceKorea and China perform well, and have large class
sizes.Can we conclude large class size leads to good
performance?Making inferences:
No. of storks positively correlated with no. of babies born
Crime rate positively correlated with ice cream salePeople who take care of their teeth have better general
healthMediating variables at play
8
Linking PISA to PoliciesPISA tells us about student performance, and
background of students/schools/countriesLinking background to performance is done
by people, not proven by statistics.Any interpretation is an inference.PISA cannot substantiate the validity of the
inferences.Need other in-depth studies.
9
A common misunderstanding about statistical analysisregression equation Y = a + bX
X is termed explanatory variableY is termed dependent variable
Does X explain Y?Try X = a + bYExactly the same resultsRegression does not test for causal inference.
Regression only reflects correlation.
10
Regress Reading on GDP scoresCoefficientsa
ModelUnstandardized
Coefficients
Standardized
Coefficients
t Sig.BStd.
Error Beta1 (Constant) 479.828 10.310 46.54
2.000
GDP .427 .301 .243 1.416 .167a. Dependent Variable: Reading
Coefficientsa
ModelUnstandardized
Coefficients
Standardized
Coefficients
t Sig.BStd.
Error Beta1 (Constant
)-36.464 48.202 -.756 .455
Reading .138 .098 .243 1.416 .167a. Dependent Variable: GDP
11
Reliability Issues
How strong is the relationship between two variables?
P value = 0.11
n.s. at 95% level
Small class size
and/or low teachers’
salaries
Large class size and
high teachers’
salaries
Number of countries
performed higher
than OECD average in
reading / Total
number of countries
Low cumulative
expenditure on
education
3 out of
31 countries
performed higher
than OECD average
in reading.
3 out of
12 countries
performed higher
than OECD average
in reading
6/43
High cumulative
expenditure on
education
8 out of
20 countries
performed higher
than OECD average
in reading
2 out of
2 countries
performed higher
than OECD average
in reading
10/22
Number of countries
performed higher
than OECD average
in reading / Total
number of countries
11/5
15/14 16/65
12
Top five in what?Interview transcript of a senior OECD official (Gorur & Wu):OECD Official: Well, Australia is doing pretty well!RG: It’s doing well, right? But you know what we want to do
now? Our Prime Minister says we want to be in the top five in PISA!
OECD Official: Top five in what? RG: In PISA.OECD Official: Yes, but for which students? The average
student in Canada, in Korea, Finland, Shanghai, China – that’s one thing. If you then look at high performing students or how low performing students do, then we may get a completely different picture. And that’s where policy efforts are most interesting for me.
13
Australian 2009 PISA Reading results, by state
State Mean score Confidence interval
ACT 531 520–543WA 522 510–534QLD 519 505–532NSW 516 505–527VIC 513 504–523SA 506 497–516TAS 483 472–495NT 481 469–492Australia 515 510–519
In top 5 already
Below OECD
average
14
Ranking by item content Country Item
M408Q01TR Country Item
M420Q01TRHong Kong-China 0.60 New Zealand 0.66
Finland 0.56 Australia 0.64Australia 0.56 Canada 0.64Chinese Taipei 0.55 Ireland 0.62United Kingdom 0.55 Shanghai-China 0.62New Zealand 0.55 United Kingdom 0.60Macao-China 0.53 United States 0.59Iceland 0.52 Chinese Taipei 0.58Ireland 0.51 Singapore 0.57Singapore 0.50 Denmark 0.57
15
Differential Item Functioning (DIF)Australia performed extremely well on Items
M408Q01TR and M420Q01TR, ranking third and second respectively internationally. For ItemM408Q01TR, Shanghai-China ranked 20th, despite the fact that Shanghai took the top spot internationally in mathematics literacy, with a mean score much higher than the second place country, Singapore. For Item M420Q01TR, Australia outperformed all top ranking countries.
In contrast, for Item M462Q01DR, Australia ranked 43 internationally, with an average score of only 0.1 out of a maximum of two, while Shanghai had an average score of 1.5 out of a maximum of two.
16
Implications of DIFAverage score (and ranking) hides DIF.Existence of DIF threatens comparisons
across countries, as the achievement results depend on which items are in the test.
17
An Example - JapanPISA reading
2000: 522 2003: 498 a 24 point drop, about 6 months of growth!
Triggered huge reactions in JapanBlame on reform started two years beforeNew reforms and policies
18
How PISA trends are establishedSelect some items from 2000 as “anchoring
items”Place in 2003 testSo 2003 results can be placed on the 2000
scale
19
Item BiasItems don’t work in the same way in all
countries. One item may be relatively more difficult for
one country than for other countries.Differential Item Functioning (DIF)
20
Differential Item FunctioningHypothetical example:
Item Country A (% correct)
Country B (%Correct)
1 65 762 74 833 42 514 79 855 73 646 72 917 46 54
Biased against B
/Favours A
Biased against A
/Favours B
21
Japan vs International Item Parameters
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Comparison of International Item Parameter and National parame-ters for Japan
International difficulty (logit scale)
Diffi
culty
for J
apan
(log
it sc
ale)
22
Anchoring items in Reading 2003Many anchoring items were biased against
JapanJapan’s mean score would increase by 10
score points if one particular reading unit was removed from the set of eight anchoring units. (Monseur & Berezner, 2007).
23
Fluctuation of Country ResultsOwing to items selected for a test for reasons
such asCultural differencesLanguage differencesCurriculum differences
24
2000 – 2009 trendsIt has often been claimed that Australia is
slipping in Reading.
25
-13 points
26
What PISA tells us
Big pictureAustralia is doing pretty wellAustralia and New Zealand lead the English
speaking countries(Confucius culture) Asian countries lead in
academic performanceFinland does very well in non Asian countriesMay suggest something for further
investigation
27
Limitations of large-scale assessments
Not able to collect data on all factors related to education.
For example, private spending on education has not been captured
Students’ lives outside schools.Look beyond international ranksFocus on within country comparisonsDon’t jump to conclusions on policy
implications