Strengthening Year 12 learning and certification through school-based and external assessment
-
Upload
nomlanga-fitzgerald -
Category
Documents
-
view
27 -
download
1
description
Transcript of Strengthening Year 12 learning and certification through school-based and external assessment
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
1
Strengthening Year 12 learning and Strengthening Year 12 learning and certification through school-based certification through school-based
and external assessmentand external assessment
Perth, 4 March 2008Perth, 4 March 2008
Barry McGawBarry McGawDirector, University of Melbourne Education Research InstituteDirector, University of Melbourne Education Research Institute
Former Director for Education, OECDFormer Director for Education, OECD
Principals’ ForumPrincipals’ Forum
WA Curriculum CouncilWA Curriculum Council
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
2
Australia’s ranking in OECD/PISA Reading
Reading ranks PISA 2000: 4th but tied for 2nd
PISA 2003: 4th but tied for 2nd
PISA 2006: 7th but tied for 6th
FinlandKorea
CanadaNZ
Hong Kong
KoreaCanada
NZHong Kong
Finland
PISA 2000 PISA 2003 PISA 2006
Ahead of Australia
Same as Australia
Behind Australia
Finland
KoreaCanada
NZ
Hong Kong
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
3
Mean performances in OECD/PISA reading
Australia
Finland
Hong KongChina
Canada
New Zealand
KoreaHigher performers improved.
Higherperformersdeclined.
Lowerperformers
improved.
OECD (2007), PISA 2006: science competencies for tomorrow’s world, Vol. 1 - analysis, Fig. 6.21, p.319.
Changes for Finland, Canada & New Zealand are not significant. 500
510
520
530
540
550
560
PISA 2000 PISA 2003 PISA 2006
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
4
Trends in Australian reading performances
95th %ile
OECD (2007), PISA 2006: science competencies for tomorrow’s world, Vol. 1 - analysis, Fig. 6.21, p.319.
5th %ile
90th %ile
10th %ile
75th %ile
25th %ile
Mean
300
350
400
450
500
550
600
650
700
PISA 2000 PISA 2003 PISA 2006
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
5
ObservationsWe have just mixed ranks and performance levels in
educational assessment.
This is mixing norm-referenced and criterion-referenced assessment.
QuestionCould we do this with Year 12 assessment?
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
6
Allowing for differences in question difficulty
In examinations offering choice of questions, examiners may consider question difficulty in marking. How well do they do?The 1996 New South Wales Grade 12 Geography examination gave choice of one of 32 combinations of three questions.Taking statistical account of question difficulty, students with the same overall performance would have received around 4 marks less from the examiners if they had chosen the most difficult 3-question combination compared with the easiest 3-question combination.McGaw (1997), Shaping their future: Recommendations for reform of the Higher School Certificate. Sydney: Department of Training and Education Co-ordination, Appendix C.
0
5
10
15
20
25
30
35
40
45
50
55
60
65
-2 -1 0 1 2
Standardised achievement level
Exp
ect
ed
sc
ore
hardest 3 questions
easiest 3 questions
Result accounting for difficulty statistically
Resu
lts
when e
xam
iners
allo
w f
or
diffi
cult
y
diff
ere
nce
s
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
7
ObservationWe make some strange assumptions in examinations and
examination marking when there is a choice of questions to answer.
We either assume all questions are of equal difficulty or that markers can somehow adjust for the differences.
QuestionCould we do better than this with Year 12 assessments?
Could we do so with school-based assessments?
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
8
Potential power of Potential power of assessments…assessments…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
9
Potential power of assessment
Assessment is a powerful educational tool Helps students see own progress Enables teachers to monitor students and themselves Expresses what systems take to be important Can drive reform
When stakes are high, it can be counterproductive Driving attention to the narrow and measurable Causing others to ignore the important but un-
measurable Causing others to ignore the longer term Discouraging risk-taking
Assessment must be focused on what is important Debate about whether assessment is properly focused
is… A debate about validity
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
10
Conceptions of validity Content validity
Face validity Curricular validity
Criterion-related validity Predictive – note problems of attenuation with selection Concurrent – e.g. paper & pencil as proxy for
performance Construct validity
Convergent – similar to other tests of same construct Discriminant – different from tests of other constructs
Consequential validity Not any adverse consequence Only adverse consequences due to test invalidity
– Does not mean unreasonable expectations must be satisfied
Having to tell teachers only what they don’t know Providing the mechanism for improvement
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
11
Distinguishing purposes of Distinguishing purposes of assessmentassessment
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
12
Purpose of educational assessment
E. F. Lindquist (Ed). (1951). Educational Measurement
“the functions of educational measurement are concerned…with the facilitation of learning” (Cook, 1951).
“educational measurement is conceived, not as a process quite apart from instruction, but rather as an integral part of it” (Tyler, 1951).
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
13
FormativeFormative vs summativevs summative assessmentassessment
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
14
Distinguishing purposes of assessment
Formative Frequent, interactive assessments of a student to
identify learning needs and shape teaching Barriers to widespread use:
– tension between classroom-based formative assessment and high-visibility summative assessments
– lack of connection between systemic, school and classroom approaches to assessment and evaluation
Summative To provide summary assessments of a student’s at a
particular stage. Status of stages can vary:
– Annual reports to parents from schools: low stakes– System-level cohort assessment: low stakes for students
but high stakes for schools– Public examinations: high stakes
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
15
Benefits of formative assessment
Paul Black & Dylan Wiliam (1998). Assessment and Classroom Learning. Assessment in education:
Principles, policy and practice, 5, 7-74.
“Assessment becomes ‘formative assessment’ when evidence is actually used to adapt the teaching work to meet student needs.”
Formative assessment experiments produce effect sizes of .40 - .70, larger than found for most educational interventions.
Many studies show that improved formative assessments help low achievers most.
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
16
Aligning formative and summative assessment
Scope of alignment Basic: ensure policies do not conflict Sophisticated: formative and summative reinforce
other Strategies
Ensure summative assessments measure key skills on which development is expected to occur
Convince teachers that use of formative assessment will lead to better summative assessment results
Encourage risk-taking in teachers as they explore better ways of assessing and teaching
Broaden basis for judging teachers to include, for example, students’ capacity to judge own progress and (possibly) progress of others, student motivation...
OECD (2005), Formative assessment: improving learning in secondary classrooms. Paris: Author.
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
17
Norm-referenced vs criterion-Norm-referenced vs criterion-referenced assessmentreferenced assessment
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
18
Point of reference in measurement: Point of reference in measurement: external criteria to norms…external criteria to norms…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
19
Point of reference for judging individuals
Using an independent external measure (Psychophysics) Judgements of phenomenon (e.g. brightness of light)
– requiring judgements of differences, not absolute values– comparing judgements with direct measures of phenomenon– developing a scale of human judgement of phenomenon
Interest in nature of human judgement not phenomenon Using performance of others to judge individuals
Psychological phenomena without external measure– developed in the context of studies of individual differences
Individual performances judged in relation to:– the performance of others– in particular, the average performance of others (or norm)
Want to look better?– Choose different company!
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
20
In search of external criteria…In search of external criteria…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
21
In search of an external criterion
Criterion-referenced measurement Specify learning required (Glaser, 1963) Judge students against requirements (not each other) Criteria disaggregated, often to level of items
Glaser, R. (1963), Instructional technology and the measurement of learning outcomes: some questions. American Psychologist, 18, 519-521.
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
22
In search of well defined external In search of well defined external criteria…criteria…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
23
In search of an external criterion
Criterion-referenced measurement Specify learning required (Glaser, 1963) Judge students against requirements (not each other) Criteria disaggregated, often to level of items
New psychometric models Simultaneous scale construction and measurement
– locate tasks on scale by difficulty– locate individuals on same scale by performance– interpret performance with reference to tasks
Glaser, R. (1963), Instructional technology and the measurement of learning outcomes: some questions. American Psychologist, 18, 519-521.
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
24
PISA 2003 mathematics item
Mei-Ling from Singapore was preparing to go to South Africa for 3 months as an exchange student. She needed to change some Singapore dollars (SGD) into South African rand (ZAR).
During these 3 months the exchange rate had changed from 4.2 to 4.0 ZAR per SGD. Was it in Mei-Ling’s favour that the exchange rate now was 4.0 ZAR instead of 4.2 ZAR, when she changed her South African rand back to Singapore dollars? Give an explanation to support your answer.
On returning to Singapore after 3 months, Mei-Ling had 3900 ZAR left. She changed this back to Singapore dollars, notingthat the exchange rate had changed to: 1 SGD = 4.0 ZAR. How much money in Singapore dollars did Mei-Ling get?
Mei-Ling found out that the exchange rate between Singapore dollars and South African rand was: 1 SGD = 4.2 ZAR. Mei-Ling changed 3000 Singapore dollars into South African rand at this exchange rate. How much money in South African rand did Mei-Ling get?
6
5
4
3
2
1
<1
669
607
544
482
420
358[406]
[439]
[586]
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
25
Tapping science beliefs
Doig, B.A. & Adams, R.J. (1993), Tapping students' science beliefs: a resource for teaching and learning. Hawthorn Vic: Australian Council for Educational Research
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
26
condensation when air cools 4
from atmosphere - no mechanism 3
condensation - no atmosphere 2
liquid on outside comes from inside 1
liquid passed through sides of jug 0
uninterpretable responses 0
Content of response Score
Classification of responses
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
27
Percent of responses by type
condensation when air cools 2 16
from atmosphere - no mechanism 7 30
condensation - no atmosphere 36 23
liquid on outside comes from inside 23 15
liquid passed through sides of jug 6 3
uninterpretable responses 25 13
Grade 5 9
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
28
condensation when air cools
uninterpretable responsesliquid passed through sides of jug
liquid on outside comes from inside
condensation - no atmosphere
from atmosphere - no mechanism
63
60
61
58
Category boundaries
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
29
particulate model of matter
knows processes not easily observed
identifies only observed components
simple observations and definitions
examples or 'magic' as explanation
notion of chemical reactions
62
64
58
60
56
Structure of matter scale (using further items)
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
30
Map of selected PISA 2003 mathematics tasks
OECD (2004), Learning for tomorrow’s world: First results from PISA 2003, p.48.
Walking: score 3
Walking: score 1
Walking: score 2
Skateboard: score 2
Skateboard: score 1
Exchange rate Q1: score 1
Exchange rate Q2: score 1
Exchange rate Q3: score 1
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
31
Using school-based and Using school-based and external assessments…external assessments…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
32
Use of school achievement data
School-based assessments Can measure things not readily measured in
examinations Can measure work requiring sustained effort Comparability across schools difficult to assure
Examinations Comparability across schools is assured Can measure only a limited range of things
– Constraints of form– Constraints of time
Combining the two Need to be measures of the same construct Bring to same scale before adding Use normative or criterion information to ‘scale’ the
two forms of assessment
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
33
Using criterion-referencing in Using criterion-referencing in public examinations…public examinations…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
34
Public examinations
High-stakes assessments based on curriculum secondary certification and university entrance selection for highly competitive courses (top 1½%)
The comparability-over-time problem… Grade distributions used to monitor standards
– failure rate used as a measure of ‘standards’– claim that if participation rates grow, grades should
decline to ensure that an ‘A’ still and ‘A’, etc– do enough students fail?
Criterion (standard) and norm (cohort)-referencing– ‘standards’ are never absent (in curriculum,
examination)– ‘standards’ are ignored in the norm-based award of
results– cannot use link items over time, whole test made public– marrying criterion and norm-referencing with judgments
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
35
Marrying criterion and norm-referencing
England use of criteria defined for some grade boundaries review of previous years’ scripts at grade boundaries reference to prior grade distributions reference to evidence of change in student cohort to
justify shifts in grade distributions between years Australia (New South Wales)
development of band descriptors ‘consistent’ definition of bands over years. reporting with norm and criterion-referencing
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
36
Standards referenced:bands describing what students know and can do
Minimum standard expected (50)
Norm referenced: distribution of results for all students
Mark Range:0–100
Student’s overall mark
Student’s examination mark Student’s school
assessmentmark
Number of candidates
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
37
All HSC courses listed with:(School) Assessment Mark, Examination Mark,(Overall) HSC Mark,Performance Band
All Preliminary courses listed
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
38
How New South Wales got there…
Review and recommendations for change New NSW Higher School Certificate
– McGaw, (1997). Shaping their future: Recommendations for reform of the Higher School Certificate. Sydney: Department of Training and Education Co-ordination
Scaling process– standards-referencing to curriculum and over-time– Bennett, J. (2001), Standards-setting and the NSW
Higher School Certificatewww.boardofstudies.nsw.edu.au/manuals/pdf_doc/bennett.pdf
Developing grade descriptors Used past examinations
– experienced examiners for each subject– reviewed examination papers and students’ marked
papers Developed band descriptors
– described performance for Band 6 to 2, low Band 1 not described
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
39
Using grade descriptors
Stage 1 Examiners independently form ‘image of band’ Set cut mark for each band boundary on each
question Stage 2
Examiners work together to reach agreement on boundary locations for bands on each question
Boundary locations for total scores also established Stage 3
Student work at boundaries on total scores reviewed Cut points reviewed and determined Boundaries located on mark scale
– 5/6 boundary set to 90– 4/5 boundary set to 80– …– 1/2 boundary set to 50
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
40
Back to the validity…Back to the validity…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
41
Conceptions of validity Content validity
Face validity Curricular validity
Criterion-related validity Predictive Concurrent – school-based and external (within
schools) Construct validity
Convergent – similar to other tests of same construct Discriminant – different from tests of other constructs
Consequential validity
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
42
And to the benefits…And to the benefits…
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
43
Benefits of good assessment – for students
Makes goals clear to learners Makes improvement clear
Normative assessment offers only improvement in rank (thus improvement at the expense of others)
Criterion (or standards) referenced assessment shows improvement in terms of knowledge and skills
Can teach learners how to monitor own learning Key meta-cognitive capacity Builds the base for lifelong learning
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
44
Benefits of good assessment – for schools
Makes clear what is possible for learners like theirs Shows schools where they stand in a broad picture Shows where they stand with fair comparisons
Makes improvement clear School’s movement on a criterion scale clear
Buy-in will depend on: Data being relevant Information being helpful (and novel) Comparisons being fair Results being used productively (not necessarily
protectively)
mcg
aw
gro
up p
ty ltd
AB
N 3
4 1
17
49
1 2
28
45
Benefits of good assessment – for systems
Establishes current status Makes improvement clear Permits richer reflection on the state of
system…