Peter Lenz IBE SeminarWarsaw, 20/10/2011
description
Transcript of Peter Lenz IBE SeminarWarsaw, 20/10/2011
Peter Lenz IBE Seminar Warsaw, 20/10/2011
A Language Assessment Kit – Relating to the CEFR –for French and English
Overview of the presentation
1. Context
2. Development
3. Product / Use
4. Looking back and forward / some thoughts
Overview of the presentation
1. Context
2. Development
3. Product / Use
4. Looking back and forward / some thoughts
2001 – EYL: Launch of CEFR & ELP 15+ in CH
In 2001 the Swiss Conference of Cantonal Ministers of Education recommend to the cantons
to consider the CEFR in curricula (objectives and levels) in the recognition of diplomas
to facilitate wide use of the ELP 15+ make ELP accessible to learners help teachers to integrate ELP in their teaching
to develop ELPs for younger learners
Common European Framework of Reference… (CEFR)
A common reference for Many foreign-language professionalsCourse providersCurriculum/syllabus developersMaterials authorsTeacher trainersExamination providers, etc.
A basis for the description of Objectives Contents MethodsCEFR isn't prescriptive but asks the right
questions and favors certain answers…
An action-oriented approach and Reference levels
Means of description: Descriptors of communicative language activities Descriptors of "competences" (or "language resources" or
qualitative aspects of language use)
A1 A2 B1 B2 C1 C2
Basic User Independent User Proficient User
CEFR favors an action-oriented approach (language use in context)Main objectives relate to communicative language proficiency
CEFR describes 6 reference levels: A1 through C2
Core elements of CEFR & ELP: scaled descriptors
I can deal with most situations likely to arise whilst travelling in an area where the language is spoken. I can enter unprepared into conversation on topics that are familiar, of personal interest or pertinent to everyday life (e.g. family, hobbies, work, travel and current events).
Proficiency or can-do descriptors
Core elements of CEFR: scaled descriptors
Consistently maintains a high degree of grammatical accuracy; errors are rare, difficult to spot and generally corrected when they do occur.
Descriptors of competences or qualitative aspects
The Concept of Illustrative Descriptors
Illustrative descriptors may be considered as spotlights illuminating small areas of competence/proficiency while other areas remain in the dark.
Descriptors outline and illustrate competence/proficiency levels but never define them exhaustively.
D1
D2
D3
D4 D17
Can briefly give reasons and explanations for opinions, plans and actions.
Listening Reading Spoken Interaction
Spoken Production
Writing
European Language Portfolios
For the hands of the learners: 3 parts – 2 main functions:
Lang. Passport Lang. Biographie Dossier
Documentation Facilitation of learning
From the ELP 15+ to
An ELP for learners age 11 to 15? - Teachers’ wish list: More descriptors taylored to young learners‘ needs Less abstract formulations Self-assessment grid and checklists with finer levels Tools facilitating “hard” assessment
Test tasks relating to descriptors Marked and assessed learner texts Assessed spoken learner performances on video Assessment criteria for Speaking (and Writing)
relating to finer levelsBeyond an ELP's reach
The initiators
FLFL
German-speaking cantonsof Switzerland
Principality of Liechtenstein
The authorities‘ rationale
CEFR as a basis further elaboration of Reference levels Assessment and self-assessment instruments building
upon descriptors Teacher-training material and early involvement of
teachers to prepare dissemination and introduction of the instruments in the school context
Promotion of the quality and effectiveness of school-based foreign-language teaching and learning by improving the quality, coherence and transparency of assessment
Overview of the presentation
1. Context
2. Development
3. Product / Use
4. Looking back and forward / some thoughts
Overview of expected products
Bank of validated test tasks( 5 “skills”; C-tests)
Benchmark performances(Speaking, Writing)
Bank of target-group-specific descriptors (levels A1.1-B2.1)
Ready-made "diagnostic" test sets
Assessment criteria (Speaking, Writing)
(Self-)assessment grid & checklists
ELP 11-15
Developing a Descriptor Bank
Bank of target-group-specific descriptors (levels A1.1-B2.1)
Reduced but subdivided range of levels
How were the new can-do descriptors developed?
1) Collect from written sources (ELPs, textbooks, other sources)
Teachers decide on relevance for target learners and on suitability for assessment
Teachers complement collection
2) Validate, complement the collection in teacher workshops
3) Fine-tuning and selecting descriptors Make formulations non-ambiguous and accessible; add examples Select descriptors to cover whole range of levels A1.1 - B2.1 Represent wide range of skills and tasks
~330 descriptorsfor empirical phase
Development of the descriptors
Data collection – Teachers assess their pupilsFollowing Schneider & North‘s methodology for the CEFR
Development of the descriptors
Scaling: Link and anchor assessment questionnaires of 50 descriptors each, for different levels
2 parallel sets of descrip-tors of similar difficulty per assumed level Identical descriptors as
links (& sometimes CEFR anchors)
Too few learners at B2
Development of the descriptors
Statistical analysis and scale-building (A1.1 - B1.2)
Development of the descriptors
Self-assessment Grid and Checklists
Bank of target-group-specific descriptors (levels A1.1-B2.1)
(Self-)assessment grid & checklists
ELP 11-15
Reformulations: I can ...
1) Some Can do‘s are transformed into I can‘sClasses use descriptors for self-assessment and give feedback Can learners understand?
2) Whole bank of Can do‘s is transformed into I can statements
Self-assessment tools for the ELP
Overview of products
Bank of validated test tasks( 5 “skills”; C-tests)
Bank of target-group-specific descriptors (levels A1.1-B2.1)
(Self-)assessment grid & checklists
ELP 11-15
Test Tasks
Speaking tasks (production and interaction) Writing tasks Listening tasks Reading tasks
1) Test tasks relating to communicative language proficiency
2) C-Tests (integrative tests)
C-Tests are a special type of CLOZE test.
Test tasks correspond to (or operational-ize) one or more descriptor(s).
Test Tasks
Speaking tasks (production and interaction) Writing tasks Listening tasks Reading tasks
1) Test tasks relating to communicative language proficiency
2) C-Tests (integrative tests)
C-Tests are a special type of CLOZE test.
Test tasks correspond to (or operational-ize) one or more descriptor(s).
All test tasks were field-tested and attributed to CEFR levels using pupils' self-assessment or teacher assessment(common-person equating).
Test Tasks
Speaking tasks (production and interaction) Writing tasks Listening tasks Reading tasks
1) Test tasks relating to communicative language proficiency
2) C-Tests (integrative tests)
C-Tests are a special type of CLOZE test. C-Tests are said to provide reliable information on a learner‘s
linguistic resources. C-Tests are quick.
Test tasks correspond to (or operational-ize) one or more descriptor(s).
All test tasks were field-tested and attributed to CEFR levels using pupils' self-assessment or teacher assessment(common-person equating).
Criteria and Benchmark Performances
Bank of validated test tasks( 5 “skills”; C-tests)
Benchmark performances(Speaking, Writing)
Bank of target-group-specific descriptors (levels A1.1-B2.1)
Assessment criteria (Speaking, Writing)
(Self-)assessment grid & checklists
ELP 11-15
CEFR Table 3 – the point of departure
Consistently maintains a high degree of grammatical accuracy; errors are rare, difficult to spot and generally corrected when they do occur.
Descriptors of qualitative aspects of performance
Assessment criteria for Speaking
Where did the new qualitative criteria come from? – Steps taken:
Collect criteria from various sources: CEFR, examination schemes ...1) Collect criteria
Teachers bring video recordings Teachers describe differences between learner performances they can
watch on video criteria emerge Teachers select and apply descriptors from the existing collection Teachers agree on essential categories (e.g. Vocabulary Range,
Pronunciation/Int.) and agree on a scale for each analytical category
2) Generate & select criteria: teachers assess spoken performances
3) Prepare empirical validation (experts) Decide on categories of criteria to be retained Revise and complete proposed scales of analytical criteria … and produce performances to apply the criteria to
Phase IV Producing video recordings of spoken performances
One learner - different tasks in various settings10 learners of English, 11 learners of French
33
Validation of criteria for Speaking
Methodology
A total of 35 teachers (14 Fr, 21 En) apply 58 analytical criteria (some from CEFR) belonging to 5 categories 28 task-based can-do descriptors (matching the tasks performed ) to 10 or 11 video-taped learners per language,
each performing 3-4 spoken tasks
Analytical criteria categories Interaction Vocabulary range Grammar Fluency Pronunciation & Intonation
Scaling the criteria for Speaking
Criteria and questionnaires – a linked and anchored design
Three assessment questionnaires for three different learner levels
“Statement applies to this pupil but s/he can do clearly better”
“Statement generally applies to this pupil”
“Statement doesn‘t apply to this pupil”
Links between questionnaires CEFR Anchors
Criteria for Speaking - analysis
Teacher severity and consistency
Consistency:5 out of 35 raters were removed from the analysis due to misfit of up to 2.39 logits (infit mean square)
Severity:Some extreme raters (severe or lenient) show a strong need for rater training although every criterium makes a meaningful (but somewhat abstract) statement on mostly observable aspects of competence.
Map for English
Criteria for Speaking – outcomes
Statistical analysis indicates that we have good quality criteria which may be used to assess learners from A1.1 to B2
Statistical analysis also indicates
which of the video-taped learners are the least or most able which raters (teachers) were severe or lenient which raters rated consistently or inconsistently
Useful findings for teacher training on the basis of these videos
The assessment criteria for written performances were developed using a very similar methodology
Ready-made sets of test tasks
Bank of validated test tasks( 5 “skills”; C-tests)
Benchmark performances(Speaking, Writing)
Bank of target-group-specific descriptors (levels A1.1-B2.1)
Ready-made "diagnostic" test sets
Assessment criteria (Speaking, Writing)
(Self-)assessment grid & checklists
ELP 11-15
Ready-made sets of test tasks
Ready-made, class-specific bundles of test tasks for Listening, Reading, Speaking and Writing
Information and advice for teachers regarding preparations, use and scoring/score interpretation
Overview of the presentation
1. Context
2. Development
3. Product / Use
4. Looking back and forward / some thoughts
The Kit: Ring-binder and Data base
Limite
d, non-pers
onal
licen
ce
Elements: Overview
Elements: Descriptors
Elements: Test tasks
Test tasks building upon descriptors
C-Tests
Elements: Benchmark performances
Example: Listening tasks
Example: Listening task
Example: Listening task
Instructions in German, the local L1
Example: Listening task
Interpretation of scores in relation to CEFR levels.
Answer key
Example: Spoken interaction task
For use by teachers and also by learners
Example: Spoken interaction task
For learner A
Example: Spoken interaction task
For learner B
Example: Spoken interaction task
For learner B
Instructions for learner B
Example: Assessment of Spoken interaction
Profile and levels
Type 1 descriptors: Quality of language use
Type 2 descriptors: Can-do descriptors resulting Profile
Example: C-test
C-test texts are constructed according to a set of rules. A C-test consists of 4 or 5 texts of 20-25 blanks each.
Applications
What can instruments be used for?
Among other things … Illustrate expected language proficiency and competences
(e.g. for pupils and parents) Help develop a sense of the (adapted) CEFR reference levels Develop self-assessment and planning skills Assemble level-related (proficiency-)tests (or use ready-made sets) Establish learners' proficiency profile (self-assessment; tests) Check learners' readiness for external examinations Diagnose strengths and weaknesses with regard to different skills
and competences in order to focus on individual goals for a term …
Overview of the presentation
1. Context
2. Development
3. Product / Use
4. Looking back and forward / some thoughts
If I could start again…
Some food for thought and discussion
What reference framework would I use?
How close should it be to classroom teaching and learning?• far: CEFR/theory-related?• intermediate: curriculum or syllabus-related?• close: textbook-related?
If I could start again…
Some food for thought and discussion
What objectives would I focus on?• Language proficiency (can do)?• linguistic resources (vocabulary, grammar, phonology…)?• ability to communicate across the language program / the
curriculum as a whole?• language awareness?• (inter-)cultural skills and knowledge?• …
If I could start again…
Some food for thought and discussion
What purposes would I try to meet?• summative assessment? – Including certification?• formative assessment?• diagnostic assessment?
how fine-grained? would explicit feedback be provided? If yes – to whom? would repeated assessments lead to an individual
Roadmap or profile of learning progression?• …
If I could start again…
Some food for thought and discussion
What roles would computers and the Internet play?• Would pupils work online?• What contributions could teachers make?• Would assessment results be fed back into the system? If
yes – by the teachers?• Would the system provide diagnostics, profiling and
feedback?If you want to improve a product or monitor its quality, you need data. Answers entered online are a unique (and cheap) data source.
If I could start again…
Some food for thought and discussion
What would I try to improve with regard to craftsmanship and technical quality?• what role should the L1 play in task construction?• what effort needs to be made to have more validity evidence
and a better understanding of the assessment instruments? a principled assessment design program? combine assessment delivery and assessment
research?• …