Session Objectives Explain the principles of good assessment Evaluate the standards for good...

78
Session Objectives Explain the principles of good assessment Evaluate the standards for good assessment Analyse the components of a test item Conduct qualitative item analysis Compare and contrast a range of assessment methods Evaluate a range of test items Compare and contrast marking systems Compare and contrast marking system for open response/performance-based items Identify common pitfalls in assessing student work

Transcript of Session Objectives Explain the principles of good assessment Evaluate the standards for good...

Session Objectives

• Explain the principles of good assessment

• Evaluate the standards for good assessment

• Analyse the components of a test item

• Conduct qualitative item analysis

• Compare and contrast a range of assessment methods

• Evaluate a range of test items

• Compare and contrast marking systems

• Compare and contrast marking system for open response/performance-based items

• Identify common pitfalls in assessing student work

The need for good assessment

“In assessing students, we are making claims to know them in certain important ways”

Derek Rowntree

Our views of self – Self Concept – is greatly influenced by the feedback we receive from others. The grades we give students provides data for their interpretation of how intelligent or capable they are.

The Purposes of Assessment

The are many purposes of assessment:• Selection and grading

• Maintaining standards

• Diagnosing learning difficulties

• Proving feedback to support the learning process

• As a source of information for evaluating the effectiveness of the teaching/learning strategy

You may note that these purposes are not necessarily complimentary. For example, while grades and standards may be of keen interest to employers, they are unlikely to help student to learn more effectively

Summative and Formative AssessmentSummative AssessmentThis refers to any assessment where final marks or grade are allocated to a learner’sperformance. Typically, this is related to end-of-course examinations. However, allassessments that contribute to the overall assessment mark/grade are at some stagesummatively assessed in that the assessment decision is final – at least for that course. 

Formative AssessmentThis refers to assessment that is focused on supporting the learning process and providingclear and supportive feedback to learners – both in terms of identifying competency gapsand providing guidance for future learning.

To understand these differences in assessment focus, contrast having a driving lesson and taking the final test. During the lesson, the instructor will be assessing your performance and helping you to improve – there is no pass or fail – this is Formative Assessment. However, in the actual test, you are either pass or fail – this is Summative Assessment.

Good Assessment

Unlike teaching, good assessment is a far less contestedset of practices. While assessment may not be an ‘exact science’there are well constituted processes and procedures to ensurethat the assessment system is as ‘good as it can be’

• Valid

• Reliable

• Sufficient

• Authentic

• Fairness

• Flexible

• Current

• Efficient

Principles of Good Assessment

These Principles of Assessment are key criteria to apply in the design and conduct of the assessment process as well as the

development of assessment items and instruments

ValidThis refers to a tests capability in measuring accurately what it is weintend to measure.

For example, a valid driving test is one in which driving skills(Performance) is measured in typical traffic (Conditions) against thecriteria established by the Motoring Authority (Standard)

Reliable

This refers to the capability of a test to produce the

same scores with different examiners (persons

scoring the test)

Grade AGrade A Grade A

Examiner 1 Examiner 2 Examiner 3

Sufficient

This refers to the important question of ‘how much

assessment evidence’ do we need in order to feel

confident that a student is competent in the area

assessed?’

 

AuthenticQuite simply this refers to how sure we are that the work produced has been done by the student

In an examination, we can be more confident of authenticity

However, with assignments done by students in their own time, authenticity becomes a concern

Fairness

Fairness relates to a number of considerations in assessment. However, they are all concerned with ensuring that learners, when being assessed,are provided with appropriate access to the assessment activities and arenot unfairly discriminated against in the assessment process.

Unfair discrimination typically means discrimination based on criteria unrelated to the assessment activity itself, for example, gender or racialcharacteristics.

Fairness is a general concern throughout assessment, relating as much toproviding learners with sufficient knowledge and time for assessment, to non-discriminatory processes in marking their work.

Flexible

Flexibility is concerned with the process of assessment, not the standard of the assessment.

Learners can display their learning in a range of ways (e.g., orally, written, demonstration, etc), provided the evidence is validly demonstrated.

Flexibility typically becomes a consideration for learners with special needs (e.g., visual/auditory impairment, second language, etc) or untypical situations (e.g., sickness on exam day, etc). The arrangements for flexibility are usually specified by exam boards.

Current

This refers to how recent the evidence is generated and whether it fully relates to the most up-to-date knowledge, skills and practices for the work function being assessed.

This consideration mainly comes into play when prior learning or achievement is part of the assessment evidence. It may need to be checked against the industry standard and any specific policy guidelines stated.

Efficient

Assessment can be time consuming.

It is important to: use methods (where possible) that enable

assessment of a wider range of learning outcomes avoid over-assessing produce marking systems that reduce time encourage peer assessment – at the formative level

VALIDITY

Assessment Standards and criteria relating to 3

interrelated areas:

– A well constituted scheme of assessment (incorporating the principles of good assessment)

– An effective and efficient approach to conducting assessment (to ensure accurate judgement of learner performance)

– A means of providing feedback on assessment decisions (to support future learning)

Standards for Good Assessment

Produce and review a scheme of assessment (Assessment Plan)

• The scheme specifies the assessment methods to be used, their purpose, the marks to be allocated, and the timing of assessments

• The selected assessment methods are designed to incorporate the principles of good assessment (Validity, Reliability, Sufficiency, Authenticity, Fairness, etc)

• The assessment methods are well constructed and sufficiently varied

• The key aspects of the assessment scheme are explained to learners

• Opportunities are provided for learners to seek clarification on assessment requirements

• The scheme is reviewed at agreed times and up-dated as necessary.

Conduct Assessment (Judge and make decisions relating to the assessment evidence presented by learners)

• Learners are provided with clear access to assessment

• The assessment evidence is judged accurately against the agreed assessment criteria

• Only the criteria specified for the assessment are used to judge assessment evidence

• The assessment decisions are based on all relevant assessment evidence available

• Inconsistencies in assessment evidence are clarified and resolved

• The requirements to ensure authenticity are maintained

Providing feedback on assessment decisions

• The assessment decisions are promptly communicated to learners

• Feedback to learners is clear, constructive and seeks to promote future learning

• Learners are encouraged to seek clarification and advice

• The assessment decisions are appropriately recorded to meet verification requirements

• Records are legible, accurate, stored securely and promptly passed to the next stage of the recording/certification process.

Planning the overall assessment framework

There are certain key decisions that need to be borne

in mind when planning the assessment framework:• What is to be assessed and what marks weighting is to be

allocated?

• What assessment methods are to used?

• When is assessment to be conducted?

• Where and what resources are needed?

• How are assessment decisions communicated to students?

What is to be assessed and what marks weighting is to be allocated?

What is assessed, and the marks weighting, must directly reflect the

learning outcomes for the module or unit.  

A Table of Specifications is often used to documents the main subject

areas, general learning objectives and the weighting attached. From the

table it is possible to directly identify what to assess and the weighing to

be allocated.

Preparing A Table of Specifications

A Table of Specifications is a two-way chart that identifies the subject content, the type of learning outcomes and their relative weighting in the module. Preparing a Table of Specifications involves the following steps:• Identify the learning outcomes and content areas to be measured• Weight the learning outcomes and content areas in terms of their

relative importance• Build the table in accordance with these relative weights by

distributing the test items proportionately among the relevant cells of the table.

The completed table provides a framework for systematically planning the amount and types of items to use

What goes into a objective?

As an objective describes some Performance aspect of learning, it must contain both knowledge and cognitive processes.

For example, State the year that England won the soccer world cup? – contains specific factual knowledge – 1966 - and the cognitive process of memory.

A successful Performance would be recall (e.g., in written or oral form) of the date

What goes into a test item?

Any type of test item must include:1. The subject content on which the item is based:

– Facts, Concepts, Principles and Procedures

2. The type of cognitive behaviour needed to respond appropriately:

– Memory

– Types of Thinking (e.g.,analysis, comparison & contrast, inference, evaluation, metacognition)

It may also include:

3. Other generic skills and affective components:– Communication, Teamwork, Attitudes

Using a Taxonomy for writing objectives at different levels of cognitive complexity

Many educational institutions use Blooms taxonomy

Potential uses:

Clarifying what the desired learning entails

Understanding the integral relationship between the knowledge and cognitive processes inherent in the objectives

Planning teaching strategies and assessments calibrated to the objectives

Designing the ObjectiveThe completed learning outcome will combine...

Knowledge Dimension:FactsConceptsPrinciplesProcedures

Cognitive Processes:MemorizeAnalyzeCompare & contrastInfer and interpretEvaluateCreate

Compare and contrast values & ethics

Characteristics of useful specific learning outcomes

1. Performance (what the learner is to be able to do)

2. Conditions (important conditions under which the performance is expected)

3. Criterion (the quality or level of performance that will be considered acceptable)

Though it is not always necessary to include the second characteristic and not always practical to include the third, the more you can say about them, the better the objective will communicate its intent

Performance“The most important and indispensable characteristic of a useful objective is that it describes the kind of performance that will be accepted as evidence that the learner has mastered the objective” (p.24)

R. F. Mager, 1984, ‘Preparing Instructional Objectives’

Note: A performance may be either Overt or Covert.

Overt refers to any kind of performance that can be observed directly (e.g., operate a software programme to produce 3-drawings from a set of given specifications)

Covert performances cannot be observed directly as they are cognitive, internal and invisible (e.g., thinking, adding, etc). It is important to be able to identify the behavioural indicators that enable a valid inference of measuring this performance (e.g., thinking can be inferred from what a learner writes, speaks, does in a specific problem-solving activity

Conditions

Some key questions in identifying conditions:• What will the learner be allowed to use?• What will the learner be denied?• Under what conditions will you expect the desired

performance to occur?• Are there any skills that you are NOT trying to develop?

Does the objective exclude such skills?

NB. Don’t add conditions if you don’t need them (e.g., thedesired performance is perfectly clear)

Criterion

• The criterion is the standard by which performance is evaluated, the yardstick by which achievement is assessed

For example: Initiate a fire alarm at SP according to thestandard operating procedures in less than 1 minute • NB – Occasionally conditions and criterion blend together,

but this is not a problem providing the intent is clear. Also, as with conditions, if this is not useful don’t include

Adding Conditions & Criteria to Performance

For:• Adds clarity and precision

to the intended learning

• Enables more accuracy and reliability in assessment

Against:• Can become cumbersome

• May lead to a focus on basic knowledge and skills that are easy to measure

• Reduces flexibility – without the conditions and standard, an objective can be used in many contexts

The range of student performances that confirm outcomes have been met?

Typically these performances demonstrate that the student can actually

do what is clearly identified in the outcomes. This may refer to:

• Accurately recalling specific knowledge that has been acquired e.g., effectively memorized)

• Displaying understanding of concepts, principles, and procedures by being able to explain their connectedness and applications in a range of situations (e.g., transfer). This typically results from the application of good thinking to the various knowledge components involved)

• Showing competence in specific skills that actually apply knowledge basis and skill sets in real world applications (e.g., testing a circuit, producing a report, displaying team-work, etc).

Table of Specifications for this workshop

Topics Abilities

  M U D Total

A Key planning considerations in conducting assessment 

5 5 0 10

B Preparing a Table of Specifications 

3 7 0 10

C Types of test items  

5 20 30 55

D Preparing a marking scheme 

5 15 5 25

Total 18 47 35 100

Qualitative Item Analysis

A non-numerical method for analysing test items that checks for:

• Content Validity (the degree to which a test item measures an intended content area)

• Construct Validity (the degree to which a test measures intended mental operations, e.g., recall, types of thinking, application)

• Item Design Quality (the degree to which a test has technically well designed items)

 

What assessment methods are to be used?

Assessment is not an exact science. All methods have limitations in terms of the measurement of human capability rendered. The following are key questions to ask in designing and using methods of assessment:         do they accurately measure identified learning outcomes?        is a sufficient range employed to encourage learner motivation and enable

them to display competence in different ways?        are they fostering (wherever possible) an understanding of the key concepts, principles and procedures of the subject matter?        do they make cost-effective use of time in generating sufficiency of evidence to infer competence?        do they provide fair assessment situations for learners?        are they systematically organized into an effective and balanced assessment scheme?

When is assessment to be conducted?

The major considerations typically revolve around how much assessment should be

conducted at the end of the programme (terminal assessment) and how much over

the duration of the programme (continuous assessment). Continual assessment

captures a more representative picture of student performance and spreads the

assessment load. Terminal assessment creates more assessment pressure and

perhaps pushes the student to learn more at one given point in time.

Other important questions are:

• Have students had sufficient opportunity to have internalised this learning and be able to demonstrate the necessary competence

• What other commitments do student have at the time of assessment – do these create unnecessary or unrealistic burdens?

Where and what resources are needed for conducting assessment?

These are ‘nuts and bolts’ planning decisions, but very

important. It is important to book necessary rooms and ensure

that appropriate resources are available for the type of

assessment to be conducted. This is especially the case when

laboratory equipment needs to be prepared, etc.

It may also be necessary to ensure sufficient supervising staff

to ensure the smooth running of the assessment.

How are assessment decisions communicated to students?

Assessment decisions are not simply grades on a piece of paper, but represent judgements of worth. Many students are likely to internalise the assessments we make of them.

In giving feedback to learners, it is important that learners are providedwith:

• A clear explanation for the assessment decision made. Students need positive reinforcement for what has been achieved, but they also need to know what they have not demonstrated in the assessment and why it is important

• Constructive guidance on what learning needs to be developed, and how this might be achieved, in order to develop necessary competencies presently lacking or not sufficiently established

Types of Assessment Item

Assessment items are the ‘nuts and bolts’ of any assessment strategy. They are what we get learners to do in order for them to show us that they are competent in the areas assessed.Basically, assessment items can be seen in terms of two broad categories:• Fixed response (Objective Tests): where the student chooses an

answer from limited options provided (True-False Items; Multiple-Choice Items; Matching Items; Completion Items; Interpretive Exercises)

• Open response (Essay-Type): where the student, to varying degrees, chooses the answers provided

Selecting Items

In selecting the type of assessment items, the guiding principle is: Use the items types that provide the most direct measures

of student performance specified by the intended learning outcomes

From this principle, the following rules typically apply:• Skills are best tested by performance tests (the learner performs the

tasks described in the objective under real or simulated conditions) • Knowledge is best tested by written or oral tests • Attitudes are best tested by observations of performance in a range of

situations and over time.

True-False Items

The true-false question consists of a statement with which the

student can register agreement or disagreement.

Examples:1. Formative assessment is primarily concerned with allocating grades T F

2. Objective tests are more reliable than essay-type questions T F

3. Assessment is an exact science T F

4. True - false test items can validity assess types of thinking T F

5. Moderation increases the reliability of assessment T F

Uses and Limitations of True-False Items

Uses• Easy to construct, administer

and score

• Learner response is simple – requiring only the identification of the statements as true or false

• Validly assesses learning outcomes at the level of knowledge acquisition and basic comprehension

Limitations• Possibility of learners getting

one-half of the questions correct by chance

• Not valid for assessing deeper understanding of the content or application

Multiple-Choice Items (MCQ’s)MCQ’s often vary in format and structure, but essentially provide the student with a question presented in the stem and a choice of response answers (one correct answer – the key – and typically 3 wrong answers or distracters). The learner must select his/her answer from the options provided. There are 5 basic formats for the construction of MCQ’s:

1. A premise (p) presented in the stem is followed by a particular consequence © in one of the options

2. Two or more premises in combination lead to a particular conclusion3. Two propositions are presented in the stem. The learner must decide whether both

are true: neither is true; (a) but not (b) is true; (b) but not (a) is true4. Classification of terms, names, statements5. A set of information is presented as a stimulus (e.g., a written scenario, graph,

table, article, etc). MCQ’s are then based on the information presented

1. If test scores for a group of students remain constant over time, we can conclude that:

a) validity is increasingb) reliability is increasingc) verification is decreasingd) representativeness is decreasing.

2. If pass rates for a module have progressively deteriorated over the past 3 years, and there is no evidence of change in terms of student or staff cohort, we are most likely to conclude that:

a) student attitudes to work had deterioratedb) lecturers are assessing more stringentlyc) examinations had increased in difficultyd) teaching quality had deteriorated.

3. Structured questions are best classified as:

a) student response itemsb) objective test itemsc) multiple choice itemsd) criteria referenced items

4. The NVQ National Framework is composed of elements with performance criteria.Performance criteria are examples of (a) norm-referencing; (b) competency-based assessment?

a) (a) but not (b)b) (b) but not (a)c) both (a) and (b)d) Neither (a) or (b)

5. Table 1 shows the number of students responses to each test item in an examinationpaper and those scoring 60% or over.

Question no. No. of response (60%+) 1 11 7 2 30 8 3 16 10 4 27 18

From the data presented in Table 1, the most likely inference is:a) students had done well overallb) some questions were more confusing than othersc) certain topic areas had been studied in more detaild) students had done poorly overall.

Uses and Limitations of MCQ’s

Uses• If well designed they provide

an effective method for assessing a wide range of learning outcomes, from knowledge acquisition to types of thinking

• They are easy to administer and score

• Are more reliable than true-false items

Limitations• They can be difficult and time-

consuming to design well, particularly at application level

• They are not particularly valid for assessing:– skill applications, whether

technical or human – communications. – complex activities that require

a number of interrelated abilities and skills

– attitudes, dispositions – creativity

Matching ItemsMatching items could be considered as a group of multiple choice items combined

together, but having a single set of responses. Typically, a range of options is

presented in one column, which need to be correctly matched with options in a

second column. Example below:

Column A contains a list of characteristics of test items. On the line to the left of

each statement, write the letter of the test item in Column B that best fits the

statement. Each response in Column B may be used once, more than once, or not

at all.

Column A Column B__ 1. Least useful for educational diagnosis. A. Multiple-choice

items.

__ 2. Measures greatest variety of learning outcomes. B. True-false items.

__ 3. Most difficult to score objectively. C. Short-answer items.

__ 4. Provides the highest score by guessing

Uses and Limitations of Matching Items

Uses• It is possible to measure a large

amount of related factual material in a relatively short time

Limitations• Restricted to the measurement

of factual information based on rote learning

• Susceptible to the presence of irrelevant clues

Completion Items

This form of test is commonly used to measure familiarity with rules,

procedures, formulas, etc., by requiring the learner to complete a

statement in which a critical or unique element is missing.

Examples below:

Q.1 When a test item truly measures what it intended to measure, we can say that

the item is __________

Q.2 When assessment focuses on the development of learning, we are likely to

refer to such assessment as __________

Q.3 The process for ensuring the quality of assessment practice is referred to as

__________

Uses and Limitations of Completion Items

Uses• Easy to construct

• Very effective for assessing recall of information, basic understanding and certain mathematical skills, such as formula application and computation

• Reduces the chances of guessing as compared to other objective tests

Limitations• Can be difficult to mark.

Answers may not be exactly the words expected, though be partially correct

• Not particularly valid for assessing application and types of thinking

Interpretive Exercises

An interpretive exercise consists of a series of objective items (MCQ’s and/or

True-False) based on a common set data (e.g., written materials, graphs,charts,

tables, maps or pictures). This format allows for a greater range and flexibility in

measuring more complex learning outcomes. These include:

• Analyse relationships between part and systems

• Make inferences and interpretations from various sources of information

• Compare and contrast different options and scenarios

• Evaluate alternatives and make decisions

Choosing the fixed-response item to use

The Multiple-Choice Item provides the most generally useful format for

measuring achievement at various levels of learning. However, there are

occasions when other formats are more usable, for example:

• When there are only two possible alternatives, a shift can be made to a true –false item

• When there are a number of similar factors to be related, a shift can be made to a matching-item

• When the items are to measure types of thinking and other complex outcomes, a shift can be made to the interpretive exercise

Open-Response (essay type) items

The main feature of open-response items is that the student supplies, rather than selects, the correct answer. In broad terms there are two main types of open-response item:

Restricted Response ItemsThese are shorter and more focused essay items. They involve the student to apply less content knowledge and are more specific in the cognitive abilities that are involved.

Extended Response ItemsIn extended response items the student has a high degree of freedom in the response to the essay item. This type of essay item is typically used to assess a range as well as a high level of cognitive abilities.

Restricted Response

1. List the major similarities and differences between multiple choice and performance tests. Focus specifically on validity and reliability. Limit your answer to one page.(8 marks)

2. Explain what is meant by verification. Identify 3 ways in which it can promotequality in assessment practice. (10 marks)

Extended Response

1. Compare and contrast 3 different assessment methods that you think are appropriate for assessing thinking in your module. Explain and illustrate the advantages and limitations of each method. (20 marks)

2. There are many purposes for assessment. Evaluate the extent to which these different purposes supports the learning process. (25 marks)

Distinctions between these two types of essay item are of degree and their use would depend on what was to be the main purpose of assessment in terms of identified learning outcomes.

Uses and Limitations of Essay Type Items

UsesRelatively easy to construct and provide a means to assess a wide range of higher-level cognitive abilities. :• analyze relationships • compare and contrast options• identify assumptions in a position(s)

taken • explain cause-and-effect relations • make predictions • organize data to support a viewpoint • point out advantages and disadvantages

to a option(s)• integrate data from several sources • evaluate the quality or worth of an item,

product, or action

LimitationsThere are four main limitations to essay type items:• It is difficult to establish marking

criteria that can be applied consistently by markers. Subjectivity is a major concern with essay type items – hence reliability is lessened

• Writing skill influences the scoring. Skilful bluffing can raise scores; errors in grammar and spelling typically lower scores

• Fewer essay items can be used as compared to objective tests – therefore more limited sampling across the range of learning outcomes

• They take a long time to mark.

General Design Considerations

• Clearly identify the key knowledge areas (concepts, principles and procedures) and cognitive abilities (e.g., analysis, inference and interpretation, evaluation) you want the learner to be able to demonstrate before you write the question

• Write the question in a way that clearly and unambiguously defines the task to the student.

• Cue the main cognitive abilities you want students to demonstrate in the wording of the essay. For example, compare and contrast, give reasons for, predict, etc.

Consider these guidelines in relation to the following examples:

Poor item: Discuss the value of performance-based tests

Better version: Evaluate the usefulness and limitations of using performance tests in a module you teach. Identify one topic area where you think such tests provide the most valid assessment of student learning. Explain and illustrate why this topic area is most validly assessed by performance tests.

What is Performance-Based Assessment?

Performance tests are the most authentic form of assessment as they measure direct competence in real world situations. A performance test is one that directly measures identified learning, focusing on the actual competence displayed in the performance. A driving test is a typical example of a performance test, where the examinee is tested on real driving performance in context, i.e. on the road.

Examples of Performance – Based Assessment

In educational testing, the degree of authenticity or realism is usually a matter of degree. However, there are many ways in which assessments tasks can achieve a reasonable degree of realism – for example:• Real work projects and tasks• Simulations• Problem solving through case studies• Presentations• Any activity that largely models what would be done by

professionals in the world of work

Example 1: Design A Food Package

Select a food product and design the packaging that you think will give itbest marketability. You must be able to identify the product attributes,protection and enhancement needed to satisfy the functional andmarketing requirements, and use suitable packaging material(s) andpackage type. The work produced should reflect the quality of yourthinking in the following areas:

• identify the criteria for evaluating the marketability of a product• analyze the components of a product that constitute an effective design• generate new ways of viewing a product design beyond existing standard forms• predict potential clients response to the product given the information you have• monitor the development on the group’s progress and revise strategy where• necessary

Example 2: Design and conduct a small experiment to test the Halo Effect

In groups of 3-4, design and conduct a small experiment to test the HaloEffect in person perception. You may choose the particular focus for the experiment, but it must:• Clearly test the Halo Effect in person perception• Be viable in terms of accessing relevant data• Meet ethical standards in conducting experiments with persons• Follow an established method and procedure• Produce results that support or refute the hypothesis

Once completed, the experiment should be written up in an appropriate format of approximately 2000 words. It should document the importantstages of the experiment and compare and contrast the data found with existing findings on the Halo Effect.

Steps in designing performance – based items

Step 1: Identify the knowledge, skills (and attitudes if relevant) to be incorporated into the task

For this step it is important to:• Choose specific topic areas in your curriculum that contain knowledge

essential for key understanding of the subject (e.g., key concepts, principles, etc)

• Identify the types of thinking that are important for promoting student understanding and subsequent competence in these topic areas (e.g., analysis, comparison and contrast, inference and interpretation, evaluation, generating possibilities, metacognition)

• Identify other attributes (e.g., attitudes/habits of mind) that may be relevant to effective learning in the task (e.g., perseverance, time-management, etc)

Steps in designing performance – based items

Step 2: Produce the learning task

It is important that the task:

• Clearly involves the application of the knowledge, skills and attitudes identified from Step 1.

• Is sufficiently challenging, but realistically achievable in terms of student’s prior competence, access to resources, and time frames allocated.

• Successful completion involves more than one correct answer or more than one correct way of achieving the correct answer

• Clear notes of guidance are provided, which:– Identify the products of the task and what formats of presentation are acceptable

(e.g. written report, oral presentation, portfolio, etc)– Specify the parameters of the activity (e.g. time, length, areas to incorporate,

individual/collaborative, how much choice is permitted, support provided, etc)– Cue the types of thinking and other desired process skills– Spell out all aspects of the assessment process and criteria.

Performance – Based Assessment +’s and –’s

PLUS• Measures a range of complex

skills and processes in real world or authentically simulated contexts

• Enables assessment of both the effectiveness of the process and product resulting from performance of a task

• Links clearly with learning and instruction in a planned developmental manner

• Motivates students through meaningful and challenging activities

MINUS• More time consuming than

paper and pencil type assessment

• Where courses focus on underpinning knowledge, there is less opportunity for performance – based assessment

• As these items often involve professional judgement, there is always the problem of subjectivity in marking

Evaluating test items

• Has each item’s subject content been verified(matched to learning outcomes)? Yes No

• Have you identified the cognitive response behaviour(types of thinking involved)? Yes No

• Has the correct answer been identified or appropriatemarking format created? Yes No

• Have you followed the item writing advice for eachtype of item? Yes No

• Have you edited the items for clarity, bias or insensitivity? Yes No

• Have you piloted the items? Yes No

The importance of a valid marking scheme

Having a well designed and accurate marking scheme and scoring system

for the assessments that learners complete is essential to the assessment

process.

 

For fixed response items this is a simple process of tabulating the number

of correct scores on test items. These can then be converted into grades if

necessary

However, for essay-type/open response items, there needs to be a clear

and well constructed marking scheme – especially for extended response

items.

Key planning considerations in producing a marking scheme

Decide on what exactly is to be assessed from the item - Performance Areas. These must be reflect the learning objectives for the module

Decide on the Performance Criteria for each of the performance areas. These are the key operations or elements that underpin competence in each of the performance areas.

Decide on the Marks Weighting for each performance area. This must reflect the table of specifications in the module document

Decide on the sources of Performance Evidence to be used in assessingthe item (e.g., written, oral, products, observation, questioning)

Decide on the Format for the marking scheme – typically a Checklist or Rating Scale/Scoring Rubric

Decide format on the basis of whether the item involves High or Low Inference

• Low inference items are those where the performances being tested are clearly visible and there is a widely established correct answer (e.g., conducting a fire drill, setting up an experiment) Here a Checklist is most appropriate

• High inference items involve performances that are less directly visible and/or more open to subjective judgement (e.g., creative writing, managing a team) Here a rating scale/scoring rubric is most appropriate

A major challenge to test design is to produce tasks that require low inference scoring systems. Unfortunately, many worthwhile student outcomes reflecting higher order thinking lend themselves more to high inference scoring.

Developing a checklist

• Identify the important components - procedures, processes or operations - in an assessment activity– for example, in conducting an experiment one important operation

is likely to be the generation of a viable hypothesis

• For each component, write a statement that identifies competent performance for this procedure, process or operation– in the above example, the following may be pertinent:

A clear viable hypothesis is described

• Allocate a mark distribution for each component - if appropriate– this is likely to reflect its importance or level of complexity

Note: Checklist are most useful for low inference items –where the performance evidence is clearly agreed and there is little disagreement relating to effective or ineffective performance (e.g., observable steps)

Assessment checklist for Assignment 1: Design and conduct a small experiment to test the Halo Effect

Performance Areas:

1. The context of the experiment is accurately described 2. A clear viable hypothesis is presented 3. The method/procedure is appropriate 4. There is no infringement on persons 5. Findings are clearly collated and presented 6. Valid inferences and interpretations are drawn from the data

and comparison is made with existing data 7. The write-up of the experiment meets required conventions

The allocation of marks for each performance area will reflect the weighting allocated in the Table of Specifications

Developing a rating scale/scoring rubric

• Define the performance areas for an assessment– for example, ‘Valid inferences and interpretations are drawn from

the data and comparison is made with existing data’

• Identify the key constructs/elements that underpin competence for each performance area– Using the above example:

• inference and interpretation• comparison and contrast

• Write a concise description of performance at a range of levels from very good to very poor – for example, 5 = very good; 1 = very poor

Note: Rating Scales/Scoring Rubrics are most for useful for high inference items –where the performance evidence requires considerable professional judgement in making an assessment decision

Scoring RubricValid inferences and interpretations are drawn from the data

and comparison is made with existing data

Score Description5 All valid inferences have been derived from the data. Interpretations are consistently logical

given the data obtained. All essential similarities and differences are identified between this data and existing data. The significance of these similarities and differences is fully emphasized.

4 Most of the valid inferences have been derived from the data. Interpretations are mainly logical given the data obtained. Most of the essential similarities and differences are identified between this data and existing. The main significance of these similarities and differences is emphasized.

3 Some valid inferences have been derived from the data. Some logical interpretations are made from the data obtained. Some essential similarities and differences are identified between this data and existing data. The significance of these similarities and differences is only partly established.

2 Few valid inferences derived and limited interpretation of findings. Comparison and contrast with existing data is partial and the significance is not established

1 Failure to make valid inferences and interpretations

Key checks for your scoring system

Irrespective of the format chosen, the system must:• incorporate the most important performance criteria that

underpin competence in each performance area identified

• marks allocated reflect the cognitive activities and skills which the assessment activity requires the learner to demonstrate

• adequate provision is made for acceptable alternative answers

• the scheme is sufficiently broken down and organized to allow the marking to be as objective as possible

What grading and recording system is to be employed?

Unless you are using a pass-fail system, it is likely that marks

from various assessments need to be collated and translated

into a final mark and/or grade. Furthermore, summative

grades need to be carefully recoded and secured.

Ensure that you carefully follow the grading and recoding

system employed by the institution/department.

Pitfalls in Assessment

This final set of slides focuses on some common

pitfalls in conducting assessment – be careful to

avoid them:

• The Halo Effect

• The Contrast Effect

• Assessing effort and progress rather than achievement

• Lack of clarity with the marking scheme

• Discriminatory practices

The Halo Effect

This is where our existing conceptions of a learners work affect

subsequent marking. For example, if we are used to a high standard of

work from a student we may develop a tendency to over-mark future

poorer work. The converse is also true in the case of students who are

generally perceived as less able

The Contrast Effect

This arises when the outcomes of an assessment are affected by comparing a particular learner with the preceding one, whether the work is good or bad. For example, if we have just assessed several weak assignments and are then presented with a quite well presented one, there is a danger of giving it more marks than it perhaps really deserves.

Assessing effort and progress rather than achievement

This occurs when an assessor is distracted by the efforts and progress a learner has made, rather than focusing on the actual attainments in relation to the learning outcomes and assessment criteria.

Remember, you must assess in relation to the performance areas identified in the marking scheme. Only if there is an allocation of marks for effort and progress can you then give marks accordingly.

Lack of clarity with the marking scheme

This is a common problem, resulting from not being sure about what to assess and the allocation of appropriate marks to parts of the assessment activity. Ensure that you are familiar with the learning outcomes, the marking scheme andwhat constitutes appropriate standards for different levels of performance.

Discriminatory practices

“I really think that people who are not purple And lack antenna cannot master engineering”

A Lien 4007

In assessment, as in other situations, this occurs when the assessor discriminates – either positively or negatively – in relation to a learner because of race, gender, creed, sexual preference or special needs. Care needs to be taken to ensure that learners receive fair and equal opportunities during their assessments.