In this theme we discuss in detail the topic evaluation. This
is a comprehensive and a complex theme. Therefore, during this
session, we discuss only a first part of the overall theme.
Slide 3
Evaluation Aggregation level Quality criteria Trends in
evaluation: Dimensions Definition Functions evaluation? When to
evaluate? Evaluation techniques Who is responsible Validity
Reliability Recency Authenticity In relation to Evaluation, we will
discuss three main themes and related subthemes. Measuring Valueing
Scoring
Slide 4
Evaluation Aggregation level Quality criteria Trends in
evaluation: Dimensions Trends in evaluation: Dimensions Definition
Functions evaluation? When to evaluate? Evaluation techniques Who
is responsible Validity Reliability Recency Authenticity We start
with a focus on the first main theme about the definition and the
concept of evaluation. Measuring Valueing Scoring
Slide 5
Evaluation: the concept Defining the concept evaluation is a
difficult issue since the concept itself only emphasizes one aspect
of what evaluation fully embraces; namely the giving a value to
what is being observed. As we will see, it also does not help to
replace the concept by other popular concepts, such as assessment.
Again, only one particular aspect of the whole process is being
emphasized.
Slide 6
Evaluation: the concept Read the following description of
evaluation: Evaluation is the entire process of collecting,
analysing and interpreting information about potentially every
aspect of an instructional activity, with the aim of giving
conclusions about the efficacy, efficiency and or any other impact
(Thorpe, 1988). You can observe that evaluation is a comprehensive
process that can be related to potentially every element in our
educational frame of reference.
Slide 7
Evaluation: the concept In the literature, an important
distinction is made between evaluation and assessment. Assessment
or measuring refers to the process of collecting and analysing
information (Burke, 1999 en Feden & Vogel, 2004) Evaluation
refers to, as stated earlier, adding a value to what has been
collected and analyzed in view of coming to a conclusion about the
efficacy, efficiency or any other impact.
Slide 8
But in the literature, an even more detailed distinction is
made between: Measuring/testing: collecting information
Evaluating/valuing: what is this information worth?
Scoring/grading: depending the worth , what score will we give It
is essential to distinguish these three approaches. One can measure
without valuing or scoring. And one cannot score without collecting
and valuing information. Evaluation: the concept
Slide 9
Evaluation Aggregation level Quality criteria Trends in
evaluation: Dimensions Trends in evaluation: Dimensions Definition
Functions evaluation? When to evaluate? Evaluation techniques Who
is responsible Validity Reliability Recency Authenticity We now
move to the second main theme that centers on quality criteria
Measuring Valueing Scoring
Slide 10
Evaluation: quality requirements Prior to a discussion of
recent developments in the field of evaluation, we first deal with
some critical quality requirements that are central in discussions
about evaluation: Validity Reliability Authenticity Recency
Evaluation Quality criteria Validity Reliability Recency
Authenticity
Slide 11
Validity Validity refers to the extent that the content of what
is being measured, valued and scored is related to the initial
evaluation objective. Typical questions that are raised in this
context are: What if we only measure geometry, when we want to come
to conclusions about mathematics performance in primary school?
What if we only get questions from chapter 5 during an exam? What
if we only ask memorization questions in a test when we also worked
in the laboratory and solved chemistry problems? Evaluation Quality
criteria Validity Reliability Recency Authenticity
Slide 12
Reliability Reliability refers to the extent our measurement is
stable. Typical questions raised are: If I repeat the same test
tomorrow, will I get the same results (stability)? Is there a large
difference in the ability to solve the different questions about
the same topic (internal consistency)? If someone else measured,
valued and scored the test, would he/she end up with the same
results? Evaluation Quality criteria Validity Reliability Recency
Authenticity
Slide 13
Authenticity refers to the extent the information we gather,
mirrors in a relevant, adequate, and authentic way reality.
Examples of related questions: Is it sufficient to ask student
nurses to give injections on a doll to evaluate their injection
skills? Is it adequate to give a flying license to someone who was
only tested in a flight simulator? Is it sufficient to say that one
is able to teach after evaluating his/her capacities with small
group teaching? Evaluation Quality criteria Validity Reliability
Recency Authenticity
Slide 14
Recency Recency questions the date information has been
collected, valued or scored in view of evaluation: Can we accept
credits obtained 5 years ago from someone who asks being releaved
of courses in a new study program? Can we hire a young house mother
who got her degree 10 years ago? Are the Basic Life Support Skills
mastered six months ago, still relevant today in an active first
aid officer? Evaluation Quality criteria Validity Reliability
Recency Authenticity
Slide 15
Evaluation Aggregation level Quality criteria Trends in
evaluation: Dimensions Definition Functions evaluation? When to
evaluate? Evaluation techniques Who is responsible Validity
Reliability Recency Authenticity From here on, we move to the third
main theme in this session about the recent developments in
evaluation. Five subthemes are discussed. Measuring Valueing
Scoring
Slide 16
Recent developments in evaluation Recent developments in
evaluation can be clustered along five dimensions: At what
aggregation level is the evaluation being set up? What are the
functions/roles of the evaluation? Who carries out the evaluation?
When is the evaluation being set up? What evaluation techniques are
being adopted? We discuss some examples in relation to each
dimension.
Slide 17
Dimension 1: aggregation levels Firstly, we observe that
evolutions in evaluation are related to the aggregation levels in
our educational frame of reference: Micro level Meso level Macro
level We look in relation to each aggregation level to particular
new developments. Evaluation Aggregation level Trends dimensions
Functions evaluation? When to evaluate? Evaluation techniques Who
is responsible Micro level Meso level Macro level
Slide 18
Dimension 1: aggregation levels At each aggregation level, the
same elements re- appear. Evaluation can be related to every
element in the educational frame of reference Responsible for the
instruction Learner Learning activities Organisation Context
Instructional activities (objectives, learning content, media,
didactical strategies, evaluation)
Slide 19
Micro level Example 1: evaluation of the extent the learning
objectives have been attained; Example 2: evaluation of didactical
strategies. Evaluation Aggregation level Trends dimensions
Functions evaluation? When to evaluate? Evaluation techniques Who
is responsible Micro level Objectives Didactical strategies
Slide 20
Micro level: evaluation learning objectives During evaluation
we measure the behavior, we value the behavior and give a score.
The question is What is the base of giving a certain value?. Based
on a criterion? Criterion referenced assessment Based on a norm,
e.g., group mean? Norm referenced assessment Based on earlier
performance of learner? Ipsative assessment or self-referenced
assessment Micro level Objectives Didactical strategies
Slide 21
Example: athletics, 15-year olds have to run 100 meter?
Criterion referenced assessment Every performance is compared to an
a priori stated criterion; e.g., less than 15 seconds Norm
referenced assessment Every performance is compared to the
classroom mean (imagine your are in a class with fast runners).
Ipsative assessment of self-referenced assessment Every performance
is compared to the earlier performance of the individual learner;
emphasis on progress. Micro level: evaluation learning objectives
Micro level Objectives Didactical strategies
Slide 22
Micro level: evaluation instructional strategies Hattie (2009)
discusses in his meta-analysis instructional activities. These
analyses look whether different instructional strategies have a
differential impact on learners. Do they matter? In the following
example you see that the didactical strategy homework has an
average effect size d =.29. This is far below the benchmark d =.40.
Micro level Objectives Didactical strategies
Slide 23
Meso level: evaluation at school level Evaluation Aggregation
level Trends dimensions Functions evaluation? When to evaluate?
Evaluation techniques Who is responsible Micro level Meso level
Macro level
Slide 24
Meso level: evaluation at school level Recent developments at
the school level look whether schools have a value-added; this
means an additional value that results in better learning
performance. But can we simply compare schools with one another?
Does this not lead to simple ranking as depicted in this journal
Aggregation level Meso level
Slide 25
One cannot simply compare schools. Calder (1994) puts forward
in this context, the CIPP model to consider everything in balance:
Context evaluation: the geographical position of a school, the
available budget, the legal base, etc. Input evaluation: what the
school actually uses as resources, its program, its policies, the
number and type of staff members, etc. Process evaluation: the way
a program is implemented, the strategies being used, the evaluation
approach, the professional development of the staff, etc. Product
evaluation: the effects, such as goal attainment, throughput,
return on investment, etc. Meso level: evaluation at school level
Aggregation level Meso level
Slide 26
Comparing schools with the CIPP model can as such imply that: A
school with a lot of migrants outperforms a school with dominantly
upper class children. A school can be good in attaining certain
goals, but can be less qualified in attaining other goals. A school
can be criticized as to its policies. That one will consider the
geographical location of a school when discussing results (e.g., an
unsafe neighbourhood). That we will also look at what the learners
do later when they go to another school (e.g., success at
university). Schools are being assessed by the inspection on the
base of the CIPP model. Meso level: evaluation at school level
Aggregation level Meso level
Slide 27
The inspection reports are public. Meso level: evaluation at
school level
Slide 28
Macro level: school effectiveness Evaluation Aggregation level
Trends dimensions Functions evaluation? When to evaluate?
Evaluation techniques Who is responsible Micro level Meso level
Macro level
Slide 29
Macro level: school effectiveness Read the following
description: The aim of school effectiveness research is to
describe and explain the differences between schools on the base of
specific criteria. This research explores the differences in
performance on the base of differences in those responsible for
teaching, the learners, the classes, the school. You can see that
as in the CIPP model explanations are sought at the level of all
schools in the educational system. Aggregation level Macro
level
Slide 30
This development started from very critical reports as to the
value-added of schools: Coleman report (1966, chapter 1): Schools
have little effect on students achievement that is independent of
their family background and social context. Plowden report (1967,
p.35): Differences between parents will explain more of the
variation in children than differences between schools. () Parental
factors, in fact, accounted for 58% of the variance in student
achievement in this study. Schools want in contrast to these
reports proof they make a difference and contribute to learner
performance Macro level: school effectiveness Aggregation level
Macro level
Slide 31
A central critique on the Coleman and Plowden report is that
they neglect the complex interplay that helps to explain
differences between, schools; see the CIPP model. Instead of simply
administering tests and comparing results, we have to look next to
product effects to the processes and variables that are linked to
these results. This is labelled with the concept performance
indicators. Macro level: school effectiveness Aggregation level
Macro level
Slide 32
Macro level: performance indicators Performance indicators are:
"statistical data, numbers, costs or any other information that
measures and clarifies the outcomes of an institution in line with
preset goals. You can notice that the emphasis in performance
indicators is on the description and explanation of differences in
performance. One of the best known performance indicator studies is
the three-yearly PISA study: Programme for International Student
Assessment. E.g., in PISA 2006, performance was compared of schools
in 54 countries. Aggregation level Macro level
Slide 33
Macroniveau: performance indicators Results of PISA in 2006
show for example the high performance of Flemish schools for
sciences, mathematics, and reading literacy. Aggregation level
Macro level
Slide 34
PISA results are not only described. They are also explained.
In this graphic, one sees how the PISA results are associated with
the socio-economic status (SES) of the learners. The higher the
status, the higher the results. SES is determined by the
educational level of the parents, their income, their possession of
cultural goods (e.g., books), etc. Aggregation level Macro
level
Slide 35
Dimension 2: Functions of evaluation Evaluation Aggregation
level Quality criteria Definition Functions evaluation? When to
evaluate? Evaluation techniques Who is responsible Validity
Reliability Recency Authenticity Trends in evaluation: Dimensions
Measuring Valueing Scoring
Slide 36
Dimension 2: Functions of evaluation Why do we evaluate? There
might be different reasons: Formative evaluation To see where one
is in the learning process and how we can redirect the learning
process Summative evaluation To determine the final attainment of
the goals. Prediction function To predict future performance (e.g.,
success in higher education) Selection function To see whether one
is fit for a job or task. Function s Formative Summative Selection
Prediction
Slide 37
Dimension 2: Functions of evaluation Abroad, there is a lot of
attention for the selection function; see the emphasis on entrance
exams. In this example, one sees a lucky candidate (and his mother)
who succeeded in the entrance exam for a Chinese university.
Function s Formative Summative Selection Prediction
Slide 38
Dimension 2: Functions of evaluation Earlier, there was a major
emphasis on summative evaluation. Nowadays this emphasis has
shifted towards formative evaluation. Why? Does one learn from
evaluative feedback; this is also called consequential validity
genoemd? From the evaluation results, does this not imply that the
teacher has to redirect the instruction, the support, the learning
materials, etc? Does a learner already reach a preliminary
attainment level? Function s Formative Summative Selection
Prediction
Slide 39
Dimension 3: Who is responsible? Evaluation Aggregation level
Quality criteria Definition Functions evaluation? When to evaluate?
Evaluation techniques Who is responsible Validity Reliability
Recency Authenticity Trends in evaluation: Dimensions Measuring
Valueing Scoring
Slide 40
Dimension 3: Who is responsible? Traditionally, the teachers is
responsible for the evaluation. But there are new developments: The
learner him/herself carries out the evaluation : self assessment
The learner and peers carry out the evaluation together: peer
assessment An external responsible carries out the evaluation
(e.g., other teacher). An external company carries out the
evaluation: assessment centers Self assessment Peer assessment
Assessment center Who
Slide 41
New development: self assessment Self-assessment is seen as a
type of evaluation that aims at fostering the learning process
(Assessment-as-learning) : formative evaluation function Two main
steps to be taken: Initial training to develop criteria and tool,
and to discuss the value of what is being measured. Next, usage of
the tools/instruments and developing a personal opinion. Scoring is
not an issue here. Very useful technique: rubrics (see further)
Dimension 3: Who is responsible? Self assessment Peer assessment
Assessment center Who
Slide 42
Assessment centres: external company that carries out
evaluation; mostly with selection function Standardized procedure
to assess complex behavior on the base of multiple information
bases. The behavior is assessed in simulated contexts. Multiple
persons evaluate and come to a shared vision. Different evaluators
are involved and guarantee a 360 approach of the evaluation This
technique fulfills selection function e.g., when screening
candidates for a job Dimension 3: Who is responsible? Self
assessment Peer assessment Assessment center Who
Slide 43
Dimension 4: When to evaluate? Evaluation Aggregation level
Quality criteria Definition Functions evaluation? When to evaluate?
Evaluation techniques Who is responsible Validity Reliability
Recency Authenticity Trends in evaluation: Dimensions Measuring
Valueing Scoring
Slide 44
Dimension 4: When to evaluate? There is a shift in the moment
the evaluation is being set up: towards prior to and during the
learning process; serving a formative evaluation function: Prior
Prior knowledge testing During Progress testing Portfolio
evaluation After Final evaluation
Slide 45
Dimension 5: What technique? Evaluation Aggregation level
Quality criteria Definition Functions evaluation? When to evaluate?
Evaluation techniques Who is responsible Validity Reliability
Recency Authenticity Trends in evaluation: Dimensions Measuring
Valueing Scoring
Slide 46
Dimension 5: What technique? Next to traditional evaluation
tests with multiple choice questions, open answer questions,
fill-in questions, sort questions, we observe a series of new
techniques. Examples: Rubrics: attention is paid to criteria and
indicators Portfolios: file with letters, information,
illustrations, products, as the information base for the evaluation
Evaluation techniques Rubrics Portfolios
Slide 47
Dimension 5: Technique rubrics Rubrics: Define clear criteria:
concrete element of a complex learning objective that is being
measured, valued and scored Determine for each criterion a number
of quality indicators: indicators exemplify the level at which a
certain criterion is being met, answered, attained Evaluation
techniques Rubrics Portfolios
Slide 48
Dimension 5: Technique Rubrics Example rubric: mixing colours
In next steps of the learning process, we can add criteria and/or
performance indicators to the rubric 1234 Amount of paint being
used?Learner does not consider the amount of different colours
being used --Learner uses right from the start minimal amounts of
paint to start mixing colours What colour is mixed first?Starts
with the darkest colour to mix - -Starts with the lightests colour
to mix What order in mixing colours? Performance indicators
Criteria
Slide 49
Dimension 5: Technique Rubrics Example rubric: Writing of a
historical fiction story Evaluation techniques Rubrics
Portfolios
Slide 50
Dimension 5: Technique Portfolio Read this description of a
portfolio: A portfolio is a file with letters, information,
illustrations, products, that is used as an information base for
the evaluation. Evaluation techniques Rubrics Portfolios
Slide 51
Dimension 5: Technique Portfolio Types of portfolios: A
document portfolio or product portfolio: documentation that helps
to describe the activities in the training, intership, practical
experience, (measurement). In addition to this info, learners can
add their reflections (valueing). Typically used with student
doctors, nurses, teachers, A process portfolio: a logbook.
Documentation of the progress in the learning process, enriched
with reflections. Typically used with student doctors, nurses,
midwives, teachers, . A showcase portfolio: the best of . Bundle of
the best work of a student that helps to come to a conclusion about
his/her performance. Typical use in decorative arts, music,
theater, architects, . Evaluation techniques Rubrics
Portfolios
Slide 52
Example of a process portfolio for student teachers Dimension
5: Technique Portfolio
Slide 53
Evaluation Aggregation level Quality criteria Trends in
evaluation: Dimensions Definition Functions evaluation? When to
evaluate? Evaluation techniques Who is responsible Validity
Reliability Recency Authenticity We hope you developed now a first
comprehensive picture about evaluation. Measuring Valueing
Scoring
Slide 54
Einde van dit instructiepakket Pak nu de eindtoets aan. Ga
opnieuw naar je Minerva werkplek