Jurnal (Studies and Evaluation)

13
Development and evaluation of a summative assessment program for senior teacher competence Anouke Bakx a, *, Liesbeth Baartman b , Tamara van Schilt-Mol c a Fontys University of Applied Sciences, FHKE, pabo Eindhoven, De Lismortel 25, 5612 AR Eindhoven, The Netherlands b Utrecht University of Applied Sciences, Faculty of Education, Research Group Vocational Education, P.O. Box 14007, 3508 SB Utrecht, The Netherlands c HAN University of Applied Sciences, Faculty of Education, Research Centre for Quality for Learning, P.O. Box 30011, 6503 HN Nijmegen, The Netherlands Introduction For many years, the quality of education in general and of teachers in particular has been the object of discussion and research. Indeed, teacher quality is important because teachers play a crucial role in realizing the quality of the learning environment (Hattie, 2009) and determine to a great extent the school’s quality (Marzano, 2011). In this respect, Rasmussen and Friche (2011) state that schools experience a pressure to increase and demonstrate the quality of their education and teachers. In the Netherlands, this pressure to increase the quality of education in general and of teachers in particular has been addressed by the Teaching Advisory Board of the Dutch government. As a way to increase teacher quality, they advised to create more opportunities for career development and differentiation within the teaching profession. This should increase the attractiveness of the teaching profession and prevent good teachers from leaving schools and choosing other career paths (Teaching Advisory Board, 2007). The Dutch Ministry of Education decided that secondary schools should introduce integral personnel management in order to (1) stimulate teachers’ development; (2) offer opportunities for differentiation in the teacher profession; and (3) raise the quality of Dutch secondary education. It was assumed that the introduc- tion of integral personnel management in secondary education would lead to increased educational quality. It might help putting the best teachers on the most complex tasks and pupil groups, and the possibility to address weak teaching practices (Borko, Whitcomb, & Liston, 2009). To integrate an effective and fair integral personnel manage- ment system, instruments are needed that validly and reliably assess teacher quality (van der Schaaf, Stokking, & Verloop, 2005). At the moment, no specific standardized procedures or guidelines for teacher assessment are available and Dutch secondary schools emphasize those aspects which are important for their particular schools. The common practice is that teachers gain a raise of salary each year, simply by having worked a year more as a teacher. In order to effectuate this, one annual dialogue between teacher and management takes place. This can hardly be looked upon as an assessment method for teacher quality. The question then arises whether there are possibilities to assess teacher quality validly. Whereas assessment and development of student teachers has quite often been studied (e.g. Hegender, 2010; Noell & Burns, 2006), summative assessment of teachers working in schools has been studied distinctively less often. Therefore the aim of the current study is to develop and evaluate a summative assessment program for senior teachers in secondary education. Besides this summative function, the assessment program should have a formative function to enable and stimulate teachers to reflect on their own competence development. Studies in Educational Evaluation 40 (2014) 50–62 A R T I C L E I N F O Article history: Received 13 March 2013 Received in revised form 24 November 2013 Accepted 25 November 2013 Available online 18 December 2013 Keywords: Teacher evaluation Evaluation methods Secondary education A B S T R A C T The focus of this article is the development and evaluation of an assessment program for measuring senior teachers’ competences in secondary schools. The goals of the developed instrument were measuring senior teachers’ competences and providing the opportunity for self-reflection for the teachers assessed. This instrument was developed and evaluated in four steps: (1) the content of assessment was determined, defined in senior teacher competences; (2) criteria and standards were specified for the assessment of the competences; (3) the assessment methods were determined; and (4) the assessment program was evaluated by means of a pilot study. The target group consisted of eight potential senior teachers, who were assessed with the new instrument. In total, eleven teachers and 70 pupils evaluated the new assessment instrument. The assessment seems fit for the purpose. Pupils are positive about the assessment program, whereas the teachers are more sceptic about it. ß 2013 Elsevier Ltd. All rights reserved. * Corresponding author. Tel.: +31 8778 75 993. E-mail addresses: [email protected] (A. Bakx), [email protected] (L. Baartman), [email protected] (T. van Schilt-Mol). Contents lists available at ScienceDirect Studies in Educational Evaluation jo ur n al ho mep ag e: www .elsevier .c om /st u ed u c 0191-491X/$ see front matter ß 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.stueduc.2013.11.004

Transcript of Jurnal (Studies and Evaluation)

Page 1: Jurnal (Studies and Evaluation)

Studies in Educational Evaluation 40 (2014) 50–62

Development and evaluation of a summative assessment program forsenior teacher competence

Anouke Bakx a,*, Liesbeth Baartman b, Tamara van Schilt-Mol c

a Fontys University of Applied Sciences, FHKE, pabo Eindhoven, De Lismortel 25, 5612 AR Eindhoven, The Netherlandsb Utrecht University of Applied Sciences, Faculty of Education, Research Group Vocational Education, P.O. Box 14007, 3508 SB Utrecht, The Netherlandsc HAN University of Applied Sciences, Faculty of Education, Research Centre for Quality for Learning, P.O. Box 30011, 6503 HN Nijmegen, The Netherlands

A R T I C L E I N F O

Article history:

Received 13 March 2013

Received in revised form 24 November 2013

Accepted 25 November 2013

Available online 18 December 2013

Keywords:

Teacher evaluation

Evaluation methods

Secondary education

A B S T R A C T

The focus of this article is the development and evaluation of an assessment program for measuring

senior teachers’ competences in secondary schools. The goals of the developed instrument were

measuring senior teachers’ competences and providing the opportunity for self-reflection for the

teachers assessed. This instrument was developed and evaluated in four steps: (1) the content of

assessment was determined, defined in senior teacher competences; (2) criteria and standards were

specified for the assessment of the competences; (3) the assessment methods were determined; and (4)

the assessment program was evaluated by means of a pilot study. The target group consisted of eight

potential senior teachers, who were assessed with the new instrument. In total, eleven teachers and 70

pupils evaluated the new assessment instrument. The assessment seems fit for the purpose. Pupils are

positive about the assessment program, whereas the teachers are more sceptic about it.

� 2013 Elsevier Ltd. All rights reserved.

Contents lists available at ScienceDirect

Studies in Educational Evaluation

jo ur n al ho mep ag e: www .e lsev ier . c om / s t u ed u c

Introduction

For many years, the quality of education in general and ofteachers in particular has been the object of discussion andresearch. Indeed, teacher quality is important because teachersplay a crucial role in realizing the quality of the learningenvironment (Hattie, 2009) and determine to a great extent theschool’s quality (Marzano, 2011). In this respect, Rasmussen andFriche (2011) state that schools experience a pressure to increaseand demonstrate the quality of their education and teachers. In theNetherlands, this pressure to increase the quality of education ingeneral and of teachers in particular has been addressed by theTeaching Advisory Board of the Dutch government. As a way toincrease teacher quality, they advised to create more opportunitiesfor career development and differentiation within the teachingprofession. This should increase the attractiveness of the teachingprofession and prevent good teachers from leaving schools andchoosing other career paths (Teaching Advisory Board, 2007). TheDutch Ministry of Education decided that secondary schoolsshould introduce integral personnel management in order to (1)stimulate teachers’ development; (2) offer opportunities fordifferentiation in the teacher profession; and (3) raise the quality

* Corresponding author. Tel.: +31 8778 75 993.

E-mail addresses: [email protected] (A. Bakx), [email protected]

(L. Baartman), [email protected] (T. van Schilt-Mol).

0191-491X/$ – see front matter � 2013 Elsevier Ltd. All rights reserved.

http://dx.doi.org/10.1016/j.stueduc.2013.11.004

of Dutch secondary education. It was assumed that the introduc-tion of integral personnel management in secondary educationwould lead to increased educational quality. It might help puttingthe best teachers on the most complex tasks and pupil groups, andthe possibility to address weak teaching practices (Borko,Whitcomb, & Liston, 2009).

To integrate an effective and fair integral personnel manage-ment system, instruments are needed that validly and reliablyassess teacher quality (van der Schaaf, Stokking, & Verloop, 2005).At the moment, no specific standardized procedures or guidelinesfor teacher assessment are available and Dutch secondary schoolsemphasize those aspects which are important for their particularschools. The common practice is that teachers gain a raise of salaryeach year, simply by having worked a year more as a teacher. Inorder to effectuate this, one annual dialogue between teacher andmanagement takes place. This can hardly be looked upon as anassessment method for teacher quality. The question then ariseswhether there are possibilities to assess teacher quality validly.Whereas assessment and development of student teachers hasquite often been studied (e.g. Hegender, 2010; Noell & Burns,2006), summative assessment of teachers working in schools hasbeen studied distinctively less often. Therefore the aim of thecurrent study is to develop and evaluate a summative assessmentprogram for senior teachers in secondary education. Besides thissummative function, the assessment program should have aformative function to enable and stimulate teachers to reflect ontheir own competence development.

Page 2: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–62 51

Indeed, literature shows different perspectives on how teachercompetence is defined and measured, many of these focusing onthe effectiveness of teachers in accomplishing high student learningoutcomes (e.g. Chen, Mason, Staniszweski, Upton, & Valley, 2011;Mangiante, 2011; Praslova, 2010; Seidel & Shavelson, 2007). Thesestudies rely on the assumption that certain teacher behaviour (denBrok, Brekelmans, & Wubbels, 2004) and teachers’ (pedagogical)content knowledge (e.g. Baumert et al., 2010; Kleickmann et al.,2013; Shulman, 1986) have an influence on student achievement.Research results about teacher competences were used as input forthe ‘assessment development team who would construct a school-specific assessment method’ and based on this together with inputfrom the teachers themselves, an assessment instrument wasdeveloped.

The focus of this instrument was on senior teachers, because oftheir important role in the school; they have the most important(teaching) positions in schools and are responsible for coachingstarting teachers, for example. It is assumed that these seniorteachers determine the quality of the school to a large extent. Next tothis, there was additional funding from the government, meant forthe best teachers, in order to motivate them additionally and keepthem in school. For the integral personal management of a school it isimportant to be able to ‘spot’ and assess these key teachers in a validway, presumably in a way that is accepted by the school team. In thestudy described in this paper, senior teachers have already beeneffective teachers for many years and for the new program to bedeveloped, competences were needed that would have an additionalvalue beyond ‘being a very effective teacher’.

Thus, the main focus of our study was to develop and test anassessment program for distinguishing average senior teachersfrom very good senior teachers. The assessment program shouldcontribute to an opportunity of self-reflection for the teachers aswell. Therefore, the central question of our study is: How cansenior teachers’ competence in secondary education be assessed,while providing the opportunity for self-reflection by the seniorteachers? The assessment program was developed in closecollaboration with a large secondary school and a pilot studywas organized in which we carried out and evaluated theassessment program. In order to do so, the following steps werecarried out: first, literature was explored on what ‘good teachers’are and the content of the teacher competence had to bedetermined. Second, criteria and standards were defined in orderto validly assess the competences of senior teachers. Third, theprogram sections of the assessment were determined. The finalstep was to carry out a pilot with eight participating seniorteachers. The new assessment program was evaluated. Below,these four steps are described in detail.

Theoretical background

Defining good teachers

The ability to distinguish average senior teachers from verygood senior teachers depends on how senior teacher competence isdefined and what assessment criteria and standards are set(Uhlenbeck, 2002). In general, all assessments require a clearnotion of the construct to be assessed (Messick, 1995; Sadler,1998). This is especially important for the development processdescribed in this article because the assessment program beingdeveloped in this study can be considered a ‘high stake’assessment. A positive assessment result would lead to a salaryraise, while negative outcomes of the assessment program wouldlead to a ‘frozen salary’. Senior teachers, as we focus on in thisstudy, ought to be the school’s best teachers. Defining goodteachers is complex and there is no consensus on this topic, yet(e.g. Berliner, 2001; Fenstermacher & Richardson, 2005).

Contemporary educational research on good teachers isscattered across a variety of research traditions, showing adiversity of definitions, instruments and results related to theissue of good teaching. These traditions can be broadly categorizedas: (1) perception studies of ideal teaching, including learningenvironment research (Allen & Fraser, 2007); (2) effectivenessresearch (e.g. den Brok et al., 2004; Seidel & Shavelson, 2007), (3)studies on teachers’ professional knowledge (e.g. Berliner, 2004;Darling-Hammond & Snyder, 2000; Verloop, 2005), and (4)research on teachers’ professional identity (e.g. Beijaard, Meijer,& Verloop, 2004; Day, Sammons, Stobart, Kingston, & Gu, 2007).These four traditions have their own specific perspective ofstudying good teaching practices.

The first perspective, perception studies of ideal teaching, forexample, show that students (aged 7–16 years) value a nicepersonality and teaching ability very important (e.g. Beishuizen,Hof, van Putten, Bouwmeester, & Asscher, 2001), as well ascompetent instructing, focusing on transfer of knowledge andskills. Kutnick and Vena (1993) mentioned physical presentation,teachers’ care for students, and trustworthiness as being importantfor good teachers whereas Hamacheck (1969) adds being helpful inschoolwork, clear explanation and humour.

The second tradition, effectiveness research, mainly focusses onthe results of teachers’ actions on students’ learning processes,achievement or attitude towards learning (Seidel & Shavelson,2007). Seidel and Shavelson (2007) used an interesting frameworkof teachers’ effectiveness based on cognitive models of teaching(and student learning) in their meta-analysis on teachers’effectiveness studies. One of their conclusions was that domain-specific components of teaching resulted in the largest effects forstudents’ learning. Studies within this perspective show that thecombination of teaching skills with communicative competenceare important for gaining positive achievement by the students(e.g. Hattie, 2009; Marzano, 2003; Ryan & Deci, 2000; Scheerens,2007). Further, Brophy and Good (1986) stated instruction andclassroom management techniques are very important teacherbehaviours. This is in line with findings from learning environmentresearch as described above. More specifically, effectivenessstudies show that in order to gain high student outcomes, teachersshould be able to realize an appropriate level of difficulty for theinstruction, continuous progress at a high success rate, effectivediagnosis of learning needs and prescription of learning activitiesand monitoring of progress and continuous practice, integratingnew learning (Brophy & Good, 1986; Marzano, 2003). This also fitsthe perception perspective, in which students also state thatteaching ability is important and that they are preferably taught bycompetent instructors, who can transfer knowledge and skills(Beishuizen et al., 2001).

The third tradition described, concerns the (practical andtheoretical) professional knowledge required for good teaching.Teachers’ domain-specific knowledge is important for explainingproperly and asking the stimulating, specific, subject-relatedquestions (Darling-Hammond, 1999). In order to be able toinstruct well (professional) knowledge of teachers is considereda requirement (Clausen, Reusser, & Klieme, 2003; Wise & Okey,1983). More specifically, teachers’ subject matter knowledge andpedagogical content knowledge have been argued to be essentialfor realizing quality of education (e.g. Hill, Rowan, & Loewenberg-Ball, 2005; Shulman, 1986). Teachers’ pedagogical and subject-related knowledge are often linked to their quality of instruction(Elbaz, 1991; Shulman, 1987).

Finally, the fourth tradition concerns research on teachers’professional identity, taking the teacher as a person as focus forresearch, stating that the teacher’s personality is ‘omnipresent’ inhis way of teaching and professional learning (Beijaard et al.,2004). The identity perspective claims that teachers perceive

Page 3: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–6252

teaching as a combination of different roles regarding the teachingjob and a certain hierarchy concerning these roles. Teachers viewthemselves as subject matter experts, learning experts, andpedagogical experts (Beijaard, Verloop, & Vermunt, 2000). Theperceived hierarchy in roles determines teachers’ professionalidentity and their behaviour they show. Taking results from theabove described research traditions together, an ‘integratedexpertise’ comes forward, combining professional (pedagogicaland subject-related) knowledge and instructional skills (Beijaardet al., 2000; Darling-Hammond, 1999; Stronge, 2007). From theperception studies and the studies on professional identity ofteachers, personality-related characteristics could be added, butup to now little empirical evidence has been found that this leadstowards better student outcomes.

Summarizing, good teaching and good teachers have beenstudied from different perspectives, leading to different foci, likeinstructional quality and classroom management, professionalknowledge and teachers’ personality as possible main factors forgood teaching. As Gage (1964) stated, it might be the case thatteaching and teacher quality cannot be described by a single theoryat all. Vanderlinde and Van Braak (2010, p. 303) stated that ‘‘thetraditional top-down model of the development and disseminationof educational innovations should be replaced by a model whereteachers share a primary role with educational researchers in thedevelopment of innovative practices (Englert & Tarrant, 1995)’’.Vanderlinde and Van Braak (2010) refer to a research-practice gap.With this, they emphasize the fact that the use of and reflectionupon and use of academic research by teachers are usually lessthan optimal. It seems that teachers and other educationalprofessionals do not seem to see an additional value of educationalresearch, or they are unable to use results from educationalresearch in their practice. Vanderlinde and Van Braak (2010) foundin their study a possible solution to bridge this gap, by realizingmore cooperation between researchers and educational practi-tioners. In our study, for example, we have translated their insightsinto a more bottom up approach, by giving the information fromliterature as input to the development team.

Instead of sticking to one perspective of teacher quality, wechoose for a more eclectic approach, using multiple perspectives.This was done in order to increase the school team’s commitment,leading towards an underlying competence profile for seniorteachers that would be recognized and accepted by the schoolteam, partly because this would be developed by the school teamitself instead of driven by one single theoretical perspective.Teachers’ position towards an educational innovation is morepositive when given ownership, agency and logical sense-making(Ketelaar, Beijaard, Boshuizen, & den Brok, 2012). This is why wedecided to take our knowledge on good teachers with us as astarting point for the development of the assessment instrument,while not having these theoretical findings dominating thediscussions with the development team of the instruments (seebelow). The way of bottom up working – instead of top down – onthe development of a high stake assessment program, as describedin this paper, contributes to the body of knowledge of assessmentdevelopment as well as educational innovation, when striving foroptimal commitment of the school team.

Development and evaluation of the teacher assessmentprogram

Step 1: determining the content: teacher competence

As this study was carried out in a large secondary school in theNetherlands, the competence profile had to be locally valid. That is,the Dutch government has adopted the ‘‘Professions in EducationAct’’ (2005), which specifies seven teacher competences as the

minimum quality for certified teachers. Teacher-training collegeshave to use these competences to assess their teacher students anda logical consequence is to use these same competences to assessworking teachers in a personnel management system. These seventeacher competences are (Snoek et al., 2009): (1) interpersonalcompetence; (2) pedagogic competence; (3) subject knowledgeand methodological competence; (4) organizational competence;(5) competence for collaboration with colleagues; (6) competencefor collaboration with the working environment; and (7) compe-tence for reflection and development. These seven competencesrefer to a ‘basic level’ or a starting point for junior teachers. For this,quick scans and internet self-assessment tools for teachers areavailable. However, more than this starting level is expected fromsenior teachers, being already the more effective teachers inschool. Furthermore, instruments to assess teachers’ behaviour inthe classroom already exist, like the Questionnaire on TeacherInteractions (den Brok et al., 2003). Unfortunately, in Dutchsecondary schools, personnel management is a rather underde-veloped area and has not received much attention from the schoolmanagement so far (Seezink & Poell, 2009). There are nosystematically documented experiences from other Dutch schools,yet, which could be used in order to develop the assessmentprogram aimed for. Therefore, a competence profile for seniorteachers – extending the basic and effective level – has beendeveloped in this study.

School development team

In this study, a school-specific competence profile was devel-oped for senior teachers. The (school) context was explicitly takeninto account because of the specific demands of the schoolenvironment for senior teachers (Berliner, 2005) and because ofthe commitment of the school team to this new assessmentprogram. Next to this, the described theoretical perspectives werebrought into the development team by the authors of this paper, aswell as specific trends within teachers education, for example, theteacher-research as being a competence teachers need for their on-going professional development (van der Linden, Bakx, Ros, &Beijaard, 2012).

A development team consisting of ten teachers from thesecondary school (10% of the entire school team) and themanagement were brought together with the assignment tospecify the competences which were needed for by their seniorteachers. Four out of the eight senior teachers to be assessedparticipated voluntarily in the development team. A large teamwas chosen in order to create a valid profile as well as acommitment for the use of the new competence profile for seniorteachers as the underlying basis for the assessment program. Forthe acceptance of this new assessment program, the team ofteachers should be confident that it is a valid and fair way ofjudgement (Baartman, Prins, Kirschner, & Van der Vleuten, 2007).During one year, the development team worked on the compe-tence profile. They started with two open brainstorm sessions andthey investigated literature on teacher quality and recenteducational developments. The teachers themselves mainly usedliterature, which they frequently used for their own professionaldevelopment. This was followed by discussion-sessions of first,second and third drafts of the competence profile for seniorteachers. Eventually, based on consensus, the competence profilefor senior teachers was accepted by all members of thedevelopment team (see Table 1).

The competence profile for senior teachers consists of twoparts. First, the development team agreed that the sevencompetences determined by the Dutch government as describedabove relate to senior teachers as well as to starting teachers. Thesecompetences mainly focus on effective classroom activities, butalso include cooperation with colleagues and stakeholders in the

Page 4: Jurnal (Studies and Evaluation)

Table 1Competence profile for senior teachers, description of (level of) competences (level

1 = lowest level, level 3 = highest level).

1. Flexibility/anticipatingFlexibility/anticipating concerns the application of new alternatives:

improvise – change methods easily, when the existing method is not

effective

propose new ideas

move quickly and effectively between one’s own task and someone else’s

task; adjust quickly

see the need for change, make suggestions and take initiatives

Level 1: change ideas and methods to changing circumstances

Level 2: change robust patterns

Level 3: apply new alternatives

2. InnovatingInnovating concerns the proposition and creation of alternatives:

introduce new activities

take risks and be not afraid to fail

see and use opportunities

create a stimulating learning environment for pupils

Level 1: develop new, original methods and applications

Level 2: propose and create alternatives for existing routines

3. LearningLearning concerns sharing learning experiences with others and act as a role

model:

using different learning styles and apply these while learning

reflective attitude in new situations

as questions and show ones own insecurities

be a role model in admitting mistakes and learn from it

talk about ideas about one’s own professional development and ask for

feedback

Level 1: reflect upon one’s own qualities and translate this into behavioural

changes

Level 2: look for/create situations to learn

Level 3: share learning experiences with others and act as a role model

4. Dealing with stressDealing with stress concerns guarding one’s own limits and talk about this

in the teachers’ team:

in times of pressure make sure the team works efficiently (priorities)

stick to one’s own ideas in time of pressure

talk about resistance to change by analysing the process together with

others

Level 1: take one’s own limits into account

Level 2: take things ‘easy’

Level 3: guard one’s own limits and talk about this in the teachers’ team

5. CoachingCoaching concerns the stimulation of others and increasing their

self-confidence:

stimulate others to ask questions about their drives and motives

put aside one’s own judgement to increase other people’s confidence

notice and mention individual contributions to . . .

represent trust and security

stimulate others towards self-reflection and self-judgement

Level 1: think in line with other people and talk about it

Level 2: motivate and stimulate others to learn

Level 3: stimulate others and increase their self-confidence

6. Problem solvingProblem solving concerns helping others to solve their problems:

anticipate problems within or out the team or school

analyse problems: what is the real question?

help others when they are not able to solve their own problems

Level 1: detect problems

Level 2: solve problems

Level 3: help others to solve their problems

7. CooperationCooperation concerns the sustainment of teambuilding:

create conditions for a cooperative (learning) environment for pupils

create an open atmosphere

be trustful

take differences between people into account and talk about ‘difficult

issues’ in the team

stimulate others to learn from one another and help them with this

Level 1: contribute to team goals

Level 2: be responsible for team goals

Level 3: look for cooperation with others

Level 4: sustain teambuilding

8. Results based actingResults based acting concerns making sure plans can be realized (SMART):

assess the progress of activities and lead the team, when necessary

help others define SMART-goals

assess the quality of new educational products by means of systematic

evaluation

Level 1: define and realize goals

Level 2: stimulate and lead others

Level 3: make sure plans can be realized (use of SMART goalsa)

(Drucker, 1954)

a SMART goals are specific, measurable, attainable, relevant and time-bound.

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–62 53

educational environment, even though this is a relatively smallpart of the competence profile. In order to assess the classroomactivities as well, the school management decided to use theexisting questionnaire on teacher behaviour, the QTI (den Broket al., 2003). Second, senior teachers take on many outside-the-classroom activities, such as innovation projects and coachingyounger teachers. Therefore, the development team formulated acompetence profile for senior teachers with eight competences,specifically, having an added value above the seven nationalcompetences for teachers.

In total, the competence profile for senior teachers thusconsisted of the seven teacher competences developed by thegovernment and the eight competences developed by the schooldevelopment team. The competence profile for senior teachersincludes competences like cooperation, dealing with stress,problem solving and coaching. These were chosen because ofthe fit with the specific school-situation (Berliner, 2005) andbecause literature shows that these competences add towardseducational quality (e.g. Brophy & Good, 1986). Next to this,competences like innovating, learning, anticipating and results-based acting were put into the competence profile because of theinnovative developments within the educational field, like theteacher-researchers creating the opportunity to realize a critical,reflective attitude towards their practice (Zeichner & Noffke, 2001).Personality-related characteristics were not included in thecompetence profile for senior teachers because of the speculativerelation with teaching ability (Damon, 2007). Finally, the meaningof the eight competences was described by specifying a number ofascending levels distinguished within each competence, in order todescribe these as specific as possible. For senior teachers thehighest levels are relevant.

Step 2: specification of criteria and standards

For the seven basic competences for (junior) teachers, quickscans and internet self-assessment tools for teachers are available.The focus of our paper was on the newly developed competencesfor senior teachers. Following, criteria were needed in order tovalidly assess the additional competences of senior teachers(Uhlenbeck, 2002). Assessment is a comparative process whichrequires a frame of reference, with unambiguous definitions ofassessment criteria and standards (Damon, 2007). Therefore,assessment criteria and standards had to be developed, matchingthe eight competences for senior teachers. Standard-settingstudies show that standards are often contingent on the localsituation (Price, 2005) and are always subjective to some extent,for example if they are determined by a group of experts in thedomain and thus rely on human judgement (Norcini & Shea, 1997).This is the case in our study. A way of specifying standards asobjectively as possible, is the use of exemplars and verbaldescriptions (Sadler, 1998). Exemplars are key examples that

Page 5: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–6254

describe the desired level of proficiency and are mostly used forproduct evaluations. Verbal descriptions or qualitative rubrics(Scriven, 1980) describe the properties characterizing the desiredlevel of proficiency. These standards are context-specific and areoften the most feasible to use, especially when multiple criteria areused (Sadler, 1998). A rubric is a scoring tool for qualitative ratingof authentic work and it includes criteria for rating importantdimensions of performance. It describes levels of performance on aparticular task and thereby denotes what is considered importantto both assessors and assessees. For assessors, it helps determinewhat to look for when assessing (Jonsson & Svingby, 2007;Tigelaar, Van Tartwijk, Janssen, Veldman, & Verloop, 2009). Areview of studies investigating the use of scoring rubrics showsthat rubrics can enhance the reliable scoring of performanceassessments, especially if they are analytic and topic-specific(Heldsinger & Humphry, 2013; Jonsson & Svingby, 2007; Panaderoand Jonsson, 2013). Consistent with the Sadler’s (1998) ideas, thisstudy also shows the benefits of exemplars and adds theimportance of rater training. Rubrics, on the other hand, do notautomatically enhance the validity of performance assessments.This requires not only that the content of the rubric adequatelyrepresents the content of the construct to be assessed (in this case,senior teachers’ competences), but also that, for example, themental processes used during the assessment are incorporated.According to Jonsson and Svingby, very few studies on the use ofrubrics provide this kind of validity evidence, which implies thatthe effect of rubrics on the validity of performance assessments isnot clear at the moment. In this study, we decided to describe a setof rubrics for each competence, as research seems to mainly showadvantages of the use of rubrics. The teacher development teamhas described a set of rubrics for each competence defined in thecompetence profile for senior teachers, and they have alsodeveloped the competence profile in the same way. Eventually,based on consensus, the rubrics were fixed for each level (seeTable 1 for the eight competences and the rubrics).

Step 3: determination of the assessment program parts

The competence profile and rubrics were the starting point forthe further development of the assessment program for seniorteachers. The first two steps described the development of thecompetence profile and the rubrics, defining the content (‘what isassessed’). The next, third, step is about the way how this contentcould be assessed. The choice of assessment methods largelydetermines the validity of the assessment process, as the methodsshould adequately measure the construct at stake (Messick, 1995).A single assessment would probably not be sufficient to validlyassess senior teachers’ competences. A mix of methods should beused instead (Baartman et al., 2007; van der Vleuten & Schuwirth,2005), because it reveals additional insights in comparison withone single assessment method, gaining input from qualitative as

Table 2Mix of methods for the assessment of senior teachers’ competences.

Method Target group When

Observation questionnairea Pupils Twice, d

Observation questionnairea Colleagues Twice, d

Questionnaire Senior teacher Once, a

Portfolio development Senior teacher colleagues During

Portfolio assessment External experts At the e

weeks a

Interview External experts At the e

data-an

a Standardized questionnaire (www.ivlos.uu.nl).

well as quantitative data (e.g. Spillane, Pareja, Dorner, Barnes, &May, 2009). Others propose a longitudinal process involvingvarious methods in order to gain a rich picture of teachers’knowledge and performance (Berliner, 2005; Darling-Hammond &Snyder, 2000). Ideally, measurements like observations, interviewsand questionnaires should be repeated over time. The reliability ofassessment can also increase when using different informationsources, like peers, the teachers themselves, management andexperts (Uhlenbeck, 2002). In this study a combination ofassessment methods was chosen in order to combine the strongaspects of the different assessment methods. This study uses anassessment program (Baartman et al., 2007), consisting of a mix ofmethods: (1) observations and questionnaires; (2) interviews; and(3) portfolio assessment. Different groups of stakeholders (man-agement, pupils, colleagues, the senior teachers themselves andtwo experts), were involved in the judgement of the seniorteachers’ competencies in a pilot study of the assessment program.Table 2 summarizes the different methods used in the assessmentprogram. These assessment methods and the rationales forchoosing each method are described more in-depth below.

Observations and questionnaires: background and rationales

Observations are a powerful method to assess teachers’ qualitybecause authentic teacher behaviours can be judged in the in vivocontext (Chen et al., 2011; Landy & Conte, 2013). However,observations by schooled assessors are not very practical and quiteexpensive (Bakx, van der Sanden, Sijtsma, & Taconis, 2002). Pupilsand colleagues can also play a role observing their teachers.Questionnaires can be used in order to judge competences of thepotential senior teachers. The use of questionnaires, when usingtransparent, clear and uni-dimensional items, can be a rich and‘standardized’ method for assessing teachers’ competences (Landy& Conte, 2013). Also, for the judgement of teachers’ behaviour inclass validated questionnaires are often used, like the QTI(questionnaire on teacher interaction) (den Brok, Brekelmans, &Mainhard, 2010; Levy, den Brok, Wubbels, & Brekelmans, 2003;Telli & den Brok, 2012). Using questionnaires helps the observerstuning their perspectives towards certain aspects of the seniorteachers’ competences. Having pupils observe their teachers usinga standardized questionnaire can lead to richer insights inteachers’ classroom behaviour and can reveal aspects which mightother ways remain implicit (Burden, 2010). In order to use pupils’observations as part of the assessment of teachers’ competence, alarge group of pupils is needed to prevent bias (Damon, 2007). Theother competences, like cooperation within the teaching team, canbe validly judged by observations by colleagues and managers. Theobservation of teachers by their colleagues can be seen as ‘peerassessment’. Different studies show benefits of peer assessment,with regard to professional development and reflection by thepeer-assessor as well as the assesse (Sadler & Good, 2006), eventhough observations done by peers always have the problem of

Conditions

uring a period of half a year - Group of pupils (�25)

- More than once (�1)

uring a period of half a year - Group of colleagues (�4)

- More than once (�1)

t the start of their portfolio development –

a period of half a year - Standardized instruction

- Portfolio guidelines

nd of the assessment period/three

fter receiving data and portfolio

- Independent experts

nd of the assessment period, after

alysis of data and portfolio

- Independent experts

Page 6: Jurnal (Studies and Evaluation)

Table 3Number of items and a typical item for the QTI-scales (den Brok et al., 2010).

Scale Number

of items

Typical item

Leadership 10 S/he is a good leader

Helpful/friendly 10 S/he is someone we can depend on

Understanding student 10 If we have something to

say s/he will listen

Responsibility/freedom S/he gives us a lot of free time in class

Uncertain 9 S/he seems uncertain

Dissatisfied 9 S/he is suspicious

Admonishing 9 S/he gets angry

Strict 9 S/he is strict

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–62 55

sympathy of the observed person. That is why at least four peers,working together with the teacher on a regular basis, should beinvolved in order to gain reliable and valid ratings (e.g. Sluijsmans,Brand-Gruwel, & Van Merrienboer, 2002).

Practice of observations by pupils in the pilot study

For the observation by pupils, a standard questionnaire wasused, measuring pupils’ perceptions of interpersonal teacherbehaviour (Wubbels, Brekelmans, & Hooymayers, 1991). Thisquestionnaire assesses teacher’s interpersonal behaviour, usingnine aspects. Table 3 presents the scales, the number of items and atypical item for each QTI-scale (den Brok et al., 2003). These nineaspects are related to the competencies from the seven basiccompetencies (especially interpersonal competence and pedagogiccompetence) as well to newly defined competencies (especiallyflexibility/anticipating, dealing with stress and problem solving).For each senior teacher being assessed in the new assessmentprogram (eight teachers in total), 20 pupils carried out theobservations and filled out the QTI-questionnaire. The QTI-questionnaire (Wubbels et al., 1991) could be filled out by thepupils anonymously. Therefore, during twelve weeks, the pupilsobserved their teachers. After this period they completed thequestionnaire online in the computer-classroom under guidance ofan ICT-assistant for the first time. This was repeated half a yearlater.

Practice of observations by colleagues and management in the pilot

study

The competence profile for senior teachers together with therubrics was transformed into questionnaires by the researchers.This questionnaire used a 4-point rating scale, varying from ‘mysenior colleague does this (1) almost never to (4) very often’. Theteacher colleagues and school management used this question-naire in order to rate the competences of the senior teacher whomthey observed. Four different colleagues and one managerobserved one senior teacher. The colleagues completed thequestionnaire twice, with an interval of six months.

Interviews: background and rationales

Next to observations and questionnaires, interviews are a validmethod for the judgement of competence, especially in combina-tion with other methods and information (Landy & Conte, 2013;Schmidt & Hunter, 1998). It is important that at least two experts,not attached to the school in any other way, conduct the interviews(van der Schaaf et al., 2005). The inclusion of two experts instead ofone increases reliability (Murhpy & Davidshofer, 1994) and issufficient to produce acceptable levels of inter-rater agreement(Marzano, 2003).

Practice of interviews in the pilot study

In the pilot study two experts from the teacher-training collegewere hired as external assessors. Based on an analysis of all dataavailable, being the questionnaires of pupils, colleagues and

management and senior teachers’ portfolio’s (as described below),these experts could hold interviews with each individual seniorteacher in order to judge their level of competences. The individualinterviews took about 1.5 h each. The interviews were explicitlydirected on (1) the confirmation of evidence for ‘already provencompetences’ from the materials analysed; and (2) on the furtherexploration of competences which were ‘unclear’, or not yetproven sufficiently.

Portfolio assessment: background and rationales

By constructing a portfolio, the senior teachers themselvescould be actively involved in the assessment. In this portfolio theydescribed and reflected on their own strengths and weaknesses(e.g., van der Schaaf et al., 2005). The process of working on thisportfolio asks for reflection and introspection. Research shows thatthe use of rubrics leads to learning (Boud, 1995; Jonsson & Svingby,2007) and possibly even to a professional growth (Hatton &Schmith, 1995). For this to happen, the criteria, format andguidelines for the portfolio should be transparent and clear (Linn,Baker, & Dunbar, 1991; van der Schaaf & Stokking, 2008). Fromother studies it is known that asking colleagues for feedback oncompetences, can lead to learning experiences as well (Hattie &Timperley, 2007).

Practice of portfolio assessment in the pilot study

For the pilot study, a portfolio manual was written by theexternal experts, containing guidelines, fill-in factsheets, and thecompetence profile for senior teachers with rubrics, as being thecriteria the teachers should judge themselves on. The teachers hadto prove their competences in at least two different situations.These two situations should be additional to what ‘already could beknown’ about the teachers from the other measurements. Next,these situations should be described in detail, so the experts wouldbe able to visualize the situation and ask specific (check)questionson this during the interview, and colleagues would be able to writefeedback (or a specific addition) regarding the situations described.The addition of written feedback by at least two colleagues for eachsituation described was put into the portfolio. As mentionedearlier, the entire assessment program should offer learningopportunities for the people involved, because it is an expensiveand time consuming process all together. Indeed, the completeassessment should not only result in a judgement, but it shouldalso have developmental possibilities for the teacher during andafter the assessment. For the portfolio, the teachers completed the‘pupil questionnaire’ on interpersonal teacher behaviour at thestart of the pilot study assessing the way they viewed their ownbehaviour with pupils in the class. Next, the senior teachers provedthe eight competences by completing an ‘evidence form’ (what isyour evidence for this competence, why is this convincing). Thisevidence could be a video from a good practice, a series of lessons, amanual they developed and so on. Together with this evidence,written feedback from colleagues was added. For writing thisfeedback, the teacher’s colleague(s) used the rubrics from thecompetence profile.

Step 4: pilot study

As introduced above, a pilot study was organized to evaluatethe working of the assessment program in practice and theacceptance of the program within the school team. Validity andreliability are the most widely used quality criteria for assessment,but just these two criteria are not sufficient when it comes toassessing competence (see also Baartman et al., 2007). Severalauthors have proposed other or complementary quality criteria,focusing for example on the meaningfulness of the assessment forlearning or the quality of the feedback it provides (e.g., Baartman

Page 7: Jurnal (Studies and Evaluation)

Table 4Twelve quality criteria for competence assessment programs (Baartman et al., 2007,

p. 261).

1. AcceptabilityAll stakeholders should approve of the assessment methods, criteria

and standards

2. AuthenticityThe degree of resemblance of the assessment to the (future) workplace

3. Cognitive complexityThe assessment should reflect the presence of the cognitive skills needed and

should enable the judgement of thinking processes

4. ComparabilityThe assessment should be conducted in a consistent and responsible way.

The tasks, criteria and working conditions should be consistent with regard

to key features of interest

5. Costs and efficiencyThe time and resources needed to develop and carry out the assessment,

compared to the benefits

6. Educational consequencesThe degree to which the assessment yields positive effects on learning and

instruction and the degree to which negative effects are minimized

7. FairnessTeachers should get a fair change to demonstrate their competences, by

letting them express themselves in different ways and making sure the

assessors do not show biases

8. Fitness for purposeThe assessment methods, criteria and standards should be compatible with

the construct to be measured

9. Fitness for self-assessmentThe assessment should stimulate self-regulated learning by fostering self-

assessment and the formulation of learning goals

10. MeaningfulnessThe assessment should be a learning opportunity and provide valuable

feedback for further learning

11. Reproducibility of decisionsDecisions made based on the results of the assessment should based on

multiple situations and assessors. Decisions should not depend on one

assessor or specific situation

12. TransparencyThe assessment, criteria and standards should be clear and understandable

to all stakeholders

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–6256

et al., 2007; Linn et al., 1991). Quantitative measures of quality areoften not available for these kinds of assessment programs,necessitating other operationalizations of validity and reliability(Baartman et al., 2007). The assessment program described in thispaper was evaluated by using 12 quality criteria for thedetermination of the quality of competence assessment programs.Of course, the assessment parts should be valid, reliable andobjective (traditional criteria). The rationale behind using the 12new criteria is that competence assessment consists of both moretraditional and new forms of assessment, and as a consequence,both traditional and new quality criteria are needed to evaluate thequality of the assessment. Table 4 presents a description of thequality criteria used in this study (Baartman et al., 2007, p. 261).

Method

The study presented in this paper describes pilot study of theassessment program, including an evaluation of the assessmentprogram as well.

Instruments

The competence profile for senior teachers was transformedinto a questionnaire, as described above and was used by the peerteachers, the management and the teachers themselves. Next, astandardized questionnaire on teacher behaviour was used, theQTI (questionnaire on teacher interaction) by the pupils and thesenior teachers themselves. This questionnaire is a validated andreliable instrument used in many other (international) studies

already (den Brok et al., 2010; Levy et al., 2003; Telli & den Brok,2012). The scales and numbers of items of the QTI are presented inTable 3.

Evaluation instruments of the (perceived) quality of the assessment

program

To evaluate the quality of the entire assessment program the 12quality criteria of Baartman et al. (2007) were used (Table 4presents the categories). In a previous study, these quality criteriawere specified into 4–6 indicators per quality criterion (Baartmanet al., 2007), which were used as questions in a questionnaire inthis study. The participating senior teachers and their colleaguesjudged the quality of the assessment program on a 10-point Likertscale. The pupils could fill in four of the twelve quality criteria: (1)fitness for purpose; (2) transparency; (3) fairness; and (4) (costsand) efficiency. These four were chosen, because these are the mostvisible for the pupils, for example, if the criteria were clear to themand if they thought the criteria represented their opinion of a goodteacher. Next, four questions were added to the pupils’ question-naire in order to receive information about the pupils’ perspectiveon the usefulness of the assessment program.

Participants

Eight senior teachers participated as ‘assesses’ in the pilot studyof the assessment program: six men and two women. The age ofthe teachers varied between about 30 years and 63 years of age. Allteachers taught a subject like maths or languages, and one teachertaught physical exercise and had managerial tasks next to herteaching tasks. All teachers had gained at least five years ofteaching experience. In the evaluative part of the pilot study, whichwas not obligatory, seven out of the eight senior teacherscompleted the evaluation questionnaire of the quality of theassessment program.

For each participating senior teacher, pupils of two classes ratedtheir teachers. They carried out observations and filled out the QTI-questionnaire. In total 170 pupils participated. The pupils varied inage between 14 and 17 years old. Participation of the pupil groupswas obligatory, so the response rate was close to 100%. 70 out ofthe 170 pupils also completed the evaluation questionnaire on avoluntary basis.

Four different colleagues observed each single senior teacher;in total 32 teachers participated in the new assessment program asobservers and 16 other teachers helped providing writtenfeedback. In total 48 teachers were involved in the assessmentprogram of their eight colleagues. Only 4 out of the 48 peerteachers completed the evaluation of the quality of the assessmentprogram. This difference in participation between the pilot studyand the evaluative part of the pilot study might be due to theperiod of the year (at the end of the second semester just before thesummer holidays) and the fact that peers and pupils were invitedto participate on a voluntarily basis.

Data analyses

For the assessment of the senior teachers’ competence,available data were (1) results of pupils on the QTI-questionnaire,together with the scores of the senior teachers themselves on theQTI-scales; (2) the scores of the questionnaires on the competenceprofile for senior teachers, completed by the colleagues; and (3)portfolios of the senior teachers including the feedback by peercolleagues. The questionnaires from the colleagues were analysedby computing mean scores per competence (varying between 1and 4) that subsequently were computed into percentages,indicating on a 100% scale how often the teacher showed a

Page 8: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–62 57

specific competence. In order to make a final judgement on thesenior teachers’ competences, the two experts used the followingcriteria: (1) results on the pupils’ questionnaire should be positive,being in line with the (national) norms of the QTI showing no largenegative differences with the national average scores; (2) theresults from the peer teachers’ questionnaires should at least havescores of 60% for each competence, being a positive result; (3) theevidence for each competence as included in the portfolio shouldbe valid, reliable and convincing according to the two experts. For apositive judgement, teachers should score positively on all threeparts.

For the assessment of the quality of the assessment program,available data were the results of the pupils’ evaluationquestionnaires and results on the evaluation questionnaire filledin by the senior teachers themselves and their peers. Quantitativedata from the evaluation questionnaire were available from 70pupils, seven of the eight senior teachers and four colleagues.Means and standard deviations were calculated.

Evaluation of the assessment: results

The first results concern the pilot study of the assessmentprogram, describing the assessment of the competences as well asthe provision of the opportunity for self-reflection. Second, theevaluation of the assessment program is presented.

Pilot study of the assessment program

Senior teachers’ competences

The separate parts of the assessment program, the observations,questionnaires, portfolios and interviews all pointed out towardsthe same direction: a senior teacher does or does not show thecompetences as formulated in the competence profile for seniorteachers. In none of the cases, the results of the different parts ofthe assessment program contrasted each other. Table 5 presentsthe overall results of the teachers on the separate parts of theassessment.

Five teachers proved to be competent for the senior role asdescribed in the competence profile for senior teachers. The twoassessors judged this independently from each other based on theportfolio assessment. This positive judgement was confirmed bythe interview with these teachers. In contrast, two other teachershad not been able to prove their competences by means of theirportfolios; the judgement by the experts was ‘doubtful’. However,these two teachers were capable of adding extra material duringthe assessment interview by providing extra information on theevidence provided and by adding ‘critical incidents’ during theinterview. Eventually, the interview turned their judgements into

Table 5Results on different parts of the assessment of senior teachers’ competences.

Method Target group Overall results

Observation questionnaire T1a Pupils Positive result

Observation questionnaire T2b Pupils Positive result

Negative resu

Observation questionnaire T1a Colleagues Positive result

Negative resu

Observation questionnaire T2b Colleagues Positive result

Negative resu

Questionnaire (self) Senior teacher Positive judge

Portfolio assessment External experts Convincing ev

Additional evi

Insufficient ev

Interview External experts Positive on all

One interview

a T1 = first measurement.b T2 = second measurement (6 months after first measurement).

positive ones. For the last teacher, the portfolio was insufficientand even if there would have been an interview with additionalmaterials, it could not have led to a positive judgement. Therefore,the assessment interview was cancelled and this teacher wasrequested to construct a new portfolio. In an evaluation interview,the eight teachers, even the one without a positive judgement,stated that they recognized the advice and judgement.

Opportunity for reflection

Working with portfolios regarding professional development isconsidered valuable when there is a dialogical context. Thisdialogical context was created by having the senior teachers asktheir peers for written feedback. Next, an interview was held withthe senior teacher and two experts. All teachers stated that theentire process helped them reflect on their profession, theirbehaviour and their actions undertaken. Seven out of the eightsenior teachers told the experts that it was a developmentalprocess for them to work on the portfolio because of the gatheringof evidence proving the competence, reflecting on the compe-tences and writing down, asking peers for feedback and discussingthis with the experts. One senior teacher, who did not receive apositive judgement, did not agree with the other teachers on this.He stated that the assessment program also judged the way onecould build up a portfolio and use one writing skills, and not onlythe senior teachers’ competences.

Evaluation of the assessment program

Mean scores of the evaluation of the quality of the assessmentprogram as judged by the teachers and peer teachers on a 1–10scale are presented in Table 6. The criterion ‘acceptability’ (i.e. ‘‘allstakeholders should approve of the assessment methods, criteriaand standards’’) showed the lowest score. The teachers who hadbeen assessed, as well as the teachers who participated in the peerassessment, did not completely support the assessment programused (teachers M = 5.67, colleagues M = 4.75). Especially theteachers who participated in the peer assessment reported lowscores on the acceptance of this method. The criterion ‘fairness’also showed low scores within both groups (teachers M = 5.62,colleagues M = 5.51). This criterion comprises questions like ‘‘doyou think the assessment is fair’’ and ‘‘are the assessorsunprejudiced’’. The assessed teachers also reported low scoreson the criterion ‘educational consequences’. They stated that thisassessment program did not really influence their professionalbehaviour (M = 5.82). The peer assessors on the other hand statedthat participation in the assessment program did influence theteachers’ professional behaviour (M = 8.88). Next, the assessedteachers reported that the assessment program was suitable forself-reflection (M = 7.43), which was part of the aim of the

of the senior teachers (N = 8)

s for all 8 teachers

s for 7 teachers, showing even better results than in T1

lts for 1 teacher, compared to T1

s for 7 teachers

lts for 1 teacher

s for 7 teachers, showing even better results than in T1

lts for 1 teacher, the same as in T1

ment of self by all 8 teachers

idence for 5 teachers

dence on two competences needed for two teachers

idence for 1 teacher

competences for 7 teachers

was postponed; additional evidence was needed first (judgement: insufficient)

Page 9: Jurnal (Studies and Evaluation)

Table 6Mean scores and standard deviations (SD) on the 12 quality criteria for assessed teachers and peer assessors (1–10 scale).

Criteria Teachers (N = 7) Peer assessors (N = 4) Number of items

Mean SD Mean SD

Acceptability 5.67 2.61 4.75 2.99 3

Authenticity 7.14 1.80 6.25 1.50 2

Cognitive complexity 6.69 1.79 6.73 1.72 5

Comparability 6.92 2.50 6.88 3.01 4

Costs and efficiency 5.87 1.88 – – 6

Educational consequences 5.82 2.93 8.88 0.88 4

Fairness 5.62 0.83 5.51 2.28 6

Fitness for purpose 7.30 1.06 6.65 2.29 6

Fitness for self-assessment 7.43 1.47 6.00 1.75 4

Meaningfulness 6.32 2.50 6.25 2.54 4

Reproducibility of decisions 7.38 1.16 7.75 1.77 6

Transparency 6.57 1.17 6.75 1.52 3

Table 7Pupils’ means scores and standard deviations (SD) on four quality criteria (N = 70, 1–10 scale).

Criterion Mean SD Number of items

Fitness for purpose 7.32 2.21 5

Transparency 7.75 1.62 1

Fairness 8.16 1.48 2

(Costs and) efficiency 8.04 1.52 2

Additional questions

A good assessment with pupils’ participation 8.21 1.63 3

Adequate questions about my teacher 7.18 1.61 1

Questionnaire is a way of giving feedback 7.11 1.89 1

This assessment leads to a change in behaviour of my teacher 3.35 2.86 1

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–6258

assessment program. Especially the portfolio was designed tostimulate the teachers to reflect on their own competence. Alleleven teachers (7 senior teachers and 4 peer teachers) reportedthat the assessment program led to reproducible judgements anddecisions (resp. M = 7.38; M = 7.75), which is a measure ofreliability. Another measure of reliability is ‘comparability’, whichis the use of comparable methods, criteria and standards for allassesses. According to the assessed teachers and the peer teachers,the assessment program was indeed comparable (M = 6.92;M = 6.88 respectively). The (peer) teachers also reported that theassessment program was suitable for the aim set (‘fitness forpurpose’: teachers M = 7.30, peers M = 6.65), which was judgingwhether a teacher was a real senior teacher having all competencesdescribed.

Table 7 reports the scores of the evaluation of the assessmentprogram by the pupils. In total, 70 pupils filled out the evaluationquestionnaire in which a 10 point scale was used. All four criteriameasured showed high scores (see Table 7). The pupils reportedthat they understood the questionnaire about their teachers’interpersonal behaviour and that they understood the goal of it,namely to judge the ‘best’ teachers in school for senior positions,also giving the pupils a voice in this. The assessment program (asfar as the pupils participate in it) was fair and transparentaccording to the pupils.

The pupils reported that they appreciated the fact that theycould participate in the assessment and give their judgement of theteacher (M = 8.21). They stated this is a way of giving feedback totheir teachers (M = 7.11). Pupils did not perceive changes inteachers’ behaviour (M = 3.34).

Conclusion and discussion

The study presented in this paper focused on the developmentof an assessment program for senior teachers, while providing theopportunity for self-reflection by the senior teachers. The

development process contained four steps. The first step con-cerned determining the content of the competences to be assessed.In the second step specification of criteria and standards wasundertaken and in the third step methods were chosen for carryingout the assessment program. The assessment program wasimplemented in a pilot study, assessing eight senior teachers.

Theoretical frameworks on good teachers do not present onespecific view on good teachers (Berliner, 2001; Fenstermacher &Richardson, 2005). Therefore, three theoretical perspectives ongood teaching can be recognized in the final competence profile forsenior teachers’ competences. The profile included aspects from (1)perception studies of ideal teaching, including learning environ-ment research (Allen & Fraser, 2007); (2) effectiveness research(e.g. Seidel & Shavelson, 2007); and (3) studies on teachers’professional knowledge (e.g. Berliner, 2004; Darling-Hammond &Snyder, 2000; Verloop, 2005). The literature on good teachers waspresented to the development team and the team also used theirown (literature) resources from e.g. professional developmentprograms. A specific aim was that the school team would recognizethe new assessment program and that there would be a strongcommitment towards using it. As a consequence, a school-specificcompetence profile was developed by the school’s developmentteam. This is a rather eclectic approach, using competences fittingto the specific school context, mostly chosen bottom up. Berliner(2005) described the importance of taking specific demands of theschool environment into account. The school management agreeda rather eclectic approach in choosing the competences, in order tocreate a larger commitment of the team.

The assessment program had two goals: judgement of seniorteachers’ competences and creating an opportunity for reflectionon their competences by the senior teachers participating. Thesenior teachers stated that the assessment program did not reallyinfluence their professional behaviour, but they recognized thepossible influence of the assessment program on their professionalbehaviour as teachers. They mentioned the possibility to reflect on

Page 10: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–62 59

their own competence development while working on theirportfolio. This forced them to make their competences explicit.With this reflection function, the assessment program – althoughhaving a specific summative goal – had a formative purpose as well(Hickey, Zuiker, Taasoobshirazi, Schafer, & Michael, 2006).

Opportunity for reflection

A portfolio can play an important role in the professionaldevelopment of teachers, and not only in case of an (external)judgement. Working with portfolios regarding professionaldevelopment is considered valuable when there is a dialogicalcontext. If so, a portfolio can be considered an ‘assessment toolfor learning’. However, when there are no reflective discussionsregarding the portfolio, it is less valuable (Mittendorff, Jochems,Meijers, & den Brok, 2008). This might be the case in ourassessment program: only one interview was conducted inwhich the teacher could talk about and explain his or hercontribution in the portfolio. It is not unlikely that extrinsicmotivation played a role considering the portfolio assignment.The teachers completed their portfolios in order to gain anotherposition/salary in the school. The participating teachers statedthat the process of working on their portfolios is a very goodway of self-reflection (see also Boud, 1995). However, the othercolleagues were not convinced of this; they stated that workingon a portfolio might contribute to self-reflection, but that thisdoes not has to be the case for everyone. In order to achieve anactual and long-lasting effect on teacher behaviour andprofessionalization, teacher assessment should be integratedinto a larger personnel evaluation system. This could be done byhaving all teachers work on a professional portfolio, describingtheir professional development and reflecting on their profes-sional identity as a teacher (Beijaard et al., 2004). This portfoliocan then be the base of the annual dialogue of the teacher andthe management.

Indeed, a pilot study of the assessment program showed that agood view of senior teachers’ competences, as described in thisstudy, seemed to be gained in this way. Seven out of eight seniorteachers who were subjected to the assessment indeed demon-strated the specified senior teachers’ competences. The results ofthe different methods within the assessment program all pointedin the same direction. Multiple different methods, assessors andpieces of evidence were used to demonstrate teacher compe-tence, assuring triangulation of methods and assessors. The factthat observations, questionnaires, interviews and portfolios allpointed in the same direction (a positive or negative judgementof the senior competence) is a first starting point for thedetermination of construct validity (Murhpy and Davidshofer,1994). However, in the pilot study a rather small, selective groupof teachers participated. Half of them participated in thedevelopment team of the competence profile as well, whichmight have influenced their views on the assessment program ina positive way. The fact that the eight ‘best’ teachers wereselected by the school management for participating in the pilotstudy might be of some influence of the (positive) findings. Inorder to validate the assessment program further on, it could beimplemented in other, comparable schools, who did notparticipate in the development process, but who also need asystem for personal management and high stake assessments asrequired by the Dutch government.

It remains an interesting question whether these kinds ofassessment tools should be theoretically driven, practically driven,or could be a combination of both, as in our study. We assume thatthis combination works best in order to gain a valid competenceprofile as well as an increased commitment of the team. Whileworking mainly theoretically driven might lead towards a ‘not

invented here-problem’, ending up with a rejection of theassessment program the school team. By having the best teachersparticipate in the development team and the input of literature, acertain ‘gathering of competences’ should have been prevented,but this is possible weakness of our study. This way of workingmight be interesting for others to try out and could be interestingas a subject for further research as well. It might also be interestingfor future research to compare our approach to more theoreticallydriven approaches while assessing teachers, taking the acceptanceand commitment towards the assessment program of the schoolteam in mind.

Evaluation of the assessment program

The judgements of the eight senior teachers were recognizedand accepted by the teachers assessed as well as by their peerteachers. The development process was carried out with the foursteps by the school development team, in order to create a largecommitment of the school team towards this new assessmentprogram. As a consequence, the pilot study with the firstimplementation of the assessment program was evaluated. Thisevaluation needs to be interpreted with some caution because onlyfour peer assessors (of the 48 peer assessors participating in theassessment program) and 70 pupils (of 170 in total) participated inthe evaluative part of this study. This difference in participationmight be due to the period of the year (at the end of the secondsemester just before the summer holidays) and the fact that peersand pupils were invited to participate on a voluntarily basis. Schoolmanagement reported that it was too busy that time of year to gaina higher response.

The participating teachers (teachers assessed and the peers)reported that the assessment program was suitable for providing acompetence judgement, but that the acceptance of the assessmentmethods was rather low and that the commitment towards thisway of competence assessment was not very high. This could bepartly due to the fact that the assessment methods used were quitetime-consuming. Teachers who judged their colleagues investedtime and effort, but did not perceive their participation as usefulfor themselves. Indeed, peer assessment can be valuable for bothparties (teacher and peer), but dialogue and exchange areimportant conditions for learning from each other in school teams(Doppenberg, den Brok, & Bakx, 2012). However, in order to reachthis dual learning effect for teachers assessed as well as their peers,specific goals on this should be set, explained and guided. This wasnot done in the pilot study, but can be a valuable suggestion forothers who would use a comparable assessment program. Eventhough the findings of the study do not support some of theoutcomes hoped for (in particular the development of a summativeassessment program for senior teachers), it does seem to offer asound methodology for the development of such a program withineach school context. If all teachers in a school site utilized themethod for developing the competencies, then acceptance of theoutcomes of the evaluation might be more readily embraced by theteachers.

For additional research, it is interesting to gain more insightsinto the possible psychological rationales in order to find out whythe teachers resist supporting the assessment program. For thispurpose, another questionnaire could be developed, investigatingthe possible psychological causes of resistance. Especially openended questions could be useful in order to do so. This mayfacilitate an understanding of why teachers did not quite acceptthis assessment method. This might help to refine the program forthe future if the underlying causes for their resistance areidentified. From other studies on educational innovation, it isknown that teachers are more positive when given ownership,agency and logical sense-making (Ketelaar et al., 2012). In order to

Page 11: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–6260

create acceptance for the assessment program, a large group ofteachers and the management were brought together both tocreate a valid profile as well as a commitment for the use of thiscompetence profile for senior teachers as the underlying basis forthe assessment program. However, not all teachers were involvedin the development process and the teachers had not been involvedin choosing the assessment methods. Especially these methods(portfolio, observation, questionnaires and interview) were time-consuming and involved many peers and pupils in order toestablish one valid judgement of a senior teacher’s competences.The low acceptance could be due to the large amount of time, effortand money spent on the assessment of relatively few people(Wiliam, Lee, Harrison, & Black, 2004). The pupils were morepositive in this respect. They understood and recognized their partof the assessment program and appreciated the possibility to givefeedback to their teachers. They appreciated the fact that theiropinion was asked for and perceived the assessment program asfair, transparent and suitable for the purpose of assessing seniorteachers’ competence. The difference between teachers’ andpupils’ judgements may be due to the fact that pupils’ voiceswere heard and that they had a serious role in the assessment oftheir teachers, which was not a role that pupils commonly get.Because of the anonymity of the methods, the pupils could behonest about their opinions of their teachers, without fearingnegative consequences.

A question that remains difficult to answer is whethercolleagues are capable of judging each other objectively whenit considers a summative assessment. The participating peerswere in no way dependent on each other in the teaching team, orconnected in a hierarchical relation. However, it is possible thatcolleagues who like each other, judge each other more positively.Observations done by peers have the problem of ‘sympathy of theobserved person’. This is especially important when it concerns asummative assessment with salary consequences, like in thisstudy. The same question that can be asked is whether pupils arecapable of judging their teachers? In order to reduce the possiblebias by colleagues and pupils, three actions were undertaken: (1)judgements were carried out anonymously; (2) relatively largegroups of pupils and peers were asked to participate in theassessment (Damon, 2007); and (3) external assessors wereincluded in the assessment program. The assessed teachers wereasked to add ‘evidence’ to their portfolios to prove all eightcompetences which were analysed by the external assessors and,together with the results from the pupil questionnaire, thequestionnaires of the colleagues and the portfolio, formed thebasis for the final individual interview. The expert judgement andthe peer judgement produced comparable results, which is a firstindication that peer teachers and pupils could play a role in thesummative assessment of their colleagues. However, moreresearch is needed in this respect, for example by comparingthe judgements of colleagues who like or dislike each other.For formative purposes, these reliability issues are less of anissue. In case of assessments for formative reasons, colleaguescould judge one another as a starting point for professionaldevelopment and intervision. The organization could create aculture in which collegial consultation and reflection onprofessional behaviour are generally accepted and appreciatedand in which learning from each other is a central goal(Doppenberg et al., 2012).

Summarizing, the assessment program for senior teachersshowed that competences of working teachers can be assessedduring their teaching career, with this program. First indications ofvalidity and reliability were positive, but the acceptance of theprogram by the school team was rather low and should beinvestigated further. Future steps to be undertaken in order toimprove the assessment program should contain the validation of

the competence profile, acceptance of the assessment program and apossibility to lower costs and time-investments. A further validationof the competence profile could be done by external experts andteachers, combining a theoretical approach and practical input. Thiscould possibly improve and specify the competence profile further.Next, in the school conditions for learning from each other and usingportfolios as a means for professional development, could contributetowards a decrease of time and effort, because a portfolio would thenbe a growing document all teachers already have. This portfoliowould then be the base of the annual dialogue between teacher andschool management. When a teacher would be selected toparticipate in the assessment program, then only an addition inthe portfolio with peer feedback would be needed. Indeed, thisrequires a change in school culture, directed at a learningorganization. Our study provides a first direction for the develop-ment of an assessment program that would fit in a learningorganization.

References

Allen, D., & Fraser, B. J. (2007). Parent and student perceptions of classroom learningenvironment and its association with student outcomes. Learning EnvironmentsResearch, 10(1), 67–82.

Baartman, L. K. J., Prins, F. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2007).Determining the quality of competence assessment programs: A self-evaluationprocedure. Studies in Educational Evaluation, 33, 258–281.

Bakx, A.W.E.A., van der Sanden, J. M. M., Sijtsma, K., & Taconis, R. (2002). Developmentand evaluation of a student-centred multimedia self-assessment instrument forsocial-communicative competence. Instructional Science, 30, 335–359.

Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., et al. (2010).Teachers’ mathematical knowledge, cognitive activation in the classroom, andstudent progress. American Educational Research Journal, 47(1), 133–180.

Beijaard, D., Verloop, N., & Vermunt, J. D. (2000). Teachers’ perceptions of professionalidentity: An exploratory study from a personal knowledge perspective. Teachingand Teacher Education, 16(7), 749–764.

Beijaard, D., Meijer, P. C., & Verloop, N. (2004). Reconsidering research on teachers’professional identity. Teaching and Teacher Education, 20, 107–128.

Beishuizen, J. J., Hof, E., van Putten, C. M., Bouwmeester, S., & Asscher, J. J. (2001).Students’ and teachers’ cognitions about good teachers. British Journal of Educa-tional Psychology, 71(2), 185–201.

Berliner, D. (2001). Learning about and learning from expert teachers. InternationalJournal of Educational Research, 35, 463–482.

Berliner, D. (2004). Describing the behaviour and documenting the accomplishmentsof expert teachers. Bulletin of Science Technology and Society, 25, 1–13.

Berliner, D. (2005). The near impossibility of testing for teacher quality. Journal ofTeacher Education, 56(3), 205–213.

Borko, H., Whitcomb, J., & Liston, D. (2009). Wicked problems and other thoughts onissues of technology in teacher learning. Journal of Teacher Education, 60(1), 3–7.

Boud, D. (1995). Enhancing learning through self assessment. London/New York: Rou-tledge Falmer Taylor & Francis Group.

Brophy, J., & Good, T. (1986). Teacher behaviour and student achievement. In M.Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 328–375). New York:Macmillan.

Burden, P. (2010). Creating confusion or creative evaluation? The use of studentevaluation of teaching surveys in Japanese tertiary education. Educational Assess-ment, Evaluation and Accountability, 22(2), 97–117.

Chen, W., Mason, S., Staniszewski, C., Upton, A., & Valley, M. (2011). Assessing thequality of teachers’ teaching practices. Educational Assessment, Evaluation andAccountability, 24(1), 25–41.

Clausen, M., Reusser, K., & Klieme, E. (2003). Unterrichtsqualitat auf der Basis hoch-inferenter Unterrichtsbeurteilungen: Ein Vergleich zwischen Deutschland und derdeutschsprachigen Schweiz [Quality of instruction based on high-inference anal-ysis of lessons: A comparison between Germany and German speakingSwitzerland]. Unterrichtswissenschaft, 31, 122–141.

Damon, W. (2007). Dispositions and teacher assessment: The need for more rigorousdefinition. Journal of Teacher Education, 58(5), 365–369.

Darling-Hammond, L. (1999). Teacher quality and student achievement: A review of statepolicy evidence. Seattle: University of Washington, Center for Teaching and Policy.

Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment in teaching in context.Teaching and Teacher Education, 16, 523–545.

Day, C., Sammons, P., Stobart, G., Kington, A., & Gu, Q. (2007). Teachers matter:Connecting lives, work and effectiveness. Berkshire: Open University Press.

den Brok, P. J., Fisher, D., Brekelmans, J. M. G., Rickards, T., Wubbels, Th., Levy, J., et al.(2003). Students’ perceptions of secondary science teachers’ interpersonal style insix countries: A study on the cross national validity of the Questionnaire onTeacher Interaction. NARST annual meeting 2003 March 23–26, Philadelphia. Phila-delphia: NARST.

den Brok, P. J., Brekelmans, J. M. G., & Wubbels, Th. (2004). Interpersonal teacherbehaviour and student outcomes. School Effectiveness and School Improvement,15(3), 407–442.

Page 12: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–62 61

den Brok, P. J., Brekelmans, J. M. G., & Mainhard, T. (2010). The effect of students’perceptions of their teachers’ interpersonal behaviour on their educational out-comes: A meta-analysis of research with the Questionnaire on Teacher Interaction(QTI). In Th. Wubbels, P. den Brok, J. van Tartwijk, J. Levy, & B. Fraser (Eds.),International conference on interpersonal relationships in education (pp. 21–)Eind-hoven: TUe-UU-LU.

Doppenberg, J., Den Brok, P., & Bakx, A. (2012). Collaborative teacher learning acrossfoci of collaboration: Perceived activities and outcomes. Teachers and TeacherEducation, 28(6), 899–910.

Drucker, P. F. (1954). The Practice of Management. New York: Harper.Elbaz, F. (1991). Research on teacher’s knowledge: The evolution of a discourse. Journal

of Curriculum Studies, 29(1), 1–19.Englert, C. S., & Tarrant, K. L. (1995). Creating collaborative cultures for educational

change. Remedial and special education, 16(6), 325–336.Fenstermacher, G. D., & Richardson, V. (2005). On making determinations of

quality in teaching. Revision of paper presented for the Board of InternationalComparative Studies in Education of the National Academies of Science and theNational Research Council, Washington, DC. Teachers College Record, 107(1),186–213.

Gage, N. L. (1964). Theories of teaching. In E. R. Hilgard (Ed.), Theories of learning andinstruction: Sixty-third yearbook, Part I: National Society for the Study of Education(pp. 268–285). Chicago: University of Chicago Press.

Hamacheck, D. (1969). Characteristics of good teachers and implications for teachereducation. The Phi Delta Kappan, 50(6), 341–345.

Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating toachievement. London: Taylor & Francis.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research,77(1), 81–112.

Hatton, N., & Schmith, D. (1995). Reflection in teacher education: Towards a definitionand implementation. Teacher and Teacher Education, 11(1), 33–49.

Hegender, H. (2010). The assessment of student teachers’ academic and professionalknowledge in school-based teacher education. Scandinavian Journal of EducationalResearch, 54(2), 151–171.

Heldsinger, S. A., & Humphry, S. M. (2013). Using calibrated exemplars in theteacher-assessment of writing: An empirical study. Educational Research,55(3), 219–235.

Hickey, D. T., Zuiker, S. J., Taasoobshirazi, G., Schafer, N. J., & Michael, M. A. (2006).Balancing varied assessment functions to attain systemic validity: Three is themagic number. Studies in Educational Evaluation, 32(3), 180–201.

Hill, H. C., Rowan, B., & Loewenberg Ball, D. (2005). Effects of teachers’ mathematicalknowledge for teaching on student achievement. American Educational ResearchJournal, 42(2), 371–406.

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity andeducational consequences. Educational Research Review, 2(2), 130–144.

Ketelaar, E., Beijaard, D., Boshuizen, H., & den Brok, P. J. (2012). Teachers’ positioningtowards an educational innovation in the light of ownership, sense-making andagency. Teaching and Teacher Education, 28(2), 273–282.

Kleickmann, T., Richter, D., Kunter, M., Elsner, J., Besser, M., Krauss, S., et al. (2013).Teachers’ content knowledge and pedagogical content knowledge. The role ofstructural differences in teacher education. Journal of Teacher Education, 64(1), 90–106.

Kutnick, P., & Vena, J. (1993). Students’ perceptions of a good teacher: A developmentalperspective from Trinidad and Tobago. British Journal of Educational Psychology,63(3), 400–413.

Landy, F. J., & Conte, J. M. (2013). Work in the 21st century: An introduction to industrialand organizational psychology (4th ed.). Hoboken, NJ: Wiley.

Levy, J., den Brok, P., Wubbels, T., & Brekelmans, M. (2003). Students’ perceptions ofinterpersonal aspects of the learning environment. Learning Environments Research,6(1), 5–36.

Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assess-ment: Expectations and validation criteria. Educational Researcher, 20(8),15–21.

Mangiante, E. M. S. (2011). Teachers matter: Measures of teacher effectiveness in low-income minority schools. Educational Assessment, Evaluation and Accountability,23(1), 41–63.

Marzano, R. J. (2003). What works in schools. Translating research into action. Alexandria,VA: Association for Supervision and Curriculum Development.

Marzano, R. J. (2011). De kunst en wetenschap van het lesgeven. Een evidence-baseddenkkader voor goed, opbrengstgericht onderwijs. [The art and science of teaching.An evidence-based perspective for good, data-driven education]. Vlissingen:Bazalt.

Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50,741–749.

Mittendorff, K., Jochems, W., Meijers, F., & den Brok, P. (2008). Differences andsimilarities in the use of the portfolio and personal development plan for careerguidance in various 70 vocational schools in The Netherlands. Journal of VocationalEducation and Training, 60(1), 75–91.

Murhpy, K. R., & Davidshofer, C. O. (1994). Psychological testing: Principles and applica-tions. New Jersey: Prentice-Hall.

Noell, G. H., & Burns, J. L. (2006). Value-added assessment of teacher preparation.An illustration of emerging technology. Journal of Teacher Education, 57(1), 37–50.

Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards. AppliedMeasurement Education, 10(1), 39–59.

Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessmentpurposes revisited: A review. Educational Research Review, 9, 129–144.

Praslova, L. (2010). Adaptation of Kirkpatrick’s four level model of training criteria toassessment of learning outcomes and program evaluation in Higher Education.Educational Assessment, Evaluation and Accountability, 22(3), 215–225.

Price, M. (2005). Assessment standards: The role of communities of practice and thescholarship of assessment. Assessment and Evaluation in Higher Education, 3, 215–230.

Rasmussen, A., & Friche, N. (2011). Roles of assessment in secondary education:Participant perspectives. Educational Assessment, Evaluation and Accountability,23(2), 113–129.

Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation ofintrinsic motivation, social development, and well-being. American Psychologist,55, 68–78.

Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment inEducation, 5(1), 77–84.

Sadler, P. M., & Good, E. (2006). The impact of self- and peer-grading on studentlearning. Educational Assessment, 11(1), 1–31.

Scheerens, J. (2007). Een overzichtsstudie naar school- en instructie-effectiviteit. [Anoverview research on school-effectiveness and instruction-effectiveness].Enschede: Universiteit Twente.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods inpersonnel psychology: Practical and theoretical implications of 85 years of re-search findings. Psychological Bulletin, 124(2), 262–274.

Scriven, M. (1980). The logic of evaluation. Inverness, CA: Edge Press.Seezink, A., & Poell, R. F. (2009). Teachers’ individual action theories about competence-

based education: The value of the cognitive apprenticeship model. Journal ofVocational Education and Training, 61(2), 203–215.

Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade:The role of theory and research design in disentangling meta-analysis results.Review of Educational Research, 77(4), 454–499.

Shulman, L. (1986). Those who understand: Knowledge growth in teaching. EducationalResearcher, 15, 4–14.

Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. HarvardEducational Review, 57(1), 1–22.

Sluijsmans, D. M. A., Brand-Gruwel, S., & Van Merrienboer, J. J. G. (2002). Peerassessment training in teacher education: Effects on performance and perceptions.Assessment and Evaluation in Higher Education, 27(5), 443–454.

Snoek, M., Clouder, C., De Ganck, J., Klonari, K., Lorist, P., Lukasova, H., et al. (2009).Teacher quality in Europe: Comparing formal descriptions. Paper presented at theATEE conference 2009, Mallorca.

Spillane, J. P., Pareja, A. S., Dorner, L., Carol Barnes, C., & May, H. (2009). Mixing methodsin randomized controlled trials (RCTs): Validation, contextualization, triangula-tion, and control. Educational Assessment, Evaluation and Accountability, 22(1), 5–28.

Stronge, J. H. (2007). Qualities of effective teachers. Alexandria, USA: Association forSupervision and Curriculum Development.

Teaching Advisory Board. (2007). [Commissie Leraren]. LeerKracht! Advies van de Com-missie Leraren. [Power of teachers: Advice of the teaching advisory board]. DenHaag: Ministry of Education.

Telli, S., & den Brok, P. J. (2012). The questionnaire on teacher interaction from theprimary to the higher education context in Turkey. In T. Wubbels, P. J. den Brok, J.van Tartwijk, & J. Levy (Eds.), Interpersonal relationships in education: An overview ofcontemporary research (pp. 187–206). Rotterdam: Sense Publishers.

Tigelaar, D. E. H., Van Tartwijk, J., Janssen, F., Veldman, I., & Verloop, N. (2009). Aprogram for the assessment of competence in teacher education. An exploration ofteacher educators’ assessment activities. Paper presented at the 13th biannualconference of the European Association for Research on Learning and Instruction,August 29, 2009, Amsterdam, The Netherlands.

Uhlenbeck, A. M. (2002). The development of an assessment procedure for beginningteachers of English as foreign language. (unpublished doctoral dissertation) Leiden,the Netherlands: University of Leiden, ICLON Graduate School of Education.

van der Linden, W., Bakx, A., Ros, A., Beijaard, D., & Vermeulen, M. (2012). Students’perceived development of a positive attitude towards research and researchknowledge and skills in primary teacher education. European Journal of TeacherEducation, 35(4), 401–419.

van der Schaaf, M. F., & Stokking, K. M. (2008). Developing and validating a design forteacher portfolio assessment. Assessment and Evaluation in Higher Education, 33(3),245–262.

van der Schaaf, M. F., Stokking, K. M., & Verloop, N. (2005). Cognitive representations inraters’ assessment of teacher portfolios. Studies in Educational Evaluation, 31, 27–55.

van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional compe-tence: From methods to programs. Medical Education, 39, 309–317.

Vanderlinde, R., & Van Braak, J. (2010). The gap between educational research andpractice: Views of teachers, school leaders, intermediaries and researchers. BritishEducational Research Journal, 36(2), 299–316.

Verloop, N. (2005). De leraar [The teacher]. In N. Verloop & J. Lowyck (Eds.), Onder-wijskunde (pp. 195–234). Groningen: Noordhoff Uitgevers.

Wiliam, D., Lee, C., Harrison, C., & Black, P. J. (2004). Teachers developing assessment forlearning: Impact on student achievement. Assessment in Education: Principles Policyand Practice, 11(1), 49–65.

Wise, K. C., & Okey, J. R. (1983). A meta-analysis of the effects of various scienceteaching strategies on achievement. Journal of Research in Science Teaching, 20,419–435.

Wubbels, T., Brekelmans, M., & Hooymayers, H. (1991). Interpersonal teacher behav-iour in the classroom. In B. Fraser & H. Walberg (Eds.), Educational environments.Oxford: Pergamon.

Page 13: Jurnal (Studies and Evaluation)

A. Bakx et al. / Studies in Educational Evaluation 40 (2014) 50–6262

Zeichner, K. M., & Noffke, S. E. (2001). Practitioner research. In V. Richardson (Ed.),Handbook of research on teaching (pp. 298–330). Washington, DC: AmericanEducational Research Association.

Anouke Bakx, PhD, is an associate professor and academic director of the masterprogram ‘Learning and Innovation’ for teachers at Fontys University of AppliedSciences, The Netherlands. Her research focuses on teacher quality in primary schools,professional learning of teachers and outcome-based education.

Liesbeth Baartman, PhD, is a senior researcher and lecturer at the Faculty of Educationof Utrecht University of Applied Sciences, the Netherlands. She is part of the Research

Group Vocational Education. Her research focuses on assessment quality in (higher)vocational education and students’ learning processes between school and work invocational education.

Tamara van Schilt-Mol, PhD, is associate professor, testing and assessing at theHAN University of Applies Sciences, the Netherlands. She is part of the ResearchCentre Quality for Learning. Her research focuses both on the function of testingand assessment regarding development of students and teachers/lecturers, and onthe function of testing and assessment regarding (improving) the quality ofeducation.