Sheffield Assessment Instrument for Letters (SAIL)

Letters to the Editor

Facing the challenges ofcompetency-basedassessment of postgraduatedental training

Editor – I read with interest the recent

paper on longitudinal evaluation of per-

formance in competency-based

assessment.1 To develop and introduce a

valid and reliable system of assessment,

which accurately measures the all-round

competence of trainees, is indeed a

daunting task and the authors have

obviously put an enormous amount of

effort into producing their evaluation

form. The emphasis on increasing for-

mative and minimizing summative

assessment is certainly to be applauded.

I am, however, concerned that such

assessments are being developed with-

out having preset criteria against which

judgements of performance are made.

How can these judgements be reliable

and objective without clear criteria? To

imply that the reliability (and validity)

of such assessments is improved if a

large number of assessments are car-

ried out is dubious. Reputations are

easily made and hard to change and,

once an opinion is formed on a train-

ee, word spreads quickly around a

department. Assessments are thus

easily influenced, consciously or sub-

consciously, and any errors compoun-

ded. The ‘halo’ effect (and the oppo-

sing ‘horns’ effect) are best countered

by establishing objective criteria in

advance2 – not by the reinforcement of

subjective judgement.

I do accept that criteria for such wide-

ranging assessments may have to be

broad for reasons of feasibility, but

please let us not do away with them

altogether. If we do, we are in danger of

spending an enormous amount of time

and effort on assessments that are no

more reliable than the intuitive judge-

ments made in the past.

A W Evans

London, UK

References1 Prescott LE, Norcini JJ, McKinlay P,

Rennie JS. Facing the challenges of

competency-based assessment of post-

graduate dental training. Longitudinal

evaluation of performance (LEP). Med

Educ 2002;36:92–7.

2 Fletcher S. Competence-Based Assessment

Techniques. London: Kogan Page; 2000.

Students benefit fromexperience of hospitalization

Editor – We all learn in a variety of

ways. Although textbooks, lectures and

discussion groups are important con-

stituents of medical student learning, we

wouldn’t want anyone performing sur-

gery based only on a written description

of the procedure. Similarly, true

empathy towards patients involves more

than acquired knowledge and skills; it

also requires understanding something

about what patients experience.

Thus, unlike Professor Downie, we

believe there can be great value in hav-

ing students go through the hospital-

ization experience we describe in this

issue – despite the fact that it cannot

precisely replicate the exact circum-

stances faced by patients with acute

and⁄or chronic disease. Students who

participated in this project were able to

personally experience many of the

dehumanizing aspects of being in a

hospital, from wearing a flimsy hospital

gown and having to undress in front of

others, to being treated as an object

rather than as a human being.

Professor Downie also feels this edu-

cational project was unethical, for three

reasons. Firstly, he worries that it could

have resulted in harm to the students.

As we pointed out, the safeguards we

instituted meant that there was only

minimal risk. Although the institution of

safeguards itself creates additional differ-

ences between the students’ experience

and that which ’real patients’ undergo,

it does not eliminate many important

aspects of the experience of hospital-

ization, and is a reasonable and indeed

necessary compromise. Furthermore, of

the many student volunteers who gave

fully informed consent, only a few were

able to participate in the project.

Secondly, Professor Downie worries

about inappropriate use of resources.

No patients were denied care because a

handful of students took up otherwise

empty beds at a time when we knew our

hospital would not be full. The

increased demand on the time of phy-

sicians and nurses was minimal, and the

Correspondence: AW Evans, Department of

Oral & Maxillofacial Surgery, Eastman

Dental Institute for Oral Health Care

Sciences, 256 Grays Inn Road, London

WCIX 8LD, UK. E-mail: a.evans@eastman.

ucl.ac.uk

Correspondence: Dr Michael Wilkes, 39630

Larkspur Place, Davis, California 95616,

USA. E-mail: [email protected]

586 � Blackwell Science Ltd MEDICAL EDUCATION 2002;36:586–590

financial cost of the exercise was trivial.

As we note in the manuscript, the cost

of the exercise was dwarfed by the

amount spent on other medical educa-

tional activities that, we would argue,

are of far less value. We would never

advocate taking up hospital beds with

healthy students if those beds were

needed for real patients. In the UK or

elsewhere this concern might mean that

an alternative experience would need to

be provided. However, this was cer-

tainly not the case at our teaching hos-

pital.

We agree with Professor Downie’s

comment that ‘one of the saddest parts

of the experience’ related to one of the

students being turned away for lack of

health insurance. We are vocal critics of

this aspect of American health care.

Although this aspect of the experience

was a powerful and educational one for

the student involved, it does not miti-

gate the ethical shortcomings of a health

care system that treats people unequally

based on their ability to pay. However,

this unconscionable aspect of American

medicine was not the subject of our

paper.

Finally, Professor Downie raises

concerns about the intrinsic deception

involved. He opines that the only

justification for deception might be

‘important research’. We do not

understand this concept. Deception in

and of itself is of course undesirable,

but we believe that most ethical issues

are complex, and, typically, involve the

balancing of competing values. The

benefits of a project like this – whether

it is carried out for research, for edu-

cation, or for some other purpose – can

outweigh the harm associated with this

degree of deception. It goes without

saying that the project could not have

been accomplished in any meaningful

sense had the caregivers known what

was taking place. Given the specific

nature of the project, we are comfort-

able that its educational value, and

potential for positively changing future

behaviour, justified the degree and type

of deception involved. (Frankly we are

astonished by the assertion that our

willingness to conduct this project

proves that we ‘lack humane qualit-

ies’).

Professor Downie concludes that we

have addressed the wrong issue because

‘what is needed is to give permission for

the deployment of humane qualities that

students already possess’. He believes

this might be best accomplished by

offering courses in the humanities and

’encouraging a broader perspective on

life’. We have no objection to this pro-

posal and believe it reflects one positive

aspect of American medical education,

in that our medical students have first

obtained an undergraduate degree,

where they are far more likely than their

UK counterparts to have been exposed

to broad perspectives.

Moreover, we believe the experiences

our students had during this exercise

confirm our prior observations that,

despite the many other ways in which

we attempt to introduce humanism into

the curriculum at our medical school, a

great deal more needs to be done. We

do not discourage the use of other tools

as well, but suggest that this particular

tool can add greatly to students’

understanding, not only of the patient’s

experience of hospitalization, but also of

the critical importance of sensitivity and

humanism, or their absence, on the part

of physicians.

Michael Wilkes

J Hoffman

Davis, California, USA

Training of Doctors project

Editor – I read with interest the valuable

discussion paper by Bleakley1 in your

Journal. I have, however, a number of

comments concerning our group’s

quoted work.2,3,4

The Training of Doctors project was

a funded research and development

project, which was multidisciplinary in

nature and practical in approach. As an

action research developmental project,

it aimed to illuminate important ques-

tions and themes in an under studied

area, rather than to primarily develop

new theory. The mixture of doctors,

educationalists and a trained anthro-

pologist facilitated this approach. The

emphasis was on producing outcomes

and possible solutions which would

enhance the educational experience or

’training’ of junior doctors, not just

pre-registration house officers. These

outcomes were evaluated and found to

work in a variety of hospital settings and

departments.

Using a language that would allow

us to communicate with doctors was

essential, and a psychological turn of

phrase was almost inevitable as this

vernacular is predominant in the

medical paradigm. In addition, this

psychological approach facilitated ac-

cess to those well-described aspects of

the junior doctor’s life that concern

stress and emotion. This does not

mean that the authors do not believe

in the social constructivist worldview,

socialization or the recent seminal

work on communities of practice.5,6

But a practice based ‘how to do it on

the shop floor’ approach was what

appeared to be most valued by train-

ees.7 This may be because being a

doctor is a practical job. This situated

approach then led to further theory

development, particularly of commu-

nities of practice, which extended our

preliminary work.7

There is undoubtedly more ‘cultural

complexity’ to be unraveled, but the

actual educational value of many junior

doctors’ work experience and their

knowledge of optimal learning strategies

is still often only moderate, and so more

sophisticated approaches may well be

helpful in the future. For example, a

critical theory approach may have been

useful, but access to some institutions

was sometimes vulnerable where sensi-

tivities and confidentiality could easily

be disturbed, so some theoretical com-

promise was necessary in order to

maintain access. Similarly, although the

statement that ‘there is no generic

pedagogic formula’ is probably true,

particularly in the present unbounded,

contested and hybrid postmodern

world, some straightforward starting

point was pragmatically necessary for

the project. If, as with Bleakley’s paper,

this work also stimulates a discourse, it

can only help junior doctors and their

training in the future.

S J Ward

London, UK

Correspondence: Dr S J Ward, 32 Dovercourt

Road, London SE22 8ST, UK. E-mail:

[email protected]

Letters to the Editor 587

� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:586–590

References1 Bleakley A. Pre-registration house

officers and ward-based learning: a ‘new

apprenticeship’ model. Med Educ

2002;36:9–15.

2 Hargreaves DH, Bowditch MG, Griffin

DR. On-the-Job Training for Surgeons: a

Practical Guide. Edinburgh: The Royal

Society of Medicine Press 1997.

3 Hargreaves DH, Southworth GW,

Stanley P, Ward SJ. On-the-Job Training

for Physicians: a Practical Guide. Edin-

burgh: The Royal Society of Medicine

Press 1997.

4 Stanley P. Structuring ward rounds for

learning: can opportunities be created?

Med Educ 1998;32:239–43.

5 Lave J, Wenger E. Situated Learning:

Legitimate Peripheral Participation. Cam-

bridge: Cambridge University Press 1991.

6 Wenger E. Communities of Practice:

Learning, Meaning and Identity. Cam-

bridge: Cambridge University Press 1998.

7 Ward SJ. Enhancing the capacity of

junior doctors’ training. Unpublished

Thesis. Cambridge School of Education

1997.

Sheffield AssessmentInstrument for Letters (SAIL)

Editor ) We were interested and

impressed by the simplicity of the

Sheffield Assessment Instrument for

Letters (SAIL).1 Specialist Registrars

(SpRs) in Paediatrics in the South-west

Region are recommended to include

example anonymous clinic letters and

discharge summaries within their port-

folios. These are then used to inform the

Record of In Training Assessment

(RITA). Our experience is that due to

time pressures within the allocated 1

hour for the RITA interview, the

opportunity for more than a superficial

review is severely limited. Thus, a

validated objective scoring system

would seem to be the ideal way forward.

However we were surprised that the

authors felt that SAIL was �highly feas-

ible to carry out� as the time incurred by

the �judges� would be enormous if this

scoring system was to be conducted for

each and every SpR. As an example,

using the mathematics from within the

paper, for a reliability coefficient of 0Æ80,

6 judges would need to score 10 letters

from each SpR. In the South-west

region there are more than 70 SpRs,

resulting in 700 clinic letters to be ana-

lysed. If each letter took the 6 minutes,

as suggested by the authors, then this

would amount to 70 hours for each of

the 6 judges. We are not sure that this is

a productive use of time for Regional

Advisors or RITA assessors.

Perhaps the SAIL system would be

more practical if used as a formal com-

ponent of the assessment undertaken at

the end of the first 2 core years of

training or when assessing poor per-

formance of trainees in difficulty.

Sarah J Bridges

Huw Thomas

Bristol, UK

Reference1 Crossley JGM, Howe A, Newble D,

Jolly B, Davies HA. Sheffield Assess-

ment Instrument for Letters (SAIL):

performance assessment using outpa-

tient letters. Med Educ 2001;35:1115–

24.

Authors’ reply

Editor – We are grateful to Dr Bridges

and Dr Thomas for their thoughtful

response to our paper. In particular it is

reassuring that others have recognized

the face validity, feasibility and at-

tractiveness of using clinic letters to

inform RITA and other assessment

processes.

They raise a very important point in

relation to the balance between reliability

and feasibility in assessing letters. Based

on our data they have calculated (cor-

rectly) that it would take the slowest

judge 70 hours to achieve a set of results

with a reliability coefficient of 0Æ8 on all

the SpRs in a large Higher Specialist

Training Programme at one point in

time. Even the quicker judges would still

take 35 hours. There are 3 important

points to make about this conclusion that

will illustrate the richness of generaliz-

ability data and some important princi-

ples of performance assessment. Unlike

other reliability techniques generaliz-

ability allows modelling of reliability for a

range of assessment strategies from

which the one best suited to a given pur-

pose and circumstances can be chosen.

A reliability coefficient of 0Æ8 is quo-

ted because it is the accepted threshold

for high-stakes assessment such as

revalidation. There is no commonly

held threshold for in-training assess-

ment processes, but most authors agree

that validity and feedback potential are

more important in this setting and that

the threshold for reliability is much

lower.1 The data show that a threshold

of 0Æ7 (still better than an hour-long

MCQ2 or a 3-hour OSCE)3 would be

reached if 6 judges each marked 5 letters

or 3 judges each marked 8 letters. For a

training programme of 30 SpRs this

would take each judge 7Æ5–24 hours

depending upon marking speed and

how many judges took part.

It is rarely necessary to produce a

high-reliability result on every doctor at

regular fixed time points. The marking

that contributes to a regular assessment

process could be distributed throughout

the year. Bridges and Thomas them-

selves suggest that high-reliability, high-

investment assessment could be

reserved for specific points in training

but will have been preceded by lower

reliability, high feasibility formative

assessment to inform the development

of letter writing skills. Similarly a less

discriminating assessment strategy

could be used to �screen� for borderline

trainees who would then be subjected to

a more rigorous and resource expensive

strategy before any definitive decision

about their subsequent progress was

made. We have developed a simple

global rating scale for this purpose that

correlates well with SAIL but has a

slightly lower reliability. It takes only

1)2 minutes per letter but cannot pro-

duce such rich formative feedback.

Bridges and Thomas are rightly

concerned that the busiest and most

Correspondence: Sarah Bridges, Paediatric

Unit, Southmead Hospital, Westbury-on-

Trym, Bristol, UK. E-mail: sarahbridges

[email protected]

Correspondence: Helena Davies, Consultant in

Medical Education, Sheffield Children’s

Hospital NHS Trust, Western Bank,

Sheffield S10 2TH, UK. Tel.: (44) 114 271

7108; Fax: (44) 114 271 7185; E-mail:

[email protected]

Letters to the Editor588


expensive clinicians should not be

spending their time in lengthy assessment

procedures. We included a consultant, a

GP and a trainee in the original study

since they are the main stakeholders in

good letter writing. This has enabled us to

show that the differences between them

are not significantly related to their des-

ignation. It follows that 3 trainees acting

as judges will produce a similar result that

is equally reliable when their marking is

guided by SAIL. Whilst most programme

directors would probably feel more

comfortable with a mix of judges for

higher-stakes assessment there is no rea-

son why trainees couldn’t mark most of

the letters most of the time. This in itself

has provided valuable instruction in letter

writing for markers of every grade in our

experience.

Using the results appropriately it is

easy to re-evaluate the reliability in any

of these circumstances to check that the

assessment tool had performed as pre-

dicted.

Helena Davies

Jim Crossley

Amanda Howe

Brian Jolly

David Newble

Sheffield, UK

References1 van der Vleuten C. The assessment of

professional competence: developments,

research and practical implications. Adv

Health Sci Education 1996;1:41–67.

2 Norcini JJ, Swanson DB, Grosso LJ,

Webster GD. Reliability, validity and

efficiency of multiple choice question

and patient management problem item

formats in assessment of clinical com-

petence. Med Educ 1985;19:238–47.

3 Newble DI, Swanson DB. Psychometric

characteristics of the objective struc-

tured clinical examination. Med Educ

1988;22:325–34.

Holding on to the philosophyand keeping the faith

Editor – Norman1 corrects Norman’s2

attribution of an argument to do with

the existence of God, and refers this to

�Norman�s inadequate educational pre-

paration in the liberal arts.’ However

this correction itself needs correcting.

Moreover this reveals a further deficit in

Norman’s educational preparation – an

understanding of philosophy that is

�inadequate� to PBL and the article he is

writing about.3

Norman’s correction is misleading on

two counts: first he is wrong about the

argument itself, and secondly he is

mistaken about the very notion of the

attribution of arguments. It is the latter

that is revealing.

This is the argument as given: �if you

believe in God and there is none, you

have lost nothing, if you don�t believe

and there is one, you have lost every-

thing’. Norman attributes this first to

Galton (1822–1911) and then to Spi-

noza (1632–1677). However the argu-

ment is better known as the Wager of

Pascal (1623–1662).4

It is not important that Norman’s

correction calls this a �logical proof of

the existence of God� (although it isn’t –

it’s an argument for believing in God’s

existence). Nor is it important that

Norman’s reference to an online

encyclopedia provides no evidence that

Spinoza ever used this argument. Nor is

it really so important that versions of it

can be found, no doubt, that predate

Pascal. All these are relatively trivial

points.

Norman’s real mistake, I would sug-

gest, is to look for someone to whom to

�credit� the argument at all. Historians of

philosophy can, and do, argue over

issues of priority and attribution – who

said what and when – just as historians

of the discovery of, say, oxygen, or

America, do. But doing the history of

philosophy is not the same thing as

doing philosophy. Philosophers, by

contrast, are interested in the arguments

themselves, for these are all any philo-

sopher has. An argument in philosophy,

in short, �belongs� to whoever asserts it.

Or, to put it another way, we are all

responsible for what we think. Try

running Pascal’s Wager yourself, as

paraphrased above: why are you not

persuaded?

The reasons for this responsibility

are the same as those which motivate

PBL: Galton can’t do your thinking,

nor your teacher your learning, for

you. This, in turn, suggests that put-

ting �the responsibility for learning in

the hands of the learner, not the tea-

cher� is more than simply an �assump-

tion�, more than an �unfounded belief�,but an idea that has substantial philo-

sophical warrant.

Norman’s original commentary was

sceptical of – or at least �agnostic�about – Dolmans’ paper, portraying it

as an attempt at �keeping the faith� in

PBL. Norman’s correction of his

commentary suggests that there may

be good philosophical reasons for us to

be sceptical of his agnosticism, and

indeed to continue �holding on to the

philosophy.�

Simon Harrison

University of Bristol

References1 Norman GR, Erratum. Med Educ

2002;36:102.

2 Norman GR. Holding on to the philos-

ophy and keeping the faith. Med Educ

2001;35:820–1.

3 Dolmans D, Wolfhagen I, van der Vle-

uten C, Wijnen W. Solving problems

with group work in problem based

learning: hold on to the philosophy. Med

Educ 2001;35:884–9.

4 Pascal B. Penses. Translated by AJ

Krailsheimer. London: Penguin; 1963: pp.

149–52.

The assessment tool is only asgood as the assessors

Editor ) We read with interest the art-

icle on videoconferencing to assess

neonatal resuscitation skills1 and were

impressed by the low levels of interob-

server variability found between the

two instructors in the 18 megacodes

evaluated.

Correspondence: Simon Harrison, 2 Rodney

Place, Clifton, Bristol BS8 4HY, UK. E-mail:

[email protected]

Correspondence: Gavin D Perkins, Research

Fellow Intensive Care Medicine,

Birmingham Heartlands Hospital, Bordesley

Green East, Birmingham B9 5SS, UK. Tel.:

(44) 121 424 3562; Fax: (44) 121 424 1108;

E-mail: [email protected]

Letters to the Editor 589


We have recently assessed inter-

observer variability in an adult resusci-

tation skills course (The Resuscitation

Council UK Advanced Life Support

Provider Program).2 We used video-

recorded scenarios to test a group of 25

examiners from 15 different assessment

centres in order to assess the levels of

interobserver variability. Our study

differed from the present study by

using a diverse group of examiners and

by using scenarios that were staged to

include a number of commonly ob-

served errors. Unlike Cronin’s study we

observed significant interobserver vari-

ability ranging from agreement of only

50% to a maximum of 100% for a

single scenario where the candidate

made multiple mistakes. Intraobserver

variability was also tested by showing

the instructors one of the videos twice

and found to be similarly poor (kappa

0Æ43). The marked difference in inter-

observer consistency between our

findings and those of Cronin et al.

suggests that extrapolation of their

findings per se to a larger group of in-

structors would not guarantee a similar

level of consistency as that demonstra-

ted in their study.

However, as Cronin identifies, the

development of this model for training

new and re-certifying instructors may

be a valuable tool to improve consis-

tency in marking and continuing edu-

cation.

Gavin D Perkins

Birmingham, UK

Michael J Tweed

Leicester, UK

References1 Cronin C, Cheang S, Hlynka D, Adair

E, Roberts S. Videoconferencing can be

used to assess neonatal resuscitation

skills. Med Educ 2001;35:1013–23.

2 Perkins GD, Hulme J, Tweed MJ.

Variability in the assessment of

advanced life support skills. Resuscitation

2001;50:281–6.

Letters to the Editor590


Sheffield Assessment Instrument for Letters (SAIL)

Documents

Transcript of Sheffield Assessment Instrument for Letters (SAIL)