BY F. H. Adler, M.D.

13
MULTIPLE CHOICE QUESTIONS IN THE WRITTEN EXAMINATION OF THE AMERICAN BOARD OF OPHTHALMOLOGY BY F. H. Adler, M.D. IN 1952 Dr. Walter Atkinson suggested that the American Board investigate giving a multiple choice type of question (MCQ) in the written examination in place of the essay type. He called attention to the increasing use of this new type of examination by numerous examining boards throughout the country, including the College Entrance Board. As a result, the American Board of Ophthalmology has now had six years' experience with this type of examination. A report on the objectives of this examination, how it makes possible a better appraisal of the candidate, and the ways in which it may still be improved, is the subject of this report. Educators admit that no one type of examination is perfect. Ideally, an examination should be made up of three parts: essay type, MCQ, and oral. The objections to the essay type are that it takes too much time to cover the extensive material; the candidate can often conceal lack of knowledge by padding the answers or by skirting around the edge of the subject; grading is very difficult, and grading the same paper by different examiners often yields widely different results; finally, the tedious job of reading a large number of handwritten papers brings in the factor of fatigue and bias because of poor hand- writing on the part of the examinee. The chief objection to the MCQ type of question is that it does not test the candidate's ability to organize his material, and it is very difficult to make up good questions. This subject will be dealt with in more detail later. The objection that candidates are not familiar with this type of examination is no longer valid. Almost all candidates have been examined in this way before they come up for the Board examination. The objections to the oral type of examination are the nervous tension under which it puts some candidates; the great variability between different examiners in posing questions to bring out weaknesses; and the large factor which per- sonality plays; that is, whether there is rapport between the candidate

Transcript of BY F. H. Adler, M.D.

Page 1: BY F. H. Adler, M.D.

MULTIPLE CHOICE QUESTIONS IN THEWRITTEN EXAMINATION OF THE AMERICAN

BOARD OF OPHTHALMOLOGY

BY F. H. Adler, M.D.

IN 1952 Dr. Walter Atkinson suggested that the American Boardinvestigate giving a multiple choice type of question (MCQ) in thewritten examination in place of the essay type. He called attention tothe increasing use of this new type of examination by numerousexamining boards throughout the country, including the CollegeEntrance Board. As a result, the American Board of Ophthalmologyhas now had six years' experience with this type of examination. Areport on the objectives of this examination, how it makes possible abetter appraisal of the candidate, and the ways in which it may stillbe improved, is the subject of this report.

Educators admit that no one type of examination is perfect. Ideally,an examination should be made up of three parts: essay type, MCQ,and oral. The objections to the essay type are that it takes too muchtime to cover the extensive material; the candidate can often conceallack of knowledge by padding the answers or by skirting around theedge of the subject; grading is very difficult, and grading the samepaper by different examiners often yields widely different results;finally, the tedious job of reading a large number of handwrittenpapers brings in the factor of fatigue and bias because of poor hand-writing on the part of the examinee. The chief objection to the MCQtype of question is that it does not test the candidate's ability toorganize his material, and it is very difficult to make up good questions.This subject will be dealt with in more detail later. The objection thatcandidates are not familiar with this type of examination is no longervalid. Almost all candidates have been examined in this way beforethey come up for the Board examination. The objections to the oraltype of examination are the nervous tension under which it puts somecandidates; the great variability between different examiners in posingquestions to bring out weaknesses; and the large factor which per-sonality plays; that is, whether there is rapport between the candidate

Page 2: BY F. H. Adler, M.D.

and the examiner right at the start. The great advantage of the oralexamination for the American Board of Ophthalmology is that it allowsthe candidate to be tested in the way he does things as well as in theway he thinks. It would seem obvious that the real examination of theAmerican Board of Ophthalmology should be an oral examination inwhich the candidate examines patients in front of the examiner. Priorto this a written examination should be used to screen out thosecandidates who are not ready to take the oral.

Before 1952 the essay type was given in ten different disciplines,such as anatomy, physiology, optics, medical ophthalmology, etc. In1952 the Committee on Examinations substituted MCQ type ofquestions in three of these subjects, retaining the essay type for theothers. The following year MCQ was used in seven subjects, and by1954 the whole examination was MCQ in type. In order to acquaintitself with the technics of MCQ examinations the Committee was inclose consultation with the Educational Testing Service of Princeton,New Jersey, during the years 1955 and 1956. On the advice of theService, in 1956 the emphasis on an examination covering disciplineswas dropped, and a comprehensive examination was substituted. Thequestions were made up from the various disciplines covered byophthalmology, but were so chosen that they likewise tested forcertain abilities in the candidate, and the whole examination wasdesigned to serve a screening purpose, which was to eliminate thosecandidates who were not prepared to present themselves for the oralexamination. This was a considerable change in policy, and it stillremains to be seen whether this will be permanently adopted, orwhether examinations in the different disciplines will be reinstituted.The arguments of the professional educators for a comprehensive

examination are that the purpose of the written examination is merelyto screen out candidates not adequately prepared for the oral, andnot to test their knowledge in particular subjects. This is the functionof the oral. In order to determine fitness for the oral the candidateshould be examined in all subjects which pertain to ophthalmology asa whole. In the early days of the written the candidate was examinedonly in the basic sciences, while the oral was reserved for the clinicalsubjects, and was referred to as the "practical." This differentiation wassoon lost, however, as subjects such as medical and neurologic ophthal-mology and therapeutics were included in the written.Another argument in favor of the comprehensive written is that in

the methods of examining in separate subjects the poorer'candidatesoften managed to qualify after several attempts by passing a few

AA F. II. Adler

Page 3: BY F. H. Adler, M.D.

Multiple Choice Questionssubjects each year. These candidates achieved their goal by nibbling.Statistics showed that of the men who failed in the oral the majoritycame from that group who had reached the.oral by nibbling. Underthe comprehensive plan a candidate either passes the whole examina-tion or has to take the whole examination over. He cannot nibble.For the past three years the written examination has been a com-

prehensive MCQ type composed of 200 questions covering all fields ofophthalmology. It is the purpose of this examination to screen outcandidates who have not had adequate training to take the oral. Thisis made doubly important because of the fact that the AmericanBoard of Ophthalmology has consistently taken the stand that it willnot of itself maintain any jurisdiction over the residency trainingprograms offered throughout the country. The written examination isgiven in January of each year simultaneously in approximately twentycities to an average of 225 candidates.MCQ questions. The Committee on Written Examinations meets

twice a year and reviews all questions selected for the written of thefollowing January. Each member brings fifty new questions which aresubmitted to the Committee as a whole for criticism. Each is chal-lenged with the idea of finding some flaw in it. It is surprising howoften a question which looked straightforward to the member whomade it up is found to have flaws which eliminate it. Frequently themembers do not all agree on the same answer, for, as will be explained,these questions are not all factual. About 60 percent of the questionssubmitted are accepted after having been revised. These are then codedand placed in the question pool. The coding is done in two categories:(i) the discipline the question best represents, and (ii) the ability ittests in the candidate. The various abilities the Committee has chosento test are as follows:

1. Ability to demonstrate an understanding of basic knowledge.2. Ability to apply basic knowledge.3. Ability to recognize and differentiate clinical findings.4. Ability to reason with regard to cause and effect.5. Ability to evaluate diagnostic and therapeutic procedures.6. Ability to reason with regard to the need for surgery or other treatment.7. Ability to anticipate complications and sequels.8. Factual recall.

A grid is made for each examination and the 200 questions selectedare chosen to ensure a good spread of both the disciplines and theabilities to be tested. The grid for the 1958 examination is shown in

47

Page 4: BY F. H. Adler, M.D.

48 F. H. AdlerFigure 1. How many questions should be chosen from each disciplineis a question the Committee has to decide each year. Obviously somesubjects should be given greater importance than others, and theCommittee has to weigh the number chosen from each. This dependspartly upon the makeup of the Committee. Every pathologist on theCommittee will fight to the death to include a maximum number ofquestions in pathology, and each member will do likewise in his ownfield. Compromises, have to be made, and common sense finallyprevails.

AMERICAN BOARD OF OPHTHALMOLOGYItem Distribution - Year 1958

(Al{) ~ ~ ~ ~ ~ ~ ttom«thitloy 2 13 1

tionznd metabolism 3 4

I J toA vo A

40 n 42_40 *0i.

(O)~~ ~ ~ ~~~~1 Op.c0e ofrcin 1 1 1

VU0 0 u0'4gJ~2

ris OC r * 0o 5*0 nDisciplines dA ~ O ~ 033 d 1.@ ~ U 13.

(AM) AMatomy &ad histology 2 3 2 3 11

(B) Biochemistry, nutri-tion and metabolism 3 4 ________ ___ 7

(E) Embryology-and con-

genital anomalies 4 4 2 1_ 10

(M) Microbiology andexternal diseases ___7 7 6 23

(0) Optics and refraction 3 3 1 1 1

(MO) Medical ophthalmology(systemic disea.ses) 9 81 18

(MY) Motility 1 5 2 4 1 13

(N) Ijeuro - ophthalmology 17 5 2 i1

(P) Pharmacology andtherapeutics 2 1 3 2 3 17 ____

(PA) Pathology and oculardiseases 1 15 6 3 1 1 27

(PH) Physiology 5 5 7 17

(S) Surgery 1 3 14 6 7 31

Total 22 14 51 49 30 12 18 4 200

FIGURE 1

Several different types of MCQ have been tried during the last sixyears. Different boards have employed many different types. Thesimplest and most in use is a five-item answer, only one of which isconsidered correct. The statement or question is called the stem, andfollowing this are five choices the candidate must make, only one ofwhich is correct. The incorrect choices are called distractors. Figure 2gives an example of this item type.

Page 5: BY F. H. Adler, M.D.

Mtultiple Choice QuestionsOne would suspect multiple sclerosis in a patient rather than an intracranialneoplasm if one found

* (1) ataxic (dissociated) nystagmus

(2) Ophthalmoplegia externa

(3) 6th nerve paralysis

(4) divergence paralysis

(5) isolated ptosis.

FIGURE 2

A second type (Figure 3) is similar to the first except that theremay be more than one correct answer. The number of choices offeredthe candidate depend upon how many can be invented which areplausible. Generally four or five choices are offered; at least one iscorrect, but all four or five may be correct. Another item type is theso-called "true-false" item.

Remnants of the congenital pupillary membrane are frequently attached to

* 1. the collarette

2. tne pupillary edge

* 3. the lens capsule

4. iris cryptsFIGURE 3

More complicated item types have been devised, such as the match-ing type. A statement or question forms the stem and the candidate isasked to pick from a given list the one item which is correct or bestfits the stem. Item types have been devised which involve decisionson the part of the candidate other than those concerned in the subjectin which he is being examined. The Committee has consistentlyavoided the use of such items.

In all MCQ examinations it is possible to inject into the test thefactor of speed by giving so many questions that only the candidateswho can think fast are able to complete the test. Some educators feelthis is valid on the ground that speed correlates well with ability, andthat the good candidates are alvays the fast ones. They purposelymake the examination so long that only the best men can complete

'The asterisks mark the correct answers.

49

Page 6: BY F. H. Adler, M.D.

it, and use this, plus the grade on the questions completed, to judgethe candidate. We have rejected this principle in the belief that thereare some good candidates who are relatively slow, and these shouldnot be eliminated. Accordingly the examination is made of such alength that the fast workers will finish well ahead of time, while theslow workers will not be pushed to finish the complete test. A hundredquestions are given in the morning session and a hundred in theafternoon session of three and one-half hours each. Very few candi-dates have as yet failed to finish the examination because of lack oftime, and the majority finish in less than three hours.At each session, on entering the examining room after identification

the candidate is given a sealed envelope on which are his name andhis number. At the command of the proctor each candidate opens hisenvelope containing his own set of questions and his own answersheets, marked with his number only. The answer sheet is a standardform developed by IBM for their machines. The candidate marks hisanswers on the answer sheet with a special pencil which is provided,and at the end of each session puts his set of questions and answersheet in an envelope, seals it, and hands it to the proctor, who mailsthe envelopes immediately at the end of the session. The individualanswer sheets are graded by machine, and the candidates' raw scoreson the morning and afternoon sessions are added together.For the past few years the type used has been a stem with four or

five distractors, one or more of which were correct answers. Theanswers are agreed to unanimously. This is necessary, as many of thequestions are not factual ones whose answers can be found directlyin books, but involve the choice of the "best" answers to a situation.They are matters of opinion, but only those questions are used inwhich there is unanimity of opinion on the part of the Committee.Where possible five choices are given, but the number depends onwhether or not good distractors can be invented. Occasionally it isdifficult to think up good distractors, and a distractor which fails toseparate the good from the poor candidates is a waste of time.What constitutes a good question? A good question is one which

tests the desired abilities in candidates and spreads the candidates outin how they rate in these abilities. A cut-off point can then be deter-mined below which one can say the candidates are not yet qualified.The questions must be neither too hard nor too easy, for if either allcandidates know the answer or no candidate knows the answer to aquestion, that question does not separate the sheep from the goats.

Interpretation of the examination results. The scores obtained on an

F. H. AdlerSO

Page 7: BY F. H. Adler, M.D.

Multiple Choice QuestionsMCQ examination differ from those in the essay type in which aperfect paper would be marked 100, and each question would havethe same value. In MCQ examinations a raw score is obtained by eachcandidate. If each question has only one correct answer, the value ofthat question may be taken as 1, and a perfect score on an examinationof 200 questions would be 200. If each question has more than onecorrect answer two possible courses are open to the examiner. Eitherhe may mark the question wrong if the candidate makes any wrongchoice, in which case each question still has a value of 1, and a perfectscore would be 200, or each question may be valued by the numberof possible choices. A question of five choices would have a value offive; one with four would be valued at four. This seems fairer to thecandidate, for it gives him credit for the correct answers he does knowin each question. He is given credit for those he chooses correctly andpenalized for those he chooses wrongly. If there is a total of 200questions, each one of which has five choices, the perfect score is 1,000.Since the examinations given in the past few years have been composedof questions having five or four choices, and since we have elected tocount each choice, the perfect score for the examinations has run some-where between 800 and 1,000. It is more difficult to mark the answersheets using this method, for it means that they must be run throughthe IBM machines twice, once to pick up the errors of commission andonce for the errors of omission. For example, in the following question(Figure 4), there are four choices. Number 2 and number 4 are cor-

A retrobulbar injection of 4% procaine chloride has been made 20 minutespreviously. The surgeon wishes to constrict the pupil immediately.Which of the following drugs will be effective when applied topically?

(1) eserine salicylate 0. 25%

* (2) pilocarpine nitrate 1%

(3) prostigmine 4%

* (4) carcholin 1. 5%

FIGURE 4

rect, and 1 and 3 are incorrect. If a candidate marks 2 only on hisanswer sheet, he has made one error (by omitting to mark 4 also).Since the question is valued at four points he would receive a gradeof three. If he marks 2, 3 and 4, he has made one mistake (3 is anincorrect answer), and his grade would also be three. If he should

51

Page 8: BY F. H. Adler, M.D.

mark 1 and 3 only he has made four mistakes (omitting 2 and 4, anderrors of commission by choosing 1 and 3), and he would receive azero for his grade.

For the 1957 examination the perfect score was 896.Even if a candidate is entirely ignorant of ophthalmology hc can

make some score by merely marking the answAer sheet at randcm. Bypure chance some of his answers would be correct. The first thing to bedone, therefore, is to give the answer sheets to several people totallyignorant of ophthalmology, or without even showing them the ques-tions, and ask them to mark the answer sheet by mere guesswork.The only fact they know is that at least one of the choices in eachquestion is correct. For the 1957 examination this so-called chancescore was around 400. This means that every candidate who took thatexamination went in with 400 points to his credit without any knowl-edge at all. Therefore, 400 equals 0 percent and 896 equals 100 percent.

In the essay type of examination it is customary to regard a gradeof 70 percent as passing. In this examination 70 percent of 496 equals347. Therefore, a passing grade would be 347 plus 400, which equals747 (Figure 5). Experience shows that for MCQ examinations 70 per-cent of the possible earned score generallv is too high; too large aproportion of candidates would fail. Some other method must be

Total no. of choices 896

Chance score 400

Perfect score 496

Points o 347 496

% o 70% 100%

FI70GU 347 + 400 = 747

Passing score 52% n 652F'ICURE 5

52 F. H1. Adler

Page 9: BY F. H. Adler, M.D.

Multiple Choice Questionsfound to secure a proper cut-off point. Several methods for obtainingthis may be adopted. The simplest is to use the same percentage ofcandidates which failed the written essay type over the past years.There are obvious objections to this system. It assumes that thecandidates are always of the same calibre year after year. This ishardly fair. Another method is to make a population curve of thecandidates' grades, as showln in the figure (Figure 6), and to establisha cut-off point by finding where the nearly horizontal portion meetsthe ascending or vertical portion. A better wav is to compute what

25

20 -

en) '/ \w /

z

100

6z

4

2 \q< *;4 1957

520 550 600 650 700 750 800

RAW SCORE

FIGURE 6

is known as the minus one deviation. The method is as follows: Themean grade is determined by adding all the scores together anddividing by the total number of candidates. This mean score issubtracted from each of the candidates' grades. The result is squared.These are added together and divided by the total number of candi-dates. The square root of the figure is the minus one deviation. Theformula is as follows:

n/ (score - mean )2I deviation=n

53

Page 10: BY F. H. Adler, M.D.

The minus one standard deviation may be taken as the appropriatecut-off point for passing. In our examinations the cut-off point hascoincided remarkably well as determined by the minus one standarddeviation with the percentage of candidates formerly failed by theessay type, and by the graph method.The great advantage of the MlCQ type of examination over that of

the essay type is that it lends itself to statistical analysis. By this meansthe examination can be improved from year to year as experience isgained with it.

After each examination is given the questions are submitted to whatis called an item analysis. This is done by IBM machines; otherwise,the mere arithmetic involved would take an enormous amount of time.The following factors are determined:

(i) The difficulty of each question. This is determined by the per-centage of candidates who made a perfect score on it. If only 10 per-cent of the candidates made a perfect score on a question, it isprobably too difficult. On the other hand, if 90 percent answered itcorrectly, the question is too easy. Questions which on analysis fallbelow 20 percent or above 90 percent are discarded. Once a questionhas been subjected to item analysis, the grade is kept on it, so thateach year when a new examination is made up about the same numberof difficult and easy questions can be picked. This ensures keepingthe examination at the same level of difficulty from year to year.

(ii) Analysis of each possible answer. In order to do this thenumber of candidates from the whole group electing each distractoris shown and the number of candidates from the top 25 percent of thegroup and from the bottom 25 percent of the group. Where there isonly one correct answer, the following is shown by the item analysisof question MO 100.

Question MO 100 Grid MO 3

Neovascularization on the disc is frequently found in which one of thefollowing conditions?

(1) a brain tumor with beginning papilledema(2) malignant hypertension, Grade IV(8) tuberculous meningitis with beginning papilledema

* (4) obstructive vascular disease with thrombosis of the central retinalvein

(5) obstructive vascular disease with closure of the central retinalartery

54 F. H. Adler

Page 11: BY F. H. Adler, M.D.

Multiple Choice QuestionsItem-analysis

Choice Total no. Top Bottom.answvering 25% 25%

1 6 0 32 22 2 173 2 0 1

*4 166 54 265 36 1 13

(a) The factor of difficulty is 0.71. That is, it was answered correctlyby 71 percent of the total group of candidates. From this aspect it is agood question, neither too easy nor too hard.

(b) None of the top 25 percent (56 candidates) chose distractor 1,while three of the bottom 25 percent (56 candidates) did. Similarly,only two of the top compared with seventeen of the bottom groupchose distractor 2. None of the top and only one of the bottom groupselected distractor 3. Evidently item 3 is of little help in separating thegood from the poor men. It is poor bait, and caught only one fish. Itshould be changed to some other distractor before this question isgiven in another examination. Of the top group, 54 chose distractor 4,which is the correct answer, while only 26 of the bottom group did.Finally, distractor 5 was chosen by only one of the top group and13 of the bottom. This question, therefore, apparently fulfills the re-quirements of a good screen, separating good from poor candidates.Actually, it would be necessary to submit the question to a muchstricter analysis in order to prove this point. In order to state withcertainty that there is a real and significant difference it would benecessary to determine what is called the Biserial R, which requires along, tedious computation. The cost of further analyzing these ques-tions has to be considered, and from a practical point of view theCommittee feels that there is not enough to be. gained by this ex-penditure.As a result of these analyses the questions can be improved after

they have once been given. The ultimate objective is to acquire a poolof some two to three thousand questions which have been itemanalyzed. It would be perfectly feasible to publish these questions forthe candidates to -see since such a small percentage would be used inany one year. Further, the majority of the questions are of such acharacter that the answers could not be found without considerablestudy. Until this pool of questions is large enough, however, they mustbe kept confidential.

5S

Page 12: BY F. H. Adler, M.D.

56 F. H. AdlerThe final test of this examination is to correlate it with the results of

the oral examination. Failure of correlation does not necessarily meanthat the written MCQ is inadequate, for it may well be that the oralexamination allows errors to occur in which good men are failed andpoor men passed. In general the correlation has been fair, as shown inFigure 7 from the 1957 MCQ and oral. Both written and oral areexpressed in percentage, the oral grade being the average of all thecandidates' grades in the different disciplines in which he wasexamined, and the written grade being the raw score of each candidateexpressed in percentage as previously explained. The candidates whopassed the written and therefore were allowed to take the oral showsome tendency to correlate in that those who got better written gradesgot better oral grades, but this correlation is not by any means good.It may indicate that the oral examination needs to be checked; thatthere is too much variability in this. When means are taken to makethis more uniform, the correlation should be better. One point isstriking. All of the men who failed the oral had just passing grades in

90 '

S. v /

85 ., 50".

80 _80 50~~~~~~~~~~~0

,,,75 0. 0w~~~~~~~~~~~~~~~~~~~~~~~

°Cl 65 /

0

0 /-J70

4

cr/

0 0.L0

65

60

55

PASS

WRITTEN GRADE 1957

FIGURE 7

Page 13: BY F. H. Adler, M.D.

Multiple Choice Qtuestionsthe written. Had we made our cut-off point two points higher in thewritten examination there would have been no failures in the oral.One man would have been eliminated from taking the oral who madea high grade on it, 83. Also, we do not know how many men whofailed the written, if allowed to take the oral, would have made goodgrades.The Committee believes the written examination, as it is now ad-

ministered, adequately determines who should be allowed to take theoral. Cutting out the misfits means that the Board can concentrate itsattention on determining the status of those who have shown a compre-hensive or over-all competency, so that weaknesses in the variousdisciplines can more easily be ferreted out, and such men conditioned.There should be very few failures in future oral examinations. Thewritten now acts as a barrier for those who used to get through bynibbling. Finally, it will help in showing up weaknesses in the oral, andthus making a better and fairer examination for those who want thisdiploma and prize it.

DISCUSSION

DR. GORDON M%. BRUCE. Dr. Adler has covered his subject completelyand what he did not have time to tell you about here will be found in theTransactions. The most that a discusser can do is to emphasize the fact thatthe orals, with all their faults, are still the key examinations of the AmericanBoard of Ophthalmology and that the MCQ is only a screening test.

As Chairman of the American Board of Ophthalmology I can testifythat this type of examination is most effective. As an individual who hasserved several years on Dr. Adler's Examination Committee, I can testifypersonally to the energy and devotion that he has brought to his compli-cated task. Once more Frank Adler has placed all ophthalmologists in hisdebt. On the star-studded list of his accomplishments history will place veryhigh his achievements in raising and maintaining the standards of ophthal-mology.

57