Writing Multiple Choice Test Questions

download Writing Multiple Choice Test Questions

of 25

Transcript of Writing Multiple Choice Test Questions

  • 7/31/2019 Writing Multiple Choice Test Questions

    1/25

    Writing Multiple Choice Test Questions

    SUMMARY

    This is a tutorial on creating multiple choice questions, framed by Haladynas heuristics fortest design and Anderson & Krathwohls update to Blooms taxonomy. My interest incomputer-gradable test questions is to support teaching and learning rather than high-stakesexamination. Some of the design heuristics are probably different for this case. Forexample, which is the more desirable attribute for a test question:

    a. defensibility (you can defend its fairness and appropriateness to a critic) orb. potential to help a student gain insight?

    In high-stakes exams, (a) [defensibility] is clearly more important, but as a support forlearning, Id rather have (b) [support for insight].

    This tutorials examples are from software engineering, but from my perspective assomeone who has also taught psychology and law, I think the ideas are applicable acrossmany disciplines.

    The tutorials advice and examples specifically target three projects:

    In the Black Box Software Testing Course [some course materials here], studentstake the multiple choice tests while they watch the video lectures or work throughthe assigned readings [research description here].

    We are following the same structure forlearning units for graduate student

    instruction in software engineering ethics. In the Open Certification Project for Software Testing we are creating a public

    database of questions, with peer commentary/criticism. Anyone can review thequestions, including people preparing for the exam. For the rationale behind thisapproach, see this paper by Kaner and Tim Coulter.

    CONTENTS

    Standards specific to the BBST and Open Certification Questions Definitions and Examples Item Writing Heuristics

    o Content Heuristicso Style and Format Heuristics

    o Writing the Stem

    o Writing the Options

    References

    http://www.satisfice.com/kaner/?p=24http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.testingeducation.org/BBSThttp://www.kaner.com/pdfs/CirculatingCCLI2007.pdfhttp://www.kaner.com/pdfs/EESEsubmitted.pdfhttp://www.kaner.com/pdfs/EESEsubmitted.pdfhttp://www.kaner.com/pdfs/EESEsubmitted.pdfhttp://www.freetestingcertification.com/http://www.kaner.com/pdfs/OpenCertRequirements.pdfhttp://www.satisfice.com/kaner/?p=24#mch1http://www.satisfice.com/kaner/?p=24#mch2http://www.satisfice.com/kaner/?p=24#mch3http://www.satisfice.com/kaner/?p=24#mch4http://www.satisfice.com/kaner/?p=24#mch5http://www.satisfice.com/kaner/?p=24#mch6http://www.satisfice.com/kaner/?p=24#mch7http://www.satisfice.com/kaner/?p=24#mch8http://www.satisfice.com/kaner/?p=24http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.testingeducation.org/BBSThttp://www.kaner.com/pdfs/CirculatingCCLI2007.pdfhttp://www.kaner.com/pdfs/EESEsubmitted.pdfhttp://www.kaner.com/pdfs/EESEsubmitted.pdfhttp://www.freetestingcertification.com/http://www.kaner.com/pdfs/OpenCertRequirements.pdfhttp://www.satisfice.com/kaner/?p=24#mch1http://www.satisfice.com/kaner/?p=24#mch2http://www.satisfice.com/kaner/?p=24#mch3http://www.satisfice.com/kaner/?p=24#mch4http://www.satisfice.com/kaner/?p=24#mch5http://www.satisfice.com/kaner/?p=24#mch6http://www.satisfice.com/kaner/?p=24#mch7http://www.satisfice.com/kaner/?p=24#mch8
  • 7/31/2019 Writing Multiple Choice Test Questions

    2/25

    STANDARDS SPECIFIC TO THE BBST AND OPEN CERTIFICATION

    QUESTIONS

    1. Consider a question with the following structure:

    Choose the answer:

    a. First optionb. Second option

    The typical way we will present this question is:

    Choose the answer:

    a. First optionb. Second option

    c. Both (a) and (b)d. Neither (a) nor (b)

    If the correct answer is (c) then the examinee will receive 25% credit for selectingonly (a) or only (b).

    2. Consider an question with the following structure:

    Choose the answer:

    a. First option

    b. Second optionc. Third option

    The typical way we will present this question is:

    Choose the answer:

    a. First optionb. Second optionc. Third optiond. (a) and (b)

    e. (a) and (c)f. (b) and (c)g. (a) and (b) and (c)

    If the correct answer is (d), the examinee will receive 25% credit for selecting only(a) or only (b). Similarly for (e) and (f).

    If the correct answer is (g) (all of the above), the examinee will receive 25% creditfor selecting (d) or (e) or (f) but nothing for the other choices.

  • 7/31/2019 Writing Multiple Choice Test Questions

    3/25

    3. Consider an question with the following structure:

    Choose the answer:

    a. First option

    b. Second optionc. Third optiond. Fourth option

    The typical ways we might present this question are:

    Choose the answer:

    a. First optionb. Second optionc. Third option

    d. Fourth option

    OR

    Choose the answer:

    a. First optionb. Second optionc. Third optiond. Fourth optione. (a) and (c)

    f. (a) and (b) and (d)g. (a) and (b) and (c) and (d)

    There will be a maximum of 7 choices.

    The three combination choices can be any combination of two, three or four of the firstfour answers.

    If the correct answer is like (e) (a pair), the examinee will receive 25% credit forselecting only (a) or only (b) and nothing for selecting a combination that includes(a) and (b) but also includes an incorrect choice.

    If the correct answer is (f) (three of the four), the examinee will receive 25% creditfor selecting a correct pair (if (a) and (b) and (d) are all correct, then any two ofthem get 25%) but nothing for selecting only one of the three or selecting a choicethat includes two or three correct but also includes an incorrect choice.

    If the correct answer is (g) (all correct), the examinee will receive a 25% credit forselecting a correct triple.

    DEFINITIONS AND EXAMPLES

  • 7/31/2019 Writing Multiple Choice Test Questions

    4/25

    Definitions

    Here are a few terms commonly used when discussing the design of multiple choicequestions. See the Reference Examples, below.

    Test: In this article, the word test is ambiguous. Sometimes we mean a softwaretest (an experiment that can expose problems in a computer program) andsometimes an academic test (a question that can expose problems in someonesknowledge). In these definitions, test means academic test.

    Test item: a test item is a single test question. It might be a multiple choice testquestion or an essay test question (or whatever).

    Content item: a content item is a single piece of content, such as a fact or a rule,something you can test on.

    Stem: The opening part of the question is called the stem. For example, Which isthe best definition of the testing strategy in a testing project? is Reference

    Example Bs stem. Distractor: An incorrect answer. In Reference Example B, (b) and (c) are

    distractors. Correct choice: The correct answer for Reference Example B is (a) The plan for

    applying resources and selecting techniques to achieve the testing mission. The Question format: The stem is a complete sentence and asks a question that is

    answered by the correct choice and the distractors. Reference Example A has thisformat.

    The Best Answer format: The stem asks a complete question. Most or all of thedistractors and the correct choice are correct to some degree, but one of them isstronger than the others. In Reference Example B, all three answers are plausible

    but in the BBST course, given the BBST lectures, (a) is the best. The Incomplete Stem format: The stem is an incomplete sentence that the correct

    choice and distractors complete. Reference Example C has this format. Complex formats: In a complex-format question, the alternatives include simple

    answers and combinations of these answers. In Reference Example A, the examineecan choose (a) We can never be certain that the program is bug free or (d) whichsays that both (a) and (b) are true or (f) which says that all of the simple answers (a,b and c) are true.

    Learning unit: A learning unit typically includes a limited set of content thatshares a common theme or purpose, plus learning support materials such as a studyguide, test items, an explicit set of learning objectives, a lesson plan, readings,

    lecture notes or video, etc. High-stakes test: A test is high-stakes if there are significant benefits for passing

    the test or significant costs of failing it.

    The Reference Examples

    For each of the following, choose one answer.

  • 7/31/2019 Writing Multiple Choice Test Questions

    5/25

    A. What are some important consequences of the impossibility of complete testing?

    a. We can never be certain that the program is bug free.b. We have no definite stopping point for testing, which makes it easier for some

    managers to argue for very little testing.

    c. We have no easy answer for what testing tasks should always be required, becauseevery task takes time that could be spent on other high importance tasks.d. (a) and (b)e. (a) and (c)f. (b) and (c)g. All of the above

    B. Which is the best definition of the testing strategy in a testing project?

    a. The plan for applying resources and selecting techniques to achieve the testingmission.

    b. The plan for applying resources and selecting techniques to assure quality.c. The guiding plan for finding bugs.

    C. Complete statement coverage means

    a. That you have tested every statement in the program.b. That you have tested every statement and every branch in the program.c. That you have tested every IF statement in the program.d. That you have tested every combination of values of IF statements in the program.

    D. The key difference between black box testing and behavioral testing is that:

    a. The test designer can use knowledge of the programs internals to develop a blackbox test, but cannot use that knowledge in the design of a behavioral test becausethe behavioral test is concerned with behavior, not internals.

    b. The test designer can use knowledge of the programs internals to develop abehavioral test, but cannot use that knowledge in the design of a black box testbecause the designer cannot rely on knowledge of the internals of the black box (theprogram).

    c. The behavioral test is focused on program behavior whereas the black box test isconcerned with system capability.

    d. (a) and (b)

    e. (a) and (c)f. (b) and (c)g. (a) and (b) and (c)

    E. What is the significance of the difference between black box and glass box tests?

    a. Black box tests cannot be as powerful as glass box tests because the tester doesntknow what issues in the code to look for.

  • 7/31/2019 Writing Multiple Choice Test Questions

    6/25

    b. Black box tests are typically better suited to measure the software against theexpectations of the user, whereas glass box tests measure the program against theexpectations of the programmer who wrote it.

    c. Glass box tests focus on the internals of the program whereas black box tests focuson the externally visible behavior.

    ITEM-WRITING HEURISTICS

    Several papers on the web organize their discussion of multiple choice tests around aresearched set of advice fromHaladyna, Downing & Rodriguez or the updated list fromHaladyna (2004). Ill do that too, tying their advice to back to our needs for softwaretesting.

    Content Guidelines

    1Every item should reflect specific content and a single specific cognitive process, ascalled for in the test specifications (table of specifications, two-way grid, test

    blueprint).

    2 Base each item on important content to learn; avoid trivial content.

    3Use novel material to meaure understanding and the application of knowledge andskills.

    4 Keep the content of an item independent from content of other items on the test.

    5 Avoid overspecific and overgeneral content.

    6 Avoid opinion-based items.

    7 Avoid trick items.

    8 Format items vertically instead of horizontally.

    Style and Format Concerns

    9 Edit items for clarity.

    10 Edit items for correct grammar, punctuation, capitalization and spelling.

    11Simplify vocabulary so that reading comprehension does not interfere with testing thecontent intended.

    12 Minimize reading time. Avoid excessive verbiage.

    13 Proofread each item.

    Writing the Stem

    14 Make the directions as clear as possible.15 Make the stem as brief as possible.

    16 Place the main idea of the item in the stem, not in the choices.

    17 Avoid irrelevant information (window dressing).

    18 Avoid negative words in the stem.

    Writing Options

    http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611/http://www.satisfice.com/kaner/?p=24#mc1http://www.satisfice.com/kaner/?p=24#mc1http://www.satisfice.com/kaner/?p=24#mc1http://www.satisfice.com/kaner/?p=24#mc2http://www.satisfice.com/kaner/?p=24#mc3http://www.satisfice.com/kaner/?p=24#mc3http://www.satisfice.com/kaner/?p=24#mc4http://www.satisfice.com/kaner/?p=24#mc5http://www.satisfice.com/kaner/?p=24#mc6http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc8http://www.satisfice.com/kaner/?p=24#mc9http://www.satisfice.com/kaner/?p=24#mc10http://www.satisfice.com/kaner/?p=24#mc11http://www.satisfice.com/kaner/?p=24#mc11http://www.satisfice.com/kaner/?p=24#mc12http://www.satisfice.com/kaner/?p=24#mc13http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc15http://www.satisfice.com/kaner/?p=24#mc16http://www.satisfice.com/kaner/?p=24#mc17http://www.satisfice.com/kaner/?p=24#mc18http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611/http://www.satisfice.com/kaner/?p=24#mc1http://www.satisfice.com/kaner/?p=24#mc1http://www.satisfice.com/kaner/?p=24#mc1http://www.satisfice.com/kaner/?p=24#mc2http://www.satisfice.com/kaner/?p=24#mc3http://www.satisfice.com/kaner/?p=24#mc3http://www.satisfice.com/kaner/?p=24#mc4http://www.satisfice.com/kaner/?p=24#mc5http://www.satisfice.com/kaner/?p=24#mc6http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc8http://www.satisfice.com/kaner/?p=24#mc9http://www.satisfice.com/kaner/?p=24#mc10http://www.satisfice.com/kaner/?p=24#mc11http://www.satisfice.com/kaner/?p=24#mc11http://www.satisfice.com/kaner/?p=24#mc12http://www.satisfice.com/kaner/?p=24#mc13http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc15http://www.satisfice.com/kaner/?p=24#mc16http://www.satisfice.com/kaner/?p=24#mc17http://www.satisfice.com/kaner/?p=24#mc18
  • 7/31/2019 Writing Multiple Choice Test Questions

    7/25

    19 Develop as many effective options as you can, but two or three may be sufficient.

    20Vary the location of the right answer according to the number of options. Assign theposition of the right answer randomly.

    21 Place options in logical or numerical order.

    22 Keep options independent; choices should not be overlapping.23 Keep the options homogeneous in content and grammatical structure.

    24 Keep the length of options about the same.

    25 None of the above should be used sparingly.

    26 Avoid using all of the above.

    27 Avoid negative words such as not or except.

    28 Avoid options that give clues to the right answer.

    29 Make all distractors plausible.

    30 Use typical errors of students when you write distractors.

    31 Use humor if it is compatible with the teacher; avoid humor in a high-stakes test.

    Now to apply those to our situation.

    CONTENT GUIDELINES

    1. Every item should reflect specific content and a single specific cognitive process, as

    called for in the test specifications (table of specifications, two-way grid, test

    blueprint).

    Here are the learning objectives from the AST Foundations course. Note the grid (thetable), which lists the level of knowledge and skills in the course content and defines thelevel of knowledge we hope the learner will achieve. For discussions of level ofknowledge, see my blog entries on Blooms taxonomy [1][2][3]:

    Learning Objectives of the AST Foundations Course

    Anderson /

    Krathwohl

    level

    1Familiar with basic terminology and how it will be usedin the BBST courses

    Understand

    2 Aware of honest and rational controversy overdefinitions of common concepts and terms in the field Understand

    3

    Understand there are legitimately different missions fora testing effort. Understand the argument that selectionof mission depends on contextual factors . Able toevaluate relatively simple situations that exhibit stronglydifferent contexts in terms of their implication fortesting strategies.

    Understand,Simpleevaluation

    http://www.satisfice.com/kaner/?p=24#mc19http://www.satisfice.com/kaner/?p=24#mc20http://www.satisfice.com/kaner/?p=24#mc20http://www.satisfice.com/kaner/?p=24#mc21http://www.satisfice.com/kaner/?p=24#mc22http://www.satisfice.com/kaner/?p=24#mc23http://www.satisfice.com/kaner/?p=24#mc24http://www.satisfice.com/kaner/?p=24#mc25http://www.satisfice.com/kaner/?p=24#mc26http://www.satisfice.com/kaner/?p=24#mc27http://www.satisfice.com/kaner/?p=24#mc28http://www.satisfice.com/kaner/?p=24#mc29http://www.satisfice.com/kaner/?p=24#mc30http://www.satisfice.com/kaner/?p=24#mc31http://www.satisfice.com/kaner/?p=12http://www.satisfice.com/kaner/?p=13http://www.satisfice.com/kaner/?p=13http://www.satisfice.com/kaner/?p=14http://www.satisfice.com/kaner/?p=24#mc19http://www.satisfice.com/kaner/?p=24#mc20http://www.satisfice.com/kaner/?p=24#mc20http://www.satisfice.com/kaner/?p=24#mc21http://www.satisfice.com/kaner/?p=24#mc22http://www.satisfice.com/kaner/?p=24#mc23http://www.satisfice.com/kaner/?p=24#mc24http://www.satisfice.com/kaner/?p=24#mc25http://www.satisfice.com/kaner/?p=24#mc26http://www.satisfice.com/kaner/?p=24#mc27http://www.satisfice.com/kaner/?p=24#mc28http://www.satisfice.com/kaner/?p=24#mc29http://www.satisfice.com/kaner/?p=24#mc30http://www.satisfice.com/kaner/?p=24#mc31http://www.satisfice.com/kaner/?p=12http://www.satisfice.com/kaner/?p=13http://www.satisfice.com/kaner/?p=14
  • 7/31/2019 Writing Multiple Choice Test Questions

    8/25

    4

    Understand the concept of oracles well enough to applymultiple oracle heuristics to their own work and explainwhat they are doing and why

    Understandand apply

    5

    Understand that complete testing is impossible. Improveability to estimate and explain the size of a testing

    problem.

    Understand,rudimentary

    application

    6Familiarize students with the concept of measurementdysfunction

    Understand

    7

    Improve students ability to adjust their focus fromnarrow technical problems (such as analysis of a singlefunction or parameter) through broader, context-richproblems

    Analyze

    8Improve online study skills, such as learning more fromvideo lectures and associated readings

    Apply

    9

    Improve online course participation skills, includingonline discussion and working together online in groups Apply

    10

    Increase student comfort with formative assessment(assessment done to help students take their owninventory, think and learn rather than to pass or fail thestudents)

    Apply

    For each of these objectives, we could list the items that we want students to learn. Forexample:

    list the terms that students should be able to define

    list the divergent definitions that students should be aware of list the online course participation skills that students should develop or improve.

    We could create multiple choice tests for some of these:

    We could check whether students could recognize a terms definition. We could check whether students could recognize some aspect of an online study

    skill.

    But there are elements in the list that arent easy to assess with a multiple choice test. Forexample, how can you tell whether someone works well with other students by asking them

    multiple choice questions? To assess that, you should watch how they work in groups, notread multiple-choice answers.

    Now, back to Haladynas first guideline:

    Use an appropriate type of test for each content item. Multiple choice is good forsome, but not all.

  • 7/31/2019 Writing Multiple Choice Test Questions

    9/25

    If you use a multiple choice test, each test item (each question) should focus on asingle content item. That might be a complex item, such as a rule or a relationshipor a model, but it should be something that you and the student would consider tobe one thing. A question spread across multiple issues is confusing in ways thathave little to do with the content being tested.

    Design the test item to assess the material at the right level (see the grid, above).For example, if you are trying to learn whether someone can use a model toevaluate a situation, you should ask a question that requires the examinee to applythe model, not one that just asks whether she can remember the model.

    When we work with a self-contained learning unit, such as the individual AST BBSTcourses and the engineering ethics units, it should be possible to list most of the items thatstudents should learn and the associated cognitive level.

    However, for the Open Certification exam, the listing task is much more difficult because itis fair game to ask about any of the fields definitions, facts, concepts, models, skills, etc.

    None of the Body of Knowledge lists are complete, but we might use them as a start forbrainstorming about what would be useful questions for the exam.

    The Open Certification (OC) exam is different from other high-stakes exams because theOC question database serves as a study guide. Questions that might be too hard in asurprise-test (a test with questions youve never seen before) might be instructive in a testdatabase that prepares you for an exam derived from the database questionsespeciallywhen the test database includes discussion of the questions and answers, not just thebarebones questions themselves.

    2. Base each item on important content to learn; avoid trivial content.

    The heuristic for Open Certification is:Dont ask the question unless you think a hiringmanager would actually care whether this person knew the answer to it.

    3. Use novel material to meaure understanding and the application of knowledge and

    skills.

    That is, reword the idea you are asking about rather than using the same words as thelecture or assigned readings. This is important advice for a traditional surprise test becausepeople are good matchers:

    If I show you exactly the same thing that you saw before, you might recognize it asfamiliar even if you dont know what it means. If I want to be a nastytrickster, I can put exact-match (but irrelevant) text in a

    distractor. Youll be more likely to guess this answer (if youre not sure of thecorrect answer) because this one is familiar.

    http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc7
  • 7/31/2019 Writing Multiple Choice Test Questions

    10/25

    This is important advice for BBST because the student can match the words to the readings(in this open book test) without understanding them. In the open book exam, this doesnteven require recall.

    On the other hand, especially in the open book exams, I like to put exact matches in the

    stem. The stem is asking a question like, What does thismean? orWhat can you do withthis? If you use textbook phrases to identify thethis, then you are helping the studentfigure out where to look for possible answers. In the open book exam, the multiple choicetest is a study aid. It is helpful to orient the student to something you want him to thinkabout and read further about.

    4. Keep the content of an item independent from content of other items on the test.

    Suppose that you define a term in one question and then ask how to apply the concept inthe next. The student who doesnt remember the definition will probably be able to figure itout after reading the next question (the application).

    Its a common mistake to write an exam that builds forward without realizing that thestudent can read the questions and answer them in any order.

    5. Avoid overspecific and overgeneral content.

    The concern with questions that are overly specific is that they are usually trivial. Does itreally matter what year Boris Beizer wrote his famous Software Testing Techniques? Isntit more important to know what techniques he was writing about and why?

    There are some simple facts that we might expect all testers to know.

    For example, whats the largest ASCII code in the lower ASCII character set, and what

    character does it signify?

    The boundary cases for ASCII might be core testing knowledge, and thus fair game.

    However, in most cases, facts are easy to look up in books or with an electronic search.Before asking for a memorized fact, ask why you would care whether the tester hadmemorized that fact or not.

    The concern with questions that are overly general is that they are also usually trivialor

    wrongor both.

    6. Avoid opinion-based items.

    This is obvious, right? A question is unfair if it asks for an answer that some experts wouldconsider correct and rejects an answer that other experts would consider correct.

    But we have this problem in testing.

  • 7/31/2019 Writing Multiple Choice Test Questions

    11/25

    There are several mutually exclusive definitions of test case. There are strongprofessional differences about the value of a test script or the utility of the V-model or evenwhether the V-model was implicit in the waterfall model (read the early papers) or a morerecent innovation.

    Most of the interesting definitions in our field convey opinions, and the Standards thatassert the supposedly-correct definitions get that way by ignoring the controversies.

    What tactics can we use to deal with this?

    a. The qualified opinion.

    For example, consider this question:

    The definition of exploratory testing is

    and this answer:

    a style of software testing that emphasizes the personal freedom and responsibility of theindividual tester to continually optimize the value of her work by treating test-relatedlearning, test design, test execution, and test result interpretation as mutually supportiveactivities that run in parallel throughout the project.

    Is the answer correct or not?

    Some people think that exploratory testing is bound tightly to test execution; they wouldreject the definition.

    On the other hand, if we changed the question to,

    According to Cem Kaner, the definition of exploratory testing is

    that long definition would be the right answer.

    Qualification is easy in the BBST course because you can use the qualifier, According tothe lecture. This is what the student is studying right now and the exam is open book, sothe student can check the fact easily.

    Qualification is more problematic for closed-book exams like the certification exam.In thisgeneral case, can we fairly expect students to know who prefers which definition?

    The problem is that qualified opinions contain an often-trivial fact. Should we really expectstudents or certification-examinees to remember definitions in terms ofwho said what?Most of the time, I dont think so.

    b. Drawing implications

  • 7/31/2019 Writing Multiple Choice Test Questions

    12/25

    For example, consider asking a question in one of these ways:

    IfA meansX, then if you doA, you should expect the following results. Imagine two definitions ofA:Xand Y. Which bugs would you be more likely to

    expose if you followedXin your testing and which if you followed Y?

    Which definition of X is most consistent with theory Y?

    7. Avoid trick items.

    Haladyna (2004, p. 104) reports work by Roberts that identified several types of(intentional or unintentional) tricks in questions:

    1. The item writers intention appeared to deceive, confuse, or mislead testtakers.

    2. Trivial content was represented (which vilates one of our item-writingguidelines)

    3. The discrimination among options was too fine.4. Items had window dressing that was irrelevant to the problem.5. Multiple correct answers were possible.6. Principles were presented in ways that were not learned, thus deceiving

    students.7. Items were so highly ambiguous that even the best students had no idea

    about the right answer.

    Some other tricks that undermine accurate assessment:

    8. Put text in a distractor that is irrelevant to the question but exactly matches

    something from the assigned readings or the lecture.9. Use complex logic (such as not (A and B) or a double negative) unlessthe learning being tested involves complex logic.

    10. Accurately qualify a widely discreted view:According to famous-person,the definition of X is Y, where Y is a definition no one accepts any more, butfamous-person did in fact publish it.

    11. In the set of items for a question, leave grammatical errors in all but thesecond-best choice. (Many people will guess that the grammatically-correctanswer is the one intended to be graded as correct.)

    Items that require careful reading are not necessarily trick items. This varies from field to

    field. For example, my experience with exams for lawyers and law students is that theyoften require very precise reading. Testers are supposed to be able to do very fine-grainedspecification analysis.

    Consider Example D:

    D. The key difference between black box testing and behavioral testing is that:

  • 7/31/2019 Writing Multiple Choice Test Questions

    13/25

    The options include several differences that students find plausible. Every time I give thisquestion, some students choose a combination answer (such as (a) and (b)). This is amistake, because the question calls for The key difference, and that cannot be acollection of two or more differences.

    Consider Example E:

    E. What is the significance of the difference between black box and glass box tests?

    A very common mistake is to choose this answer:

    Glass box tests focus on the internals of the program whereas black box tests focus on theexternally visible behavior.

    The answer is an accurate description of the difference, but it says nothing about thesignificance of the difference. Why would someone care about the difference? What is the

    consequence of the difference?

    Over time, students learn to read questions like this more carefully. My underlyingassumption is that they are also learning or applying, in the course of this, skills they needto read technical documents more carefully. Those are important skills for both softwaretesting and legal analysis and so they are relevant to the courses that are motivating thistutorial. However, for other courses, questions like these might be less suitable.

    On a high-stakes exam, with students who had not had a lot of exam-preparation training, Iwould not ask these questions because I would not expect students to be prepared for them.On the high-stakes exam, the ambiguity of a wrong answer (might not know the content vs.

    might not have parsed the question carefully) could lead to the wrong conclusion about thestudents understanding of the material.

    In contrast, in an instructional context in which we are trying to teach students to parsewhat they read with care, there is value in subjecting students to low-risk reminders to readwith care.

    STYLE AND FORMAT CONCERNS

    8. Format items vertically instead of horizontally.

    If the options are brief, you couldformat them as a list of items, one beside the next.However, these lists are often harder to read and it is much harder to keep formattingconsistent across a series of questions.

    9. Edit items for clarity.

    I improve the clarity of my test items in several ways:

  • 7/31/2019 Writing Multiple Choice Test Questions

    14/25

    I ask colleagues to review the items. I coteach with other instructors or with teaching assistants. They take the test and

    discuss the items with me. I encourage students to comment on test items. I use course management systems,

    so it is easy to set up a question-discussion forum for students to query, challenge

    or complain about test items.

    In my experience, it is remarkable how many times an item can go through review (andimprovement) and still be confusing.

    10. Edit items for correct grammar, punctuation, capitalization and spelling.

    It is common for instructors to write the stem and the correct choice together when theyfirst write the question. The instructor words the distractors later, often less carefully and insome way that is inconsistent with the correct choice. These differences becomeundesirable clues about the right and wrong choices.

    11. Simplify vocabulary so that reading comprehension does not interfere with testing

    the content intended.

    Theres not much point asking a question that the examinee doesnt understand. If theexaminee doesnt understand the technical terms (the words or concepts being tested),thats one thing. But if the examinee doesnt understand the other terms, the questionsimply wont reach the examinees knowledge.

    12. Minimize reading time. Avoid excessive verbiage.

    Students whose first language is not English often have trouble with long questions.

    13. Proofread each item.

    Despite editorial care, remarkably many simple mistakes survive review or are introducedby mechanical error (e.g. cutting and pasting from a master list to the test itself).

    WRITING THE STEM

    14. Make the directions as clear as possible.

    Consider the following confusingly-written question:

    A program will accept a string of letters and digits into a password field. After it acceptsthe string, it asks for a comparison string, and on accepting a new input from the customer,it compares the first string against the second and rejects the password entry if the stringsdo not match.

    a. There are 218340105584896 possible tests of 8-character passwords.

  • 7/31/2019 Writing Multiple Choice Test Questions

    15/25

    b. This method of password verification is subject to the risk of input-buffer overflowfrom an excessively long password entry

    c. This specification is seriously ambiguous because it doesnt tell us whether theprogram accepts or rejects/filters non-alphanumeric characters into the secondpassword entry

    Let us pretend that each of these answers could be correct. Which is correct for thisquestion? Is the stem calling for an analysis of the number of possible tests, the risks of themethod, the quality of the specification, or something else?

    The stem should make clear whether the question is looking for the best single answer orpotentially more than one, and whether the question is asking for facts, opinion, examples,reasoning, a calculation, or something else.

    The reader should never have to read the set of possible answers to understand what thequestion is asking.

    15. Make the stem as brief as possible.

    This is part of the same recommendation as Heuristic #12 above. If the entire questionshould be as short as possible (#12), the stem should be as short as possible.

    However, as short as possible does not necessarily mean short.

    Here are some examples:

    The stem describes some aspect of the program in enough detail that it is possible

    to compute the number of possible software test cases. The choices include thecorrect answer and three miscalculations. The stem describes a software development project in enough detail that the reader

    can see the possibility of doing a variety of tasks and the benefits they might offerto the project, and then asks the reader to prioritize some of the tasks. The choicesare of the form, X is more urgent than Y.

    The stem describes a potential error in the code, the types of visible symptoms thatthis error could cause, and then calls for selection of the best test technique forexposing this type of bug.

    The stem quotes part of a product specification and then asks the reader to identifyan ambiguity or to identify the most serious impact on test design an ambiguity like

    this might cause. The stem describes a test, a failure exposed by the test, a stakeholder (who has

    certain concerns) who receives failure reports and is involved in decisions about thebudget for the testing effort, and asks which description of the failure would bemost likely to be perceived as significant by that stakeholder. An even moreinteresting question (faced frequently by testers in the real world) is whichdescription would be perceived as significant (credible, worth reading and worthfixing) by Stakeholder 1 and which otherdescription would be more persuasive for

    http://www.satisfice.com/kaner/?p=24#mc12http://www.satisfice.com/kaner/?p=24#mc12http://www.satisfice.com/kaner/?p=24#mc12http://www.satisfice.com/kaner/?p=24#mc12
  • 7/31/2019 Writing Multiple Choice Test Questions

    16/25

    Stakeholder 2. (Someone concerned with next months sales might assess risk verydifferently from someone concerned with engineering / maintenance cost of aproduct line over a 5-year period. Both concerns are valid, but a good tester mightraise different consequences of the same bug for the marketer than for themaintenance manager).

    Another trend for writing test questions that address higher-level learning is to write a verylong and detailed stem followed by several multiple choice questions based on the samescenario.

    Long questions like these are fair game (normal cases) in exams for lawyers, such as theMultistate Bar Exam. They are looked on with less favor in discplines that dont demandthe same level of skill in quickly reading/understanding complex blocks of text. Therefore,for many engineering exams (for example), questions like these are probably less popular.

    They discriminate against people whose first language is not English and who are

    therefore slower readers of complex English text, or more generally against anyonewho is a slow reader, because the exam is time-pressed. They discriminate against people who understand the underlying material and who

    can reach an application of that material to real-life-complexity circumstances ifthey can work with a genuine situation or a realistic model (something they canappreciate in a hands-on way) but who are not so good at working fromhypotheticals that abstract out all information that the examiner considersinessential.

    They can cause a cascading failure. If the exam includes 10 questions based on onehypothetical and the examinee misunderstands that one hypothetical, she mightblow all 10 questions.

    They can demoralize an examinee who lacks confidence/skill with this type ofquestion, resulting in a bad score because the examinee stops trying to do well onthe test.

    However, in a low-stakes exam without time limits, those concerns are less important. Theexam becomes practice for this type of analysis, rather than punishment for not being goodat it.

    In software testing, we are constantly trying to simplify a complex product into testablelines of attack. We ignore most aspects of the product and design tests for a few aspects,considered on their own or in combination with each other. We build explicit or implicitmental models of the product under test, and work from those to the tests, and from thetests back to the models (to help us decide what the results should be). Therefore, drawingout the implications of a complex system is a survival skill for testers and questions of thisstyle are entirely fair gamein a low stakes exam, designed to help the student learn, ratherthan a high-stakes exam designed to create consequences based on an estimate of what thestudent knows.

    16. Place the main idea of the item in the stem, not in the choices.

    http://www.ncbex.org/multistate-tests/mbe/http://www.ncbex.org/multistate-tests/mbe/
  • 7/31/2019 Writing Multiple Choice Test Questions

    17/25

    Some instructors adopt an intentional style in which the stem is extremely short and thequestion is largely defined in the choices.

    The confusingly-written question in Heuristic #14 was an example of a case in which thereader cant tell what the question is asking until he reads the choices. In #14, there were

    two problems:

    the stem didnt state what question it was asking the choices themselves were fundamentally different, asking about different

    dimensions of the situation described in the stem rather than exploring onedimension with a correct answer and distracting mistakes. The reader had to guess /decide which dimension was of interest as well as deciding which answer might becorrect.

    Suppose we fix the second problem but still have a stem so short that you dont know whatthe question is asking for until you read the options. Thats the issue addressed here

    (Heuristic #16).

    For example, here is a better-written question that doesnt pass muster underHeuristic #16:

    A software oracle:

    a. is defined this wayb. is defined this other wayc. is defined this other way

    The better question under this heuristic would be:

    What is the definition of a software oracle?

    a. this definitionb. this other definitionc. this other other definition

    As long as the options are strictly parallel (they are alternative answers to the same impliedquestion), I dont think this is a serious a problem.

    17. Avoid irrelevant information (window dressing).

    Imagine a question that includes several types of information in its description of someaspect of a computer program:

    details about how the program was written details about how the program will be used details about the stakeholders who are funding or authorizing the project details about ways in which products like this have failed before

    http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc16http://www.satisfice.com/kaner/?p=24#mc16http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc14http://www.satisfice.com/kaner/?p=24#mc16http://www.satisfice.com/kaner/?p=24#mc16
  • 7/31/2019 Writing Multiple Choice Test Questions

    18/25

    All of these details mightbe relevant to the question, but probably most of them are notrelevant to any particular question. For example, to calculate the theoretically-possiblenumber of tests of part of the program doesnt require any knowledge of the stakeholders.

    Information:

    is irrelevant if you dont need it to determine which option is the correct answer unless the readers ability to wade through irrelevant information of this type in

    order to get to the right underlying formula (or generally, the right approach to theproblem) is part of the

    18. Avoid negative words in the stem.

    Here are some examples of stems with negative structure:

    Which of the following isNOTa common definition of software testing?

    DoNOTassign a priority to a bug reportEXCEPTunder what condition(s)? You should generally compute code coverage statistics UNLESS:

    For many people, these are harder than questions that ask for the same information in apositively-phrased way.

    There is some evidence that there are cross-cultural variations. That is, these questions areharder for some people than others because (probably) of their original language training inchildhood. Therefore, a bad result on this question might have more to do with the personsheritage than with their knowledge or skill in software testing.

    However, the ability to parse complex logical expressions is an important skill for a tester.Programmers make lots of bugs when they write code to implement things like:

    NOT (A OR B) AND C

    So testers have to be able to design tests that anticipate the bug and check whether theprogrammer made it.

    It is not unfair to ask a tester to handle some complex negation, if your intent is to testwhether the tester can work with complex logical expressions. But if you think you aretesting something else, and your question demands careful logic processing, you wont

    know from a bad answer whether the problem was the content you thought you weretesting or the logic that you didnt consider.

    Another problem is that many people read negative sentences as positive. Their eyes glazeover when they see the NOT and they answer the question as if it were positive (Which ofthe following IS a common definition of software testing?) Unless you are testing for glazyeyes, you should make the negation as visible as possible I use ITALICIZED ALL-CAPSBOLDFACEin the examples above.

  • 7/31/2019 Writing Multiple Choice Test Questions

    19/25

    WRITING THE CHOICES (THE OPTIONS)

    19. Develop as many effective options as you can, but two or three may be sufficient.

    Imagine an exam with 100 questions. All of them have two options. Someone who is

    randomly guessing should get 50% correct.

    Now imagine an exam with 100 questions that all have four options. Under randomguessing, the examinee should get 25%.

    The issue of effectiveness is important because an answer that is not credible (not effective)wont gain any guesses. For example, imagine that you saw this question on a quiz in asoftware testing course:

    Green-box testing is:

    a. common at box manufacturers when they start preparing for the Energy Star ratingb. a rarely-taught style of software testingc. a nickname used by automobile manufacturers for tests of hybrid carsd. the name of Glen Myers favorite book

    I suspect that most students would pick choice 2 because 1 and 3 are irrelevant to thecourse and 4 is ridiculous (if it was a proper name, for example, Green-box testing wouldbe capitalized.) So even though there appear to be 4 choices, there is really only 1 effectiveone.

    The number of choices is important, as is the correction-for-guessing penalty, if you are

    using multiple choice test results to assign a grade or assess the students knowledge inway that carries consequences for the student.

    The number of choices the final score is much less important if the quiz is forlearning support rather than for assessment.

    The Open Certification exam is for assessment and has a final score, but it is different fromother exams in that examinees can review the questions and consider the answers inadvance. Statistical theories of scoring just dont apply well under those conditions.

    20. Vary the location of the right answer according to the number of options. Assign

    the position of the right answer randomly.

    Theres an old rule of thumbif you dont know the answer, choose the second one in thelist. Some inexperienced exam-writers tend to put the correct answer in the same locationmore often than if they varied location randomly. Experienced exam-writers use arandomization method to eliminate this bias.

    21. Place options in logical or numerical order.

  • 7/31/2019 Writing Multiple Choice Test Questions

    20/25

    The example that Haladyna gives is numeric. If youre going to ask the examinee to choosethe right number from a list of choices, then present them in order (like $5, $10, $20, $175)rather than randomly (like $20, $5, $175, $20).

    In general, the idea underlying this heuristic is that the reader is less likely to make an

    accidental error (one unrelated to their knowledge of the subject under test) if the choicesare ordered and formatted in the way that makes them as easy as possible to read quicklyand understand correctly.

    22. Keep options independent; choices should not be overlapping.

    Assuming standard productivity metrics, how long should it take to create and document

    100 boundary tests of simple input fields?

    a. 1 hour or lessb. 5 hours or less

    c. between 3 and 7 hoursd. more than 6 hours

    These choices overlap. If you think the correct answer is 4 hours, which one do you pick asthe correct answer?

    Here is a style of question that I sometimes use that might look overlapping at first glance,but is not:

    What is the best course of action in context C?

    a. Do X because of RY (the reason you should do Y).b. Do X because of RX (the reason you should do X, but a reason that the examinee isexpected to know is impossible in context C)

    c. Do Y because of RY (the correct answer)d. Do Y because of RX

    Two options tell you to do Y (the right thing to do), but for different reasons. One reason isappropriate, the other is not. The test is checking not just whether the examinee can decidewhat to do but whether she can correctly identify why to do it. This can be a hard questionbut if you expect a student to know why to do something, requiring them to pick the rightreason as well as the right result is entirely fair.

    23. Keep the options homogeneous in content and grammatical structure.

    Inexperienced exam writers often accidentally introduce variation between the correctanswer and the others. For example, the correct answer:

    might be properly punctuated might start with a capital letter (or not start with one) unlike the others

  • 7/31/2019 Writing Multiple Choice Test Questions

    21/25

    might end with a period or semi-colon (unlike the others) might be present tense (the others in past tense) might be active voice (the others in passive voice), etc.

    The most common reason for this is that some exam authors write a long list of stems and

    correct answers, then fill the rest of the questions in later.

    The nasty, sneakytricky exam writer knows that test-wise students look for this type ofvariation and so introduces it deliberately:

    Which is the right answer?

    a. this is the right answerb. This is the better-formatted second-best answer.c. this is a wrong answerd. this is another wrong answer

    The test-savvy guesser will be drawn to answer 2 (bwaa-haaa-haa!)

    Tricks are one way to keep down the scores of skilled guessers, but when students realizethat youre hitting them with trick questions, you can lose your credibility with them.

    24. Keep the length of options about the same.

    Which is the right answer?

    a. this is the wrong answer

    b. This is a really well-qualified and precisely-stated answer that is obviously morecarefully considered than the others, so which one do you think is likely to be theright answer?.

    c. this is a wrong answerd. this is another wrong answer

    25. None of the above should be used carefully.

    As Haladyna points out, there is a fair bit of controversy over this heuristic:

    If you use it, make sure that you make it the correct answer sometimes and the

    incorrect answer sometimes Use it when you are trying to make the student actually solve a problem and assess

    the reasonability of the possible solutions

    26. Avoid using all of the above.

    The main argument against all of the above is that if there is an obviously incorrectoption, then all of the above is obviously incorrect too. Thus, test-wise examinees can

    http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc7http://www.satisfice.com/kaner/?p=24#mc7
  • 7/31/2019 Writing Multiple Choice Test Questions

    22/25

    reduce the number of plausible options easily. If you are trying to statistically model thedifficulty of the exam, or create correction factors (a correction is a penalty for guessingthe wrong answer), then including an option that is obviously easier than the others makesthe modeling messier.

    In our context, we arent correcting for guessing or estimating the difficulty of the exam:

    In the BBST (open book) exam, the goal is to get the student to read the materialcarefully and think about it. Difficulty of the question is more a function ofdifficulty of the source material than of the question.

    In the Open Certification exam, every question appears on a public server, along ajustification of the intended-correct answer and public commentary. Any examineecan review these questions and discussions. Some will, some wont, some willremember what they read and some wont, some will understand what they readand some wonthow do you model the difficulty of questions this way? Whateverthe models might be, the fact that the all of the above option is relatively easy for

    some students who have to guess is probably a minor factor.

    Another argument is more general. Several authors, including Haladyna, Downing, &Rodriguez (2002), recommend against the complex question that allows more than onecorrect answer. This makes the question more difficult and more confusing for somestudents.

    Even though some authors recommend against it, our question construction adopt acomplex structure that allows selection of combinations (such as (a) and (b) as well as allof the above) because other educational researchers consider this structure a usefulvehicle for presenting difficult questions in a fair way. See for example

    Wongwiwatthananukit, Popovich & Bennett (2000) and their references.

    Note that in the BBST / Open Certification structure, the fact that there is a combinationchoice or an all of the above choice is not informative because most questions have these.

    There is a particular difficulty with this structure, however. Consider this question:

    Choose the answer:

    a. This is the best choiceb. This is a bad choice

    c. This is a reasonable answer, but (a) is far betteror this is really a subset of (a),weak on its own but it would be the only correct one if (a) was not present.d. (a) and (b)e. (a) and (c)f. (b) and (c)g. (a) and (b) and (c)

    http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.satisfice.com/kaner/?p=24#mch1http://www.satisfice.com/kaner/?p=24#mch1http://findarticles.com/p/articles/mi_qa3833/is_200004/ai_n8883100http://www.satisfice.com/kaner/?p=24#mch1http://www.satisfice.com/kaner/?p=24#mch1http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.satisfice.com/kaner/?p=24#mch1http://www.satisfice.com/kaner/?p=24#mch1http://findarticles.com/p/articles/mi_qa3833/is_200004/ai_n8883100http://www.satisfice.com/kaner/?p=24#mch1
  • 7/31/2019 Writing Multiple Choice Test Questions

    23/25

    In this case, the student will have an unfairly hard time choosing between (a) and (e). Wehave created questions like this accidentally, but when we recognize this problem, we fix itin one of these ways:

    Alternative 1. Choose the answer:

    a. This is the best choiceb. This is a bad choicec. This is a reasonable answer, but (a) is far betteror this is really a subset of

    (a), weak on its own but it would be the only correct one if (a) was not

    present.

    d. This is a bad choicee. (a) and (b)f. (b) and (c)g. (a) and (b) and (c)

    In this case, we make sure that (a) and (c) is not available for selection.

    Alternative 2. Choose the answer:

    a. This is the best choiceb. This is a bad choicec. This is a reasonable answer, but (a) is far betteror this is really a subset of

    (a), weak on its own but it would be the only correct one if (a) was notpresent.

    d. This is a bad choice

    In this case, no combinations are available for selection.

    27. Avoid negative words such as not or except.

    This is the same advice, for the options, as we provided inHeuristic #18 for the stem, forthe same reasons.

    28. Avoid options that give clues to the right answer.

    Some of the mistakes mentioned by Haladyna, Downing, & Rodriguez (2002) are:

    Broad assertions that are probably incorrect, such as always, never, must, andabsolutely. Choices that sound like words in the stem, or words that sound like the correct

    answer Grammatical inconsistencies, length inconsistencies, formatting inconsistencies,

    extra qualifiers or other obvious inconsistencies that point to the correct choice Pairs or triplet options that point to the correct choice. For example, if every

    combination option includes (a) (such as (a) and (b) and (a) and (c) and all of the

    http://www.satisfice.com/kaner/?p=24#mc18http://www.satisfice.com/kaner/?p=24#mc18http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.satisfice.com/kaner/?p=24#mc18http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdf
  • 7/31/2019 Writing Multiple Choice Test Questions

    24/25

    above) then it is pretty obvious that (a) is probably correct and any answer thatexcludes (a) (such as (b)) is probably wrong.

    29. Make all distractors plausible.

    This is important for two reasons:

    If you are trying to do statistical modeling of the difficulty of the exam (There are4 choices in this question, therefore there is only a 25% chance of a correct answerfrom guessing) then implausible distractors invalidate the model because fewpeople will make this guess.However, in our tests, we arent doing this modelingso this doesnt matter.

    An implausible choice is a waste of space and time. If no one will make this choice,it is not really a choice. It is just extra text to read.

    One reason that an implausible distractor is sometimes valuable is that sometimes students

    do pick obviously unreasonable distractors. In my experience, this happens when thestudent is:

    ill, and not able to concentrate falling asleep, and not able to concentrate on drugs or drunk, and not able to concentrate or temporarily inflicted with a very

    strange sense of humor copying answers (in a typical classroom test, looking at someone elses exam a few

    feet away) and making a copying mistake.

    I rarely design test questions with the intent of including a blatantly implausible option, but

    I am an inept enough test-writer that a few slip by anyway. These arent very interesting inthe BBST course, but I have found them very useful in traditional quizzes in thetraditionally-taught university course.

    30. Use typical errors of students when you write distractors.

    Suppose that you gave a fill-in-the-blank question to students. In this case, for example,you might ask the student to tell you the definition rather than giving students a list ofdefinitions to choose from. If you gathered a large enough sample of fill-in-the-blankanswers, you would know what the most common mistakes are. Then, when you create themultiple choice question, you can include these as distractors. The students who dont

    know the right answer are likely to fall into one of the frequently-used wrong answers.

    I rarely have the opportunity to build questions this way, but the principle carries over.When I write a question, I askIf someone was going to make a mistake, what mistakewould they make?

    31. Use humor if it is compatible with the teacher; avoid humor in a high-stakes test.

  • 7/31/2019 Writing Multiple Choice Test Questions

    25/25

    Robert F. McMorris, Roger A. Boothroyd, & Debra J. Pietrangelo (1997)and Powers(2005) advocate for carefully controlled use of humor in tests and quizzes. I think this isreasonable in face-to-face instruction, once the students have come to know the instructor(or in a low-stakes test while students are getting to know the instructor). However, in a testthat involves students from several cultures, who have varying degrees of experience with

    the English language, I think humor in a quiz can create more confusion and irritation thanit is worth.

    References

    These notes summarize lessons that came out of the last Workshop on Open Certification(WOC 2007) and from private discussions related to BBST.

    Theres a lot of excellent advice on writing multiple-choice test questions. Here are a fewsources that Ive found particularly helpful:

    1. Lorin Anderson, David Krathwohl, & Benjamin Bloom, Taxonomy for Learning,Teaching, and Assessing, A: A Revision of Blooms Taxonomy of EducationalObjectives, Complete Edition, Longman Publishing, 2000.

    2. National Conference of Bar Examiners,Multistate Bar Examination Study Aids andInformation Guides.

    3. Steven J. Burton, Richard R. Sudweeks, Paul F. Merrill, Bud Wood, How toPrepare Better Multiple-Choice Test Items: Guidelines for University Faculty,Brigham Young University Testing Services, 1991.

    4. Thomas M. Haladyna, Writing Test Items to Evaluate Higher Order Thinking,Allyn & Bacon, 1997.

    5. Thomas M. Haladyna, Developing and Validating Multiple-Choice Test Items, 3rd

    Edition, Lawrence Erlbaum, 2004.6. Thomas M. Haladyna, Steven M. Downing, Michael C. Rodriguez, A Review of

    Multiple-Choice Item-Writing Guidelines for Classroom Assessment,AppliedMeasurement in Education, 15(3), 309334, 2002.

    7. Robert F. McMorris, Roger A. Boothroyd, & Debra J. Pietrangelo, Humor inEducational Testing: A Review and Discussion,Applied Measurement inEducation, 10(3), 269-297, 1997.

    8. Ted Powers, Engaging Students with Humor, Association for PsychologicalScience Observer, 18(12), December 2005.

    9. The Royal College of Physicians and Surgeons of Canada,Developing MultipleChoice Questions for the RCPSC Certification Examinations.

    10. Supakit Wongwiwatthananukit, Nicholas G. Popovich, & Deborah E. Bennett,Assessing pharmacy student knowledge on multiple-choice examinations usingpartial-credit scoring of combined-response multiple-choice items, AmericanJournal of Pharmaceutical Education, Spring, 2000.

    11. Bibliography and links on Multiple Choice Questions athttp://ahe.cqu.edu.au/MCQ.htm

    http://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.freetestingcertification.com/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.ncbex.org/multistate-tests/mbe/http://www.ncbex.org/multistate-tests/mbe/http://www.ncbex.org/multistate-tests/mbe/http://testing.byu.edu/info/handbooks/betteritems.pdfhttp://testing.byu.edu/info/handbooks/betteritems.pdfhttp://www.amazon.com/Writing-Items-Evaluate-Higher-Thinking/dp/0205178758http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.ranzcog.edu.au/fellows/pdfs/diploma-mcqs/developing-mcqs-for-RCPSC.pdfhttp://www.ranzcog.edu.au/fellows/pdfs/diploma-mcqs/developing-mcqs-for-RCPSC.pdfhttp://www.ranzcog.edu.au/fellows/pdfs/diploma-mcqs/developing-mcqs-for-RCPSC.pdfhttp://findarticles.com/p/articles/mi_qa3833/is_200004/ai_n8883100http://findarticles.com/p/articles/mi_qa3833/is_200004/ai_n8883100http://ahe.cqu.edu.au/MCQ.htmhttp://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.freetestingcertification.com/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.amazon.com/Taxonomy-Learning-Teaching-Assessing-Educational/dp/0321084055/http://www.ncbex.org/multistate-tests/mbe/http://www.ncbex.org/multistate-tests/mbe/http://testing.byu.edu/info/handbooks/betteritems.pdfhttp://testing.byu.edu/info/handbooks/betteritems.pdfhttp://www.amazon.com/Writing-Items-Evaluate-Higher-Thinking/dp/0205178758http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://www.amazon.com/Developing-Validating-Multiple-Choice-Test-Items/dp/0805846611http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdfhttp://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.leaonline.com/doi/abs/10.1207/s15324818ame1003_5http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.psychologicalscience.org/observer/getArticle.cfm?id=1904http://www.ranzcog.edu.au/fellows/pdfs/diploma-mcqs/developing-mcqs-for-RCPSC.pdfhttp://www.ranzcog.edu.au/fellows/pdfs/diploma-mcqs/developing-mcqs-for-RCPSC.pdfhttp://findarticles.com/p/articles/mi_qa3833/is_200004/ai_n8883100http://findarticles.com/p/articles/mi_qa3833/is_200004/ai_n8883100http://ahe.cqu.edu.au/MCQ.htm