The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer...

14
WHITE PAPER What Makes a Great Test Great? The Art of Evaluation By Sandy Barker

Transcript of The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer...

Page 1: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

WHITE PAPER

What Makes a Great Test Great?

The Art of Evaluation

By Sandy Barker

Page 2: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

Table of contents

Creating Great Tests Abstract ------------------------------------------------------------------------------------ 3

The Perfect Assessment -------------------------------------------------------------- 4

How to Optimize ------------------------------------------------------------------------ 5

Defining and Aligning Objectives ------------------------------------------------- 5

How the ADDIE Model Applies to Assessment ------------------------------- 7

Assessment Analysis -------------------------------------------------------------------- 8

Summary ----------------------------------------------------------------------------------- 8

Best Practices for Test Item Constructions (Chapter based on “Training Handbook for Air Force Instructors, AF MANUAL 50-62”)

Writing Assessments -------------------------------------------------------------------- 9

Best Practices for Constructing Any Assessment Item --------------------- 9

Best Practices for Constructing General Multiple Choice Items ------- 9

Alternate Types of Successful Multiple-Choice Items ---------------------- 11

Best Practices for Constructing Matching/Task-Step Ordering Items 11

Best Practices for Constructing True-False Items ---------------------------- 12

Best Practices for Constructing Completion Items ------------------------- 12

Best Practices for Constructing Short-Answer Items ----------------------- 13

Conclusion ------------------------------------------------------------------------------- 13

THE ART OF EVALUATION 2

Page 3: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

3THE ART OF EVALUATION

The Art of Evaluation

Abstract

A good training program can ultimately provide big benefits: efficiency, quality, and cost-savings, to name a few; and an experienced, well-informed instructional designer can make or break a training program. The job of the instructional designer is to turn raw content into a learning experience that is efficient, effective, and appealing. It is an art form that require content mastery, development resources, creativity, and formal training in learning theory. Course development should be based on time-proven, research-based theories and practices.

This paper provides an industry-centered overview of the course development process from start to finish with a focus on only one product: the test. It explains why the best tests are created as an integral part of the course rather than as an afterthought and introduces the best practices for ensuring that the right things are measured in an efficient, effective method, down to the minute practices involved in test item construction.

Tests are used in the workplace for many reasons including, but not limited to hiring, compatibility, membership, job certification, placement, and promotion. It is important that they measure what they are intended to measure to meet ultimate goals of training.

“So what makes a good test? Ultimately, a rigorous assessment measures how well students, when faced with uncertainties, discrepancies, or seemingly irresolvable conundrums, can use what they have learned to develop sound solutions to problems. ...A rigorous assessment meets three criteria: (1) It measures thinking skills rather than factual recall, (2) it sustains or reinforces rigorous engagement by asking students to think in highly rigorous ways, and (3) it asks students to apply what they have learned to real-world or unpredictable situations.”

“– Robyn R. Jackson, “How to Plan Rigorous Instruction”

About the Author

Sandy Barker is an Instructional Designer working for Flatirons Solutions. Originally from San Antonio, TX, Sandy grew up on and around the Air Force bases there, which fostered in her a lifelong intrigue with heavy things that can leave the earth. With a degree in Aerospace Engineering and a passion for training, she helped build the astronaut training program for the avionics onboard the International Space Station and trained the first four Expedition Crews. She also helped to restructure the logistics model for the US Coast Guard’s surface fleet (based on their aviation model) using the same requirements-gathering and process-documentation techniques used to build successful training programs.

Page 4: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

Creating Great TestsThe Perfect Assessment

Whether you want to have students memorize and recall information, combine concepts to solve problems, or judge the effectiveness of their implementation, there’s a recognized, proven method to ensure that your test items evaluate the ability of a student to do just what you intend at the level you require. In addition, the process optimizes exams by in-corporating the following five factors:

4

Reliability Yields consistent results

Validity Measures exactly what it was intended to measure

Objectivity Scorable with a relative absence of personal judgment or subjectivity

Comprehensiveness Liberally samples the skills representative of the stated instructional objectives

Differentiation Compares results to a standard or among each other

Differentiation

Validity

Reliability

ObjectivityComprehen-siveness

THE ART OF EVALUATION

Page 5: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 5

How to Optimize

� Reliability Once validity is checked, follow best practices for test item construction. Evaluate student performance per test item during the evaluation phase of the ADDIE model.

� Validity Align learning objectives, test items and instructional content up front during development.

� Objectivity To the greatest extent possible and using best practices, write clear, concise test items that have ONE clearly right answer (or best answer). When this is not feasible, set specific criteria for scoring and provide training to scorers to reduce bias.

� Comprehensiveness Be sure each test includes a sampling of valid test items to include and represent all course objectives.

� Differentiation (criterion-referenced) Build tests and test items, using best practices, to distinguish different levels of objective mastery

Defining and Aligning Objectives

Every instructor is familiar with the saying, “Say what you’re gonna say, say it, then say what you said.” That same logic applies to ISD, but from the student perspective:

Tell me what I need to know.

Teach me what I need to know.

Assess me on what you said I need to know.

“Say what you’re gonna say, say it,

then say what you said”

Page 6: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 6

The complete instructional design (ISD) flowchart below depicts the flow of “need to know” information from stakeholders to course developer and ensures alignment between objectives, assessment items, and course content.

Businessdocumentation& best practices

Trainingservices

requirements

Field service

requirements

Jobdescription /

tasksSME

High-levelobjectives

Learningoutcomes

Break downspecific

knowledge andskills (K&S)

Definelearning levels

for eachK&S item

Learningtaxonomy

Define & categorize “What I need to know”

Measurableperformance

objectives

Test items

Coursecontent

Trainingdelivery

resources

Instructorpreview/practice

Complete course

Page 7: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 7

How the ADDIE Model Applies to Assessments

The ADDIE Model provides a flexible guideline for developing effective training utilizing feedback for planned improvement. The following diagram focuses on how ADDIE applies to the development of assessments:

Analysis

Design

DevelopImplement

EvaluateADDIEModel

• Revisit learning goals and objectives

• Analyze feedback & assessment results. Are results skewed by: - misalignment with performance objectives? - course content misalignment? - poorly designed test items? - test administration?

• Write measurable objectives and corresponding test items (initial design)

• Determine changes to objectives, instructional content, assessment items and assessment administration/delivery based on analysis results

• Incorporate changes into course materials

• SME Review (if appropriate)• Dry run (if time allows)

• Teach new or updated course

• Administer revised assessment(s)

• Gather assessment results data

• Collect student feedback: - course evaluation - instructor evaluation

• Collect instructor feedback: - course content alignment and accuracy - assessment alignment

Page 8: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 8

Assessment Analysis

The subject of assessment analysis is a broad one. The specific algorithms used for objective test evaluation depend on numerous factors, some of which are included in the figure above.

Other factors are determined by the goals of assessment, students, instructors, workplace expectations and more. Are your results norm-referenced or criterion-referenced? Are there enough students to generate meaningful data? What is your student motivation level? How are the results of assessment utilized?

SummaryEvaluation is a concept that should be addressed during the initial phases of instructional design. Measurable performance objectives directly feed test items, which then feed course content. This ensures alignment with regard to knowledge and skills, aka “What I Need to Know”. When student and/or instructor evaluation is deemed beneficial, up-front creation of an evaluation model is crucial to its success.

As an integral piece of the lesson/course curriculum, this model is a living, cyclic process. Every aspect of evaluation is addressed through the ADDIE Model. Addressing it as such and weighting it equally with course/lesson content, improvement in instructor and student performance, over time, is ensured. Applied correctly in the right areas, this approach can ultimately lead to an organization that runs smoothly and efficiently with significant cost savings.

Page 9: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 9

Best Practices for Test Item Construction1

Writing Assessments

As shown in the ISD flowchart, in an effort to “assess me on what you said I need to know”, ideally, test items should be constructed:

� based on performance objectives.

� prior to filling in the details of the course content.

� using proven best practices

The first two areas are incorporated by aligning assessments to course objectives. The sections that follow detail best practices for constructing different types of assessment items.

Best Practices for Constructing Any Assessment Item

1. Keep wording simple and direct.

2. Avoid tricky or leading questions.

3. Keep items independent of other items.

4. Underline, capitalize, italicize or somehow emphasize crucial words or phrases. If possible, avoid negatives altogether.

5. When sketches, diagrams, or pictures present information more clearly than words, use them.

6. When assessing performance objectives with the potential to affect life, safety, or valuable equipment, written exams are not appropriate as a standalone solution.

Best Practices for Constructing General Multiple Choice Items

The “stem” is the preliminary sentence that presents the question or the situation. When constructing a stem, consider the following:

1. The stem sets the stage for alternatives. Make sure it clearly presents the central problem or idea.

Avoid Leading Questions

Q:“Do you have any problems with your boss?”

Q:“Tell me about your relationship with your boss.”

1 Training Handbook for Air Force Instructors, AF MANUAL 50-62

Page 10: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 10

2. Unless selection of material is part of the problem, place only relevant information in the stem.

3. Avoid providing clues to the correct answer in the stem.

4. If there is language common to all alternatives, include it in the stem.

5. Exclude extraneous information and wording not relevant to answer the question.

6. To avoid confusing stems, keep wording positive when possible.

7. If the stem ends in “a” or “an”, consider rewording the stem to preclude reducing possible right answers to those that are grammatically correct.

“Alternatives” are the possible answers that can be chosen by students. The “keyed response” is the correct answer and the “distractors” are the incorrect answers. When constructing alternatives, consider the following:

1. When developing distractors, consider other factors besides incorrectness:

a. plausible but incorrect distractors that relate to the situation

b. common misconceptions

c. true statements that do not satisfy central idea in the stem

d. statements that are either too broad or too narrow to satisfy the stem

2. Avoid clue words like “all”, “always”, “never”, “sometimes”, or “usually”. The first three are not likely to be correct while the latter two are more likely to be true.

3. Keep alternatives of similar length, maintaining a parallel structure when possible.

4. When alternatives are numbers, put them in ascending or descending order.

5. To the student who has not achieved the expected level of learning, every alternative should seem plausible.

6. Avoid a consistent pattern of correct answers.

7. Minimize the use of “all of the above” or “none of the above” alternatives. If “none of the above” is the correct answer, it is still impossible to know if the student knows the correct answer - only that one or more are incorrect.

8. When possible, word the stem as a question.

”..all..”“..always..”

”..never..”

Avoid Clue Words

Page 11: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 11

Alternate Types of Successful Multiple-Choice Items

1. Multiple Response (more than one answer is correct and students must mark all correct answers)

2. Definition (put the definition in the stem and make the alternatives each a single word or concept.)

3. Stem supplemented by an Illustration

4. Problem-situation (good for analytical and reflective thinking)

a. When possible, present actual (or at least realistic) problems.

b. Be as specific as possible regarding the situation and the problem to avoid confusion regarding details or requirements.

c. Consider the time allotted to answer the question and limit the problem accordingly.

d. Exploit as many aspects of a single situation as possible and use it to construct several test items.

e. If it is important that the student select the important facts from numerous details, include unimportant details.

f. Present situations that are new to the student.

Best Practices for Constructing Matching and Task-Step Ordering Items

1. Make instructions specific and complete. Students should know exactly what is expected.

2. Include only important information.

3. Choices should be closely related. If they can be categorized, it increases the chance of correct guessing.

4. Make all alternatives plausible.

5. Arrange the alternatives in a logical order. (Alphabetical order is common.)

6. If each alternative is used only once, provide additional alternatives.

Consider the time allotted to answer the question

Make Alternatives Plausible

“Who was the president of the United States in

2014?”

A: Ozzy OsbourneB: Barack ObamaC: Barbara Bush

Page 12: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 12

Best Practices for Constructing True-False Items

1. Include only one idea per item.

2. Do not make one part of the item true while another part is false.

3. Keep the wording positive.

4. Make statements clear and well-defined. Use simple wording and sentence structure.

5. Use well-defined terms or at least terms that mean the same thing to all students.

6. Avoid “hedge” words like “all”, “only”, “every”, “no”, “some”, “any” and “generally”.

7. Avoid patterns in response sequence.

8. Make true statements and false statements the same length.

Best Practices for Constructing Completion Items

1. Include no more than one blank per sentence. Two or more may create ambiguity.

2. Place the blank at or near the end of the item.

3. Be sure there is only one correct or best answer. Both the student and the instructor should have a clear understanding of the criteria for acceptable responses.

4. When testing at a higher level of learning than knowledge (rote memorization), word items differently than originally presented.

5. Make all blanks a uniform length.

6. Include a separate series of blanks arranged in a vertical column and instruct students to record their responses there. This makes for easier and more accurate scoring.

Q:“Bread and grains are not at the top of the food pyramid”

Avoid Negative Wording

Page 13: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

THE ART OF EVALUATION 13

Best Practices for Constructing Short-Answer Items

1. Include specific instructions so the student knows exactly what is expected.

2. Make sure each answer involves only one idea, concept, or fact.

3. Make sure students know the criteria for a complete response through the wording of the item and/or the amount of space provided for the response.

ConclusionThe job of the instructional designer is to turn raw content into a learning experience that is efficient, effective, and appealing. Given good resources and some formal training, a good instructional designer can build a training program to meet the goals of an organization by collecting training requirements and distilling them into course content. The only way to know if the program is successful, however, is to test those requirements.

Tests can be in paper or electronic form. They can be formally administered or issued as surveys or interviews. A test can be administered at different levels within an organization, to include lesson participants, their management, customers, and others. Using the methodology described, the success of a program or product can eventually be judged in ways that are reliable, valid, objective, comprehensive and even differentiated from and among others. Ultimately, the format matters less than a research-validated development approach.

The best tests are created as an integral part of the course rather than as an afterthought; and they can be constructed to ensure that the right things are measured in an efficient, effective method, down to the minute practices involved in test item construction.

Page 14: The Art of Evaluation - Flatirons Solutions€¦ · Best Practices for Constructing Short-Answer Items ----- 13 Conclusion ----- 13 ... Best Practices for Constructing Any Assessment

www.flatironssolutions.com

[email protected]

ABOUT FLATIRONS SOLUTIONS

Flatirons Solutions® provides solutions for content lifecycle management for large asset industries like aviation, defense, rail, and marine. For more than 20 years, it has helped manufacturers, operators, and military forces maintain and operate complex assets more effectively. Its software and service solutions help organizations to deliver the right information, at the right time, to the right people.

ABOUT CORENA SUITE

The CORENA SuiteTM from Flatirons Solutions® is the leading solution for content lifecycle management developed specifically for organizations that rely on mission-critical data to design, manufacture, operate, or maintain complex assets over product and service lifecycles as well as across their business networks.

FLATIRONS SOLUTIONS REGIONAL HEADQUARTERSAMERICASFlatirons Solutions, Inc.Boulder, CO+1 303 544 0514EUROPEFlatirons A/SBirkerød, Denmark+45 4594 9400 ASIAFlatirons Solutions India Private LimitedChennai, India+91 44 6693 6949© 2019 Flatirons Solutions, Inc. All Rights Reserved