1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response...

44
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing test items

Transcript of 1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response...

1

Item Analysis - Outline

1. Types of test itemsA. Selected response items

B. Constructed response items

2. Parts of test items

3. Guidelines for writing test items

2

Item Analysis - Outline

4. Item AnalysisA. Distracter measures

B. Item difficulty measures

C. Item discrimination measures

5. Item Response TheoryA. ICCS

B. Adaptive testing

3

1. Types of test items

A. Selected response Multiple choice Likert scale Category Q-sort

B. Constructed response

4

A. Selected response

• Multiple choice or forced choice

• Task is to choose between set answers

• Advantage: ease of scoring

• Advantage: scoring requires little skill

• Disadvantage: may test memory rather than comprehension

5

A. Selected response

• Multiple choice or forced choice

• Correct response must be distinct

• Distracters should not be obvious or ambiguous

• If distracters are bad, more = less reliable test

• Use 3-4 distracters per item

6

A. Selected response

• Multiple choice or forced choice• Likert format

• Test-taker chooses a point on a scale that expresses their attitude or belief

• Data lend themselves to factor analysis

7

Likert scale example item

Parking costs at the university are fair

1 2 3 4 5

Strongly agree neutral disagree strongly

agreedisagree

8

A. Selected response

• Multiple choice or forced choice• Likert format• Category

• Similar to Likert but with more choices

• Test-taker’s commitment

• Reliability depends on good instructions & # of categories (≤ 10)

• Scoring shows context effects

9

A. Selected response

• Multiple choice or forced choice• Likert format• Category• Q-sort

• A large set of cards each with statement referring to a “target”

• Test-take sorts cards into piles in terms of how accurate statements are as a description of target

• Generally 9 piles

10

1. Types of test items

A. Selected response

B. Constructed response Free response Fill-in-the-blank Essay tests Portfolios In-basket technique

11

B. Constructed response items

• Free response • Test-taker responds without constraint

• Describes what is important to him/her

12

B. Constructed response items

• Free response• Fill-in-the-blank

• Used to test for knowledge or to find out about beliefs and attitudes

13

B. Constructed response items

• Free response• Fill-in-the-blank• Essay tests

• Preferred when you want to assess test-taker’s ability to think analytically, integrate ideas, and express himself

14

B. Constructed response items

• Free response• Fill-in-the-blank• Essay tests• Portfolios

• Not really a test• Collections of things

the person being evaluated has produced

• Let you evaluate things you can’t assess with a selected response test

15

B. Constructed response items

• Free response• Fill-in-the-blank• Essay tests• Portfolios• In-basket technique

• Used in business• Job candidate gets a

set of “everyday” problems, says how he or she would deal with those problems

• Requires expert raters to grade response

16

B. Constructed response items

• Strengths • Assess higher-order skills

• More useful feedback to test-taker

• Positive influence on study habits?

• Easier to create items

17

B. Constructed response items

• Weaknesses • Time consuming to use

• Possible subjectivity in scoring

18

2. Parts of test items

A. Stimulus or item stem

B. Response format or method

C. Conditions governing the response

D. Procedures for scoring the response

19

2. Parts of test items

A. Stimulus or item stem

• What the subject responds to

20

2. Parts of test items

B. Response format or method

• Typically multiple choice or constructed response

21

2. Parts of test items

C. Conditions governing the response

• e.g., time limits; allowing probes for ambiguous responses; how response is recorded...

22

2. Parts of test items

D. Procedures for scoring the response

• particularly important for constructed response items

23

2. Parts of test items

• To some extent, your choices on each of these parts will be dictated by:

• Precedent What did you do last

time?

• Experience Did that work?

• Practical considerations How many people have

to be tested? How much time is

available?

24

3. Writing test items – guidelines

A. Define clearly

B. Generate a pool of potential items

C. Monitor reading level

D. Use unitary items

E. Avoid long items

F. Break any response “set”

25

3. Writing test items – guidelines

A. Define clearly • Why are you testing?• What do you want to

know?

26

3. Writing test items – guidelines

A. Define clearly

B. Generate a pool of potential items

• The larger the pool of items you select from, the better the test

• Selection from this pool based on item-analysis (see below)

27

3. Writing test items – guidelines

A. Define clearly

B. Generate a pool of potential items

C. Monitor reading level

• level too low? more sophisticated

test-takers may get bored

• level too high? you’re testing reading

skill as well as domain you think you’re testing

28

3. Writing test items – guidelines

A. Define clearly

B. Generate a pool of potential items

C. Monitor reading level

D. Use unitary items

• Then the meaning of the response is clear

29

3. Writing test items – guidelines

A. Define clearly

B. Generate a pool of potential items

C. Monitor reading level

D. Use unitary items

E. Avoid long items

• Longer items are more likely to be mis-interpreted by test-takers

• Short items are more likely to be unitary

30

3. Writing test items - guidelines

A. Define clearly

B. Generate a pool of potential items

C. Monitor reading level

D. Use unitary items

E. Avoid long items

F. Break any response “set”

• Use reverse-scored items to prevent test-taker’s from getting into a response set such as just responding “5” for every item on a Likert scale

31

4. Item analysis

A. Multiple choice distracter analysis

B. Item difficulty measure P

C. Discrimination index D

D. Item – total correlation

32

A. Multiple choice – distracter measures

• How many people choose each distracter?

• Distracters should be equally attractive

• Correct choice should be based on knowledge

• Where knowledge is lacking, choice should be random

33

B. Item Difficulty Measure P

• Difficulty determined by item and population tested

P(i) = # got item correct

# taking test

34

B. Item Difficulty Measure P

• P = .50 is best • P = 0 or P = 1 – such items do not distinguish ability levels

35

C. Item Discrimination Measures

• Discrimination index D

• Item-total correlation

36

Discrimination Index D

• Extreme groups method U = # getting item

correct in ‘top’ group L = # getting item

correct in ‘bottom’ group

nU = # in top group

nL = # in bottom group

D = U – L

nU nL

37

Item Total Correlation

• Good item High correlation People who get item

correct have high score on the test

People who get item wrong have low score on the test

• Poor item Low correlation: look

at wording – may be testing reading skill

38

5. Item Response theory

A. Item characteristic curves

B. Adaptive testing using computers

39

A. Item characteristic curves

• Most important idea: Item Characteristic Curves (ICCs)

• One curve for each test item

• X axis: test-taker ability (given by test score)

• Y axis: probability of choosing an answer

Test Score

Probability of correct response

Item 1

Item 2

Item 3

41

A. Item Characteristic Curves

• Slope: how quickly the curve rises.

• indicates how well item discriminates among persons of differing abilities

• like P(i) in Classical Test Theory

• but sample-invariant

42

Problems with Item Response Theory

• Obtaining stable estimates of IRT parameters requires rather large samples

• Computationally complex

• IRT model assumes that the trait being measured is one-dimensional. It may not be.

43

B. Adaptive Testing Using Computers

• computer selects harder or easier questions as test-taker gets each question right or wrong

• lets you tailor questions for each test-taker

• test-taker does not spend most of their time with questions that are too easy or too difficult

44

B. Adaptive Testing Using Computers

• Facilitates testing of diverse ability groups

• Output = level of difficulty test-taker can deal with