Applying the principles of item and test analysis
-
Upload
questionmark -
Category
Business
-
view
3.046 -
download
0
Transcript of Applying the principles of item and test analysis
Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
2012 Users Conference New Orleans March 20 - 23
Applying the Principles of Item and Test Analysis
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Define various Classical Test Theory item and test statistics
Identify poorly performing items and tests using item and test analysis reports
Use item analysis reports to guide item revisions to improve your assessments
Session objectives
Slide 2
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
A global network of professional services firms. 150,000 staff members across the network. 35,000 are in the USA.
The development of our people is a top priority. PwC is in Training Magazine’s Hall of Fame and is the only
company to have been awarded #1 in the Top 125 for 3 years in a row.
We have a highly mobile and virtual workforce.
Slide 3
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Introduction to Test Theory
Slide 4
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Mathematical concepts to help answer your questions about assessment quality: Does the test measure one thing or multiple things? How should the test be scored? How precisely does the assessment measure the knowledge? Are any items influenced by factors other than what you are
trying to measure (a.k.a. “bias” or "irrelevant variance")? Can we use alternative, equivalent items to test the same
thing?
What is test theory?
Slide 5
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Measurement is the assignment of numbers to an attribute according to a rule.
Most physical measurement scales we take for granted: Temperature Weight Volume
Assessments are measurement scales!
Slide 6
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Assessment scales have a limited range. They have what is called a Floor (0%) & Ceiling (100%)
Assessments have a limit
Slide 7
No Knowledge New Hire Novice
Expert Knowledge Experienced Hire Master
Test questions measure only part of the possible range of knowledge.
0% 100%
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Different metrics can be used for the same thing. We assign the meaning to the numbers. Think of the differences between Celsius and Fahrenheit.
Assessments can use different scales
Slide 8
Number Correct Square Square Root Log-odds Scale
Passed all items 10 100 3.16 ?
Passed 9 items 9 81 3 2.18
Passed 8 items 8 64 2.83 1.39
Passed 7 items 7 49 2.65 0.85
Passed 6 items 6 36 2.45 0.41
Passed 5 items 5 25 2.24 0
Passed 4 items 4 16 2 -0.41
Passed 3 items 3 9 1.73 -0.85
Passed 2 items 2 4 1.14 -1.39
Passed 1 items 1 1 1 -2.18
Failed all items 0 0 0 ?
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
A norm-referenced assessment measures people against a defined population. How do you measure up against
other people? A Criterion-referenced
assessment measures people against a defined domain of knowledge. Have you mastered the material in
a domain of knowledge?
Is your assessment a norm-referenced scale OR a criterion-referenced scale?
Slide 9
Normal distribution of scores
Negative distribution of scores
0%
0%
100%
100%
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Based on the assumption that all the test items measure the same concept.
Concerned mainly with the overall test score. Used to improve the quality of the test score.
Classical Test Theory is…
Slide 10
True score = observed test score + error
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Reliability
Are test scores consistent across factors which should not influence the score (time, versions, environment, etc)? Are test items all measuring the same thing?
Validity Are you measuring what you intended to measure? Is the test score being used appropriately?
Reliability & Validity
Slide 11
We will be addressing mainly reliability
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
In general: There are many of them Usually take the form of a correlation with a range from 0 to 1. Closer to 1 is better Below .5 is unacceptable
Some measure consistency across different factors Test-Retest reliability Alternative forms reliability
Others measure internal consistency (Are you measuring a single topic well?) Split-half Reliability = Split the test in half and correlate the 2 scores Cronbach's alpha (α) = All possible split half combinations Most appropriate for Norm-referenced tests
Reliability Measures
Slide 12
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Test Analysis
Slide 13
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Do you have enough people in your sample to have confidence in the statistics? You can get by with 25-30 people for pilot testing in low-
moderate stakes assessments. Ideally you want 100 people for a solid analysis.
Do you have the expected distribution of scores for your testing program? Norm-referenced = normal distribution Criterion-referenced = negatively skewed distribution
Test Analysis: First examine your sample
Slide 14
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Several statistics will help you understand your distribution: Mean – The mathematical average of all scores. The mean can be
misleading if your distribution is skewed. Median – The middle value. You should use the median when you
have a skewed distribution. Mode – The most common value in the distribution. Skew – Tells you how evenly scores are distributed around the
mean. Negative skew – more values are higher than the mean Positive skew – more values are lower than the mean Zero skew – score are evenly distributed around the mean
Kurtosis – A measure of the “peakedness” of the distribution
Test Analysis: Score distribution
Slide 15
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Test Analysis: Histogram
Slide 16
If you don’t like all those numbers just look at the histogram!
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Statistic Value Number of Examinees 193 Mean 21 Median 22 Mode 22 Skew -0.916 Kurtosis 2.246
Test Analysis: The Numbers
Slide 17
These statistics correspond to the previous slide histogram.
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Next check your floor and ceiling to ensure the test is targeted to the population’s ability.
Is your test too hard? What was the minimum score? For a multiple choice test how many people are scoring
around 25%? Is your test too easy? What was the maximum score? How many people are scoring 95% or higher?
Test Analysis: Overall Difficulty
Slide 18
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Shows the average amount of measurement error around a test score.
Use to create confidence intervals around a test score. The true score is likely to fall within the range. SEM is best used with norm-referenced tests.
Test Analysis: Standard Error of Measurement (SEM)
Slide 19
SEM = 2 68% confident that the true score is between 12 and 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Passing score = 14
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Cronbach’s Alpha (α) Internal consistency
≥ .9 Excellent
.9 > α ≥ .8 Good
.8 > α ≥ .7 Acceptable
.7 > α ≥ .6 Questionable
.6 > α ≥ .5 Poor
< .5 Unacceptable
Cronbach’s Alpha is the most popular reliability measure.
A very high reliability may indicate you have redundant items.
Topic or sub-scores should also be reliable.
Test Analysis: Internal Consistency
Slide 20
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Have a consistent “controlled” testing environment & provide clear instructions.
Have a large group of examinees with a broad range of ability. If everyone is of equal ability they will score relatively the
same on the test and thus reliability index will be low. Use objectively scored test items (multiple choice
items). Items like an essay which requires scoring from the teacher
tends to have lower reliability due to additional error introduced by teacher judgment.
Improving Reliability
Slide 21
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Increase the test length In general more items means less error for the test (Google
“Spearman-Brown formula”). Use only “quality” items Develop items using best practices to reduce “lucky
guessing”. Delete or edit items which are too easy, too difficult, or
otherwise do not help to differentiate those with knowledge. See Developing and Validating Multiple-Choice Test-Items by
Thomas M. Haladyna http://books.google.com/books?id=kna46TApW14C&lpg=PP1&d
q=Multiple%20choice&pg=PP1#v=twopage&q&f=false
Improving Reliability
Slide 22
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Statistic Value
Number of examinees 90
Minimum achieved score 48%
Maximum achieved score 95%
Test reliability (Cronbach's Alpha) 0.796
Mean 80.72%
Median 81.25%
Mode 81.25%
Standard deviation 9.53%
Standard error of measurement 4.31%
Skew -0.886
Kurtosis 0.939
Does this look like a “good” test?
Slide 23
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Statistic Value
Number of examinees 193
Minimum achieved score 10%
Maximum achieved score 31%
Test reliability (Cronbach's Alpha) -0.704
Mean 24.39%
Median 25.29%
Mode 25.29%
Standard deviation 3.18%
Standard error of measurement 4.16%
Skew -0.916
Kurtosis 2.246
Does this look like a “bad” test?
Slide 24
This test used an item banking approach with random selection of items. Beware – only use basic test analysis reports when all examinees get exactly the same items.
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Item Analysis
Slide 25
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Stem – The actual question being asked Options – all choices Key – Correct answer Distractors – The wrong answer choices Feedback – Explanatory remediation as to why that particular answer choice
is incorrect.
Anatomy of an item
What Code section deals with the taxable gain to a corporation when a corporation distributes property to a shareholder? A. 301 Incorrect. This section deals with the character of the amount
received by a shareholder from a corporation. B. 311 Correct. C. 312 Incorrect. This section deals with the effect on earnings and profits
of a transaction. D. 334 Incorrect. This section deals with the basis of property received in
a liquidation.
Distractors
Stem
Key
Feedback
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Number of examinees: The number of examinees answering the question. Aim to get at least 30 responses however 100 is ideal
P-Value: The percentage of examinees who chose the correct answer. For norm-referenced tests you'll want a wide range of P-values For Criterion-referenced tests you'll want more around your passing score Distractor Percentage: The percentage of examinees who chose a wrong answer If zero - 5% consider replacing with a more attractive distractor All wrong answers should be common mistakes a novice would make
Item-Total Correlation OR Discrimination: A correlation between picking an option and the total score on the test. Theory is if you get the question right you should be scoring higher on the test than those who got the question wrong. Correct answer should have strong positive correlation with the total test score (if that
question is measuring the same thing as the other questions on the test). Item-total correlation is influenced by P-value so expect lower values on very hard or
very easy items. A P-value of 0 or 1 will always give you a zero correlation.
Item statistics: Definitions
Slide 27
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Items too hard or too easy?
• P-Value < 50% Very Difficult 51% - 64% Difficult 65% - 75% Good 76% - 94% Easy 95% - 100% Very Easy
Items tricky or confusing?
• Discrimination < .20 Low
.20 - .30 Moderate
> .30 Good
Item Statistics: Some Guidelines
Slide 28
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #1 - What would you do with this item?
Slide 29
Stat A B C D Total # Examinees 0 0 0 21 21 P-Value/Distractor % 0% 0% 0% 100% 100% Item-total Correlation 0.00 0.00 0.00 0.00
Correct answer
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #1 – Too Easy
Slide 30
Which of the following most accurately presents the reconciliation of Partners' Capital found in Schedule M 2 of Form 1065?
Original Revised A Beginning Capital Accounts minus
Distributions equals ending capital accounts. BOY Capital plus Guaranteed Payments plus/minus CY Net Income minus Distributions equals EOY Capital.
B Beginning Equity minus Distributions plus stock buy-backs equals ending equity.
BOY Capital plus Distributions plus/minus CY Net Income equals EOY Capital.
C Beginning Retained Earnings plus current year income minus dividends equals ending retained earnings.
BOY Capital minus Capital Contributions plus/minus CY Income equals EOY Capital.
D Beginning Capital plus current year net income plus capital contributions minus distributions equals ending capital.
BOY Capital plus current year net income plus Capital Contributions minus Distributions equals EOY Capital.
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #1 - Revised
Slide 31
Stat A B C D Total # Examinees 6 5 7 85 103 P-Value/Distractor % 6% 5% 7% 83% 100% Item-total Correlation -0.32 -0.27 -0.17 0.46
Correct answer
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example # 2 - What could be wrong with this item?
Slide 32
Stat A B C D Total # Examinees 16 1 5 1 23 P-Value/Distractor % 70% 4% 22% 4% 100% Item-total Correlation -0.04 0.00 -0.02 0.14
Correct answer
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #2 – Confusing the Learner
Slide 33
Which of the following is the best definition of a Book Tax Return?
Original Revised A A business tax return that reflects taxable
income according to financial statement rules instead of tax rules.
A return that reflects taxable income according to financial statement rules instead of tax rules.
B A business return prepared using federal tax rules in calculating taxable income.
A return prepared using federal tax rules in calculating taxable income.
C A business return prepared using tax rules to calculate taxable income but reflects a book basis balance sheet.
A return prepared using tax rules to calculate taxable income and a GAAP balance sheet.
D A business return prepared using international accounting standards.
A return prepared using international accounting standards.
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example # 2 - Revised
Slide 34
Stat A B C D Total # Examinees 60 10 34 1 105 P-Value/Distractor % 57% 10% 32% 1% 100% Item-total Correlation 0.39 -0.27 -0.24 0.00
Correct answer
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #3 - What could be wrong with this item?
Slide 35
Stat A B C D Total # Examinees 18 2 3 0 23 P-Value/Distractor % 78% 9% 13% 0% 100% Item-total Correlation 0.03 0.13 -0.14 0.00
Correct answer
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #3 – Bad Item Format
Slide 36
Dalton Enterprises, Inc sold investment assets this year. Will Form 4797 be required?
Original Revised A No, because the assets were capital assets
and reported on schedule D Because the assets were capital assets, Schedule D will be filed.
B No, because the assets qualified as "involuntary conversion" instead of a sale
If the assets were sold at a loss and used in the business Schedule D will be filed
C Yes, because all asset sales must be recorded on this form
If the assets were sold at a gain Form 4797 will be filed.
D Yes, because the sale was a tax free sale Since investment assets are tax exempt Form 4797 is filed
Dalton Enterprises, Inc sold investment assets this year. Will Form 4797 or schedule D be required?
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
Example #3 - Revised
Slide 37
Stat A B C D Total # Examinees 86 5 4 9 104 P-Value/Distractor % 83% 5% 4% 9% 100% Item-total Correlation 0.42 -0.15 -0.38 -0.19
Correct answer
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
National Council on Measurement in Education – free articles: http://ncme.org/publications/items/ Understanding Reliability: http://ncme.org/linkservid/65F3B451-
1320-5CAE-6E5A1C4257CFDA23/showMeta/0/ Standard error of measurement:
http://ncme.org/linkservid/6606715E-1320-5CAE-6E9DDC581EE47F88/showMeta/0/
Practical Assessment, Research & Evaluation: http://pareonline.net/ Writing Multiple Choice Items:
http://pareonline.net/getvn.asp?v=4&n=9 Basic Item Analysis: http://pareonline.net/getvn.asp?v=4&n=10
Additional Resources
Slide 38
2012 Users Conference New Orleans Copyright © 1995-2012 Questionmark Corporation and/or Questionmark Computing Limited, known collectively as Questionmark. All rights reserved. Questionmark
is a registered trademark of Questionmark Computing Limited. All other trademarks are acknowledged.
[email protected] 973-236-4327 You can find me on LinkedIn
Contact Information
Slide 39