Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and...

28
Scoring & Scoring & Decision Making Decision Making DeShon - 2005 DeShon - 2005

Transcript of Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and...

Page 1: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Scoring & Scoring & Decision MakingDecision Making

DeShon - 2005DeShon - 2005

Page 2: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Scoring OverviewScoring Overview

Once you have administered the test and Once you have administered the test and cleaned the data…cleaned the data…

What number is used to represent the What number is used to represent the person on the latent variable of interest?person on the latent variable of interest? What’s the right answerWhat’s the right answer

Empirical vs. Rational KeyingEmpirical vs. Rational Keying Summing responsesSumming responses

Number correct, number endorsed, number checkedNumber correct, number endorsed, number checked Weighted summing, non-unique summingWeighted summing, non-unique summing

Corrections for artifacts (lie scales)Corrections for artifacts (lie scales) Forming Composites of subscalesForming Composites of subscales

Page 3: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Correct/Incorrect MeasuresCorrect/Incorrect Measures

How do you determine correct?How do you determine correct? Rational KeyingRational Keying

Experts agree on the right answerExperts agree on the right answer Find right answers in authoritative texts on Find right answers in authoritative texts on

the topicthe topic Empirical KeyingEmpirical Keying

Compare correlation of item response Compare correlation of item response alternatives to a criterion of interestalternatives to a criterion of interest

Compare existing groups and find items that Compare existing groups and find items that discriminate between the groups - discriminate between the groups - Discriminant-groups validity modelDiscriminant-groups validity model

Page 4: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Correct/Incorrect MeasuresCorrect/Incorrect Measures

Scoring algorithms for scoring items Scoring algorithms for scoring items and constructing scales from item and constructing scales from item responses are often not disclosedresponses are often not disclosed Why?Why? Item scores matterItem scores matter Scale construct routines are largely Scale construct routines are largely

irrelevant unless you must base your irrelevant unless you must base your interpretation on existing normsinterpretation on existing norms

Page 5: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Interpretation and Decision MakingInterpretation and Decision Making

Once you have scores, how do you Once you have scores, how do you interpret test scores and use them interpret test scores and use them for decision making?for decision making? Ranking/Top-Down decision makingRanking/Top-Down decision making Banding Banding Cut scoresCut scores NormsNorms

Z-scores, T-scores, percentilesZ-scores, T-scores, percentiles

Page 6: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Interpretation and Decision MakingInterpretation and Decision Making

Top-Down/Ranking is very commonTop-Down/Ranking is very common Decisions based on relative standing in Decisions based on relative standing in

the distribution of test scoresthe distribution of test scores Higher scores mean more of the traitHigher scores mean more of the trait

Hard to demonstrate that higher Hard to demonstrate that higher scores mean higher standing on the scores mean higher standing on the latent trait if there is much error in latent trait if there is much error in the scoresthe scores

Page 7: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

BandingBanding

Set up ranges of the test scores that Set up ranges of the test scores that are distinguishable based on the are distinguishable based on the standard error of the differencestandard error of the difference

Then select candidates at random or Then select candidates at random or using some other criterion (senority) using some other criterion (senority) within the bandwithin the band Fixed bandsFixed bands Sliding bandsSliding bands

Page 8: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Criterion-Referenced MeasuresCriterion-Referenced Measures

Develop a cutoff and the meaning of scores Develop a cutoff and the meaning of scores is based on standing relative to the cut is based on standing relative to the cut scorescore Pass/failPass/fail

Usually used for knowledge and Usually used for knowledge and achievement testsachievement tests

Many methods available for computing cut Many methods available for computing cut scoresscores EbelEbel NedelskyNedelsky AngoffAngoff

Page 9: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Angoff MethodAngoff Method

Subject Matter Experts (SMEs) evaluate all items and estimate the probability that a minimally qualified person would get the item right

The average of the item scores is the The average of the item scores is the cut cut scorescore for the exam. for the exam.

A bit more complex than this….A bit more complex than this….

Page 10: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

NormsNorms

Raw scores of psychological tests usually have Raw scores of psychological tests usually have little inherent meaninglittle inherent meaning

For normative tests, meaning is derived by For normative tests, meaning is derived by comparing scores to other individuals (e.g., other comparing scores to other individuals (e.g., other members of a sample or a normative sample)members of a sample or a normative sample) Percentiles Percentiles Z scoresZ scores T scoresT scores

Representativeness of the norming sample is Representativeness of the norming sample is crucial!crucial!

Page 11: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Norms - PercentilesNorms - Percentiles

Percentile: relative position in the sample Percentile: relative position in the sample or reference groupor reference group

Percentile rank: percentage of people that Percentile rank: percentage of people that earned a raw score lower than the given earned a raw score lower than the given scorescore Percentage of Percentage of personspersons, not items, not items Example: GRE scoresExample: GRE scores

Page 12: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Norms – Z scoresNorms – Z scores

Expresses distance of score from the Expresses distance of score from the mean in SD unitsmean in SD units

Advantages of standard scoresAdvantages of standard scores Includes information about the person’s Includes information about the person’s

standing in the distribution (ie., percentile standing in the distribution (ie., percentile rank)rank)

Allows comparisons across tests that have Allows comparisons across tests that have different raw metricsdifferent raw metrics

Page 13: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Norms: Norms: T scoresT scores

T scores are linear transformations of Z T scores are linear transformations of Z scores scores T score = (Z score * 10) + 50T score = (Z score * 10) + 50 Mean = 50, SD = 10Mean = 50, SD = 10

If normal T-scores will be between 20 and If normal T-scores will be between 20 and 8080

Why? Easier for lay audiences to interpretWhy? Easier for lay audiences to interpret For Z scores, half the scores are negative.For Z scores, half the scores are negative.

Page 14: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Comparison of NormsComparison of Norms

Z score T score Percentile rank

3 80 99.9

2 70 97.5

1 60 84

0 50 50

-1 40 16

-2 30 2.5

-3 20 .1

Page 15: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI -2Example: MMPI -2

Designed for routine diagnostic assessmentsDesigned for routine diagnostic assessments Most frequently used personality test in the US for Most frequently used personality test in the US for

adults and adolescentsadults and adolescents Empirical keying approachEmpirical keying approach 567 true/false items567 true/false items

10 clinical scales plus validity scales10 clinical scales plus validity scales Original NormsOriginal Norms

724 Minnesota”normals” and 221 psychiatric 724 Minnesota”normals” and 221 psychiatric patientspatients

Revised NormsRevised Norms 2600 U.S. residents aged 18-90 (census derived)2600 U.S. residents aged 18-90 (census derived)

Page 16: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Empirical/Criterion keyingEmpirical/Criterion keying Identify a criterion group (e.g., people diagnosed with Identify a criterion group (e.g., people diagnosed with

schizophrenia)schizophrenia) Identify a comparison group (e.g., persons with no Identify a comparison group (e.g., persons with no

mental illness)mental illness) Administer many, many test items to both groupsAdminister many, many test items to both groups Identify a group of items that discriminates the two Identify a group of items that discriminates the two

groups, i.e., items endorsed more frequently by the groups, i.e., items endorsed more frequently by the criterion groupcriterion group

This group of items becomes the schizophrenia scaleThis group of items becomes the schizophrenia scale

Page 17: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Resulting scales are a “mixed bag” of Resulting scales are a “mixed bag” of items with generally undesirable items with generally undesirable measurement propertiesmeasurement properties Scales have heterogeneous item contentScales have heterogeneous item content Often multi-dimensionalOften multi-dimensional Item overlap across scalesItem overlap across scales Adds to complexity of interpretationAdds to complexity of interpretation

But still appears to have practical useBut still appears to have practical use

Page 18: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI -2Example: MMPI -2

Administered individually or in Administered individually or in groupsgroups

Administration time is approximately Administration time is approximately 1 to 1.5 hours1 to 1.5 hours

Scored by hand or computerScored by hand or computer Separate scoring keys by genderSeparate scoring keys by gender

Page 19: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Validity ScalesValidity Scales ? Scale (Cannot say)? Scale (Cannot say)

number of items left unanswerednumber of items left unanswered If 30 or more items are left unanswered the protocol is If 30 or more items are left unanswered the protocol is

invalidinvalid F scale (Infrequency)F scale (Infrequency)

66 items66 items atypical or deviant response styleatypical or deviant response style endorsed by less than 10% of the populationendorsed by less than 10% of the population general indicator of pathology or “faking bad.”general indicator of pathology or “faking bad.” Extreme elevations indicate invalid profile (100 or Extreme elevations indicate invalid profile (100 or

higher)higher) No exact cutoff for suspecting an invalid profileNo exact cutoff for suspecting an invalid profile

Page 20: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Example Items for F-scaleExample Items for F-scale My father is a good man. (F)My father is a good man. (F) My teachers have it in for me. (T)My teachers have it in for me. (T) I am troubled by attacks of nausea and vomiting. (T)I am troubled by attacks of nausea and vomiting. (T) Evil spirits posses me at times. (T)Evil spirits posses me at times. (T) My parents do not really love me. (T)My parents do not really love me. (T) I am liked by most people who know me. (F)I am liked by most people who know me. (F) There is something wrong with my mind. (T)There is something wrong with my mind. (T) I think school is a waste of time. (T)I think school is a waste of time. (T) I get anxious and upset when I have to make a short trip I get anxious and upset when I have to make a short trip

away from home. (T)away from home. (T) I have gotten many beatings. (T)I have gotten many beatings. (T)

Page 21: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Validity ScalesValidity Scales Lie (L) Scale (15 items)Lie (L) Scale (15 items)

extent to which client is “faking good” or extent to which client is “faking good” or describing self in an overly positive mannerdescribing self in an overly positive manner

Uneducated, lower SES will score higherUneducated, lower SES will score higher Average number of endorsed items is 3Average number of endorsed items is 3 T Scores of 65 or above are suspect and T Scores of 65 or above are suspect and

indicate profile should not be interpretedindicate profile should not be interpreted High scores may lead to lower scores on High scores may lead to lower scores on

clinical scalesclinical scales

Page 22: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Example Items for L scaleExample Items for L scale Once in a while I think about things too bad to Once in a while I think about things too bad to

talk about. talk about. At times I feel like swearing. At times I feel like swearing. I do not always tell the truth. I do not always tell the truth. I do not read every editorial in the newspaper I do not read every editorial in the newspaper

every day. every day. Once in a while I put off tomorrow what I ought Once in a while I put off tomorrow what I ought

to do today.to do today. My table manners are not quite as good at My table manners are not quite as good at

home as when I am out in company.home as when I am out in company.

Page 23: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Validity ScalesValidity Scales K scale (30 Items)K scale (30 Items)

More subtle and sophisticated index of “faking More subtle and sophisticated index of “faking good” or “faking bad”good” or “faking bad”

T scores above 65 or 70 are higher than T scores above 65 or 70 are higher than expectedexpected

Higher scores indicative of ego defensiveness Higher scores indicative of ego defensiveness and guardednessand guardedness

K correction is added to five of the clinical K correction is added to five of the clinical scalesscales

And many more…And many more…

Page 24: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Example Items for the K scaleExample Items for the K scale At times I feel like smashing things. (F)At times I feel like smashing things. (F) I think a great many people many exaggerate their I think a great many people many exaggerate their

misfortunes in order to gain sympathy and help of misfortunes in order to gain sympathy and help of others. (F)others. (F)

It takes a lot of argument to convince most people of the It takes a lot of argument to convince most people of the truth. (F)truth. (F)

I have very few quarrels with members of my family. (T)I have very few quarrels with members of my family. (T) Most people will use somewhat unfair means to get what Most people will use somewhat unfair means to get what

they want. (F)they want. (F) At times my thoughts have raced ahead faster than I At times my thoughts have raced ahead faster than I

could speak them. (F)could speak them. (F) I get mad easily then get over it soon. (F)I get mad easily then get over it soon. (F)

Page 25: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

InterpretationInterpretation Yields individual’s clinical profile compared with the Yields individual’s clinical profile compared with the

normative samplenormative sample Interpretation is configural in nature and not dependent on Interpretation is configural in nature and not dependent on

any one scaleany one scale T-score of 65 or higher is considered a clinically significant elevation for T-score of 65 or higher is considered a clinically significant elevation for

all clinical scalesall clinical scales Clinical scales do not measure the low end; don’t interpret low scores Clinical scales do not measure the low end; don’t interpret low scores

except for Mf & Siexcept for Mf & Si Interpreted by qualified professionalsInterpreted by qualified professionals Welsh CodingWelsh Coding

Record the 10 numbers of the clinical scales in order of T Record the 10 numbers of the clinical scales in order of T scores, from the highest on the left to the lowest on the rightscores, from the highest on the left to the lowest on the right

When adjacent scores are within one T score point, they are When adjacent scores are within one T score point, they are underlined. When they have the same T score they are placed underlined. When they have the same T score they are placed in the ordinal sequence found on the profile sheet and in the ordinal sequence found on the profile sheet and underlinedunderlined

Page 26: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

Page 27: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

30

45

60

75

90

105

120

VR

INF L K 1 H

s2 D

3 Hy

4 Pd5 M

f6 Pa7 Pt8 Sc

9 Ma

0 Si

Page 28: Scoring & Decision Making DeShon - 2005. Scoring Overview Once you have administered the test and cleaned the data… Once you have administered the test.

Example: MMPI-2Example: MMPI-2

30

45

60

75

90

105

120

VRINF L K 1 Hs

2 D 3 Hy4 Pd

5 Mf

6 Pa7 Pt

8 Sc9 M

a0 Si