Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

81
Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Transcript of Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Page 1: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Assessing Personality 75 Years After Likert:Thurstone Was Right!

(And some implications for I/O)

Page 2: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Colleagues

Sasha Chernyshenko Steve Stark

Page 3: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone In a series of papers in the late 1920s,

Thurstone asserted “Attitudes Can Be Measured” and provided several methods for their measurement

He assumed that a conscientious person would endorse a statement that reflected his/her attitude…but

“as a result of imperfections, obscurities, or irrelevancies in the statement, and inaccuracy or carelessness of the subjects” not everyone will endorse a statement, even when it matches their attitude

Page 4: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone, Psych Review, 1929

For N1 people with attitude S1, all should endorse a statement with scale value S1 if they were conscientious and the item was perfect; but only n1 actually endorse the item

These people will endorse another statement with scale value S2 with a probability p that is a function of |S1-S2|

Figure from Thurstone’s paper:

Page 5: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone 1929

Page 6: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone 1928 Attitudes Can Be Measured

Gave an example of an attitude variable, militarism-pacifism, with six statements representing a range of attitudes:

Page 7: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone 1928

Page 8: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone 1928

A pacifist “would be willing to indorse all or most of the opinions in the range d to e and … he would reject as too extremely pacifistic most of the opinions to the left of d, and would also reject the whole range of militaristic opinions.”

“His attitude would then be indicated by the average or mean of the range that he indorses”

Page 9: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Implications

On Thurstone’s pacificism-militarism scale, three people might endorse two items each: Person 1 endorses f and d, and is very

pacifistic Person 2 endorses e and b, and is neutral Person 3 endorses c and a, and is very

militaristic Thus, it is crucial to know which items are

endorsed!

Page 10: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Likert 1932

Proposed a much simpler approach: A five-point response scale with options “Strongly Approve”, “Approve”, “Neutral”, “Disapprove”, and “Strongly Disapprove”.

The numerical values 1 to 5 were assigned to the different response options

And an individual’s score was the sum or mean of the numerical scores

Page 11: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Likert 1932

Likert evaluated his scales bySplit-half reliabilityItem-total correlations

To make this work, he hit upon the idea of reverse scoring, e.g., statements like d and f from Thurstone needed to be scored in the opposite direction of statements like a and c.

Page 12: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Likert 1932

When computing item-total correlations, “if a zero or very low correlation coefficient is obtained, it indicates that the statement fails to measure that which the rest of the statements measure.” (p. 48)

“Thus item analysis reveals the satisfactoriness of any statement so far as its inclusion in a given attitude scale is concerned”

Page 13: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Likert 1932

Likert discarded intermediate statements like “Compulsory military training in all countries should be reduced but not eliminated”

Such a statement is “double-barreled and of little value because it does not differentiate persons in terms of their attitudes” (p. 34)

Page 14: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Likert Scaling

Although Likert didn’t articulate a psychometric model for his procedure, his analysis implies what Coombs (1964) called a dominance response process.

Specifically, someone high on the trait or attitude measured by a scale is likely to “Strongly Agree” with a positively worded item and “Strongly Disagree” with a negatively worded item

Page 15: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Person endorses item if her standing on the latent trait, theta, is more extreme than that of the item.

0.00.10.20.30.40.50.60.70.80.91.0

-3 -2 -1 0 1 2 3

Theta

Pro

b o

f P

osi

tive

Re

spo

nse

Item Person

Example of a Dominance Process

Page 16: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Thurstone Scaling

Thurstone assumed people endorse items reflecting attitudes close to their own feelings

Coombs (1964) called this an ideal point process

Sometimes called an unfolding model

Page 17: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Person endorses item if his standing on the latent trait is near that of the item.“I enjoy chatting quietly with a friend at a cafe.” Disagree either because:

Too introverted (uncomfortable in public places)Too extraverted (chatting over coffee is boring)

Example of an Ideal Point Process

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0Theta

Item

TooIntroverted

TooExtraverted

Page 18: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Important Point:

The item-total correlation of intermediate ideal point items will be close to zero!

Page 19: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Which Process is Appropriate for Temperament Assessment?

In a series of studies, we’veExamined appropriateness of dominance process

by fitting models of increasing complexity to data

from two personality inventories

Compared fits of dominance and ideal point

models of similar complexity to 16PF data

Compared fits of dominance and ideal point

models to sets of items not preselected to fit

dominance models

Page 20: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Fitting Traditional Dominance Models to Personality Data

Data 16PF 5th Edition

• 13,059 examinees completed 16 noncognitive scales Goldberg’s Big Five factor markers

• 1,594 examinees completed 5 noncognitive scales

Models examined Parametric – 2PLM, 3PLM

Nonparametric – Levine’s Maximum Likelihood Formula

Scoring (MFSM)

Page 21: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Three-Parameter Logistic Model

Page 22: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Three-Parameter Logistic Model

Page 23: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Three-Parameter Logistic Model

Page 24: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Three-Parameter Logistic Model

Page 25: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Three-Parameter Logistic Model

Page 26: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Three-Parameter Logistic Model

Page 27: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Two-Parameter Logistic Model

Page 28: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Methods for Assessing Fit: Fit Plots

0.0

0.2

0.4

0.6

0.8

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Theta

Pro

b.

of

Po

siti

ve R

esp

on

se

IRF

EMP

Page 29: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Chi-squares typically computed for single items

Methods for Assessing Fit: Chi-Squares

Very important to examine item pairs and triplets

May indicate violations of local independence or misspecified model

s

1k i

2ii2

i kE

kEkO *i iE k N P u k f d

( , ') ( ) ( ') ( )ij i jE k k N P u k P u k f d

Page 30: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Methods for Assessing Fit: Chi-Squares

To aid interpretation of chi-squares: Adjust to sample size of 3,000

Compare groups of different size

The expected value of a non-central chi-square is equal to its df plus N times the noncentrality parameter

where N is the sample size. So an estimate of the noncentrality parameter is

NdfE )( 2

./)(ˆ 2 Ndf

Page 31: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Adjusted Chi-square

To adjust to a sample size of, say, 250, use

For IRT, we usually adjust to N = 3000, and divide by the df to get an adjusted chi-square/df ratio

Less than 2 is great, less than 3 is OK

2 2250( ) /Adjusted df df N

Page 32: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Adjusted Chi-square/df for an Ability Test

AdjChf < 3

<1 1<2 2<3 3<4 4<5 5<7 >7 Mean SDSinglets 11 3 2 1 0 2 1 1.877 2.923Doublets 77 44 31 16 12 9 1 1.829 1.734Triplets 327 424 264 92 14 16 3 1.71 1.092

FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS

Page 33: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Results for 16 PF Sensitivity Scale: Mean Chi-sq/df Ratios

Model Singles Doubles Triples

2PL 0.98 4.05 5.45

3PL 0.87 3.89 5.23

SGR 0.99 7.76 7.12

MFS-dich 2.91 2.61 2.42

MFS-poly 1.55 2.68 2.58

Page 34: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

What if Items Assessed Trait Values Along the Whole Continuum?

Items on existing personality scales have been pre-screened on item-total correlation

We speculate that items measuring intermediate trait values are systematically deleted

So, what happens if a scale includes some intermediate items?

Page 35: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Well-being Scale

Tailored Adaptive Personality Assessment System

Assesses up to 22 facets of the Big Five

Well-being is a facet of emotional stability

We wrote items reflecting low, moderate, and high well-being

Page 36: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

For example, TAPAS Well-Being Scale

WELL04, “I don’t have as many happy moments in my life as others have

WELL17, “My life has had about an equal share of ups and downs

WELL41, “Most days I feel extremely good about myself

In total, 20 items. 5 negative items, 9 positive, and 6 neutral

Page 37: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Item_NameInitial SME Location Reverse Mean SD

Factor Loading

CITC (alpha=.76)

1 WELL02 negative r 2.14 0.80 -0.40 0.352 WELL04 negative r 2.08 0.87 -0.45 0.403 WELL06 negative r 2.23 0.78 -0.55 0.454 WELL09 negative r 2.22 0.76 -0.53 0.425 WELL13 negative r 2.20 0.77 -0.54 0.45

6 WELL16 neutral 2.48 0.85 0.08 0.08

7 WELL17 neutral 2.82 0.73 0.13 0.15

8 WELL19 neutral r 2.85 0.65 -0.09 -0.05

9 WELL20 neutral 3.00 0.89 0.04 0.0610 WELL23 neutral 3.03 0.64 0.07 0.1111 WELL26 neutral r 2.80 0.78 -0.14 0.0612 WELL29 positive 2.89 0.74 0.36 0.4813 WELL30 positive 2.77 0.74 0.56 0.4214 WELL34 positive 3.13 0.70 0.46 0.3515 WELL38 positive 2.80 0.82 0.57 0.4916 WELL40 positive 2.53 0.75 0.56 0.4817 WELL41 positive 2.96 0.73 0.56 0.5018 WELL43 positive 3.13 0.66 0.63 0.5519 WELL45 positive 2.82 0.70 0.53 0.4620 WELL46 positive 2.89 0.72 0.47 0.41

Traditional Analysis Results

Page 38: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Fit Plot for 2PL WELL17

Fit Plot for Item 7

0.0

0.2

0.4

0.6

0.8

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Theta

Pro

b.

of

Po

siti

ve R

esp

on

se

IRF7

EMP7

Page 39: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

An Ideal Point Model: The Generalized Graded Unfolding Model (GGUM)

Roberts, Donoghue, & Laughlin (2000). Applied

Psychological Measurement.

The model assumes that the probability of

endorsement is higher the closer the item to the

person

GGUM software provides maximum likelihood

estimates of item parameters

Page 40: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

GGUM

The probability of disagree is:

and the probability of agree is

1 1

1 1

exp{ [( ) ]} exp{ [2( ) ]}( 1| )

1 exp{3 ( )} exp{ [( ) ]} exp{ [2( ) ]}i j i i i j i i

i ji j i i j i i i j i i

P Z

1 1

1 exp{3 ( )}( 0 | )

1 exp{3 ( )} exp{ [( ) ]} exp{ [2( ) ]}i j i

i ji j i i j i i i j i i

P Z

Page 41: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

GGUM Estimated IRF for Moderate Item

GGUM ORF for Option 2

0.0

0.2

0.4

0.6

0.8

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Well-Being

Pro

b. o

f P

osi

tive

Res

po

nse

IRF for Agree response to TAPAS Well-being item “My life has had about an equal share of ups and downs.”

Page 42: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Well-being Scale

<1 1<2 2<3 3<4 4<5 5<7 >7 Mean SDSinglets 20 0 0 0 0 0 0 0 0Doublets 22 0 0 0 0 0 2 0.997 3.256Triplets 9 0 0 1 1 1 0 1.081 2.001

FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS

<1 1<2 2<3 3<4 4<5 5<7 >7 Mean SDSinglets 20 0 0 0 0 0 0 0 0Doublets 17 1 0 0 1 2 3 2.955 6.439Triplets 5 0 1 0 0 1 5 5.408 6.512

FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS

2PL Results:

GGUM Results:

Page 43: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Summary of Findings

2PLM and 3PLM fit scales developed by traditional methods OK, but if moderate items are included Chi-square doublets and triplets can be large, especially when

moderate items are included Discrimination parameter estimates are uniformly small for

moderate items (and item-total correlations are near zero).

GGUM fits all items, including moderate items Adj. chi-square to df ratios are small for doubles and triples GGUM discrimination parameter estimates are large for the

moderate items!

Page 44: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

So, for Well-Being

Fitting a dominance item response theory model (the 2-parameter logistic) produced an adjusted Chi-Square to df ratio of 2.955 for pairs

The ideal point model yielded an adjusted Chi-square/df ratio of 0.997 for pairs

Page 45: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Conclusion

Ideal point model seems more appropriate for temperament assessment

BUT there’s a “Fly in the ointment” for I/O Correct specification of response process

does not guarantee more accurate assessment, because …

Traditional items are easily FAKED

Page 46: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Examples of “Traditional” Itemsthat are Easily Faked

I get along well with others. (A+) I try to be the best at everything I do. (C+) I insult people. (A-) My peers call me “absent minded.” (C-)

Because these items consist of individual statements, they

are commonly referred to as “single stimulus” items.

In each case, the positively keyed response is obvious.

Page 47: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Army Assessment of Individual Motivation (AIM) Uses tetrads:

• I get along well with others. (A+)• I set very high standards for myself. (C+)• I worry a lot. (ES-) • I like to sit on the couch and eat potato

chips. (Physical condition-)

Respondent picks the statement that is Most Like Me and the statement that is Least Like Me

Army AIM has shown less score inflation What psychometric model would describe

this type of data????

Page 48: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

So…

US Army researchers Len White and Mark Young (and others) found some fake resistance and criterion-related validity for the tetrad format

But modeling four-dimensional items was too hard for me!

How about two-dimensional items?

Page 49: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Multidimensional Pairwise Preference (MDPP) Format

Create items by pairing stimuli that are similar in

desirability, but representing different dimensions

“Which is more like you?”

• I get along well with others. (A+)

• I always get my work done on time. (C+)

This led to my work on personality assessment

over the past 10 years

And the result is:

Page 50: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Tailored Adaptive Personality Assessment

System (TAPAS) TAPAS is designed to overcome existing limitations of

personality assessment for selection by incorporating

recent advancements in: Temperament/personality assessment

Item response theory (IRT)

Computerized adaptive testing (CAT)

Our goal is for TAPAS to be innovative in both how we

assess (IRT, CAT) and what we assess (facets of

personality)

Page 51: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Vision

Fully customizable assessment to fit array of users’ needs

Users can select any dimension from a comprehensive superset; a scale length to suit their needs a response format (depends on faking worries) adaptive or static

Resulting scores can be used to predict multiple criteria or as source of feedback

Page 52: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Facet Dimensions

Based on factor analysis of each of the Big Five dimensions E.g., Roberts, B., Chernyshenko, O.S., Stark, S., & Goldberg,

L. (2005). The structure of conscientiousness. Personnel

Psychology

Analyzed 7 major personality inventories Currently 21 facets + additional “physical

condition” facet for military jobs

Page 53: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Facet Dimensions Conscientiousness

Six facet hierarchical structure:Industriousness: task- and goal-directed Order: planful and organized

Self-control: delays gratification

Traditionalism: follows norms and rules

Social Responsibility: dependable and reliable

Virtue: ethical, honest, and moral

Page 54: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Factor Analysis Results

Industriousness

OrderSelf-

controlResponsibi

lityTraditional

ismVirtue

neo competence .88 -.28 .14 .10 -.01 -.09neo achievement striving .76 .02 -.12 .10 .09 -.18ab5c organization .75 .11 .05 .11 -.10 -.17ab5c purposefulness .67 .18 -.04 -.02 -.11 .24neo self-discipline .65 .22 -.11 -.03 -.02 .16ab5c efficiency .63 .36 -.19 -.03 -.07 .21ab5c rationality .50 .16 .12 -.28 .16 -.01neo dutifulness .49 -.05 .14 -.02 .26 .09

FactorScale Name

For each facet, we have an empirical mapping of existing scales to our facets Provide basis for existing scale classification Validity of each facet can be investigated via meta-analysis

Page 55: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Military Meta-Analysis

42 studies or technical reports 1988-2006

Small number of police and fire-fighter studies were also

included

22 TAPAS facets

8 criteria (e.g., task proficiency, contextual

performance, leadership, attrition, fitness)

1494 empirical correlations

Page 56: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Military Meta-Analysis

Job/Task Performance 38964 14 36 .05 .06Contextual Performance 19423 9 18 .21 .26Counterproductivity 17673 8 17 -.14 -.18Attrition 17912 5 8 -.09 -.10Leadership 9429 12 20 .15 .18Training Performance 6156 8 27 .14 .17Adaptability 1291 3 4 .17 .21Physical Fitness 18044 5 17 .18 .23

kcObserved Validity

Corrected Validity

Criterion N kd

Industriousness Results

Validity tables can be used to guide the choice of facets!

Page 57: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

TAPAS Civiliam Meta-Analysis

Studies or technical reports in the period

1988-2006

Same 8 criterion categories and 22 TAPAS

facets

4755 validity coefficients (so, in total, we

have over 6,000 validities in our database)

Page 58: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

“How” TAPAS Measures

Our research on the item response process for personality stimuli (Stark et al., 2006; Chernyshenko et al., 2007)

suggests that Response endorsement is driven by the similarity between

the person and the behavior described by the stimulus (aka, an ideal point process)

Implications: Different models (not the 3PL or SGR) should be used for

item administration and scoring: e.g., GGUM Multiple stimuli per item are possible (i.e., pairs)

Page 59: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

“How” TAPAS Measures

The choice of 4 response formats will be available

Single statement dichotomous (Agree/Disagree) Single statement polytomous (SA,A,D,SD) Unidimensional pairwise preference (i.e., two-

alternative forced choice) Multidimensional pairwise preference (Stark,

2002)

• Used when faking is likely

Page 60: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Single Statement Scales

Generalized Graded Unfolding Model (GGUM; Roberts et al., 1998)

Reverse scoring is not needed Basic idea: a person endorses an item if it

accurately describes him/her Thus, the probability of endorsement is higher

the closer the item to the person

Page 61: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

GGUM IRFs for twoPersonality Statements

"I enjoy chatting quietly with a friend at a café."(Sociability)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Theta

P(T

heta

)

"I am about as organized as most people."(Order)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Theta

P(th

eta)

Page 62: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Multidimensional Pairwise Preference (MDPP) Format

Create items by pairing stimuli that are similar in

desirability, but representing different dimensions

“Which is more like you?”

• I get along well with others. (A+)

• I set very high standards for myself. (C+)

Page 63: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

MDPP Roots: Assessment of Individual Motivation (AIM)

AIM utilizes forced-choice tetrad format to reduce social desirability effects Greater resistance to faking than ABLE (a single

statement personality inventory developed by the Army researchers)

Low correlations (.00 to .25) with examinee race and gender and measures of cognitive ability

Predicts attrition and various job and training performance criteria in research and operational testing

Page 64: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

MDPP Roots: Assessment of Individual Motivation (AIM)

But, due to quasi-ipsative scoring AIM items are difficult to create and Score accuracy cannot be checked against

known scores, because no formal psychometric model for stimulus endorsement is available

CAT is not possible without a psychometric model

Page 65: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Respondent evaluates each stimulus (personality statement) separately

and makes independent decisions about endorsement.

Stimuli may be on different dimensions.

Single stimulus response probabilities P{0} and P{1} computed using a

unidimensional ideal point model for “traditional” items (GGUM)

IRT Model for Scoring Multidimensional Pairwise Preference Items

(Stark, 2002; Stark, Chernyshenko, & Drasgow, 2005)

}1{}0{}0{}1{

}0{}1{

}1,0{}0,1{

}0,1{),()(

tsts

ts

stst

stddts PPPP

PP

PP

PP

tsi

1 = Agree0 = Disagree

Refer to new pairwise preference model as MDPP

Page 66: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

MDPP IRF for Item Measuring Sociability and Order

Page 67: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

MDPP Model Performance

Stark & Drasgow (2002)

.77 correlation between estimated and known

scores in 2-D tests, 20 pairs, 10% unidimensional

Stark & Chernyshenko

.88 for 5-D tests, 50 items, 5% unidimensional

All possible pairings of dimensions was not

required for good parameter recovery

Page 68: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

CAT vs. Nonadaptive

* CAT yielded similar correlations with only half as many items.* 10-d CAT correlations > .9 with 100 items (only 5 unidim!).

% Unidim.

Items Per Construct

3-d 5-d 7-d 10-d 3-d 5-d 7-d 10-d

5 .73 .72 .76 .76 .87 .85 .86 .8710 .85 .87 .87 .86 .93 .93 .93 .9320 .93 .93 .93 .94 .96 .96 .96 .965 .73 .74 .75 .75 .87 .87 .85 .88

10 .85 .85 .86 .87 .92 .93 .93 .9320 .93 .93 .94 .94 .96 .96 .96 .965 .74 .74 .74 .75 .87 .84 .86 .87

10 .85 .85 .87 .86 .92 .90 .93 .9320 .92 .93 .93 .94 .96 .96 .96 .96

20

Average Correlation Across DimensionsNonadaptive Adaptive

5

10

Page 69: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Summary of MDPP Model Studies

MDPP items are attractive for applied use: Faking is more difficult Can create huge pool with relatively few statements

representing each dimension (20 stimuli = 190 items)

5% unidimensional pairings sufficient for accurate score recovery

As with SS models, MDPP CAT can reduce test length by about 50% while maintaining accuracy, which is important if many dimensions assessed.

Page 70: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Current Empirical TAPAS Studies

Comparing MDPP format to single statement

(SS) format

Testing what makes forced-choice items

resistant to faking# of dimensions?

Matching on social desirability?

Matching on statement locations?

Page 71: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Study 1: Benchmark Study

4-D MDPP measure (41 pairs) designed using “conventional

wisdom”

Match stimuli on social desirability (average difference

between SocD did not exceed 1.08 on 5-point scale)

Match stimuli to have different locations on respective

dimensions (average distance 4.3 units on Z-score metric)

4-D SS measure (40 items)

Both measures administered under faking and honest conditions (N

= 510 and N = 574)

2-D SS measure (20 items) – all honest (n=1084)

Page 72: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Very Strong Faking Instructions! Unlike in the previous sections where the instructions

asked you to be as honest and accurate as possible, we now ask that you PRETEND you are not yet in the Army, but very much want to be. Imagine a recruiter asks you to take this questionnaire to determine if you are GOOD ARMY MATERIAL. If you score well, you will be let into the Army. If you don’t score well, you will not.

For the remaining sections, you are to answer the test questions by describing yourself in a way that will make you look like “good Army material” so you are sure to pass the test and get into the Army. Remember you are not yet in the Army, but very much want to be. In other words, create the best possible impression of yourself and convince the Army that you will make a good Soldier.

Page 73: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Study 1: Benchmark Study

Comparability of formats under Honest Conditions dom_MDPP enr_MDPP ord_MDPP trad_MDPPdom_MDPP 1.00 0.27 0.12 0.11enr_MDPP 0.27 1.00 0.09 0.12ord_MDPP 0.12 0.09 1.00 0.33trad_MDPP 0.11 0.12 0.33 1.00dom_GGUM 0.59 0.22 0.02 0.08enr_GGUM 0.21 0.49 0.06 0.13ord_GGUM 0.20 0.15 0.49 0.34trad_GGUM 0.05 0.10 0.21 0.54ord_GOLD 0.21 0.13 0.50 0.35trad_GOLD 0.06 0.10 0.24 0.50

Page 74: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Study 1: Benchmark Study

MDPP scales created using conventional wisdom are as fakable as SS scales in strong faking conditions

In faking conditions, respondents chose items with “more positive” location (i.e., > 20% endorsement shift across conditions)

Honest Faking Difference Effect Sizedom_MDPP 0.10 0.32 0.21 0.32enr_MDPP 0.17 0.95 0.78 0.97ord_MDPP -0.07 0.32 0.39 0.70trad_MDPP 0.48 1.56 1.08 1.06dom_GGUM 0.13 0.44 0.31 0.41enr_GGUM 0.25 0.65 0.41 0.59ord_GGUM -0.19 0.36 0.54 0.71trad_GGUM 0.65 1.25 0.60 0.77TRAD_GOLD 31.43 31.31 -0.12 -0.03ORD_GOLD 29.96 29.69 -0.26 -0.05

Page 75: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Study 2: Location Matching

11-D MDPP static measure with 117 items

Match stimuli on similarity in locations (average distance

2.09 z-score units)

11-D SS measure (7 items each)

Both measures administered under faking and honest

conditions (N = 286 and N = 358)

Again, very strong faking instructions

Page 76: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Study 2: Location Matching

Unlike benchmark study, only 20 out of 117 items showed inflated percent endorsement shiftsNote that we matched only on locations, not

Soc.DScored 97 pair 11-D MDPP measure

Similar correlations across formats as in benchmark study

But, less score inflation

Page 77: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Study 2: Location Matching

MDPP Scores Honest (N= 358) Faking (N=276) Difference Effect SizeORD_MDPP97 -0.08 0.10 0.18 0.38SOC_MDPP97 0.13 0.06 -0.07 -0.12TRAD_MDPP97 -0.24 -0.01 0.23 0.30ENR_MDPP97 -0.77 -0.57 0.20 0.28DOM_MDPP97 -0.29 -0.33 -0.04 -0.06IND_MDPP97 -0.72 -0.43 0.29 0.49INTE_MDPP97 -0.17 -0.01 0.15 0.26TRUST_MDPP97 -0.24 -0.18 0.07 0.07CURI_MDPP97 0.01 0.13 0.12 0.20WELL_MDPP97 -0.38 -0.26 0.12 0.20PHYC_MDPP97 -0.54 -0.28 0.25 0.42 Compare to: SS scales in benchmarking study had .41 SD inflation for DOM,

and .79 SD inflation for TRAD

Page 78: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Conclusions

MDPP model (Stark, 2002) can be used effectively to score real MDPP response patterns MDPP scores agree with SS scores under honest conditions

Fake resistance of forced-choice format should not be taken for granted E.g., must match on item locations, not just Soc.D

Our MDPP CAT algorithm has constraints on location difference and Soc.D difference Adaptive testing format may further decrease fakability (e.g.,

NCAPS results with UPP scales) But, there is lots of R&D work to be done…

Page 79: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Current Work

TAPAS is being implemented by the US Army for enlistment screening June 8 for applicants without high school diplomas

Will it predict their attrition and counter-productive behaviors?

Page 80: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

Current Work

We have about 50 statements for each of the 13 dimensions that are being used by the US Army

Are some statements overused? We don’t have a exposure control algorithm

In principle, each of the approximately 650 statements could be paired with any of the other 649…but there are lots of constraints on item selection…

Page 81: Assessing Personality 75 Years After Likert: Thurstone Was Right! (And some implications for I/O)

In Sum,

TAPAS designed to bring the latest inPsychometric theoryComputer technologyPersonality theory

Our goal is to produce an easily customizable assessment tool to meet the needs of diverse users and researchers