appliedpersonnelresearch.comappliedpersonnelresearch.com/papers/Wiesen_Master... · Web viewThis...

1MASTER TUTORIALTITLE Tools to Increase Diversity, Utility, and Validity in Hiring Police Officers

SHORTENED TITLETools to Increase Diversity, Utility, and Validity in Hiring Police Officers

ABSTRACTMany police managers are stymied in their attempts to hire black police officers due to the pervasive adverse impact that traditional employment tests have on black candidates. This tutorial presents 15 tools (most novel or little-used) to help police departments (PDs) hire ethnically diverse academy classes while maintaining and even enhancing expected job performance.

PRESS PARAGRAPH Many police managers are stymied in their attempts to hire black police officers due to the pervasive adverse impact that traditional employment tests have on black candidates. This tutorial presents 15 tools (most novel or little-used) to help police departments hire ethnically diverse academy classes while maintaining and even enhancing expected job performance. Real life examples are provided to illustrate the use of these tools. The tools are described in enough detail to allow other consultants to use them.

WORD COUNT 2575

LEARNING OBJECTIVES1. Describe the conditions under which a relatively low validity test (e.g., r=.15) is expected

to have higher utility than a higher validity test (e.g., r=.25).2. Describe at least 2 test areas and 2 test modes that generally have reverse impact on

minorities.3. Describe the pros and cons of using tests of g on a pass-fail (P/F) basis.4. Explain why ranking even in part on a traditional test of g generally results in adverse

impact on minority candidates.

INTENDED AUDIENCEThis is intended for an audience schooled and experienced in employee selection testing, but all doctorate-level I/O psychologists should be able to understand much of the material.

BIO FOR JOEL P. WIESENDr. Wiesen serves as Director of his own firm, Applied Personnel Research. He specializes in employee selection testing. In 1975, he was hired by Massachusetts to validate its civil service examinations. Since 1993, he has consulted on employee testing and developed written tests for government and business. He is approved by the American Psychological Association to sponsor continuing education for psychologists. He has served as an expert on testing matters for two state Attorneys General, the US Department of Justice, other governmental entities, private law

firms, and other organizations. He received a doctorate in psychology from Lehigh University and is licensed as a psychologist in three states.

Introduction

Many police managers are stymied in their attempts to hire black police officers due to

the pervasive adverse impact that traditional employment tests have on black candidates. This

tutorial presents 21 tools (most novel or little-used) to help police departments (PDs) hire

ethnically diverse academy classes while maintaining and even enhancing expected job

performance. This tutorial focuses mainly on the hiring of black police applicants.

The Main Cause for Adverse Impact on Minority Applicants

The main reason for the frequent failure to hire a diverse workforce is the adverse impact

on minority applicants due to non-zero weighting of traditional multiple choice (M/C) tests.

Whatever the causes of the mean black-white (B-W) and Hispanic-white differences in mean test

scores, the result is clear: ranking based on M/C tests of g almost always results in adverse

impact on black and Hispanic candidate groups.

Brief Description of Tools

The tools presented fall into five general categories:

� Administrative

� Use a traditional M/C test in a P/F (P/F) fashion

� Provide substitutes for the P/F M/C test

� Replace the traditional M/C test

� Rank based on job-related abilities with little, no, or reverse impact

#1: Empower the PD to Evaluate Proposed Selection Systems

PDs should ask their testing consultants to provide projections of the outcomes of each

proposed selection system, based on plausible assumptions of the numbers of minority and white

applicants and the number of openings to be filled. Five projections should be provided: (1)

expected number of minority candidates who will be eligible to be hired (within the “top group”

or the “appointable range”), (2) expected cost of testing, (3) expected level of job performance,

(4) expected level of adverse impact, and (5) expected level of validity. Decisions about possible

tradeoffs between levels of adverse impact and job performance may best be made by the hiring

officials rather than testing experts.

#2: Shorten the Application Period

Simply allowing fewer applicants will typically increase the proportion of minority

applicants hired. With large applicant groups and small proportions of applicants selected,

typically there will be severe adverse impact (e.g., Sackett & Ellingson, 1997, Table 1. and pg.

711, par. 2). This approach does not affect validity.

#3: Use Residency Points to Increase Hiring of Residents of the Jurisdiction

Residency points will be helpful where the jurisdiction is largely minority and the

suburbs are largely white. (Such preference is not uncommon for non-racial reasons.) Often the

suburban school systems are far better funded, so having students compete on academic

achievement may be seen as unfair.

#4: Traditional Multiple-Choice Test Used P/F

Use a traditional M/C, entry-level test on a P/F basis, with a realistic, job-related passing

point. Even weighting the traditional M/C test at much less than 50% is likely to result in severe

adverse impact in hiring (Sackett & Ellingson, 1997, Table 1, pg. 710). This is a major reason

why many PDs that have moved away from total reliance on the traditional M/C test still have

difficulty hiring a diverse workforce (e.g., Diversity on the Force, 2015). By setting a suitable

passing point on a traditional M/C test and ranking on test components that have little or no

adverse impact, we can expect the new hires to have sufficient intelligence and have the requisite

personal and interpersonal skills and other abilities to do the work, yet with less adverse impact.

This tool is contrary to some apparently obvious implications of meta-analysis

research yet there are many strong arguments for using P/F g, if at all. First, some PDs require a

college degree, vitiating the validity of g among the applicants. Second, the validity of g for

police may be lower than for other jobs (r=.24, Aamodt, 2004). Third, the meta-analytic findings

in favor of g are tempered by other research. For example, the correlation between leadership

and intelligence is dramatically higher for observational than for paper and pencil (M/C or short

answer) measures of intelligence, .60 vs .19 (Judge, Colbert & Ilies, 2004, Table 2). Fourth, the

B-W mean difference in job performance on many jobs is only .5 sd, while the difference in test

performance often is 1 s.d. The regression formula: y = r*x (where y is job performance, x is test

performance, would only explain the .5 versus 1 sd difference in job and test performance if

employers hired randomly or from the whole range of test performance, which is not the case.

This suggests that the job criteria may be biased. Research suggests such bias. Studies of salaries

show: tall people are paid more than short (Judge & Cable, 2004), men are paid more than

women (Hegewisch, Williams & Henderson, 2011), and physically attractive people of

both genders are paid more than unattractive (Marlowe, Schneider & Nelson,

1996). Further, claims of lack of differential validity have been challenged (e.g., Aguinis,

Culpepper & Pierce, 2010 and 2016).

#5: Allow Alternatives for a Pass-Fail Test of g

An honorable, three-year military discharge, a college degree, a sufficient HS rank, or a

high score on an oral exam might be serviceable substitutes for passing a traditional M/C test.

The military uses cognitive ability tests as a selection tool, so additional testing is redundant.

Honorable discharge indicates the ability to learn the procedures of a paramilitary organization

and a military work specialty, and a willingness to follow orders. The restriction of range in g

due to requiring a college degree causes an r of .27 to shrink to .15. Allowing a g test with

modest validity to drive severe adverse impact is hard to defend.

Replace the Traditional M/C Test with Another Measure

Rank on job-related measures other than a traditional M/C test. Some such measures are

briefly described as separate tools below.

As to measuring cognitive ability, the main area that traditional M/C tests measure, not

all measures are interchangeable (Hough, Oswald & Ployhart, 2001; Lang, Kersting & Lang,

2010; Schmitt, 2014). For example, the correlation between leadership and intelligence is

dramatically higher for observational than paper and pencil (M/C or short answer) measures of

intelligence, .60 vs .19 (Judge, Colbert & Ilies, 2004, Table 2). This supports the use of measures

of cognitive ability other than M/C tests. Importantly, different aspects of cognitive ability have

different size B-W differences (Wee, Neuman & Joseph, 2014), suggesting that we carefully

select the aspects of cognitive ability to measure.

#6: Rank Candidates Based on HS Rank

HS rank is a good measure of future academic performance (e.g., Schmitt, Keeney,

Oswald, 2009, Tables 4A and 4B) and job performance (Roth, BeVier, Switzer III &

Schippmann, 1996). High school grades and SAT scores are approximately equally good

predictors of college grades (e.g., r = .37 vs .33 for cumulative college GPA; Berry & Sackett,

2009, Table 1). The SAT is a well-funded, thoroughly researched test. A locally developed

employment test would be unlikely to measure the readiness to learn as accurately as the SAT.

Use of HS rank will tend to promote diversity, especially when there is at least one school with a

predominantly black student body.

#7: Rank Based on a Structured Oral Exam

The structured oral exam is the most valid selection technique (r = .57, Aamodt, 2016,

Table 5.2) and typically shows relatively little adverse impact (Levashina, Hartwell, Morgeson

& Campion, 2014, Table 3, page 254). Oral exams can measure intelligence with more realistic

and complex questions than are possible on a M/C test. Oral exams can measure other job-

related abilities as well, such as: oral communication, interpersonal relations, unstructured

problem solving, and creativity (characteristics not measurable with a traditional M/C test; e.g.,

Huffcutt, Conway, Roth & Stone, 2001, Table; Cascio & Aguinis, 2011, pg. 268, par. 4).

#8: Rank Based on Video Testing

Video-based tests generally result in less adverse impact on black applicants than

traditional written M/C tests (e.g., Chan & Schmitt, 1996; Lievens & Sackett, 2006). Video-

based exams can present questions and situations that are much richer in detail and are more

realistic (video versus written or oral descriptions) as compared to other testing modes and can

measure abilities that are not measurable with written M/C tests.

#9: Rank Based on Each Candidate’s Highest Score

From those applicants who achieve a job-related passing score on each of several

measures (e.g., interpersonal, intelligence), rank applicants based on their highest score on any

one ability or their two highest scores. This approach maintains validity and reduces adverse

impact (Wiesen & Aguinis, 2010).

#10: Rank Based on the Memory of Faces that Mirror the Community

Remembering and identifying minority faces is easier for members of that minority group

(e.g., Levin, 2000). So, we expect a measure of this ability to have reverse impact. Being able to

remember faces of the type found in the community is likely job-related.

#11: Rank Based on Physical Fitness

Measures of physical fitness sometimes (not always) have little or no impact on black

applicants. Police officers perform many physically demanding job tasks (e.g., Shetterly &

Krishnamoorthy, 2008).

#12: Rank Based on a Measure of Prejudice

Tests of freedom from prejudice tend to favor minority applicants (Axt, Ebersole, &

Nosek, 2014). This type of test has been recommended for predicting discrimination in police

stops and searches (Greenwald, Banaji & Nosek 2014, p. 557 col 1, last par).

#13: Rank Based on Other Relevant Personality Measures

Some personality measures have relatively little or no impact on black applicants and are

valid for a wide range of jobs (e.g., extroversion, emotional stability; Ployhart & Holtz, 2008,

Table 1). Aspects of conscientiousness are particularly predictive for certain jobs (Dudley,

Orvis, Lebiecki & Cortina, 2006).

#14: Rank Based on Creativity and Problem Solving

Some tests of creativity show small or no B-W differences (e.g., Kaufman, 2006, page

1066, par 2, Kaufman, 2010, abstract). Further, there are non-cognitive aspects of problem

solving. Openness and intelligence have equal correlations with quality of problem solving

(Jauk, Benedek, Dunst & Neubauer, 2013, Table 1). Thinking biases may be independent of

cognitive ability in general and be stronger in more intelligent people (Stanovich & West, 2008).

Additionally, some new measures of intelligence show smaller d's than traditional g tests (e.g,

Agnello, Ryan & Yusko, 2015, page 52, par 5).

#15: Focus on Utility Rather than Validity

A relatively low validity test (e.g., personality with r=.15) can have higher utility than a

more validity test (e.g., g with r=.25) in situations often encountered (Wiesen, 2017c).

Principles for Using Two or More Tools Together

Brief Evaluation of the Legality of the Tools

No room to describe these in this summary, but they will be covered.

Real Life Examples

Bridgeport, CT

A goal for Bridgeport’s 2014 Police Officer entrance exam was to “Reduce disparity levels

between protected classes while also increasing validity”, but the approach offered by the City’s

consultant would use compensatory scoring of cognitive and behavioral components, despite the

(unmentioned) severe adverse impact. I suggested the consultant project the expected adverse

impact in numeric terms. I suggested changes which were adopted: P/F use of the g test, more

residency points (the City chose 15), and ranking based on an oral exam (Bridgeportct.gov,

2015). The Mayor of Bridgeport was exuberant in his praise for the resulting hiring, saying:

“...61 percent are candidates in the protected class of minorities or women; 61 percent is a

number we have never had before. ... we couldn’t have been prouder of this process. This did not

happen on its own; we made significant changes to the process” (Finch, 2015).

Oklahoma City, OK

Some 30 years ago, after abandoning affirmative action hiring, two consecutive firefighter

entrance exams resulted in hiring some 200 white and zero black firefighters, despite

blacks comprising 20% of applicants. The City, Fire Department, and black fraternal

organization were extremely concerned. The Fire Department used some of the tools described

here. This reduced resulted in hiring a diverse class of recruits with no legal challenges. The

system included:

- P/F M/C cognitive ability test

- P/F written workstyle test

- Ranked oral exam

- P/F 40 hour, First Responder course

- P/F background check

- P/F physical ability test

- P/F medical

The mean academy grade and dropout rate were similar to previous classes. The mean

quarterly recruit evaluation was high. All the new firefighters became certified EMTs. This

selection system was replicated in 1996 (Wiesen, 1996, 1997, 1998).

US Army

This screening program would not have adverse impact on minorities at any cut score

(White, Young, Heggestad, Stark, Drasgow & Piskator, 2004, page 4, col 2, par 3).

NYC 2012 Firefighter Exam

A exam developed by nine I/O psychologists was projected to not have adverse impact

for any of the four years the test would be used (PSI Services, 2012).

Columbus, OH Police Officer Exam

The City’s police officer examination includes a M/C cognitive ability test, a writing

exercise, and a physical fitness test, and ranks candidates based on a video-based test of problem

solving and interpersonal relations (Columbus Civil Service Commission, 2016). This approach

has enabled the City to present a diverse candidate pool to the background investigation process.

References

Aamodt, M. G. (2016). Industrial-Organizational Psychology: An Applied Approach (8th ed.)

Boston: Cengage Learning.

Cascio, W. F. & Aguinis, H. (2011). Applied Psychology in Human Resource Management (7th

ed.) Upper Saddle River, NJ: Prentice Hall.

Chan & Schmitt (1997). Video-Based Versus Paper-and-Pencil Method of Assessment in

Situational Judgment Tests: Subgroup Differences in Test Performance and Face Validity

Perceptions. Journal of Applied Psychology, 82, 143-159.

Diversity on the Force: Where Police Don’t Mirror Communities; A Governing Special Report

(September, 2015). Washington: Governing.

Finch, B. (2015). Mayor Bill Finch announcing department will hire 100 new cops over next 18

to 24 months. Published on Jul 27, 2015. Downloaded 2/17/2015 from

https://www.youtube.com/watch?v=X3tvOPOtF-0&feature=em-share_video_user

Kaufman, J. C. (2006). Self-Reported Differences in Creativity by Ethnicity and Gender. Applied

Cognitive Psychology, 20, 1065–1082.

Kaufman, J. C. (2010). Using Creativity to Reduce Ethnic Bias in College Admissions. Review

of General Psychology, 14, 189–203.

McCarthy, J. M., van Iddekinge, C. H. & Campion, M. A. (2010). Are Highly Structured Job

Interviews Resistant to Demographic Similarity Effects? Personnel Psychology, 63,

325–359.

McFarland, L. A., Marie Ryan, A. M., Sacco, J. M. & Kriska, S. D. (2004) Examination of

Structured Interview Ratings Across Time: The Effects of Applicant Race, Rater Race,

and Panel Composition. Journal of Management, 30, 435-452.

McDaniel, M. A., Whetzel, D. L., Hartman, N. S., Nguyen, N. & Gnibb, W. L. (2006).

Situational judgement tests: Validity and an integrative model. In R. Ployhart & J.

Weekley (Eds.) Situational Judgment Tests: Theory Measurement, and Application,

Jossey-Bass.

Ployhart, R. E., & Holtz, B. C. (2008). The diversity-validity dilemma: Strategies for reducing

racioethnic and sex subgroup differences and adverse impact in selection. Personnel

Psychology, 61, 153-172.

Roth, P. L. & BeVier, C. A., Switzer III, F. S. & Schippmann, J. S. (1996). Meta-Analyzing the

Relationship Between Grades and Job Performance. Journal of Applied Psychology, 81,

548-556.

Sacket, P. R. & Ellingson, J. E. (1997). The Effects of Forming Multi-Predictor Composites on

Group Differences and Adverse Impact. Personnel Psychology, 50, 707-721.

Schmitt, N., Keeney, J. & Oswald, F. L. (2009). Prediction of 4-Year College Student

Performance Using Cognitive and Noncognitive Predictors and the Impact on

Demographic Status of Admitted Students. Journal of Applied Psychology, 94, 1479–

1497.

White, L. A., Young, M. C., Heggestad, E. D., Stark, S., Drasgow, F. & Piskator, G. (2004).

Development of a Non-High School Diploma Graduate Pre-enlistment Screening Model

to Enhance the Future Force. Arlington, VA: U.S. Army Research Institute for the

Behavioral and Social Sciences.

Wiesen, J. P. (2016, November). Tools to Increase Diversity and Validity in Hiring Police

Officers. The Personnel Testing Council of Metropolitan Washington Newsletter, XII (3),

4-11.

Wiesen, J. P. (2017a, March). Tools to Increase Diversity and Validity in Hiring Police Officers

– Part II. The Personnel Testing Council of Metropolitan Washington Newsletter, XII(4),

6-15.

Wiesen, J. P. (2017b, July). Tools to Increase Diversity and Validity in Hiring Police Officers –

Part III. The Personnel Testing Council of Metropolitan Washington Newsletter, XIII (1),

6-17.

Wiesen, J. P. (2017c). Quantitative Considerations in Balancing Validity, Utility, Fairness, and

Adverse Impact. Presentation at the 2017 Conference of the International Personnel

Assessment Council, Birmingham, AL, July 19, 2016.

Wiesen, J. P. & Aguinis, H. (2010). New Methods for Reducing Adverse Impact and Preserving Validity. Symposium presentation at the 25th Annual Convention Society for Industrial and Organizational Psychology, Atlanta, GA.

appliedpersonnelresearch.comappliedpersonnelresearch.com/papers/Wiesen_Master... · Web viewThis...

Documents

Transcript of appliedpersonnelresearch.comappliedpersonnelresearch.com/papers/Wiesen_Master... · Web viewThis...