Dep ed order 3, s 2016 ph hiring guidelines for shs teacher applicants for sy 2016-2017
appliedpersonnelresearch.comappliedpersonnelresearch.com/papers/Wiesen_Master... · Web viewThis...
Transcript of appliedpersonnelresearch.comappliedpersonnelresearch.com/papers/Wiesen_Master... · Web viewThis...
1MASTER TUTORIALTITLE Tools to Increase Diversity, Utility, and Validity in Hiring Police Officers
SHORTENED TITLETools to Increase Diversity, Utility, and Validity in Hiring Police Officers
ABSTRACTMany police managers are stymied in their attempts to hire black police officers due to the pervasive adverse impact that traditional employment tests have on black candidates. This tutorial presents 15 tools (most novel or little-used) to help police departments (PDs) hire ethnically diverse academy classes while maintaining and even enhancing expected job performance.
PRESS PARAGRAPH Many police managers are stymied in their attempts to hire black police officers due to the pervasive adverse impact that traditional employment tests have on black candidates. This tutorial presents 15 tools (most novel or little-used) to help police departments hire ethnically diverse academy classes while maintaining and even enhancing expected job performance. Real life examples are provided to illustrate the use of these tools. The tools are described in enough detail to allow other consultants to use them.
WORD COUNT 2575
LEARNING OBJECTIVES1. Describe the conditions under which a relatively low validity test (e.g., r=.15) is expected
to have higher utility than a higher validity test (e.g., r=.25).2. Describe at least 2 test areas and 2 test modes that generally have reverse impact on
minorities.3. Describe the pros and cons of using tests of g on a pass-fail (P/F) basis.4. Explain why ranking even in part on a traditional test of g generally results in adverse
impact on minority candidates.
INTENDED AUDIENCEThis is intended for an audience schooled and experienced in employee selection testing, but all doctorate-level I/O psychologists should be able to understand much of the material.
BIO FOR JOEL P. WIESENDr. Wiesen serves as Director of his own firm, Applied Personnel Research. He specializes in employee selection testing. In 1975, he was hired by Massachusetts to validate its civil service examinations. Since 1993, he has consulted on employee testing and developed written tests for government and business. He is approved by the American Psychological Association to sponsor continuing education for psychologists. He has served as an expert on testing matters for two state Attorneys General, the US Department of Justice, other governmental entities, private law
firms, and other organizations. He received a doctorate in psychology from Lehigh University and is licensed as a psychologist in three states.
Introduction
Many police managers are stymied in their attempts to hire black police officers due to
the pervasive adverse impact that traditional employment tests have on black candidates. This
tutorial presents 21 tools (most novel or little-used) to help police departments (PDs) hire
ethnically diverse academy classes while maintaining and even enhancing expected job
performance. This tutorial focuses mainly on the hiring of black police applicants.
The Main Cause for Adverse Impact on Minority Applicants
The main reason for the frequent failure to hire a diverse workforce is the adverse impact
on minority applicants due to non-zero weighting of traditional multiple choice (M/C) tests.
Whatever the causes of the mean black-white (B-W) and Hispanic-white differences in mean test
scores, the result is clear: ranking based on M/C tests of g almost always results in adverse
impact on black and Hispanic candidate groups.
Brief Description of Tools
The tools presented fall into five general categories:
� Administrative
� Use a traditional M/C test in a P/F (P/F) fashion
� Provide substitutes for the P/F M/C test
� Replace the traditional M/C test
� Rank based on job-related abilities with little, no, or reverse impact
#1: Empower the PD to Evaluate Proposed Selection Systems
PDs should ask their testing consultants to provide projections of the outcomes of each
proposed selection system, based on plausible assumptions of the numbers of minority and white
applicants and the number of openings to be filled. Five projections should be provided: (1)
expected number of minority candidates who will be eligible to be hired (within the “top group”
or the “appointable range”), (2) expected cost of testing, (3) expected level of job performance,
(4) expected level of adverse impact, and (5) expected level of validity. Decisions about possible
tradeoffs between levels of adverse impact and job performance may best be made by the hiring
officials rather than testing experts.
#2: Shorten the Application Period
Simply allowing fewer applicants will typically increase the proportion of minority
applicants hired. With large applicant groups and small proportions of applicants selected,
typically there will be severe adverse impact (e.g., Sackett & Ellingson, 1997, Table 1. and pg.
711, par. 2). This approach does not affect validity.
#3: Use Residency Points to Increase Hiring of Residents of the Jurisdiction
Residency points will be helpful where the jurisdiction is largely minority and the
suburbs are largely white. (Such preference is not uncommon for non-racial reasons.) Often the
suburban school systems are far better funded, so having students compete on academic
achievement may be seen as unfair.
#4: Traditional Multiple-Choice Test Used P/F
Use a traditional M/C, entry-level test on a P/F basis, with a realistic, job-related passing
point. Even weighting the traditional M/C test at much less than 50% is likely to result in severe
adverse impact in hiring (Sackett & Ellingson, 1997, Table 1, pg. 710). This is a major reason
why many PDs that have moved away from total reliance on the traditional M/C test still have
difficulty hiring a diverse workforce (e.g., Diversity on the Force, 2015). By setting a suitable
passing point on a traditional M/C test and ranking on test components that have little or no
adverse impact, we can expect the new hires to have sufficient intelligence and have the requisite
personal and interpersonal skills and other abilities to do the work, yet with less adverse impact.
This tool is contrary to some apparently obvious implications of meta-analysis
research yet there are many strong arguments for using P/F g, if at all. First, some PDs require a
college degree, vitiating the validity of g among the applicants. Second, the validity of g for
police may be lower than for other jobs (r=.24, Aamodt, 2004). Third, the meta-analytic findings
in favor of g are tempered by other research. For example, the correlation between leadership
and intelligence is dramatically higher for observational than for paper and pencil (M/C or short
answer) measures of intelligence, .60 vs .19 (Judge, Colbert & Ilies, 2004, Table 2). Fourth, the
B-W mean difference in job performance on many jobs is only .5 sd, while the difference in test
performance often is 1 s.d. The regression formula: y = r*x (where y is job performance, x is test
performance, would only explain the .5 versus 1 sd difference in job and test performance if
employers hired randomly or from the whole range of test performance, which is not the case.
This suggests that the job criteria may be biased. Research suggests such bias. Studies of salaries
show: tall people are paid more than short (Judge & Cable, 2004), men are paid more than
women (Hegewisch, Williams & Henderson, 2011), and physically attractive people of
both genders are paid more than unattractive (Marlowe, Schneider & Nelson,
1996). Further, claims of lack of differential validity have been challenged (e.g., Aguinis,
Culpepper & Pierce, 2010 and 2016).
#5: Allow Alternatives for a Pass-Fail Test of g
An honorable, three-year military discharge, a college degree, a sufficient HS rank, or a
high score on an oral exam might be serviceable substitutes for passing a traditional M/C test.
The military uses cognitive ability tests as a selection tool, so additional testing is redundant.
Honorable discharge indicates the ability to learn the procedures of a paramilitary organization
and a military work specialty, and a willingness to follow orders. The restriction of range in g
due to requiring a college degree causes an r of .27 to shrink to .15. Allowing a g test with
modest validity to drive severe adverse impact is hard to defend.
Replace the Traditional M/C Test with Another Measure
Rank on job-related measures other than a traditional M/C test. Some such measures are
briefly described as separate tools below.
As to measuring cognitive ability, the main area that traditional M/C tests measure, not
all measures are interchangeable (Hough, Oswald & Ployhart, 2001; Lang, Kersting & Lang,
2010; Schmitt, 2014). For example, the correlation between leadership and intelligence is
dramatically higher for observational than paper and pencil (M/C or short answer) measures of
intelligence, .60 vs .19 (Judge, Colbert & Ilies, 2004, Table 2). This supports the use of measures
of cognitive ability other than M/C tests. Importantly, different aspects of cognitive ability have
different size B-W differences (Wee, Neuman & Joseph, 2014), suggesting that we carefully
select the aspects of cognitive ability to measure.
#6: Rank Candidates Based on HS Rank
HS rank is a good measure of future academic performance (e.g., Schmitt, Keeney,
Oswald, 2009, Tables 4A and 4B) and job performance (Roth, BeVier, Switzer III &
Schippmann, 1996). High school grades and SAT scores are approximately equally good
predictors of college grades (e.g., r = .37 vs .33 for cumulative college GPA; Berry & Sackett,
2009, Table 1). The SAT is a well-funded, thoroughly researched test. A locally developed
employment test would be unlikely to measure the readiness to learn as accurately as the SAT.
Use of HS rank will tend to promote diversity, especially when there is at least one school with a
predominantly black student body.
#7: Rank Based on a Structured Oral Exam
The structured oral exam is the most valid selection technique (r = .57, Aamodt, 2016,
Table 5.2) and typically shows relatively little adverse impact (Levashina, Hartwell, Morgeson
& Campion, 2014, Table 3, page 254). Oral exams can measure intelligence with more realistic
and complex questions than are possible on a M/C test. Oral exams can measure other job-
related abilities as well, such as: oral communication, interpersonal relations, unstructured
problem solving, and creativity (characteristics not measurable with a traditional M/C test; e.g.,
Huffcutt, Conway, Roth & Stone, 2001, Table; Cascio & Aguinis, 2011, pg. 268, par. 4).
#8: Rank Based on Video Testing
Video-based tests generally result in less adverse impact on black applicants than
traditional written M/C tests (e.g., Chan & Schmitt, 1996; Lievens & Sackett, 2006). Video-
based exams can present questions and situations that are much richer in detail and are more
realistic (video versus written or oral descriptions) as compared to other testing modes and can
measure abilities that are not measurable with written M/C tests.
#9: Rank Based on Each Candidate’s Highest Score
From those applicants who achieve a job-related passing score on each of several
measures (e.g., interpersonal, intelligence), rank applicants based on their highest score on any
one ability or their two highest scores. This approach maintains validity and reduces adverse
impact (Wiesen & Aguinis, 2010).
#10: Rank Based on the Memory of Faces that Mirror the Community
Remembering and identifying minority faces is easier for members of that minority group
(e.g., Levin, 2000). So, we expect a measure of this ability to have reverse impact. Being able to
remember faces of the type found in the community is likely job-related.
#11: Rank Based on Physical Fitness
Measures of physical fitness sometimes (not always) have little or no impact on black
applicants. Police officers perform many physically demanding job tasks (e.g., Shetterly &
Krishnamoorthy, 2008).
#12: Rank Based on a Measure of Prejudice
Tests of freedom from prejudice tend to favor minority applicants (Axt, Ebersole, &
Nosek, 2014). This type of test has been recommended for predicting discrimination in police
stops and searches (Greenwald, Banaji & Nosek 2014, p. 557 col 1, last par).
#13: Rank Based on Other Relevant Personality Measures
Some personality measures have relatively little or no impact on black applicants and are
valid for a wide range of jobs (e.g., extroversion, emotional stability; Ployhart & Holtz, 2008,
Table 1). Aspects of conscientiousness are particularly predictive for certain jobs (Dudley,
Orvis, Lebiecki & Cortina, 2006).
#14: Rank Based on Creativity and Problem Solving
Some tests of creativity show small or no B-W differences (e.g., Kaufman, 2006, page
1066, par 2, Kaufman, 2010, abstract). Further, there are non-cognitive aspects of problem
solving. Openness and intelligence have equal correlations with quality of problem solving
(Jauk, Benedek, Dunst & Neubauer, 2013, Table 1). Thinking biases may be independent of
cognitive ability in general and be stronger in more intelligent people (Stanovich & West, 2008).
Additionally, some new measures of intelligence show smaller d's than traditional g tests (e.g,
Agnello, Ryan & Yusko, 2015, page 52, par 5).
#15: Focus on Utility Rather than Validity
A relatively low validity test (e.g., personality with r=.15) can have higher utility than a
more validity test (e.g., g with r=.25) in situations often encountered (Wiesen, 2017c).
Principles for Using Two or More Tools Together
Brief Evaluation of the Legality of the Tools
No room to describe these in this summary, but they will be covered.
Real Life Examples
Bridgeport, CT
A goal for Bridgeport’s 2014 Police Officer entrance exam was to “Reduce disparity levels
between protected classes while also increasing validity”, but the approach offered by the City’s
consultant would use compensatory scoring of cognitive and behavioral components, despite the
(unmentioned) severe adverse impact. I suggested the consultant project the expected adverse
impact in numeric terms. I suggested changes which were adopted: P/F use of the g test, more
residency points (the City chose 15), and ranking based on an oral exam (Bridgeportct.gov,
2015). The Mayor of Bridgeport was exuberant in his praise for the resulting hiring, saying:
“...61 percent are candidates in the protected class of minorities or women; 61 percent is a
number we have never had before. ... we couldn’t have been prouder of this process. This did not
happen on its own; we made significant changes to the process” (Finch, 2015).
Oklahoma City, OK
Some 30 years ago, after abandoning affirmative action hiring, two consecutive firefighter
entrance exams resulted in hiring some 200 white and zero black firefighters, despite
blacks comprising 20% of applicants. The City, Fire Department, and black fraternal
organization were extremely concerned. The Fire Department used some of the tools described
here. This reduced resulted in hiring a diverse class of recruits with no legal challenges. The
system included:
- P/F M/C cognitive ability test
- P/F written workstyle test
- Ranked oral exam
- P/F 40 hour, First Responder course
- P/F background check
- P/F physical ability test
- P/F medical
The mean academy grade and dropout rate were similar to previous classes. The mean
quarterly recruit evaluation was high. All the new firefighters became certified EMTs. This
selection system was replicated in 1996 (Wiesen, 1996, 1997, 1998).
US Army
This screening program would not have adverse impact on minorities at any cut score
(White, Young, Heggestad, Stark, Drasgow & Piskator, 2004, page 4, col 2, par 3).
NYC 2012 Firefighter Exam
A exam developed by nine I/O psychologists was projected to not have adverse impact
for any of the four years the test would be used (PSI Services, 2012).
Columbus, OH Police Officer Exam
The City’s police officer examination includes a M/C cognitive ability test, a writing
exercise, and a physical fitness test, and ranks candidates based on a video-based test of problem
solving and interpersonal relations (Columbus Civil Service Commission, 2016). This approach
has enabled the City to present a diverse candidate pool to the background investigation process.
References
Aamodt, M. G. (2016). Industrial-Organizational Psychology: An Applied Approach (8th ed.)
Boston: Cengage Learning.
Cascio, W. F. & Aguinis, H. (2011). Applied Psychology in Human Resource Management (7th
ed.) Upper Saddle River, NJ: Prentice Hall.
Chan & Schmitt (1997). Video-Based Versus Paper-and-Pencil Method of Assessment in
Situational Judgment Tests: Subgroup Differences in Test Performance and Face Validity
Perceptions. Journal of Applied Psychology, 82, 143-159.
Diversity on the Force: Where Police Don’t Mirror Communities; A Governing Special Report
(September, 2015). Washington: Governing.
Finch, B. (2015). Mayor Bill Finch announcing department will hire 100 new cops over next 18
to 24 months. Published on Jul 27, 2015. Downloaded 2/17/2015 from
https://www.youtube.com/watch?v=X3tvOPOtF-0&feature=em-share_video_user
Kaufman, J. C. (2006). Self-Reported Differences in Creativity by Ethnicity and Gender. Applied
Cognitive Psychology, 20, 1065–1082.
Kaufman, J. C. (2010). Using Creativity to Reduce Ethnic Bias in College Admissions. Review
of General Psychology, 14, 189–203.
McCarthy, J. M., van Iddekinge, C. H. & Campion, M. A. (2010). Are Highly Structured Job
Interviews Resistant to Demographic Similarity Effects? Personnel Psychology, 63,
325–359.
McFarland, L. A., Marie Ryan, A. M., Sacco, J. M. & Kriska, S. D. (2004) Examination of
Structured Interview Ratings Across Time: The Effects of Applicant Race, Rater Race,
and Panel Composition. Journal of Management, 30, 435-452.
McDaniel, M. A., Whetzel, D. L., Hartman, N. S., Nguyen, N. & Gnibb, W. L. (2006).
Situational judgement tests: Validity and an integrative model. In R. Ployhart & J.
Weekley (Eds.) Situational Judgment Tests: Theory Measurement, and Application,
Jossey-Bass.
Ployhart, R. E., & Holtz, B. C. (2008). The diversity-validity dilemma: Strategies for reducing
racioethnic and sex subgroup differences and adverse impact in selection. Personnel
Psychology, 61, 153-172.
Roth, P. L. & BeVier, C. A., Switzer III, F. S. & Schippmann, J. S. (1996). Meta-Analyzing the
Relationship Between Grades and Job Performance. Journal of Applied Psychology, 81,
548-556.
Sacket, P. R. & Ellingson, J. E. (1997). The Effects of Forming Multi-Predictor Composites on
Group Differences and Adverse Impact. Personnel Psychology, 50, 707-721.
Schmitt, N., Keeney, J. & Oswald, F. L. (2009). Prediction of 4-Year College Student
Performance Using Cognitive and Noncognitive Predictors and the Impact on
Demographic Status of Admitted Students. Journal of Applied Psychology, 94, 1479–
1497.
White, L. A., Young, M. C., Heggestad, E. D., Stark, S., Drasgow, F. & Piskator, G. (2004).
Development of a Non-High School Diploma Graduate Pre-enlistment Screening Model
to Enhance the Future Force. Arlington, VA: U.S. Army Research Institute for the
Behavioral and Social Sciences.
Wiesen, J. P. (2016, November). Tools to Increase Diversity and Validity in Hiring Police
Officers. The Personnel Testing Council of Metropolitan Washington Newsletter, XII (3),
4-11.
Wiesen, J. P. (2017a, March). Tools to Increase Diversity and Validity in Hiring Police Officers
– Part II. The Personnel Testing Council of Metropolitan Washington Newsletter, XII(4),
6-15.
Wiesen, J. P. (2017b, July). Tools to Increase Diversity and Validity in Hiring Police Officers –
Part III. The Personnel Testing Council of Metropolitan Washington Newsletter, XIII (1),
6-17.
Wiesen, J. P. (2017c). Quantitative Considerations in Balancing Validity, Utility, Fairness, and
Adverse Impact. Presentation at the 2017 Conference of the International Personnel
Assessment Council, Birmingham, AL, July 19, 2016.
Wiesen, J. P. & Aguinis, H. (2010). New Methods for Reducing Adverse Impact and Preserving Validity. Symposium presentation at the 25th Annual Convention Society for Industrial and Organizational Psychology, Atlanta, GA.