SITUATIONAL JUDGMENT TESTS: CONSTRUCTS ASSESSED …

PERSONNEL PSYCHOLOGY2010, 63, 83–117

SITUATIONAL JUDGMENT TESTS: CONSTRUCTSASSESSED AND A META-ANALYSIS OF THEIRCRITERION-RELATED VALIDITIES

MICHAEL S. CHRISTIANEller College of Management University of Arizona

BRYAN D. EDWARDSWilliam S. Spears School of Business

Oklahoma State University

JILL C. BRADLEYCraig School of Business

California State University, Fresno

Situational judgment tests (SJTs) are a measurement method that maybe designed to assess a variety of constructs. Nevertheless, many stud-ies fail to report the constructs measured by the situational judgmenttests in the extant literature. Consequently, a construct-level focus in thesituational judgment test literature is lacking, and researchers and prac-titioners know little about the specific constructs typically measured.Our objective was to extend the efforts of previous researchers (e.g.,McDaniel, Hartman, Whetzel, & Grubb, 2007; McDaniel & Ngyuen,2001; Schmitt & Chan, 2006) by highlighting the need for a construct fo-cus in situational judgment test research. We identified and classified theconstruct domains assessed by situational judgment tests in the literatureinto a content-based typology. We then conducted a meta-analysis to de-termine the criterion-related validity of each construct domain and totest for moderators. We found that situational judgment tests most oftenassess leadership and interpersonal skills and those situational judgmenttests measuring teamwork skills and leadership have relatively high va-lidities for overall job performance. Although based on a small numberof studies, we found evidence that (a) matching the predictor constructswith criterion facets improved criterion-related validity; and (b) video-based situational judgment tests tended to have stronger criterion-related

Authors’ Note. This paper is based in part on the master’s thesis of Michael S. Christian,which was chaired by Bryan D. Edwards. An earlier version of this paper was presented atthe 2007 Annual Conference of the Society for Industrial and Organizational Psychology.

We are grateful to Winfred Arthur, Jr., Ronald Landis, Michael Burke, and Filip Lievensfor reviewing previous drafts of this article and providing valuable suggestions. We alsothank Michael McDaniel, Phillip Bobko, and Edgar Kausel for their helpful comments andsuggestions on this project. Finally, we acknowledge the work of Jessica Siegel, HelenTerry, and Adela Garza.

Correspondence and requests for reprints should be addressed to Michael S. Chris-tian Eller College of Management, University of Arizona, Department of Manage-ment and Organizations, McClelland Hall, PO Box 210108, Tucson, AZ 85721-0108;[email protected]© 2010 Wiley Periodicals, Inc.

83

84 PERSONNEL PSYCHOLOGY

validity than pencil-and-paper situational judgment tests, holding con-structs constant. Implications for practice and research are discussed.

Situational judgment tests (SJTs) have a long history of use for em-ployee selection (e.g., File, 1945; File & Remmers, 1971; Motowidlo,Dunnette, & Carter, 1990; Weekley & Jones, 1997, 1999). An SJT isa measurement method typically composed of job-related situations orscenarios that describe a dilemma or problem requiring the applicationof relevant knowledge, skills, abilities, or other characteristics (KSAOs)to solve. SJT items may be presented in written, verbal, video-based, orcomputer-based formats (e.g., Clevenger, Pereira, Wiechmann, Schmitt,& Schmidt-Harvey, 2001; Motowidlo et al., 1990), and usually containoptions representing alternative courses of action from which the test takerchooses the most appropriate response.

Despite the widespread use of SJTs, even a cursory review of theliterature reveals that test developers and researchers often give little at-tention to the constructs measured by SJTs and instead tend to reportresults based on overall (or composite) SJT scores. Nevertheless, becauseSJTs are measurement methods, understanding the constructs a given SJTmeasures is vitally important for interpreting its psychometric propertiessuch as reliability, validity, and subgroup differences (e.g., Arthur & Vil-lado, 2008; Lievens & Sackett, 2007; McDaniel, Morgeson, Finnegan,Campion, & Braverman 2001; Motowidlo et al., 1990; Schmitt & Chan,2006; Smith & McDaniel, 1998). Hence, to understand how and why SJTswork in a selection context, there is a critical need for the identificationof the constructs typically assessed using SJTs. This need has been high-lighted recently by researchers who have called for an increased researchfocus on the constructs being measured by predictors of job performance(e.g., Arthur & Villado, 2008; Hough & Ones, 2001; McDaniel et al.,2001; McDaniel, Hartman, Whetzel, & Grubb, 2007; Ployhart, 2006;Roth, Bobko, McFarland, & Buster, 2008). Indeed, there has been “virtu-ally no direct investigation of the relationships linking SJT scores and testcontent” (Schmitt & Chan, 2006, p. 147). This is a critical oversight be-cause test content is an important consideration in establishing constructvalidity (AERA, APA, & NCME, 1999; Binning & Barrett, 1989; Schmitt& Chan, 2006) and helps explain why constructs measured by SJTs arerelated to performance.

Therefore, as noted by others (e.g., Lievens, Buyse, & Sackett, 2005;Ployhart & Ryan, 2000a; Schmitt & Chan, 2006), we feel that SJT researchcould benefit from developing a more theoretically driven construct-levelframework. Hence, the primary objectives of this research were to (a)discuss the advantages of attending to and reporting SJT construct-levelversus method-level results; (b) develop a typology of constructs that

MICHAEL S. CHRISTIAN ET AL. 85

have been assessed by SJTs in the extant literature; and (c) undertake aninitial examination of the criterion-related and incremental validity of theidentified constructs and to investigate moderators of these validities.

Advantages of a Construct-Based Approach

The construct-based approach refers to the practice of evaluating anddescribing properties of SJTs in terms of the constructs measured (i.e., testcontent) rather than in terms of the method of measurement. The lack ofattention to SJT constructs is arguably the result of the way in which SJTsare typically developed, applied, and reported in the literature (Arthur &Villado, 2008; Schmitt & Chan, 2006). Many selection tests are constructcentered in that they are designed to measure a specific construct (e.g.,cognitive ability, conscientiousness, integrity) and are therefore labeledbased on the constructs that they measure (e.g., cognitive ability test). Incontrast, predictors such as interviews, work samples, and SJTs are oftendescribed in method-based terms and are frequently developed using a job-centered approach in which the tests are designed to simulate aspects of thework itself rather than measure a specific predictor construct (Roth et al.,2008). In most selection contexts, simulation-based tests are developedto closely match job performance, which is reflected in the “sample”approach to measurement where tests are developed to have isomorphisimwith the performance domain (Binning & Barrett, 1989; Wernimont &Campbell, 1968). Therefore, SJTs often are designed by collecting criticalincidents of job performance in a particular setting and in doing so tapa number of predictor constructs simultaneously (e.g., Chan & Schmitt,1997; McDaniel et al., 2001; McDaniel & Nguyen, 2001; Motowidlo& Tippins, 1993). Furthermore, many studies either fail to report theconstructs measured by SJTs (e.g., Chan, 2002; Cucina, Vasilopoulos, &Leaman, 2003; Dicken & Black, 1965; McDaniel, Yost, Ludwick, Hense,& Hartman, 2004; Pereira & Harvey, 1999) or simply report compositemethod-level scores rather than scores for the specific constructs (e.g.,Chan & Schmitt, 2002; Motowidlo et al., 1990; Smith & McDaniel, 1998;Swander, 2000; Weekley & Jones, 1997, 1999).

Reporting results at the construct level offers theoretical and practicaladvantages. First, from a theoretical perspective, the goal should not justbe to show that a measure predicts job performance but also why thatmeasure or construct predicts job performance (Arthur & Villado, 2008;Messick, 1995). Hence, identifying the constructs measured by selectiontests such as SJTs is important for theory testing and understanding whya given test is or is not related to the criterion of interest. Second, a fo-cus on reporting constructs also allows researchers to make more precisecomparisons between various selection methods. The lack of attention


to constructs in the extant SJT literature leads to scores that are difficultto compare to scores derived from other selection methods or constructs(e.g., Arthur & Villado, 2008). For example, empirical investigations com-paring the criterion-related validity, incremental validity (e.g., Clevengeret al., 2001), and subgroup differences1 (e.g., Pulakos & Schmitt, 1996;Sackett, Schmitt, Ellingson, & Kabin, 2001) of SJTs to other predictormeasures are difficult to interpret without specifying the construct(s) mea-sured (Arthur, Day, McNelly, & Edens, 2003; Arthur & Villado, 2008).Likewise, comparisons of predictive validity between different SJT for-mats (e.g., pencil and paper, video-based) are more meaningful whenconstructs are held constant, as we detail later.

Third, specification of the KSAOs measured by SJTs helps to reducecontamination in test scores resulting from the measurement of unin-tended, non-job-relevant constructs. Fourth, in terms of job relevancy, acompelling argument must be made that the validity evidence availablefor the measure justifies its interpretation and use (Messick, 1995). Fifth,when practitioners and researchers are uncertain of the reason a test pre-dicts a particular outcome, their ability to generalize findings across timeand context is hindered. The construct-based approach allows for the de-velopment of SJTs that can be used to predict performance across manydifferent jobs. In contrast, it would be difficult to transport SJTs acrosscontexts if validity data are reported only at the composite or method level.Finally, by identifying the constructs measured by SJTs, practitioners canenhance predictive validity by theoretically matching predictor and cri-terion constructs (Bartram, 2005; Hogan & Holland, 2003; Mohammed,Mathieu, & Bartlett, 2002; Moon, 2001; Paunonen, Rothstein, & Jackson,1999).

The Current State of Knowledge About the Constructs TypicallyMeasured Using SJTs

In spite of (or perhaps because of) the lack of attention in many pri-mary studies to constructs, some researchers have speculated about theconstructs measured by SJTs. For instance, Sternberg and Wagner (1993)posited that SJTs measure tacit knowledge, whereas Schmidt and Hunter(1993) argued that they primarily measure job knowledge, and McDanieland Nguyen (2001) suggested that some SJTs may predominantly

1A recent article by Whetzel, McDaniel, and Nguyen (2008) provides an interestingalternative to examining the degree to which subgroup differences are affected by varianceattributed to cognitive ability. Using vector analysis, they show that as the correlationwith cognitive ability of an SJT increases, standardized mean race differences on the SJTincrease. However, this approach is still conducted at the method-level using compositescores for SJTs, whereas we focused more on a construct-based approach.


measure cognitive ability and personality. More recently, Schmitt andChan (2006) argued that SJTs measure constructs like adaptability andcontextual knowledge. Nevertheless, with the exception of the work byMcDaniel and colleagues, we found little compelling empirical evidencefor these suppositions. In a series of meta-analytic studies, McDaniel andcolleagues assessed the construct saturation of SJTs and indicated thatSJTs measure cognitive ability (Mρ = .33–.46), Agreeableness (Mρ =.27–.31), Conscientiousness (Mρ = .25–.31), Emotional Stability (Mρ =.26–.30), Extraversion (Mρ = .30), and Openness (Mρ = .13; McDanielet al., 2001, 2007; McDaniel & Nguyen, 2001). These studies providecritical information by identifying a set of constructs typically associatedwith SJTs from which to build a more comprehensive typology.

This research extends McDaniel and colleagues’ work by taking an al-ternative methodological approach. Specifically, we broadened the list ofconstructs generated by McDaniel and colleagues by using content anal-ysis to investigate additional constructs such as leadership, social skills,and job knowledge. As noted by Schmitt and Chan (2006), this approachwill help to reveal whether SJTs, as a measurement method, inherently tapcertain constructs or whether the SJT content can be modified to assessthese constructs to a greater or lesser extent. Further, this approach allowsfor holding predictor constructs constant, which facilitates comparisonsof criterion-related validity between SJTs and other predictor methods.Hence, as we explain, our approach was to identify primary studies thatdo report construct or content information in order to develop a typologyof the constructs typically assessed by SJTs.

Identifying the Constructs Assessed by SJTs

Our approach to understanding which constructs are typically mea-sured by SJTs is consistent with research investigating other job-centeredselection methods such as interviews (Huffcutt, Conway, Roth, & Stone,2001), assessment centers (Arthur et al., 2003), and work samples (Rothet al., 2008), which often share the same construct/method confound asSJTs. We followed the suggested steps of Huffcutt et al. (2001), whichinvolved SJT construct identification, classification, and frequency as-sessment in addition to collecting and reporting criterion-related validityinformation for the constructs typically assessed by SJTs.

We reviewed the selection literature for existing typologies and foundthe work of Huffcutt et al. (2001) to be the most suitable construct clas-sification framework for SJTs. The construct categories in their typologyinclude mental capability, knowledge and skills, basic personality ten-dencies, applied social skills, interests and preferences, organizationalfit, and physical attributes. We chose this framework for several reasons.


First, Huffcutt et al.’s typology provided an adequate summary of the pri-mary psychological characteristics that could be measured using SJTs. Forexample, McDaniel and colleagues have shown that SJTs measure cog-nitive ability and personality (McDaniel et al., 2001, 2007; McDaniel &Nguyen, 2001; Whetzel, et al., 2008), indicating that these constructs areoften deliberately assessed using SJTs. Further, the SJT method enablesthe presentation of complicated social situations rich in contextual details.For this reason, we posited that SJTs would be frequently developed tomeasure applied social skills such as interpersonal skills, teamwork skills,and leadership. In addition, this typology has been used as a framework forclassifying the constructs assessed by other job-centered, method-basedpredictors such as employment interviews (Huffcutt et al., 2001) and worksample tests (Roth et al., 2008).

Adding to our rationale for the typology, SJTs share basic similaritieswith interviews and work samples; in that they are methods that can bedesigned to tap a variety of job-relevant constructs. Further, these methodsoften measure constructs embedded within “clusters” of KSAOs designedto sample particular work-related characteristics (Huffcutt et al., 2001;Roth et al., 2008). Given that SJTs are typically composed of job-relatedscenarios designed to simulate the job, the scenarios may often measuremultiple constructs because most job behaviors require multiple KSAOs.Therefore, we followed the lead of Roth et al. (2008) and analyzed the SJTcontent in terms of its saturation with predominant higher-order constructdomains.2

The idea of construct saturation is useful because job-centered methodsoften do not “cleanly” assess one specific construct. Saturation refers tothe extent to which a given construct influences (or saturates) complexmeasures like SJTs. Therefore, when the reported constructs for a givenSJT were homogeneous relative to a particular construct domain, weconsidered the SJT “saturated” with this domain. For example, Weekleyand Jones (1999) developed an SJT measuring coworker interaction skills,customer interaction skills, and loss-prevention behaviors in customerservice, which are constructs related to interpersonal skills. Although thesituational item content referenced incidental job-specific constructs, allitems described interpersonal interactions; therefore this SJT was clearlysaturated with the construct domain interpersonal skills.

Next, we conducted a comprehensive review of the SJT literature toidentify the constructs and construct domains researchers reported mea-suring (see Table 1). We conceptualized construct categories as the highest

2Although Huffcutt et al. (2001) coded at the dimension level in their meta-analysis ofinterviews, we followed the lead of Roth et al. (2008), and coded studies at the test level.This was because virtually no studies of SJTs included scores at the dimension level.


TABLE 1Typology of SJT Construct Domains

Construct category and domain Construct k % of total

136 100.00Knowledge and skills

Job knowledge and skills 4 2.94Knowledge of the interrelatedness

of units1

Pilot judgment (knowledgecontent)

1

Managing tasks 1Team role knowledge 1

Applied social skillsInterpersonal skills 17 12.50

Ability to size up personalities 1Customer contact effectiveness 1Customer service interactions 3Guest relations 1Interactions 2Interpersonal skills 3Negotiations 1Service situations 1Social intelligence 2Working effectively with others 1Not specified (interpersonal skills

content)1

Teamwork skills 6 4.41Teamwork 3Teamwork KSAs 3

Leadership 51 37.50Administrative judgment 1Conflict resolution for managers 2Directing the activities of others 2Handling people 3General management performance 2Handling employee problems 2Leadership/supervision 4Managerial/supervisory skill or

judgment5

Managerial situations 1Supervisor actions dealing with

people2

Supervisor job knowledge 1Supervisor problems 1Supervisor Profile Questionnaire 1Managing others 5Not specified (leadership content) 19


TABLE 1 (continued)

Construct category and domain Construct k % of total

Basic personality tendencies 13 9.56Personality composites Conscientiousness, Agreeableness,

Neuroticism3

Adaptability, ownership,self-initiative, teamwork,integrity, work ethic

1

Conscientiousness Conscientiousness 7Agreeableness Agreeableness 1Neuroticism Neuroticism 1

Heterogeneous composites 45 33.09

Note. Frequencies represent the number of independent effects.

order classification with construct domains falling under the categories,and constructs falling under construct domains. Following this concep-tualization, we documented the extent to which researchers reported theconstructs assessed and how commonly SJTs in the literature measuredspecific construct domains. Finally, we calculated initial meta-analyticestimates of the criterion-related validity of each construct domain3 andexamined the impact of both a construct-level moderator (i.e., job perfor-mance facets) and a method-level moderator (i.e., SJT format) on thesevalidities.

Moderator Analyses and Hypotheses

Job performance facets. Our critique of research that focuses on com-posite predictor scores also applies to job performance criteria. The typ-ical practice when conducting meta-analyses of predictors is to calculatevalidities based on ratings of job performance, collapsing across spe-cific performance dimensions (e.g., Arthur et al., 2003; McDaniel et al.2001, 2007; Roth et al., 2008). Although the practice of meta-analyzingcriterion-related validity using a broad and inclusive criterion is useful insome respects (e.g., Viswesvaran, 2001), partitioning the criterion domaininto specific aspects of job performance (e.g., facets) provides clarity of

3In order to estimate incremental validity using meta-analysis, one must obtain primarystudies that report correlations between (a) more than one SJT construct domain and thecriterion, and (b) the intercorrelations among the SJT construct domains. Unfortunately,almost no studies in the extant literature provided the information necessary to performsuch analyses, so we were unable to accomplish this goal.


how and why predictor constructs relate to criteria (e.g., Bartram, 2005;Campbell, 1990; Hogan & Holland, 2003; Hurtz & Donovan, 2000; Mo-towidlo & Van Scotter, 1994; Rotundo & Sackett, 2002; Tett, Jackson,Rothstein, & Reddon, 1999). For example, Lievens and colleagues (2005)found that an SJT measuring interpersonal skills predicted an interpersonalperformance criterion, whereas it had no relationship with an academicperformance criterion. Indeed, evidence suggests that matching predic-tor constructs with theoretically appropriate criterion facets may result instronger validities (Bartram, 2005; Hogan & Holland, 2003; Mohammedet al., 2002; Moon, 2001; Paunonen et al., 1999). Therefore, we examinedspecific performance facets as a moderator of the criterion-related validityof the constructs measured by the SJTs in our dataset.

In order to partition the job performance criterion, we reviewed anumber of potential performance categorization models from the extantliterature (see Viswesvaran & Ones, 2000 for a review), and from these wechose as a starting point the higher-order classification scheme suggestedby Borman and Motowidlo (1993) of task versus contextual performance.Task performance is a measure of the degree to which individuals can per-form the substantive tasks and duties that are central to the job. Contextualperformance is not task specific and relates to an individuals’ propensityto behave in ways that facilitate the social and psychological context ofan organization (Borman & Motowidlo, 1993). Contextual performanceconsists of interpersonal and motivational components (e.g., Borman &Motowidlo, 1997; Chan & Schmitt 2002; Rotundo & Sackett, 2002). Inaddition, we included ratings of managerial performance as another jobperformance facet. Many SJTs are developed to predict managerial orsupervisory performance, which arguably is distinct from task and con-textual performance in that many elements of managerial performanceinvolving behaviors typically considered to be contextual (e.g., interper-sonal facilitation) are actually core management duties (e.g., Borman &Motowidlo, 1993; Conway, 1999; Scullen, Mount, & Judge, 2003; Witt& Ferris, 2003). Therefore, we defined managerial performance as behav-iors central to the job of a manager or leader, including administrative andinterpersonal behaviors. In sum, we divided the job performance criterioninto three facets4: (a) task performance, (b) contextual performance, and(c) managerial performance.

Partitioning the performance criterion allowed a set of hypotheses tobe developed based on prior research for expected magnitudes of the

4Although it could have been informative to provide information for studies that assessedsupervisor ratings of overall global job performance (rather than assessing multiple dimen-sions and collapsing across them to create composites ratings), we did not find appropriatenumbers of studies to facilitate such an analysis.


relationships between SJT predictor construct domains and specific cri-terion facets. As we detail next, we expected that the benefits of theconstruct-based approach to classifying SJTs would be realized in termsof stronger validities for construct domain-specific SJTs when appropri-ately matched to criterion facets, compared with weaker validities forheterogeneous composite SJTs correlated with the same facets.

Contextual performance includes a combination of social behaviors(e.g., teamwork, helping, cooperation, and conflict resolution) and mo-tivational behaviors (e.g., effort, initiative, drive; Borman & Motowidlo,1997; Van Scotter & Motowidlo, 1996). Applied social skills such as in-terpersonal skills, teamwork skills, and leadership skills should relate tocontextual performance ratings to the degree that they reflect the ability toperceive and interpret social dynamics in such a way that facilitates judg-ments regarding the timing and appropriateness of contextual behaviors(Ferris, Witt, & Hochwarter, 2001; Morgeson, Reider, & Campion, 2005;Witt & Ferris, 2003). Interpersonal skills should predict contextual per-formance because interpersonally oriented individuals will be more likelyto perform behaviors involving helping and social facilitation. Likewise,the social awareness associated with teamwork skills should translate intoa propensity to perform behaviors that maintain the psychological andsocial context of organizational teams (Morgeson et al., 2005; Stevens& Campion 1999). Also, workers who are not managers but have strongleadership skills should have the ability and the motivation (Chan & Dras-gow, 2001) to exhibit socially facilitative behaviors such as helping andmotivating others. In addition, theoretical models and empirical evidencesupport a relationship between contextual performance and interpersonalskills (Ferris et al., 2001; Witt & Ferris, 2003), teamwork skills (Morgesonet al., 2005; Stevens & Campion, 1999), and leadership skills (Conway,1999; Mumford, Zaccaro, Connelly, & Marks, 2000). Hence, we hypoth-esized:

Hypothesis 1: For contextual performance, SJTs measuring interper-sonal skills will have stronger relationships than het-erogeneous composite SJTs.

Hypothesis 2: For contextual performance, SJTs measuring teamworkskills will have stronger relationships than heteroge-neous composite SJTs.

Hypothesis 3: For contextual performance, SJTs measuring leader-ship skills will have stronger relationships than hetero-geneous composite SJTs.

To the extent that task performance requires an understanding of theskills needed to perform job-specific tasks, it should be related to jobknowledge and skills (e.g., Borman, White, & Dorsey, 1995; Kanfer


& Ackerman, 1989; Van Scotter & Motowidlo, 1996). Indeed, empiri-cal evidence supports a relationship between task performance and jobknowledge (Borman et al., 1995; Schmidt, Hunter, & Outerbridge, 1986).Therefore, we hypothesized:

Hypothesis 4: For task performance, SJTs measuring job knowledgeand skills will have stronger relationships than hetero-geneous composite SJTs.

Finally, to the extent that managerial performance requires interper-sonally oriented behaviors, it should be related to applied social skillssuch as leadership and interpersonal skills. There is considerable con-ceptual overlap for SJTs assessing leadership with managerial perfor-mance ratings. Indeed, leadership skills such as motivating and managingothers, handling people, and directing and structuring subordinate activ-ities are core aspects of management performance (Borman & Brush,1993; Mumford et al., 2000). In addition, interpersonal skills that arenot specific to leadership per se should be related to managerial per-formance. Effective management involves interpersonal interactions in-cluding communication, conflict resolution, and negotiations (Borman &Brush, 1993; Campbell, McCloy, Oppler, & Sager, 1993; Conway, 1999;Katz, 1955; Mumford et al., 2000). As such, managers proficient in manyof the interpersonal skills assessed by SJTs will likely be rated favorablyon managerial performance. Consistently, empirical evidence supportsthe relationship between managerial performance and leadership skills(Connelly, et al., 2000; Conway, 1999) and interpersonal skills (Conway,1999; Scullen et al., 2003). Hence we hypothesized:

Hypothesis 5: For managerial performance, SJTs measuring leader-ship skills will have stronger relationships than hetero-geneous composite SJTs.

Hypothesis 6: For managerial performance, SJTs measuring interper-sonal skills will have stronger relationships than het-erogeneous composite SJTs.

SJT format. Conceptually, SJTs can be developed to measure anyconstruct. Realistically however, some methods lend themselves to mea-surement of certain constructs more readily than others. In addition, thenature of our arguments for maintaining the method/construct distinctionin SJT research suggests that different SJT formats may be a potentialmoderator of SJT construct–performance relationships. Indeed, adminis-tration method can affect the equivalence of tests developed to measurethe same construct (Edwards & Arthur, 2007; Ployhart, Weekley, Holtz,& Kemp, 2003; Sackett et al., 2001; Schmitt, Clause, & Pulakos, 1996).


Therefore, important questions in the SJT literature concern whether dif-ferent constructs are measured with different test delivery formats (e.g.,video-based; paper-and-pencil) and whether test format moderates thecriterion-related validity of SJT construct domains (McDaniel, Whetzel,Hartman, Nguyen, & Grubb, 2006; Weekley & Jones, 1997).

Our literature review revealed that SJTs are most commonly paper-and-pencil and video-based formats. Video-based formats are arguablyhigher in physical and psychological fidelity than paper-and-pencil for-mats (cf. Bass & Barrett, 1972) because video-based SJTs are morelikely to depict ambient contextual details and hence should more realisti-cally reproduce the job performance content domain. Therefore, becausehigher-fidelity simulations more closely model actual work behaviors thanpaper-and-pencil tests, video-based SJTs should be more strongly relatedto actual job performance (McDaniel et al., 2006; Motowidlo et al., 1990;Weekley & Jones, 1997). In addition, video-based SJTs should enhanceapplicant test perceptions (e.g., face validity), which are positively relatedto performance (e.g., Chan & Schmitt, 1997; Edwards & Arthur, 2007;Smither, Reilly, Millsap, Pearlman, & Stoffey, 1993).

Further, a video-based format is likely to facilitate the measurementof constructs that rely on subtle social cues and complex contextual in-formation such as applied social skills. These cues can be replicated andtransmitted using a video-based SJT more easily than a paper-and-pencilSJT. Therefore, we expect that validities for applied social skills con-structs will be higher for video-based SJTs than for paper-and-pencilSJTs. Conversely, constructs such as job knowledge and skills do nottypically require such a high level of contextual information, as they relymore on the depiction of task elements rather than social elements. There-fore, there is likely no difference in validities between video-based andpaper-and-pencil SJTs that measure job knowledge and skills. As such,we hypothesized:

Hypothesis 7: For the domains of interpersonal skills, teamworkskills, and leadership skills, video-based SJTs will havestronger relationships with job performance than paper-and-pencil SJTs.

Method

Literature Search

An extensive search of computerized databases (PsycINFO, SocialSciences Citation Index, ABInform, and Google Scholar) along with thereference lists of previous meta-analyses of SJTs was conducted to identify


studies that reported the use of SJTs. In addition, unpublished manuscriptswere requested from researchers identified as having presented a relevantpaper at the annual conference of Academy of Management and the So-ciety for Industrial and Organizational Psychology or having publishedresearch on SJTs in 2005–2008. Finally, authors of articles in which thedescriptive statistics were not reported in the manuscript were contactedto obtain these data. Based on our search, we obtained 161 manuscriptsand articles.

Inclusion Criteria

Studies were included if they used any format of delivery for theSJT (e.g., video based, Web based, computer based, paper and pencil,interview). We omitted studies that measured predictor or performanceconstructs relevant only to students (e.g., study skills, GPA). After im-plementing these criteria, we obtained 136 independent data points from85 studies for the typology, of which 134 independent data points from84 studies were useable (i.e., provided the data necessary) for the meta-analysis of criterion-related validity.

Coding of Constructs

With respect to recording the information for both the typology andmeta-analysis (e.g., correlations, artifacts, construct information), articleswere independently coded by two of the study authors. In case of disagree-ment, all three authors went back to the primary study to reach consensus.Initial agreement for the predictor construct coding was 95%. Constructswere recorded at the lowest level of specificity from studies reporting theconstructs assessed or providing enough content information for us to de-termine constructs measured. Some studies provided lists of the KSAOsmeasured in addition to broad constructs for which a score was provided(e.g., giving advice and demonstrating empathy were KSAOs bundledunder the construct label working effectively with others in one study;O’Connell, McDaniel, Grubb, Hartman, & Lawrence, 2007). We classi-fied SJTs into our typology based on the lowest level construct for whicha score was provided (i.e., score-level constructs). The remainder of thispaper refers to the construct labels at the lowest level of specificity forwhich a score was provided (i.e., score level) as constructs. For example,although O’Connell et al. (2007) reported measuring giving advice anddemonstrating empathy, scores were not provided at this level. Instead,a score was provided that was the composite of these two constructs.O’Connell et al. labeled the composite working effectively with others,


which was the construct we coded. In the absence of a label, we lookedfor KSAOs or item-level content to determine the construct.

Construct Domains and Construct Typology

We sorted all of the SJT constructs into the construct domains inHuffcutt et al.’s (2001) typology. Of the 136 independent effects in thetypology, 36 had different construct labels, which were sorted into theeight domains shown in Table 1. These domains were subsumed underthe construct categories, knowledge and skills, applied social skills, andbasic personality tendencies from Huffcutt et al. (2001). We found nostudies measuring domains subsumed under Huffcutt et al.’s categoriesof mental ability, interests and preferences, organizational fit, and phys-ical attributes. Finally, we created a fourth category for SJTs that wereunclassifiable because they solely reported effect sizes based on com-posite scores. We labeled the fourth category heterogeneous composites.Although we argued that heterogeneous composites are not theoreticallymeaningful in a construct-based approach, we reported these data to illus-trate the frequency with which composites are used and to compare thecriterion-related validity of composites and specific construct domainsacross the performance dimensions. Our typology and the results of ourconstruct sort are presented in Table 1.

Knowledge and skills. The knowledge and skills category was definedas consisting of three potential construct domains by Huffcutt et al. (2001):job knowledge and skills, education and training, and work experience.We found one of these three construct domains, job knowledge and skills,to be representative of the constructs that we sorted into this category.The construct domain job knowledge and skills includes constructs thatassessed declarative or procedural knowledge. Included within this con-struct domain, for example, are SJTs designed to assess knowledge ofmilitary procedures, knowing how units are interrelated in the military, orknowledge of how to prioritize job tasks.

Applied social skills. Applied social skills contains constructs relatedto a respondent’s skills to function effectively in social situations andinterpersonal contexts (Huffcutt et al., 2001). We found three constructdomains to be subsumed within this category: interpersonal skills, team-work skills, and leadership skills. Interpersonal skills were defined associal skills that relate to an individual’s skill in interacting with oth-ers. Within this construct domain, we included constructs that measuredcustomer service skills, interaction skills, and negotiations. The secondconstruct domain contained within the applied social skills category thatwe identified was teamwork skills. Teamwork skills involve skills spe-cific to team settings that may promote team effectiveness. For example,


teamwork may involve a combination of collaborative problem solving,coordination, and team planning (e.g., Ellis, Bell, Ployhart, Hollenbeck,& Ilgen, 2005; Morgeson et al., 2005). Applied social skills constructssuch as “working effectively with others” were not classified into the team-work skills category unless they specifically referenced team settings. Thethird construct domain contained within the applied social skills categorywe identified was leadership skills. This domain included SJT constructsdesigned to assess general management skills, such as leadership, super-vision, or administrative skills (e.g., Campbell et al. 1993), as well asmore specific leadership skills such as resolving conflicts among subordi-nates, organizing and structuring work assignments, or handling employeeproblems.

Basic personality tendencies. A number of SJTs identified in the lit-erature measured personality constructs. Consistent with current researchon personality, we conceptualized personality as relatively enduring dis-positional factors that relate to how employees act in the workplace. Theonly personality construct domain for which there were a reasonable num-ber of effects to perform a meta-analysis was Conscientiousness with ak of seven data points. Four other SJTs that measured personality repre-sented a combination of different personality construct domains, includingvarious composites of Conscientiousness, Agreeableness, Emotional Sta-bility, adaptability, and integrity. We included these in our meta-analysisfor illustrative purposes.

Heterogeneous composites. Many researchers provided detailed het-erogeneous lists of the constructs measured by an SJT but did not provideconstruct level scores. In many cases, a heterogeneous composite scorefor each SJT was reported, making it impossible to sort them into thethree categories. For example, if an SJT appeared to measure communica-tion skills Conscientiousness, and leadership ability but only a compositescore was reported, we considered the composite score uninterpretablefor the purposes of construct-level information. We identified 45 effectsthat either did not specify the constructs measured by SJTs or collapsedacross several constructs and reported a composite score. We includedthese in Table 1 to document the number of SJTs that did not identify theconstructs measured or presented method-level composite scores that forcomparative purposes were uninterpretable.

Criterion Types

Consistent with other meta-analyses of predictor methods, in our firstset of analyses we combined all performance criteria into an omnibusanalysis of the criterion-related validity for each SJT construct domain.We also separated task, contextual, and managerial performance facets


into distinct categories using two decision rules. First, we sorted the per-formance facets using operationalizations consistent with definitions fortask and contextual performance provided by Motowidlo and Van Scot-ter (1994), Van Scotter and Motowidlo (1996), and Hurtz and Dono-van (2000): (a) task performance, or the degree to which an individualcan perform tasks that are central to the job (e.g., technical skill, salesperformance, use of technical equipment, job duties, or core technicalproficiency); (b) contextual performance, or behaviors not formally re-quired by the job, including interpersonal facilitation (e.g., building andmending relationships, cooperation, helping, consideration, interpersonalrelations) and job dedication (e.g., motivation, effort, drive, initiative);and (c) managerial performance, or ratings by peers or supervisors ofan individual’s behaviors related to leadership, interpersonal managementskills, management, or administrative duties that constitute core manage-ment responsibilities (Borman and Brush, 1993; Conway, 1999; Scullenet al., 2003).

For studies that did not report the specific performance label, we as-sessed the extent to which the job description or job title indicated thata particular facet of performance was a core task. As noted by others(e.g., Borman & Motowidlo, 1993; Conway, 1999; Witt and Ferris, 2003),the distinction between facets of performance may be blurred depend-ing on the job context. Therefore, in some cases we used the job titleto determine whether ratings were most appropriately labeled as task,contextual, or managerial performance. For example, when an interper-sonal criterion was rated in a customer service job (e.g., customer contactskills), we classified it as task performance because we assumed thatcustomer contact skills are core technical requirements of this job. Con-versely, an interpersonal criterion assessed in a manufacturing job (e.g.,ratings of cooperativeness) was considered contextual performance be-cause we assumed that interpersonal skills are not formally required inmany manufacturing jobs. Finally, when an interpersonal criterion that isa core part of a managerial job was used for a management position (e.g.,communication skills), we classified it as managerial performance. Initialagreement in coding of the criterion facets (before reaching consensus)was 92%.

Criterion-Related Validity Analyses

We used meta-analysis (e.g., Hunter & Schmidt, 2004; Raju, Burke,Normand, & Langlios, 1991) to calculate corrected mean population-level estimates of the criterion-related validity of each construct domain.This procedure allows for the correction of effect sizes for measurementerror and other statistical artifacts, based on the idea that differences in


TABLE 2Mean Sample-Based Reliability Estimates Used for Analyses

Analysis k N Estimate of reliability

Job performance 22 2,169 .58Task performance 6 819 .59Contextual performance 5 642 .51Managerial performance 5 288 .67

Note. Each artifact distribution was calculated as a sample-size weighted average for allstudies that reported interrater reliability information. Estimates of range restriction werenot available in the primary studies.

the results of primary studies are due to statistical artifacts rather thanactual differences within the population. For this analyses, we calculatedapproximately defined artifact distributions using sample-size weightedestimates taken from the sample of studies for each estimated effect, apractice that generates slightly more accurate estimates of the mean andvariance of rho than population-level standard errors (Raju et al., 1991).We corrected for unreliability in the criterion (i.e., operational validity)using estimates of interrater reliability, which account for more sourcesof measurement error than internal consistency (Schmidt, Viswesvaran,and Ones, 2000). When a study reported an interrater reliability estimate,this was used in our corrections for that study; however when a study didnot report interrater reliability, we corrected using the assumed reliabilityestimate found in Table 2. These assumed values were computed usinga sample-weighted mean estimate from the distribution of studies foreach effect. No corrections for range restriction were made because theseestimates were not available from most studies.

In addition, we utilized a random effects model, which results in moreaccurate Type I error rates and more realistic confidence intervals thandoes a fixed effects model (e.g., Erez, Bloom, & Wells, 1996; Overton,1998). Therefore, we placed a 95% confidence interval around each mean-corrected effect, which represents the extent to which the corrected effectmay vary if other studies from the population were included in the analysis(for elaboration, see Burke & Landis, 2003). We also calculated credibil-ity intervals, which indicate the extent to which correlations varied acrossstudies for a particular analysis distribution; that is, 80% of the values inthe population are contained within the bounds of the interval (Hunter &Schmidt, 2004). Finally, in many cases studies provided multiple correla-tions from the same sample and the same predictor construct or criterionconstruct, which were nonindependent (e.g., Lipsey & Wilson, 2001). Insuch cases, we created a single effect to represent the range of noninde-pendent effects using sample size-weighted composite correlations.


TABLE 3Omnibus Analysis of Criterion-Related Validities of SJT Construct Domains for

Job Performance

95% CI 80% CVConstruct category

and domain k N Mr Mρ L U SEM ρ L U SDρ

Knowledge and skillsJob knowledge and skills 4 695 .15 .19 .07 .32 .06 .08 .30 .09

Applied social skillsInterpersonal skills 17 8,625 .19 .25 .20 .31 .03 .12 .39 .11Teamwork skills 6 573 .29 .38 .26 .52 .07 .24 .53 .11Leadership 51 7,589 .21 .28 .24 .32 .02 .15 .40 .10

Basic personality tendenciesPersonality composites 4 423 .30 .43 .30 .57 .07 .35 .52 .14Conscientiousness 7 908 .19 .24 .13 .34 .05 .12 .35 .14

Heterogeneous composites 45 9,681 .21 .28 .24 .31 .02 .19 .36 .07

Note. k = the number of independent effect sizes included in each analysis; N = samplesize; Mr = mean sample-weighted uncorrected correlation; Mρ = operational validity(corrected for criterion unreliability); SEM ρ = standard error of Mρ ; SDρ = standarddeviation of rho.

Results

Classification Typology and Construct Frequency

The results of the construct classification are shown in Table 1. Thecolumn on the left indicates the construct category and domains into whichthe constructs were sorted. The middle column contains the constructlabels recorded from each study. The two columns on the right representthe number of effects at the construct level and the percent of the SJTs thatmeasured a specific construct domain, respectively. Of these studies, themajority measured leadership (37.50%), followed by interpersonal skills(12.50%), basic personality tendencies (9.56%), teamwork skills (4.41%),and job knowledge and skills (2.94%). SJTs that were unclassifiable (i.e.,they reported method-level composite effects) constituted 33.09% of thedata points.

Validity of SJTs for Construct Domains

Table 2 presents the artifact distributions used in each analysis. Unlessreported otherwise, for all mean effects reported below, the 95% confi-dence interval did not overlap zero. Specific information on the confidenceintervals and other meta-analytic findings are reported in Tables 3, 4 and 5.


TABLE 4Criterion-Related Validities of SJT Construct Domains for Job Performance

Facets



Contextual performanceKnowledge and skills

Job knowledge and skills 1 83 .27 - - - - - - -Applied social skills

Interpersonal skills 3 1,364 .17 .21 .04 .39 .09 .02 .40 .15Teamwork skills 6 573 .27 .35 .23 .47 .06 .24 .46 .09Leadership 5 3,034 .19 .24 .18 .31 .03 .17 .32 .06


Task performanceKnowledge and skills

Job knowledge and skills 1 82 .39 - - - - - - -Applied social skills

Interpersonal skills 6 1,818 .19 .25 .14 .36 .06 .10 .40 .12Teamwork skills 3 232 .39 .50 .32 .68 .09 .36 .64 .11Leadership 9 4,039 .17 .21 .15 .28 .03 .10 .33 .09

Basic personality tendenciesPersonality composites 3 316 .33 .45 .31 .60 .07 .37 .53 .13Conscientiousness 3 268 .30 .39 .28 .49 .05 - - .00


Managerial performanceKnowledge and skills

Job knowledge and skills 2 931 .19 .23 .13 .34 .06 .16 .31 .06Applied social skills

Interpersonal skills 2 297 .29 .36 .21 .51 .08 .29 .43 .06Leadership 17 3,769 .24 .29 .24 .35 .03 .17 .41 .09

Basic personality tendenciesConscientiousness 2 174 .05 .06 .00 .12 .03 - - .00

Heterogeneous composites 5 1,282 .10 .12 .08 .16 .02 - - .00

Note. k = the number of independent effect sizes included in each analysis; N = samplesize; Mr = mean sample-weighted uncorrected correlation; Mρ = operational validity(corrected for criterion unreliability); SEM ρ = standard error of Mρ ; SDρ = standarddeviation of rho. Credibility values were not computed for effects that had zero estimatesfor SDρ. Job knowledge and skills estimates were not computed for task or contextualperformance due to lack of data, however single data points are presented.

We do caution the reader that in some cases the estimates are based onlow ks and should be interpreted with caution.

Table 3 presents the results of the omnibus analysis of criterion-relatedvalidity for each of the SJT construct domains. As shown in Table 3, SJTsthat measured teamwork skills had a mean validity of .38. SJTs thatassessed leadership skills had a mean validity of .28. SJTs assessing


TABLE 5Effects of SJT Format on the Criterion-Related Validities of SJT Construct

Domains (Across Job Performance Facets)



Applied social skillsInterpersonal skills

Paper-and-pencil 15 8,182 .20 .27 .22 .32 .03 .16 .38 .08Video-based 2 437 .36 .47 .39 .55 .04 - - .00

LeadershipPaper-and-pencil 47 6,938 .21 .27 .23 .31 .02 .14 .41 .10Video-based 4 651 .25 .33 .25 .40 .04 - - .00

Heterogeneous CompositesPaper-and-pencil 40 7,316 .20 .25 .22 .29 .02 .17 .33 .06Video-based 5 2,365 .28 .36 .30 .42 .03 .31 .41 .04

Note. k = the number of independent effect sizes included in each analysis; N = samplesize; Mr = mean sample-weighted uncorrected correlation; Mρ = operational validity(corrected for criterion unreliability); SEM ρ = standard error of Mρ ; SDρ = standarddeviation of rho. Credibility values were not computed for effects that had zero estimatesfor SDρ .

interpersonal skills had a mean validity of .25; and SJTs assessingConscientiousness had a mean validity of .24. Although based on onlyfour studies each, SJTs measuring job knowledge and skills had a meanvalidity of .19, and personality composites had a mean validity of .43.Finally, for heterogeneous composite SJTs, we obtained a mean validityof .28.

Moderator Analyses

Criterion-related validity by criterion facet. Table 4 presents the re-sults of the analyses of criterion-related validity for each construct domain,broken down by task, contextual, and managerial performance. SJT con-struct domains had criterion-related validities that were consistent withour hypotheses for trends in their magnitudes within each criterion type,although in many cases confidence intervals overlapped. First, for con-textual performance, we predicted that SJTs assessing interpersonal skills(Hypothesis 1), teamwork skills (Hypothesis 2), and leadership skills(Hypothesis 3) would have higher validities than heterogeneous SJTs.The validities were in the expected direction for SJTs assessing inter-personal skills (Mρ = .21), teamwork skills (Mρ = .35), and leadershipskills (Mρ = .24) when compared with heterogeneous composites (Mρ =.19), suggesting support for Hypotheses 1, 2, and 3; although the k for


interpersonal skills was only 3. For task performance, we hypothesizedthat SJTs assessing job knowledge and skills would have higher validitiesthan heterogeneous composite SJTs; unfortunately, we could not estimatethis relationship because there was only one primary study available foranalysis (r = .39). Finally, for managerial performance, we hypothesizedthat SJTs assessing leadership (Hypothesis 5) and interpersonal skills(Hypothesis 6) would have higher validities than heterogeneous compos-ite SJTs. SJTs assessing leadership (Mρ = .29) and interpersonal skills(Mρ = .36) were more strongly related to managerial performance thanwere heterogeneous composites (Mρ = .12), suggesting support for Hy-potheses 5 and 6, although the k for interpersonal skills was only 2.

Criterion-related validity by test format. Table 5 presents the resultsof the moderator analyses of test format. Consistent with Hypothesis 7,video-based SJTs tended to have stronger relationships with job perfor-mance than paper-and-pencil SJTs. Further, although the ks for video-based SJTs were relatively small, for two out of the three dimension com-parisons the confidence intervals for the video-based and paper-and-pencilformats did not overlap. Our estimates for video-based SJTs measuringinterpersonal skills (Mρ = .47) were higher than those for paper-and-pencil SJTs measuring interpersonal skills (Mρ = .27); however the kfor the video-based estimate was limited to 2. We also obtained higherestimates for video-based tests measuring leadership (Mρ = .33) thanfor paper-and-pencil tests measuring leadership (Mρ = .27), althoughthe confidence intervals overlapped and the video-based estimate wasbased on only four studies. Finally, video-based heterogeneous composites(Mρ = .36) had higher criterion-related validity than paper-and-pencil het-erogeneous composites (Mρ = .25).

Discussion

Although SJTs are widely used in employee selection and commonlyresearched in academe, little is known about the specific constructs as-sessed by SJTs because researchers and authors typically report results andframe their data more in method terms than in construct terms. Therefore,the primary objectives of this study were to (a) discuss the advantagesof attending to and reporting SJT construct-level versus method-level re-sults; (b) develop a typology of constructs that have been assessed bySJTs in the extant literature; and (c) undertake an initial examinationof the criterion-related and incremental validity of the identified con-structs and to investigate moderators of these validities. We view ourefforts as contributing to an initial classification and description of theconstructs assessed by the SJT method in the extant literature. We do,however, recognize that as with any first attempt to provide structure to a


nebulous body of literature (e.g., Arthur et al., 2003; Huffcutt et al., 2001),our undertaking will likely be refined and expanded as future research isconducted.

With regard to our first objective, we believe a fundamental issuelimiting advancements in understanding and using SJTs for selection isthe common failure to disentangle the effects of the measurement method(i.e., the SJT) from the constructs measured by the test. This has significantimplications for the use of test scores. For instance, it limits the abilityto compare different predictors that are confounded by method and/orconstruct variance (i.e., comparing apples and oranges). Understandingwhy a given test predicts performance is important to both researchersand practitioners for measurement, theory testing, or establishing the job-relevancy of the selection tool. Further, the failure to attend to constructslimits the generalizability of SJTs. For example, if one researcher reports acriterion-related validity coefficient of .19 for an SJT in a textile company,the only information gained is that the SJT predicts performance forthat job in that company. Without any information about the constructsmeasured, it remains unclear why the particular test is valid or whether itwould be valid in another job or industry. Identification of the construct(s)measured by the SJT, however, offers a point of comparison that wouldenable practitioners to transport SJTs across contexts.

As part of our construct-based approach, we identified the constructsmeasured by SJTs in the extant literature (relying on studies that reportedconstruct information). The underlying rationale behind this objective wasthat a construct typology would provide a common and systematic frame-work for understanding and applying SJT constructs. We found that asubstantial number (33%) of the SJTs in the literature did not report theconstructs measured, did not provide enough information to determine theconstructs measured, or provided only a composite score, which collapsedacross multiple constructs. Nevertheless, our analyses revealed that SJTsare in some cases developed to assess specific constructs, most often lead-ership skills (38%) and interpersonal skills (13%). Less frequently, SJTstudies reported assessing teamwork skills (4%), personality tendencies(10%), and job knowledge (3%). Plausibly, SJTs are often used to measureleadership skills and interpersonal skills because they offer a convenientmethod for sampling applicants’ performance on complex tasks that areotherwise expensive, time consuming, or difficult to assess. In particular,SJTs are well suited to measure behaviors elicited by complex interper-sonal and administrative situations, as they often contain ambient detailsthat create rich representations of contextual features. Moreover, althoughother simulation-based predictor methods offer similar benefits (i.e., work-samples, assessment centers, situational interviews), SJTs typically havea much lower cost of administration and scoring (e.g., Motowidlo et al.,1990; Weekley & Jones, 1999).


Our final objective was to meta-analytically assess the criterion-relatedvalidity of each construct domain. Teamwork skills, leadership skills, andinterpersonal skills all exhibited relatively high validities for job perfor-mance. The criterion-related validities we obtained for Conscientiousnessand job knowledge were relatively lower. In addition, we performed twosets of moderator analyses to highlight the benefits of taking a construct-based approach in SJT research. Analyses of the relationships betweeneach predictor construct domain and narrow job performance facets pro-vided support for our typological framework by demonstrating a patternof relationships (i.e., differential validities) that was consistent with ex-pectations derived from content-based matching of predictor and criterionconstructs. Nevertheless, we offer two caveats. First, several of our es-timates may be unstable because they were based on small k (e.g., therelationship between teamwork skills and task performance was based onk = 3). Second, and likely a result of the first caveat, a few results werenot in the direction that might be predicted from the literature. For exam-ple, teamwork skills had higher validities for task performance than forcontextual performance, but teamwork skills logically seem to be moreimportant for contextual performance. Nevertheless, overall, our findingsare of practical significance in that appropriate matching between the con-tent domains of predictors and criteria can strengthen the criterion-relatedvalidity of SJTs. Hence, the construct-based approach also could be ad-vantageously applied to the performance domain, in which researchershistorically report results using overall or composite job performance (cf.Campbell, 1990).

We have argued that the construct-based approach is a powerful tool forisolating variance due to constructs from method variance. Specifically,the second set of moderator analyses were intended to illustrate the role ofmethod characteristics in determining which constructs might be measuredby different formats. We expected that the type of SJT format wouldmoderate SJT criterion-related validities, perhaps by influencing whichconstructs were measured and how well they were measured. In linewith expectations, we found that for each construct domain, video-basedSJTs were more strongly correlated with performance than paper-and-pencil SJTs. This information is a significant contribution to the literaturebecause we were able to cross methods and constructs in similar fashion tothe suggestions of Campbell and Fiske (1959) for multitrait, multi methodmatrices.

Why SJT Research Has Been Method-focused Rather Than ConstructFocused

There are several possible explanations for why researchers have ne-glected to specify the constructs measured by SJTs. First, although SJTs


are commonly considered methods, they are often treated as if they areactually measuring a single construct (e.g., situational judgment, practicalintelligence). Indeed, even when researchers identified the KSAOs beingmeasured, many studies in our sample reported a single composite scorelabeled “situational judgment” (e.g., Chan & Schmitt, 2002; Motowidloet al., 1990; Smith & McDaniel, 1998; Swander, 2000; Weekley & Jones,1997, 1999). Second, it may simply be difficult to create SJTs that canbe scored at the construct level, given current developmental paradigms(Ployhart & Ryan, 2000a). As Ployhart and Ryan suggested, refinementsto typical critical incident-based development procedures may be effec-tive; specifically, researchers could delineate the constructs to be assesseda priori, conceptualize how the constructs should manifest in work situa-tions, and write response options that correspond to the range of a singlebehavior (i.e., high or low on “demonstrating effort”) rather than multipletypes of behaviors.

Finally, perhaps one of the most important reasons that SJT researchhas not focused on construct-level information is that I-O psycholo-gists have only recently begun developing and implementing a construct-oriented paradigm for selection research. As noted by Schmitt and Chan(2006), it remains a problem that “Our field as a whole . . . is more apt todiscuss the validity of methods rather than the validity of measurement ofconstructs” (p. 136). Indeed, although the idea of construct validity hasbeen around for decades (e.g., Campbell & Fiske, 1959), the emphasis onconstructs in much of the personnel selection research has only recentlygained in importance (e.g., Anastasi & Urbina, 1997; Arthur & Villado,2008; Binning & Barrett, 1989; Huffcutt et al., 2001; Messick, 1995;Roth et al., 2008). As a result, the importance of constructs may havebeen less salient to researchers for much of the older SJT literature. Onthe other hand, recent studies are not immune to the problem; we identifiedseveral recently published studies that neglected to report construct-levelinformation.

Where Do We Go From Here? Recommendations for SJTResearch and Practice

Although other researchers have pointed out that SJTs are methodsof measurement and should therefore attend to construct-level informa-tion (e.g., McDaniel et al., 2001; Schmitt & Chan, 2006), the resultsof this study suggest a need for further development of a construct-oriented paradigm in SJT research. Many of the limitations found inour meta-analysis illustrate the state of the literature at present and there-fore highlight gaps that could benefit from a construct-based approach.As such, based on our observations, we next offer recommendations for


research and for practice, with the recognition that the two are not mutuallyexclusive.

Research recommendations. First, SJT researchers should report de-tailed construct information. A relatively large number of studies (33%)failed to report the constructs measured or reported composite method-level information (in the heterogeneous category). As a result, one limi-tation of this research was small sample sizes for a number of constructdomains. We urge researchers to maintain the distinction between methods(e.g., SJTs) and constructs (e.g., leadership skills) by reporting informa-tion about the specific constructs measured by SJTs as well as reliabilityestimates, means, standard deviations, group differences, and intercorre-lations with other constructs. Providing this information would allow formore meaningful comparisons of subgroup differences (e.g., race, sex),validity (e.g., construct, criterion-related, incremental validity), or test-taker reactions to different measurement methods or different predictorconstructs. In this vein, we believe that the typology developed in thisstudy will facilitate the reporting of constructs by providing researchersand practitioners with a common framework to communicate researchfindings and validity evidence of constructs (cf. Fleishman & Quintance,1984; Hough & Ones, 2001).

Furthermore, refinements to construct validation procedures typicallyused for SJTs would be helpful. For instance, without evidence of con-vergent or discriminant validity, we were unable to determine the fullextent to which the SJTs reported in the literature actually measured theconstructs they were purported to measure. Of course, this is a criticismleveled at any meta-analysis of predictive validity in which meta-analystsrely on the primary authors’ conclusion that the tests being analyzed ac-tually measured the intended constructs. As primary researchers beginto provide more information about both the constructs measured and theextent to which the SJT displayed convergent or discriminant validitywith other measures, this criticism can be addressed. In particular, futuremeta-analytic research could combine the approach taken in this study(i.e., coding constructs based on labels and content) with the approachtaken by McDaniel and colleagues (i.e., correlating SJTs with measuresof constructs) to obtain multiple sources of construct validity evidence.

In addition, researchers should utilize SJT construct information tohold constructs constant, in order to identify and investigate various fea-tures of SJT methodology that impact relevant outcomes. For example,one could compare different dimensions of stimulus material (e.g., paper-and-pencil, computerized), different response modalities (e.g., written,oral), or different scoring strategies (e.g., empirical keying, subject matterexperts). Such comparisons are only meaningful if the construct is heldconstant across different methodological dimensions.


Moreover, it is also the case that predictor methods could influence orconstrain the constructs measured (e.g., Arthur & Villado, 2008; Messick,1995). For example, the measurement of applied social skills might bemore easily done with video-based testing than paper-and-pencil testingbecause video-based tests provide more details and contextual information(e.g., nonverbal behaviors; environmental cues) that are important to socialskills. Likewise, SJTs are well suited to measure dimensions of contextualjob knowledge, which would be applied to practical problems that are ill-defined, contain incomplete information, or have different solutions. Thistype of knowledge might be contrasted to job knowledge that relies on factsand procedures that are well defined. In fact, Schmitt and Chan (2006) havealready posited that SJTs place some constraints on the range of constructsmeasured and it would be helpful to identify such boundary conditions.As part of this effort, we would encourage researchers to examine theextent to which there is a strong method factor or higher-order constructmeasured by SJTs (e.g., judgment or practical intelligence; Schmitt &Chan, 2006).

In addition, whenever possible, SJT researchers should conduct pre-dictive validation studies. Another limitation of this meta-analysis is thatwe were unable to correct for range restriction because most studies inour dataset were either conducted concurrently or failed to provide suffi-cient information to make this correction. As a result, our estimates areconservative. Furthermore, given that SJTs are job-centered tests often de-veloped to assess job-relevant behaviors and validated by checking themagainst the criteria of behaviors actually performed on the job, then theconcurrent validation studies we reviewed may be considered estimatesof convergent validity. Concurrent, cross-sectional studies are suggestivebut cannot adequately evaluate substantive theoretical links between SJTand criterion constructs. Therefore, we would recommend the use of lon-gitudinal, predictive criterion-validation designs.

Practice recommendations. Practitioners can benefit from theconstruct-based approach by identifying the focal construct(s) of inter-est before choosing a selection methodology with which to measure theconstruct(s). In practice, job analysis determines which constructs are tobe measured, but practitioners have some latitude in determining whichmethod of measurement to use. Our meta-analytic validity estimates canbe useful for seeking methods to measure specific KSAOs. For a givenpredictor construct, practitioners may consult our study to compare theexpected validity with other methods such as interviews (Huffcutt et al.,2001) or higher fidelity simulations (e.g., assessment centers; Arthur et al.,2003). For example, the results of this study indicated that the criterion-related validity for SJTs measuring teamwork skills (Mρ = .38) wasslightly higher than the criterion-related validity for the consideration and


awareness of others dimension of assessment centers (Mρ = .33) obtainedby Arthur et al. (2003).

In addition, practitioners should consider the criterion carefully whenchoosing predictor construct(s) to measure using SJTs. Our results demon-strated the potential for nontrivial increases in validity when SJT predic-tor constructs were matched conceptually with narrowly defined relevantcriteria. Our data also suggested that one realm where SJTs often areappropriately matched between predictor construct and criterion is forthe prediction of managerial performance. Our results showed that 17of the 38 data points for managerial performance were SJTs measuringleadership. In most cases SJTs measuring specific constructs had strongervalidities than heterogeneous composites. Therefore, in the interest ofmaximizing predictive power, SJT test developers should consider thecriterion of interest when selecting a specific construct to measure usinga given SJT.

Conclusions

In conclusion, we have highlighted the importance of a construct-basedfocus in SJT research. We urge researchers to present results at the con-struct level when possible (Arthur & Villado, 2008). Such information, asnoted by Huffcutt et al. (2001), Arthur et al. (2003), and Roth et al. (2008)in their similar request with regard to interviews, assessment centers, andwork samples, will provide future researchers and practitioners with betterconceptual, theoretical, and practical understanding of SJTs.

REFERENCES

∗indicates studies included in the meta-analysis and typology.†indicates studies included in the typology only.American Educational Research Association, American Psychological Association, & Na-

tional Council on Measurement in Education. (1999). Standards for educational andpsychological testing. Washington, DC: American Psychological Association.

Anastasi A, Urbina S. (1997). Psychological testing. Upper Saddle River, NJ: Prentice Hall.∗Anonymous. (1954). Validity information exchange (No. 7-065). PERSONNEL PSYCHOL-

OGY, 7, 201.Arthur W, Jr., Day EA, McNelly TL, Edens PS. (2003). A meta-analysis of the criterion-

related validity of assessment center dimensions. PERSONNEL PSYCHOLOGY, 56,125–154.

Arthur W, Jr., Villado AJ. (2008). The importance of distinguishing between constructs andmethods when comparing predictors in personnel selection and research in practice.Journal of Applied Psychology, 93, 435–442.

∗Banki S, Latham GP. (2008, August). The validity of the situational interview and sit-uational judgment test in Iran. Paper presented at the 67th annual meeting of theAcademy of Management, Anaheim, CA.


Bartram D. (2005). The great eight competencies: A criterion-centric approach to validation.Journal of Applied Psychology, 90, 1185–1203.

Bass BM, Barrett GV. (1972). Man, work and organizations: An introduction to industrialand organizational psychology. Oxford, UK: Allyn & Bacon.

∗Bass BM, Karstendick B, McCullough, G, Pruitt RC. (1954). Validity information ex-change (No 7-024). PERSONNEL PSYCHOLOGY, 7, 159–60.

∗Bergman ME, Donovan MA, Drasgow F, Overton RC. (2001a). Assessing contextualperformance: Preliminary tests of a new framework. Paper presented at the 16thAnnual Conference of the Society for Industrial and Organizational Psychology,San Diego, CA.

∗Bergman ME, Drasgow F, Donovan MA, Juraska SE. (2001b). Scoring situational judg-ment tests. Paper presented at the 18th Annual Conference of the Society for Indus-trial and Organizational Psychology, Orlando, FL.

Binning JF, Barrett GV. (1989). Validity of personnel decisions: A conceptual analysis ofthe inferential and evidential bases. Journal of Applied Psychology, 74, 478–494.

Borman W, Brush D. (1993). More progress toward a taxonomy of managerial performancerequirements. Human Performance, 6, 1–21.

∗Borman WC, Hanson MA, Oppler SH, Pulakos ED, White LA. (1993). Role of earlysupervisory experience in supervisor performance. Journal of Applied Psychology,78, 443–449.

Borman WC, Motowidlo SJ. (1993). Expanding the criterion domain to include elementsof contextual performance. In Schmitt N, Borman WC (Eds.), Personnel selectionin organizations (pp. 71–98). San Francisco: Jossey-Bass.

Borman WC, Motowidlo SJ. (1997). Task performance and contextual performance: Themeaning for personnel selection. Human Performance, 10, 99–109.

Borman WC, White LA, Dorsey DW. (1995). Effects of rate task performance and in-terpersonal factors on supervisor and peer performance ratings. Journal of AppliedPsychology, 80, 168–177.

∗Bruce MM. (1953). The prediction of effectiveness as a factory foreman. PsychologicalMonographs: General and Applied, 67, 1–17.

∗Bruce MM, Friesen EP. (1956). Validity information exchange (No. 9-35). PERSONNEL

PSYCHOLOGY, 9, 380.∗Bruce MM, Learner DB. (1958). A supervisory practices test. PERSONNEL PSYCHOLOGY,

11, 207–216.Burke MJ, Landis RS. (2003). Methodological and conceptual challenges in conducting and

interpreting meta-analyses. In Murphy KR (Ed.), Validity generalization: A criticalreview (pp. 287–309). Mahwah, NJ: Erlbaum.

Campbell JP. (1990). Modeling the performance prediction problem in industrial and orga-nizational psychology. In Dunnette MD, Hough LM (Eds.), Handbook of industrialand organizational psychology, 2nd ed., (Vol. 1, pp. 39–74). Palo Alto, CA: Con-sulting Psychologists Press.

Campbell DT, Fiske DW. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

Campbell JP, McCloy RA, Oppler SH, Sager CE. (1993). A theory of performance. InSchmitt N, Borman WC (Eds.), Personnel selection in organizations (pp. 35–70).San Francisco: Jossey Bass.

∗Canter RR. (1951). A human relations training program. Journal of Applied Psychology,35, 38–45.

∗Carter G. (1952). Measurement of supervisory ability. Journal of Applied Psychology, 36,393–395.

Chan D. (1997). Racial subgroup differences in predictive validity perceptions on person-ality and cognitive ability tests. Journal of Applied Psychology, 82, 311–320.


∗Chan D. (2002, April). Situational effectiveness x proactive personality interaction onjob performance. Paper presented at the 17th Annual Conference of the Society forIndustrial and Organizational Psychology, Toronto, Canada.

Chan KY, Drasgow F. (2001). Toward a theory of individual differences and leadership:Understanding the motivation to lead. Journal of Applied Psychology, 86, 481–498.

Chan D, Schmitt N. (1997). Video-based versus paper-and-pencil method of assessmentin situational judgment tests: Subgroup differences in test performance and facevalidity perceptions. Journal of Applied Psychology, 82, 143–159.

∗Chan D, Schmitt N. (2002). Situational judgment and job performance. Human Perfor-mance, 15, 233–254.

∗Clevenger J, Pereira GM, Wiechmann D, Schmitt N, Schmidt-Harvey V. (2001). Incre-mental validity of situational judgment tests. Journal of Applied Psychology, 86,410–417.

∗Clevenger J, Pereira GM, Wiechmann D, Schmitt N, Schmidt-Harvey V. The situa-tional judgment inventory as a measure of contextual job knowledge. Unpublishedmanuscript.

∗Colonia-Willner R. (1998). Practical intelligence at work: Relationship between aging andcognitive efficiency among managers in a bank environment. Psychology and Aging,13, 45–57.

Connelly MS, Gilbert JA, Zaccaro SJ, Threlfall KV, Marks MA, Mumford MD. (2000).Exploring the relationship of leadership skills and knowledge to leader performance.Leadership Quarterly, 11, 65–87.

Conway J. (1999). Distinguishing contextual performance from task performance for man-agerial jobs. Journal of Applied Psychology, 84, 3–13.

∗Cucina JM, Vasilopoulos NL, Leaman JA. (2003, April). The bandwidth-fidelity dilemmaand situational judgment test validity. Poster presented at the 18th Annual Confer-ence of the Society for Industrial and Organizational Psychology, Orlando, FL.

∗Decker RL. (1956). An item analysis of how supervise? Using both internal and externalcriteria. Journal of Applied Psychology, 40, 406–411.

∗Dicken CF, Black JD. (1965). Predictive validity of psychometric evaluations of supervi-sors. Journal of Applied Psychology, 49, 34–47.

∗Dulsky SG, Krout MH. (1950). Predicting promotion potential on the basis of psycholog-ical tests. PERSONNEL PSYCHOLOGY, 3, 345–351.

Edwards BD, Arthur W, Jr. (2007). An examination of factors contributing to a reduction inrace–based subgroup differences on a constructed response paper–and–pencil testof scholastic achievement. Journal of Applied Psychology, 92, 794–801.

∗Edwards WR, Schleicher DJ. (2004). On selecting psychology graduate students: Validityevidence for a test of tacit knowledge. Journal of Educational Psychology, 96,592–602.

∗Elias DA, Shoenfelt EL. (2001, April). Use of a situational judgment test to measureteamwork components and their relationship to overall teamwork performance.Paper presented at the 16th Annual Conference of the Society for Industrial andOrganizational Psychology, San Diego, CA.

Ellis APJ, Bell BS, Ployhart RE, Hollenbeck JR, Ilgen DR. (2005). An evaluation ofgeneric teamwork skills training with action teams: Effects on cognitive and skill-based outcomes. PERSONNEL PSYCHOLOGY, 58, 641–672.

Erez A, Bloom MC, Wells MT. (1996). Using random rather than fixed effects modelsin meta-analysis: Implications for situational specificity and validity generalization.PERSONNEL PSYCHOLOGY, 49, 275–306.

Ferris GF, Witt LA, Hochwarter WA. (2001). Interaction of social skill and general mentalability on job performance and salary. Journal of Applied Psychology, 86, 1075–1082.


File QW. (1945). The measurement of supervisory quality in industry. Journal of AppliedPsychology, 29, 381–387.

File QW, Remmers HH. (1971). How supervise? Manual 1971 revision. Cleveland, OH:Psychological Corporation.

Fleishman EA, Quaintance MK. (1984). Taxonomies of human performance: The descrip-tion of human tasks. Orlando, FL: Academic Press, Inc.

∗Forhand GA, Guetzkow H. (1961). The administrative judgment test as related to descrip-tions of executive judgment behaviors. Journal of Applied Psychology, 45, 257–261.

∗Funke U, Schuler H. (1998). Validity of stimulus and response components in a videotest of social competence. International Journal of Selection and Assessment, 6,115–123.

∗Hanson MA. (1994). Development and construct validation of a situational judgment testof supervisory effectiveness for first-line supervisors in the U.S. Army. (Doctoraldissertation, University of Minnesota, 1994). Dissertation Abstracts International,56(2-B), 1138.

∗Hilton AC, Bolin SF, Parker JW, Jr., Taylor EK, Walker WB. (1955). The validity of per-sonnel assessments by professional psychologists. Journal of Applied Psychology,39, 287–293.

Hogan J, Holland B. (2003). Using theory to evaluate personality and job-performancerelations: A socioanalytic perspective. Journal of Applied Psychology, 88, 100–112.

∗Holmes FJ. (1950). Validity of tests for insurance office personnel. PERSONNEL PSY-CHOLOGY, 3, 57–69.

Hough LM, Ones DS. (2001). The structure, measurement, validity, and use of person-ality variables in industrial, work, and organizational psychology. In Anderson N,Ones DS, Sinangil HK, Viswesvaran C (Eds.), Handbook of industrial, work, andorganizational psychology (Vol. 1, pp. 233–377). London: Sage.

∗Howard A, Choi M. (2000). How do you assess a manager’s decision-making abilities? Theuse of situational inventories. International Journal of Selection and Assessment, 8,85–88.

Huffcutt AI, Conway JM, Roth PL, Stone NJ. (2001). Identification and meta-analyticassessment of psychological constructs measured in employment interviews. Journalof Applied Psychology, 80, 897–913.

∗Hunter DR. (2003). Measuring general aviation pilot judgment using a situational judgmenttechnique. The International Journal of Aviation Psychology, 13, 373–386.

Hunter JE, Schmidt FL. (2004). Methods of meta-analysis: Correcting error and bias inresearch findings. Newbury Park, CA: Sage.

Hurtz GM, Donovan JJ. (2000). Personality and job performance: The big five revisited.Journal of Applied Psychology, 85, 869–879.

∗Johnson RJ. (1954). Validity information exchange (No. 7-090). PERSONNEL PSYCHOL-OGY, 7, 567.

∗Jones C, Decotis TA. (1986). Video-assisted selection of hospitality employees. The Cor-nell H.R.A. Quarterly, 27, 68–73.

∗Jurgensen CE. (1959). Supervisory practices test. In Burros OK (Ed.), The fifth mentalmeasurements yearbook (pp. 946–947). Highland Park, NJ: Gryphon Press.

Kanfer R, Ackerman PL. (1989). Motivation and cognitive abilities: An integrative/aptitude-treatment interaction approach to skill acquisition. Journal of Applied Psychology,74, 657–690.

Katz RL. (1955). Skills of an effective administrator. Harvard Business Review, 33,33–42.


∗Kerr MR Tacit Knowledge as a predictor of managerial success: A field study. RoyalCanadian Mounted Police. Unpublished manuscript.

Lievens F, Buyse T, Sackett PR. (2005). The operational validity of a video-based situa-tional judgment test for medical college admissions: Illustrating the importance ofmatching predictor and criterion construct domains. Journal of Applied Psychology,90, 442–452.

Lievens F, Sackett PR. (2007). Situational judgment tests in high-stakes settings: Issuesand strategies with generating alternate forms. Journal of Applied Psychology, 92,1043–1055.

Lipsey MW, Wilson DB. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.∗Lobenz RE, Morris SB. (1999, April). Is tacit knowledge distinct from g, personality, and

social knowledge? Paper presented at the 14th Annual Convention of the Societyfor Industrial and Organizational Psychology, Atlanta, GA.

∗MacLane CN, Barton MG, Holloway-Lundy AE, Nickels BJ. (2001, April). Keeping score:Empirical vs. expert weights on situational judgment responses. Paper presentedat the 16th Annual Convention of the Society for Industrial and OrganizationalPsychology, San Diego, CA.

∗McClough AC, Rogelberg SG. (2003). Selection in teams: An exploration of the team-work knowledge, skills, and ability test. International Journal of Selection andAssessment, 11, 56–66.

∗McCormick EJ, Middaugh RW. (1956). The development of a tailor-made scoring key forthe how supervise? Test. PERSONNEL PSYCHOLOGY, 9, 27–37.

∗McCullough, G, Pruitt RC. (1954). Validity information exchange (No 7-024). PERSON-NEL PSYCHOLOGY, 7, 159–60.

McDaniel MA, Hartman NS, Whetzel DL, Grubb WL. (2007). Situational judgment tests,response instructions, and validity: A meta-analysis. PERSONNEL PSYCHOLOGY,60, 63–91.

McDaniel MA, Morgeson FP, Finnegan EB, Campion MA, Braverman EP. (2001). Use ofsituational judgment tests to predict job performance: A clarification of the literature.Journal of Applied Psychology, 86, 730–740.

McDaniel MA, Nguyen NT. (2001). Situational judgment tests: A review of practice andconstructs assessed. International Journal of Selection and Assessment, 9, 103–113.

McDaniel MA, Whetzel DL, Hartman NS, Nguyen NT, Grubb WL. (2006). Situationaljudgment tests: Validity and an integrative model. In Weekley J, Ployhart RE (Eds.),Situational judgment tests (pp. 183–203). Mahwah, NJ: Erlbaum.

∗McDaniel MA, Yost AP, Ludwick MH, Hense RL, Hartman NS. (2004, April). Incre-mental validity of a situational judgment test. Paper presented at the 19th AnnualConference of the Society for Industrial and Organizational Psychology, Chicago,IL.

∗McElreath J, Vasilopoulos NL. (2002, April). Situational judgment: Are most and leastlikely responses the same? Poster presented at the 17th Annual Conference of theSociety for Industrial and Organizational Psychology, Toronto, Can.

Messick S. (1995). Validity of psychological assessment: Validation of inferences from per-sons’ responses and performances as scientific inquiry into score meaning. AmericanPsychologist, 50, 741–749.

∗Meyer HH. (1951). Factors related to success in the human relations aspect of work-groupleadership. In Conrad HS (Ed.), Psychological Monographs: General and Applied(Vol. 65, pp. 1–12).

∗Meyer HH. (1956). An evaluation of a supervisory selection program. PERSONNEL PSY-CHOLOGY, 9, 499–513.


Mohammed S, Mathieu J, Bartlett A. (2002). Technical-administrative task performance,leadership task performance, and contextual performance: Considering the influenceof team- and task-related composition variables. Journal of Organizational Behavior,23, 795–814.

Moon H. (2001). The two faces of conscientiousness: Duty and achievement striving inescalation of commitment dilemmas. Journal of Applied Psychology, 86, 533–540.

∗Morgeson FP, Reider MH, Campion MA. (2005). Selecting individuals in team settings:The importance of social skills, personality characteristics, and teamwork knowl-edge. PERSONNEL PSYCHOLOGY, 58, 583–611.

∗Motowidlo SJ, Dunnette MD, Carter GW. (1990). An alternative selection procedure: Thelow-fidelity simulation. Journal of Applied Psychology, 75, 640–647.

∗Motowidlo SJ, Tippins N. (1993). Further studies of the low-fidelity simulation in the formof the situational inventory. Journal of Occupational and Organizational Psychol-ogy, 66, 337–344.

Motowidlo S, Van Scotter J. (1994). Evidence that task performance should be distinguishedfrom contextual performance. Journal of Applied Psychology, 79, 475–480.

∗Mowry HW. (1957). A measure of supervisor quality. Journal of Applied Psychology, 41,405–408.

∗Mumford T, Van Iddekinge C, Morgeson F, Campion M. (2008). The team role test:Development and validation of a team role knowledge situational judgment test.Journal of Applied Psychology, 93, 250–267.

Mumford M, Zaccaro S, Connelly M, Marks M. (2000). Leadership skills: Conclusionsand future directions. Leadership Quarterly, 11, 155–170.

∗Nguyen NT. (2004). Response instructions and construct validity of a situational judgmenttest. Proceedings of the 11th Annual Meeting of the American Society of Businessand Behavioral Sciences, Las Vegas, NV.

∗O’Connell MS, Doverspike D, Norris-Watts C, Hattrup K. (2001). Predictors of organi-zational citizenship behavior among Mexican retail salespeople. The InternationalJournal of Organizational Analysis, 9, 272–280.

∗O’Connell MS, McDaniel MA, Grubb WL, III, Hartman NS, Lawrence A. (2007). Incre-mental validity of situational judgment tests for task and contextual job performance.International Journal of Selection and Assessment, 15, 19–29.

∗Olson-Buchanan JB, Drasgow F, Moberg PJ, Mead AD, Keenan PA, Donovan MA. (1998).Interactive video assessment of conflict resolution skills. PERSONNEL PSYCHOL-OGY, 51, 1–24.

Overton RC. (1998). A comparison of fixed-effects and mixed (random-effects) modelsfor meta-analysis tests of moderator variable effects. Psychological Methods, 3,354–379.

∗Parry ME. (1968). Ability of psychologists to estimate validities of personnel tests. PER-SONNEL PSYCHOLOGY, 21, 139–147.

∗Paulin C, Hanson MA. (2001, April). Comparing the validity of rationally derived and em-pirically derived scoring keys for a situational judgment inventory. Paper presentedat the 16th Annual Conference of the Society for Industrial and OrganizationalPsychology, San Diego, CA.

Paunonen S, Rothstein M, Jackson D. (1999). Narrow reasoning about the use of broadpersonality measures for personnel selection. Journal of Organizational Behavior,20, 389–405.

∗Pereira GM, Harvey VS. (1999, April). Situational judgment tests: Do they measure ability,personality or both? Paper presented at the 14th Annual Conference of the Societyfor Industrial & Organizational Psychology, Atlanta, GA.

∗Phillips JF. (1992). Predicting sales skills. Journal of Business and Psychology, 7,151–160.


∗Phillips JF. (1993). Predicting negotiation skills. Journal of Business and Psychology, 7,403–411.

Ployhart R. (2006). Staffing in the 21st century: New challenges and strategic opportunities.Journal of Management, 32(6), 868–897.

∗Ployhart RE, Porr WB, Ryan AM. Developing situational judgment tests in a servicecontext: Exploring an alternative methodology. Unpublished manuscript.

†Ployhart RE, Ryan AM. (2000a). A construct-oriented approach for developing situationaljudgment tests in a service context. Unpublished manuscript.

Ployhart RE, Ryan AM. (2000b). Integrating personality tests with situational judgmenttests for the prediction of customer service performance. Symposium presented atthe annual conference of the Society for Industrial and Organizational Psychology,New Orleans, LA.

∗Ployhart RE, Weekley JA, Holtz BC, Kemp C. (2003). Web-based and pencil and papertesting of applicants in a proctored setting: Are personality, biodata, and situationaljudgment tests comparable? PERSONNEL PSYCHOLOGY, 56, 733–752.

∗Porr WB, Heffner TS. The incremental validity of situational judgment tests for perfor-mance prediction. Unpublished manuscript.

∗Pulakos ED, Schmitt N. (1996). An evaluation of two strategies for reducing adverseimpact and their effects on criterion-related validity. Human Performance, 9, 241–258.

∗Pulakos ED, Schmitt N, Chan D. (1996). Models of job performance rating: An examina-tion of rate race, gender, and rater level effects. Human Performance, 9, 103–119.

Raju NS, Burke MJ, Normand J, Langlois GM. (1991). A new meta-analysis approach.Journal of Applied Psychology, 76, 432–446.

Roth P, Bobko P, McFarland L, Buster M. (2008). Work sample tests in personnel se-lection: A meta-analysis of black-white differences in overall and exercise scores.PERSONNEL PSYCHOLOGY, 61, 637–661.

Rotundo M, Sackett P. (2002). The relative importance of task, citizenship, and counter-productive performance to global ratings of job performance: A policy-capturingapproach. Journal of Applied Psychology, 87, 66–80.

∗Rusmore JT. (1958). A note on the “test of practical judgment.” PERSONNEL PSYCHOL-OGY, 11, 37.

∗Sacco JM, Scheu CR, Ryan AM, Schmitt N, Schmidt DB, Rogg KL. (2000, April). Readinglevel and verbal test scores as predictors of subgroup differences and validities ofsituational judgment tests. Paper presented at the 15th Annual Conference of theSociety for Industrial and Organizational Psychology, New Orleans, LA.

Sackett PR, Schmitt N, Ellingson JE, Kabin MB. (2001). High stakes testing in employment,credentialing, and higher education: Prospects in a post-affirmative action world.American Psychologist, 56, 302–318.

∗Sartain AQ. (1946). Relation between scores on certain standard tests and supervisorysuccess in an aircraft factory. Journal of Applied Psychology, 29, 328–332.

Schmidt F, Hunter J. (1993). Tacit knowledge, practical intelligence, general mental ability,and job knowledge. Current Directions in Psychological Science, 2, 8–9.

Schmidt FL, Hunter JE, Outerbridge AN. (1986). Impact of job experience, work sample,and supervisory ratings of job performance. Journal of Applied Psychology, 71,432–439.

Schmidt F, Viswesvaran C, Ones D. (2000). Reliability is not validity and validity is notreliability. PERSONNEL PSYCHOLOGY, 53, 901–912.

Schmitt N, Chan D. (2006). Situational judgment tests: Method or construct? In Week-ley J, Ployhart RE (Eds.), Situational judgment tests (pp. 135–156). Mahwah, NJ:Erlbaum.


Schmitt N, Clause CS, Pulakos ED. (1996). Subgroup differences associated with dif-ferent measures of some common job-relevant constructs. International Review ofIndustrial and Organizational Psychology, 77, 116–139.

Scullen SE, Mount MK, Judge TA. (2003). Evidence of the construct validity of devel-opmental ratings of managerial performance. Journal of Applied Psychology, 88,50–66.

∗Smirdele D, Perry BA, Cronshaw SF. (1994). Evaluation of video-based assessment intransit operator selection. Journal of Business and Psychology, 6, 156–193.

∗Smith KC, McDaniel MA. (1998, April). Criterion and construct validity evidence for asituational judgment measure. Paper presented at the 13th Annual Conference ofthe Society for Industrial and Organizational Psychology, Dallas, TX.

Smither JW, Reilly RR, Millsap RE, Pearlman K, Stoffey RW. (1993). Applicant reactionsto selection procedures. PERSONNEL PSYCHOLOGY, 46, 49–76.

∗Spitzer ME, McNamara WJ. (1964). A managerial selection study. PERSONNEL PSY-CHOLOGY, 19–40.

Sternberg RJ, Wagner RK. (1993). The g-ocentric view of intelligence and job performanceis wrong. Current Directions in Psychological Science, 2, 1–5.

∗Sternberg RJ, Wagner RK, Okagaski L. (1993). Practical intelligence: The nature androle of tacit knowledge in work and at school. In Puckett JM, Reese HW (Eds.),Mechanisms of everyday cognition (pp. 205–227). Hillsdale, NJ: Erlbaum.

∗Stevens MJ, Campion MA. (1999). Staffing work teams: Development and validation of aselection test for teamwork settings. Journal of Management, 25, 207–228.

∗Swander CJ. (2000). Video-based situational judgment test characteristics: Multidimen-sionality at the item level and impact of situational variables. Unpublished disser-tation.

∗Swander CJ. (2001, April). Exploring the criterion validity of two alternate forms of asituational judgment test. Paper presented at the 16th Annual Conference of theSociety for Industrial & Organizational Psychology: San Diego, CA.

Tett R, Jackson D, Rothstein M, Reddon J. (1999). Meta-analysis of bidirectional relationsin personality-job performance research. Human Performance, 12, 1–29.

∗Thumin FJ, Page DS. (1966). A comparative study of two test of supervisory knowledge.Psychological Reports, 18, 535–538.

∗Timmreck CW. (1981). Moderating effect of tasks on the validity of selection tests (Doc-toral dissertation, University of Houston, 1981). Dissertation Abstracts Interna-tional, 42(3-B), 1221.

Van Scotter JR, Motowidlo SJ. (1996). Evidence for two factors of contextual performance:Job dedication and interpersonal facilitation. Journal of Applied Psychology, 81,525–531.

Viswesvaran C. (2001). Learning from the past and mapping a new terrain assessmentof individual performance. In Anderson N, Ones DS, Sinangil HK, ViswesvaranC (Eds.) Handbook of industrial, work and organizational psychology: Personnelpsychology (Vol. 1) London: Sage.

Viswesvaran C, Ones DS. (2000). Perspectives on models of job performance. InternationalJournal of Selection and Assessment, 8, 216–226.

∗Wagner RK. (1987). Tacit knowledge in everyday intelligent behavior. Journal of Person-ality and Social Psychology, 52, 1236–1247.

∗Wagner RK, Sternberg RJ. (1985). Practical intelligence in real-world pursuits: The roleof tacit knowledge. Journal of Personality and Social Psychology, 49, 436–458.

∗Waugh G. (2002, April). NCO21 Situational judgment test: Selecting response options anditems for a situational judgment test. Paper presented at the 17th Annual Conferenceof the Society for Industrial and Organizational Psychology, Toronto.


∗Weekley JA, Jones C. (1997). Video-based situational testing. PERSONNEL PSYCHOL-OGY, 50, 25–49.

∗Weekley JA, Jones C. (1999). Further studies of situational tests. PERSONNEL PSYCHOL-OGY, 52, 679–700.

∗Weekley JA, Ployhart RE. (2005). Situational judgment: Antecedents and relationshipswith performance. Human Performance, 18, 81–104.

∗Weekley JA, Ployhart RE, Harold CM. (2004). Personality and situational judgment testsacross applicant and incumbent settings: An examination of validity, measurement,and subgroup differences. Human Performance, 17, 433–461.

∗Weitz J, Nuckols RC. (1953). A validation study of how supervise? Journal of AppliedPsychology, 36, 301–303.

Wernimont PF, Campbell JP. (1968). Signs, samples, and criteria. Journal of AppliedPsychology, 52, 372–376.

Whetzel D, McDaniel M, Nguyen N. (2008). Subgroup differences in situational judgmenttest performance: A meta-analysis. Human Performance, 21, 291–309.

∗Wiener DN. (1961). Evaluation of selection procedures for a management developmentprogram. Journal of Consulting Psychology, 8, 121–128.

Witt LA, Ferris GR. (2003). Social skill as a moderator of the conscientiousness-performance relationship: Convergent results across four studies. Journal of AppliedPsychology, 88, 809–820.

∗Wolf MB, McClellan. (2008, April). Do respondents perceive a difference between SJTresponse instructions? Poster presented at the 23rd Annual Conference of the Societyfor Industrial and Organizational Psychology, San Francisco, CA.

SITUATIONAL JUDGMENT TESTS: CONSTRUCTS ASSESSED …

Documents

Transcript of SITUATIONAL JUDGMENT TESTS: CONSTRUCTS ASSESSED …