2004073P

8/3/2019 2004073P

1/4

GENERALISABILITY OF THE PSYCHOMETRIC PROPERTIES

OF A PILOT SELECTION BATTERYAgns Kokorian

People Technologies, [email protected]

&

Colin Valsler,

Psytech Ltd, [email protected]

ThePilotAptitude (PILAPT) system

The development of the PILAPT computer-based system grew out of the meta-analysis

reported by Hunter and Burke, and the design principles have been described by Burke,Kitching and Valsler (1994). In summary, these design principles were as follows:

that the test designs be based on clearly understood measures of individual differences

that research has shown are relevant to pilot performance, either in training or in

operations. As such, PILAPT had to cover both handling skills (as required in ab

initio training) and CRM competencies (such as situational awareness and capacity).

that the test designs should assume no prior knowledge of flying, but should have

links to key pilot performance factors that are intuitive to both candidates and users.

that the test designs should allow for practice to avoid the influence of prior

experience of video games and give all candidates a level playing field to demonstrate

their potential.

that the overall battery should be efficient and avoid redundancy and nugatory

assessments.

Design work on PILAPT began in 1994 and has continued with new tests and new

scoring algorithms over the nine years since. Beginning with ab initio selection for the Royal

Air Force (RAF) University Air Squadrons, PILAPT has been evaluated through data

provided by air forces in Chile, Denmark, Portugal, Sweden, Norway and Italy as well ascivilian airlines and training schools in the UK, Europe and Asia.

PILAPT is a fully automated test delivery system built on the TEKS technology

developed by Psytech Ltd. The system caters for all aspects of the testing process from

candidate log on including the capture of biographical data, instructions, test administration,

test scoring, analysis of candidate performance, reporting, and data transfer to other systems.

The system has crash recovery and networking capabilities.

The PILAPT battery of tests developed to date includes:

Hands (10 minutes) the ability to process oral (verbal) rules to execute a visual

task quickly and accurately related to absorbing and using oral (e.g. radio

information) under pressure
mailto:[email protected]:[email protected]:[email protected]:[email protected]

8/3/2019 2004073P

2/4

Patterns (10 minutes) the ability to ignore distracting information in order to make

quick and accurate decisions under time pressure related to maintaining focus on

critical information when confronted with ambiguous situations and pressure

Concentration (8 minutes) - the ability to maintain focus on a primary task when the

conditions for that task are constantly changing related to maintaining situational

awareness

Deviation Indicator (7 minutes) the ability to compensate for deviations in flight

parameters with a look-and-feel based on the flight path deviation indicator (FPDI)

related to basic handling skills

Trax (5 minutes) a pursuit tracking task requiring the candidate to work in a 3

dimensional environment related to advanced aircraft control

In addition to the tests above, primarily driven in design by ab initio requirements and

taking around 40 minutes in total, the PILAPT battery has been extended to include a mini-

test battery named Capacity designed to assess performance under increasing workload.

Capacity takes around 15 minutes to complete and comprises a primary handling task and two

secondary tasks involving visual and auditory information. Tasks are administered and

measured under a combination of single, dual and triple task load conditions, and the impact

of increased workload on the candidates performance is then analysed and reported using a

display similar to that shown in Figure 1 below. The data shown shows average performance

for Swedish fighter pilot applicants.

DI4TRIPLDI4DUA LDI4SINGL

Mean

50 0

40 0

30 0

20 0

10 0

Capac i ty underCapac i ty under

single task loadsingle task load

Capac i ty underCapac i ty under

tr iple task loa dtr iple task load

How m uch capac i tyHow much capac i ty

does the candidatedoes the candidate

retain as workloadretain as w orkload

increases?increases?

S IN G LE D U A L T R IP L E

Figure 1: Overview of what the PILAPT Capacity mini-battery measures

Reliability and construct validity evidence supporting PILAPT

This section of the paper provides a summary of the data collected on the PILAPT

tests to date in military context. Given that different tests are at different stages in the

development cycle, the evidence provided varies across PILAPT tests reflecting the iterative

cycle of development since 1994. The evidence is presented in three parts in line with

8/3/2019 2004073P

3/4

recommendations from professional bodies such as the American Psychological Association

(APA), British Psychological Society (BPS) and the International Test Commission (ITC).

First, evidence of test reliability (associated with accuracy and stability of scores) is presented

and followed by results from studies involving other marker tests of pilot aptitude (construct

validity). Papers two (Calanna and Serusi) and three (Kokorian, Valsler and Cabrera) presentcriterion validity data.

Reliability

The standard recommendation for the level of reliability required for tests used in

selection is a minimum coefficient of 0.7 (this in effect states that 70% of the variation in test

scores is true variation as intended in the tests design). Table 1 summarises the results of

reliability (internal consistency) analyses across various country and organisational sites using

the Schmidt-Hunter meta-analysis model. DI and Trax are not included in Table 1 as internal

consistency estimates of reliability are not suitable for these tests. Data on their test-retest

reliability is given below. Table 1 contains two versions of Hands, a longer 40-item versionand a shorter 25-item version.

Source Local

Sample

Hands Patterns Concentration

Chile 370 0.89 0.6

Denmark 1,212 0.90 0.69

Italy 108 0.87 0.72 0.76

Norway 232 0.92 0.71

Portugal 1,218 0.93 0.73

Sweden 762 0.94 0.71 0.83 (N=430)UK 585 0.91

Total Sample Size (N) 4,487 3,902 538

Sample Weighted Mean 0.92 0.70 0.82

90% Credibility 0.89 0.66 0.79

Table 1: Reliability results for PILAPT tests across various national sites

In addition to these results, a test-retest (stability) study was conducted in 1995 for the

RAF UAS (N=109). This study had a four month interval between test administrations and

yielded reliabilities of 0.80 for DI, 0.84 for Trax and 0.77 for Hands, and an overall test-retest

reliability of 0.91 for the sum of these three PILAPT test scores. All these data clearly showPILAPT tests exceed the minimum requirement of 0.7 reliability for use in pilot selection. As

an overall composite score for use in selection decisions, the PILAPT battery offers a

reliability of 0.9 and above.

Construct validity

This section presents the results of a study conducted in Denmark involving four

PILAPT tests DI, Hands, Patterns and Trax and a 15-test battery used to assess both

aircrew and ATC aptitudes. Data were available across all 19 tests for a sample of 632

applicants. The content of the 15-test battery was classified according to test content in line

8/3/2019 2004073P

4/4

with the classifications used by Hunter and Burke in their meta-analysis. This classification

then provides a direct test of the extent to which PILAPT is measuring pilot relevant predictor

constructs. The results are shown in Table 2.

Test Group DI Hands Patterns Trax OverallMathematical Reasoning .12 .31 .37.06 0.44

Numerical Speed & Accuracy .11 .29 .25 .03 0.35

Language .18 .14 .20 .08 0.29

General Reasoning .18 .33 .51 .13 0.57

Spatial .24 .38 .38 .17 0.53

Mechanical .27 .35 .40 .29 0.55

Memory .05 .23 .13-.09 0.27

Notes:

Overall column gives the regression of the Test Group onto the 4 PILAPT tests

Correlations in bold and italicised are significant at the 0.01 level

Table 2: Results for 632 Danish military applicants

Hunter and Burke identified the following predictor constructs as being the most

consistent and substantial predictors of pilot training success: perceptual speed, mechanical

reasoning, spatial reasoning, psychomotor and simulation based tests. The Danish data set did

not contain psychomotor or simulation based tests, but the results clearly show that PILAPT

is tapping the other predictor constructs identified by Hunter and Burke as critical to

predicting pilot training success. Corrected for average reliabilities in the Danish tests (0.8),

the overall regressions (furthest right hand column in Table 2) would range from 0.34 to 0.71.

Conclusion

This paper has summarised the R&D objectives as well as reliability and construct

validity evidence supporting the PILAPT battery. The consistency of the reliability estimates

across different national sites with different selection processes and different applicant

populations suggests that scores are generalisable across settings. Evidence of criterion

validity and transportability of validity is presented in the second and third papers of this

symposium.

References

Burke, E., Kitching, A., and Valsler, C. (1994). Computer-based assessment and the construction of

valid aviator selection tests. In N. Johnston, R. Fuller, and N. McDonald (Eds.). Aviation

Psychology: Training and Selection. Cambridge: Avery.

Hunter, D. R., and Burke, E. (1994). Predicting aircraft pilot-training success: A meta-analysis of

published research. Journal of Aviation Psychology, 4, 297-313.

Hunter, D. R., and Burke, E. (1995). Handbook of Pilot Selection. Cambridge: Ashgate.

Hunter, J. E., and Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in

research findings. Newbury Park, CA: Sage

2004073P

Documents

Transcript of 2004073P