2004073P

download 2004073P

of 4

Transcript of 2004073P

  • 8/3/2019 2004073P

    1/4

    GENERALISABILITY OF THE PSYCHOMETRIC PROPERTIES

    OF A PILOT SELECTION BATTERYAgns Kokorian

    People Technologies, [email protected]

    &

    Colin Valsler,

    Psytech Ltd, [email protected]

    ThePilotAptitude (PILAPT) system

    The development of the PILAPT computer-based system grew out of the meta-analysis

    reported by Hunter and Burke, and the design principles have been described by Burke,Kitching and Valsler (1994). In summary, these design principles were as follows:

    that the test designs be based on clearly understood measures of individual differences

    that research has shown are relevant to pilot performance, either in training or in

    operations. As such, PILAPT had to cover both handling skills (as required in ab

    initio training) and CRM competencies (such as situational awareness and capacity).

    that the test designs should assume no prior knowledge of flying, but should have

    links to key pilot performance factors that are intuitive to both candidates and users.

    that the test designs should allow for practice to avoid the influence of prior

    experience of video games and give all candidates a level playing field to demonstrate

    their potential.

    that the overall battery should be efficient and avoid redundancy and nugatory

    assessments.

    Design work on PILAPT began in 1994 and has continued with new tests and new

    scoring algorithms over the nine years since. Beginning with ab initio selection for the Royal

    Air Force (RAF) University Air Squadrons, PILAPT has been evaluated through data

    provided by air forces in Chile, Denmark, Portugal, Sweden, Norway and Italy as well ascivilian airlines and training schools in the UK, Europe and Asia.

    PILAPT is a fully automated test delivery system built on the TEKS technology

    developed by Psytech Ltd. The system caters for all aspects of the testing process from

    candidate log on including the capture of biographical data, instructions, test administration,

    test scoring, analysis of candidate performance, reporting, and data transfer to other systems.

    The system has crash recovery and networking capabilities.

    The PILAPT battery of tests developed to date includes:

    Hands (10 minutes) the ability to process oral (verbal) rules to execute a visual

    task quickly and accurately related to absorbing and using oral (e.g. radio

    information) under pressure

    mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/3/2019 2004073P

    2/4

    Patterns (10 minutes) the ability to ignore distracting information in order to make

    quick and accurate decisions under time pressure related to maintaining focus on

    critical information when confronted with ambiguous situations and pressure

    Concentration (8 minutes) - the ability to maintain focus on a primary task when the

    conditions for that task are constantly changing related to maintaining situational

    awareness

    Deviation Indicator (7 minutes) the ability to compensate for deviations in flight

    parameters with a look-and-feel based on the flight path deviation indicator (FPDI)

    related to basic handling skills

    Trax (5 minutes) a pursuit tracking task requiring the candidate to work in a 3

    dimensional environment related to advanced aircraft control

    In addition to the tests above, primarily driven in design by ab initio requirements and

    taking around 40 minutes in total, the PILAPT battery has been extended to include a mini-

    test battery named Capacity designed to assess performance under increasing workload.

    Capacity takes around 15 minutes to complete and comprises a primary handling task and two

    secondary tasks involving visual and auditory information. Tasks are administered and

    measured under a combination of single, dual and triple task load conditions, and the impact

    of increased workload on the candidates performance is then analysed and reported using a

    display similar to that shown in Figure 1 below. The data shown shows average performance

    for Swedish fighter pilot applicants.

    DI4TRIPLDI4DUA LDI4SINGL

    Mean

    50 0

    40 0

    30 0

    20 0

    10 0

    Capac i ty underCapac i ty under

    single task loadsingle task load

    Capac i ty underCapac i ty under

    tr iple task loa dtr iple task load

    How m uch capac i tyHow much capac i ty

    does the candidatedoes the candidate

    retain as workloadretain as w orkload

    increases?increases?

    S IN G LE D U A L T R IP L E

    Figure 1: Overview of what the PILAPT Capacity mini-battery measures

    Reliability and construct validity evidence supporting PILAPT

    This section of the paper provides a summary of the data collected on the PILAPT

    tests to date in military context. Given that different tests are at different stages in the

    development cycle, the evidence provided varies across PILAPT tests reflecting the iterative

    cycle of development since 1994. The evidence is presented in three parts in line with

  • 8/3/2019 2004073P

    3/4

    recommendations from professional bodies such as the American Psychological Association

    (APA), British Psychological Society (BPS) and the International Test Commission (ITC).

    First, evidence of test reliability (associated with accuracy and stability of scores) is presented

    and followed by results from studies involving other marker tests of pilot aptitude (construct

    validity). Papers two (Calanna and Serusi) and three (Kokorian, Valsler and Cabrera) presentcriterion validity data.

    Reliability

    The standard recommendation for the level of reliability required for tests used in

    selection is a minimum coefficient of 0.7 (this in effect states that 70% of the variation in test

    scores is true variation as intended in the tests design). Table 1 summarises the results of

    reliability (internal consistency) analyses across various country and organisational sites using

    the Schmidt-Hunter meta-analysis model. DI and Trax are not included in Table 1 as internal

    consistency estimates of reliability are not suitable for these tests. Data on their test-retest

    reliability is given below. Table 1 contains two versions of Hands, a longer 40-item versionand a shorter 25-item version.

    Source Local

    Sample

    Hands Patterns Concentration

    Chile 370 0.89 0.6

    Denmark 1,212 0.90 0.69

    Italy 108 0.87 0.72 0.76

    Norway 232 0.92 0.71

    Portugal 1,218 0.93 0.73

    Sweden 762 0.94 0.71 0.83 (N=430)UK 585 0.91

    Total Sample Size (N) 4,487 3,902 538

    Sample Weighted Mean 0.92 0.70 0.82

    90% Credibility 0.89 0.66 0.79

    Table 1: Reliability results for PILAPT tests across various national sites

    In addition to these results, a test-retest (stability) study was conducted in 1995 for the

    RAF UAS (N=109). This study had a four month interval between test administrations and

    yielded reliabilities of 0.80 for DI, 0.84 for Trax and 0.77 for Hands, and an overall test-retest

    reliability of 0.91 for the sum of these three PILAPT test scores. All these data clearly showPILAPT tests exceed the minimum requirement of 0.7 reliability for use in pilot selection. As

    an overall composite score for use in selection decisions, the PILAPT battery offers a

    reliability of 0.9 and above.

    Construct validity

    This section presents the results of a study conducted in Denmark involving four

    PILAPT tests DI, Hands, Patterns and Trax and a 15-test battery used to assess both

    aircrew and ATC aptitudes. Data were available across all 19 tests for a sample of 632

    applicants. The content of the 15-test battery was classified according to test content in line

  • 8/3/2019 2004073P

    4/4

    with the classifications used by Hunter and Burke in their meta-analysis. This classification

    then provides a direct test of the extent to which PILAPT is measuring pilot relevant predictor

    constructs. The results are shown in Table 2.

    Test Group DI Hands Patterns Trax OverallMathematical Reasoning .12 .31 .37.06 0.44

    Numerical Speed & Accuracy .11 .29 .25 .03 0.35

    Language .18 .14 .20 .08 0.29

    General Reasoning .18 .33 .51 .13 0.57

    Spatial .24 .38 .38 .17 0.53

    Mechanical .27 .35 .40 .29 0.55

    Memory .05 .23 .13-.09 0.27

    Notes:

    Overall column gives the regression of the Test Group onto the 4 PILAPT tests

    Correlations in bold and italicised are significant at the 0.01 level

    Table 2: Results for 632 Danish military applicants

    Hunter and Burke identified the following predictor constructs as being the most

    consistent and substantial predictors of pilot training success: perceptual speed, mechanical

    reasoning, spatial reasoning, psychomotor and simulation based tests. The Danish data set did

    not contain psychomotor or simulation based tests, but the results clearly show that PILAPT

    is tapping the other predictor constructs identified by Hunter and Burke as critical to

    predicting pilot training success. Corrected for average reliabilities in the Danish tests (0.8),

    the overall regressions (furthest right hand column in Table 2) would range from 0.34 to 0.71.

    Conclusion

    This paper has summarised the R&D objectives as well as reliability and construct

    validity evidence supporting the PILAPT battery. The consistency of the reliability estimates

    across different national sites with different selection processes and different applicant

    populations suggests that scores are generalisable across settings. Evidence of criterion

    validity and transportability of validity is presented in the second and third papers of this

    symposium.

    References

    Burke, E., Kitching, A., and Valsler, C. (1994). Computer-based assessment and the construction of

    valid aviator selection tests. In N. Johnston, R. Fuller, and N. McDonald (Eds.). Aviation

    Psychology: Training and Selection. Cambridge: Avery.

    Hunter, D. R., and Burke, E. (1994). Predicting aircraft pilot-training success: A meta-analysis of

    published research. Journal of Aviation Psychology, 4, 297-313.

    Hunter, D. R., and Burke, E. (1995). Handbook of Pilot Selection. Cambridge: Ashgate.

    Hunter, J. E., and Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in

    research findings. Newbury Park, CA: Sage