Using SHERPA to predict design-induced error on the flight deck

8
Aerospace Science and Technology 9 (2005) 525–532 www.elsevier.com/locate/aescte Using SHERPA to predict design-induced error on the flight deck Don Harris a,, Neville A. Stanton b , Andrew Marshall c , Mark S. Young b , Jason Demagalski a , Paul Salmon b a Cranfield University, Human Factors Group, Cranfield, Bedford, MK43 OAL, UK b Brunel University, Department of Design, UK c Marshall Ergonomics Ltd., Havant, UK Received 6 October 2003; received in revised form 8 November 2004; accepted 8 April 2005 Available online 23 May 2005 Abstract Human factors certification criteria are being developed for large civil aircraft. The objective is to reduce the incidence of design-induced error on the flight deck. Many formal error identification techniques currently exist, however none of these have been validated for their use in an aviation context. This paper evaluates SHERPA (Systematic Human Error Reduction and Prediction Approach) as a means for predicting design-induced pilot error. Since SHERPA was developed for predicting human error in the petrochemical and nuclear industries, a series of validation studies have suggested that it is amongst the best human error prediction tools available. This study provides some evidence for the reliability and validity of SHERPA in a flight deck context and concludes that it may form the basis for a successful human error identification tool. 2005 Elsevier SAS. All rights reserved. Keywords: Design induced error; Error prediction; Flight deck design; Reliability; Validity 1. Introduction For the past 50 years there has been a general decline in the commercial aircraft accident rate, however during the last decade the serious accident rate has remained relatively constant at approximately one per million departures [3]. With the projected increase in the demand for air travel, if this rate remains unchanged, by 2015 there will be one major hull loss per week. As the reliability and structural integrity of aircraft has improved the number of accidents directly re- sulting from such failures has reduced dramatically, hence so has the overall number of accidents and the accident rate. However, human reliability has not improved to the same degree during this period. Figures vary slightly but it can be estimated that up to 75% of all recent aircraft accidents now * Corresponding author. Tel.: +44 (0)1234 750111 x5196; fax: +44 (0)1234 750192. E-mail address: [email protected] (D. Harris). have a major human factors component. Human error is now the primary risk to flight safety [4]. The roots of human error are manifold and often have complex interrelationships with many aspects of the system of operating a modern airliner. However, during the last 10 years ‘design induced’ error has become of particular con- cern to the major airworthiness authorities. The Captain of a modern commercial aircraft is now a manager of flight crew and of complex, highly automated aircraft systems. These systems began to be introduced into airliners during the ‘glass cockpit’ revolution of the 1980s. The high levels of automation offered a considerable advance in safety over their ‘clockwork cockpit’ forbearers, however new types of error began to emerge on these flight decks. This was exem- plified by accidents such as the Nagoya Airbus A300-600, the Cali Boeing 757 accident and the Strasbourg A320 acci- dent. As a direct result of such accidents, the US Federal Aviation Administration (FAA) commissioned an exhaus- tive study of the pilot-aircraft interface on modern flight 1270-9638/$ – see front matter 2005 Elsevier SAS. All rights reserved. doi:10.1016/j.ast.2005.04.002

Transcript of Using SHERPA to predict design-induced error on the flight deck

Page 1: Using SHERPA to predict design-induced error on the flight deck

te

-inducedeir use inpredicting

s, a seriese evidencean error

Aerospace Science and Technology 9 (2005) 525–532

www.elsevier.com/locate/aesc

Using SHERPA to predict design-induced error on the flight deck

Don Harrisa,∗, Neville A. Stantonb, Andrew Marshallc, Mark S. Youngb,Jason Demagalskia, Paul Salmonb

a Cranfield University, Human Factors Group, Cranfield, Bedford, MK43 OAL, UKb Brunel University, Department of Design, UK

c Marshall Ergonomics Ltd., Havant, UK

Received 6 October 2003; received in revised form 8 November 2004; accepted 8 April 2005

Available online 23 May 2005

Abstract

Human factors certification criteria are being developed for large civil aircraft. The objective is to reduce the incidence of designerror on the flight deck. Many formal error identification techniques currently exist, however none of these have been validated for than aviation context. This paper evaluates SHERPA (Systematic Human Error Reduction and Prediction Approach) as a means fordesign-induced pilot error. Since SHERPA was developed for predicting human error in the petrochemical and nuclear industrieof validation studies have suggested that it is amongst the best human error prediction tools available. This study provides somfor the reliability and validity of SHERPA in a flight deck context and concludes that it may form the basis for a successful humidentification tool. 2005 Elsevier SAS. All rights reserved.

Keywords: Design induced error; Error prediction; Flight deck design; Reliability; Validity

clinethetivel[3].l, ifajorityre-nceratemebeow

+44

now

avetem10on-of

ghtms.ringelsoverof

xem-00,cci-

erals-

1. Introduction

For the past 50 years there has been a general dein the commercial aircraft accident rate, however duringlast decade the serious accident rate has remained relaconstant at approximately one per million departuresWith the projected increase in the demand for air travethis rate remains unchanged, by 2015 there will be one mhull loss per week. As the reliability and structural integrof aircraft has improved the number of accidents directlysulting from such failures has reduced dramatically, heso has the overall number of accidents and the accidentHowever, human reliability has not improved to the sadegree during this period. Figures vary slightly but it canestimated that up to 75% of all recent aircraft accidents n

* Corresponding author. Tel.: +44 (0)1234 750111 x5196; fax:(0)1234 750192.

E-mail address: [email protected] (D. Harris).

1270-9638/$ – see front matter 2005 Elsevier SAS. All rights reserved.doi:10.1016/j.ast.2005.04.002

y

.

have a major human factors component. Human error isthe primary risk to flight safety [4].

The roots of human error are manifold and often hcomplex interrelationships with many aspects of the sysof operating a modern airliner. However, during the lastyears ‘design induced’ error has become of particular ccern to the major airworthiness authorities. The Captaina modern commercial aircraft is now a manager of flicrew and of complex, highly automated aircraft systeThese systems began to be introduced into airliners duthe ‘glass cockpit’ revolution of the 1980s. The high levof automation offered a considerable advance in safetytheir ‘clockwork cockpit’ forbearers, however new typeserror began to emerge on these flight decks. This was eplified by accidents such as the Nagoya Airbus A300-6the Cali Boeing 757 accident and the Strasbourg A320 adent.

As a direct result of such accidents, the US FedAviation Administration (FAA) commissioned an exhau

tive study of the pilot-aircraft interface on modern flight
Page 2: Using SHERPA to predict design-induced error on the flight deck

526 D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532

e-essn-

ut-posiym-MSeentheac-em-andific

e-orse ce

as-m-AA

25ardd toncere-

ir-ss

ptedpeder-for

theinons

bealu-

aly-or-od

ica-velith ahal-essca-by

or-nye oft atelynt.

ands

ther-uc-

litywastudychber

sive-nessdyre-

tud-e-test

e in-e aer-

1An ofdyer-

sortedue’sor

ssedliedn ainerc-s the

low-tlydentalh-ones-

lishthe

tiful,t thelf-

om-withtiondity

decks [8]. The report identified many major flight deck dsign deficiencies and shortcomings in the design procThere were criticisms of the flight deck interfaces, idetifying problems in many systems, such as pilots’ aoflight mode awareness/indication; energy awareness;tion/terrain awareness; confusing and unclear display sbology and nomenclature; a lack of consistency in Finterfaces and conventions, and poor compatibility betwflight deck systems. The report also heavily criticisedflight deck design process, identifying a lack of human ftors expertise on design teams and placing too muchphasis on the physical ergonomics of the flight deck,not enough on the cognitive ergonomics. Fifty-one specrecommendations came out of the report, including:

‘The FAA should require the evaluation of flight deck dsigns for susceptibility to design-induced flightcrew errand the consequences of those errors as part of the typtification process’.

In July 1999 the US Department of Transportationsigned a task to the Aviation Rulemaking Advisory Comittee to provide advice and recommendations to the Fadministrator to ‘review the existing material in FAR/JARand make recommendations about what regulatory standand/or advisory material should be updated or developeconsistently address design-related flight crew performavulnerabilities and prevention (detection, tolerance andcovery) of flight crew error’ [22]. The European Joint Aworthiness Authorities (JAA), as a part of the airworthineregulatory harmonisation efforts, also subsequently adothis task. The rules and advisory material being develoas part of this process will be applied to both the Type Ctification and Supplemental Type Certification processeslarge transport aircraft [20,21]. In the meantime, in 2001JAA issued an interim policy document [9] that will remain force until the new harmonised human factors regulatiencompassed in Part 25 come into force.

Compliance with an airworthiness requirement mustestablished either through inspection, demonstration, evation, analysis and/or test. Some form of formal error ansis will probably be the most feasible way to evaluate fmally the pilot interface and demonstrate that the likelihoof ‘design-induced error’ is as low as is reasonably practble. Formal error analysis is not new, however it is a noapproach as a means of demonstrating compliance wcertification requirement, and as such faces unique clenges. Any technique used for a formal approval procmust be reliable and valid and, for the purposes of certifition, the method should also be capable of being usednon-human factors experts within the certification authities (e.g. the certification test pilots). As a corollary, aformal error prediction technique should also be capablbeing used by the flight deck design teams to verify thathe early stages of design, their flight deck interface is likto comply with the human factors certification requireme

Furthermore, any error prediction methodology for the flight

.

-

r-

s

deck must be designed to encompass the specific demof the environment

The objective of the work reported here was to assesssensitivity, predictive validity and reliability of one such fomal technique, SHERPA (Systematic Human Error Redtion and Prediction Approach) [5] to evaluate its suitabifor application in an aerospace context. This techniqueselected on the basis of the results of a comparative sof six human error identification techniques [11] in whiSHERPA achieved the highest overall rankings on a numof assessment criteria for its performance (comprehenness, accuracy, consistency, theoretical validity, usefuland acceptability), and on the basis of a follow-up stuwhich showed that the method also performed well in pdicting subsequent actual errors [2]. Further empirical sies have shown that SHERPA also has acceptable test/rreliability [17].

There is great caution and scepticism in the aerospacdustry concerning formal methods that purport to producprobability of error associated with any aspect of crew pformance. Furthermore, Advisory Circular AC25.1309-[7] also suggests that the reliable quantitative estimatiothe probability of crew error is not possible. In this stuemphasis is placed upon the identification of potentialrors using formal methods,not their quantification. Unlesall significant errors are identified the probability of errgenerated by any formal technique will be underestima[9], so the first requirement must be to assess a techniqpotential to identify all possible errors resulting from pohuman factors design.

To achieve the objectives described, this study progrein three stages. Firstly, the SHERPA technique was appto the task of conducting an approach and landing imodern, highly automated, glass cockpit commercial airl(Aircraft X) to produce predictions of the errors likely to ocur. This was done on two occasions in order to assestest/re-test reliability of analysts.

Secondly (and independently) using a questionnaire,level error data were collected from flight crew currenflying Aircraft X concerning the errors that they had maduring the approach and landing flight phase. A fundameproblem when validating formal error identification tecniques is obtaining ecologically-valid and reliable criteridata [14]. Accidents are very infrequent events and invtigation reports do not contain sufficient detail to estabthe design-induced errors that may have contributed tosequence of events. Incident data are much more plenhowever, these reports contain even fewer details aboupilots’ interactions with their equipment. As a result, a secompletion questionnaire had to be used for this task.

Once the above two data collection stages were cpleted, the final stage was to compare the error datathe predictions made by SHERPA, using a signal detecparadigm (q.v. Refs. [2,16]) to assess the predictive vali

of the method.
Page 3: Using SHERPA to predict design-induced error on the flight deck

D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532 527

u-sis-an

ti-aches-hasni--uldncethe

beenriti-rta-gasli-of

lay-

dge-achity.

s

av-

):ra-ons-p-ion

orin-d;

r-it-ct;ng

r):ion

rto:

m-

ddesnethethe

therrorotestherdial

e-ons

every

ATheas

e ors is

t iterrorpo-

t fortive

wd.

ead-ndingro-f theis

n 22erem-andam-eretionin-un-

ng

2. Systematic Human Error Reduction and PredictionApproach (SHERPA)

SHERPA [5] (also known as PHEA – Predictive Hman Error Analysis [6]) uses a Hierarchical Task Analy(HTA) [1] in conjunction with an error taxonomy to identify credible errors associated with a sequence of humactivity. SHERPA is from a family of human error idenfication tools that takes a psychologically-based appro[12] hence is ideally suited for use on the flight deck,pecially when it is considered that flight deck designbeen criticised for its lack of due consideration of cogtive ergonomics [8]. In apost-hoc analysis of aviation accidents, SHERPA was identified as a technique that wohave predicted many of the significant errors in the sequeof events leading to the crash, in contrast to many ofother methods examined [10]. The technique has alsoused successfully in a variety of non-aviation, safety ccal situations, for example, in the handling and transpotion of hazardous chemicals [13] and offshore oil andexploration and exploitation [18]. Non-safety critical appcations of SHERPA include the analysis of the usabilityticket vending machines [2] and of car radio/cassette pers [19].

The technique operates by trained analysts making juments about which error modes are credible for etask step based upon this analysis of the work activA SHERPA proceeds through six basic steps:

• Hierarchical task analysis: A hierarchical task analysiis conducted for the task or scenario under analysis.

• Task classification: Each bottom level task in the HTAis taken in turn and classified into one of the five behiours from the SHERPA behaviour taxonomy.– Action (e.g. pressing a button or pulling a switch

Errors in this category are classified into: opetion too long/short; operation mistimed; operatiin wrong direction; operation too little/much; mialigned; right direction on wrong object; wrong oeration on right object; operation omitted; operatincomplete; wrong operation on wrong object.

– Retrieval (e.g. getting information from a screenmanual): Errors in this category are classified into:formation not obtained; wrong information obtaineinformation retrieval incomplete.

– Checking (e.g. conducting a procedural check): Erors in this category are classified into: check omted; check incomplete; right check on wrong objewrong check on right object; check mistimed; wrocheck on wrong object.

– Selection (e.g. choosing one alternative over anotheErrors in this category are classified into: selectomitted; wrong selection made.

– Information communication (e.g. talking to anotheparty): Errors in this category are classified in

information not communicated; wrong information

communicated; information communication incoplete.

• Human error identification: After each task is classifieinto a behaviour, the analyst considers the error moassociated with that behaviour. A credible error is othe analyst judges to be possible. For example, iftask step is classified as an ‘Action’ the analyst takesassociated ‘Action’ error modes and considers wheany are credible for that task. For any credible ertypes, the analyst describes the nature of the error, ndown the associated consequences, highlights wheit is recoverable or not and suggests possible rememeasures.

• Consequence analysis: The analyst considers the consquences of each identified error, which has implicatifor the criticality of the error.

• Recovery analysis: If there is a task step at which therror can be recovered this is entered next. If no recostep is possible then ‘None’ is entered.

• Tabulation: The information gained from the SHERPanalysis method is converted into a tabular output.ordinal probability (P) of an error is then categorisedlow (hardly ever occurs); medium (has occurred onctwice) or high (occurs frequently). The same procesapplied to the criticality (C) of the error.

The main strengths of the SHERPA method are thaprovides a structured and comprehensive approach toprediction, gives an exhaustive and detailed analysis oftential errors and the error taxonomy prompts the analyspotential errors, however SHERPA is somewhat repetiand time consuming to perform.

3. SHERPA analysis

An HTA of a fully-coupled autoland approach to NeOrleans airport undertaken in Aircraft X was performeThis consisted of some 22 subtasks under the main hings of setting up for approach, lining up for the runway, apreparing the aircraft for landing. The approach and landconsidered was completely normal with no non-routine pcedures included. This task analysis formed the basis ofollowing formal error prediction analysis. An extract of thHTA is included in Fig. 1.

Eight graduate engineering participants aged betweeand 55 years took part in this study. All participants wtrained in the SHERPA methodology. The training coprised an introduction to the key stages in the methoda demonstration of the approach using a non-aviation exple, using an in-car task (see Ref. [19]). Participants wthen required to apply the method to another non-aviatask (described in Ref. [17]) under the guidance of thestructors. The purpose of this was to ensure that they hadderstood the workings of the SHERPA method. A debriefi

followed, where participants could share their understanding
Page 4: Using SHERPA to predict design-induced error on the flight deck

528 D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532

Fig. 1. Extract of HTA for landing aircraft (including an extract of plans) upon which error predictions were made.

t thehis

that

enors;

i-ndpdis-thestep

pi-iod of

toh er-d ino-

stade

with each other. When the instructors were satisfied thatraining was completed, the main task was introduced. Trequired participants to make predictions of the errorspilots could make in the autoland task.

To make their error predictions, participants were givan HTA of the autoland task developed by the autha demonstration of performing an autoland using Mcrosoft™ flight simulator; the SHERPA error taxonomy; acolour photographs of Aircraft X’s Flight Control Unit, flalevers, landing gear lever, speed brake, primary flightplays, and an overview of the flight deck. An example ofnature of the errors predicted by SHERPA for one task

in the HTA is provided in Table 1.

Participants were required to make predictions of thelot errors on two separate occasions, separated by a perfour weeks. This enabled intra-analyst reliability statisticsbe computed. The predictions made were compared witror data reported by pilots using autoland (as describethe following section). This enabled SHERPA’s validity cefficient to be computed.

4. Collection of error data

From the approach and landing HTA for Aircraft X a liwas compiled of all the possible errors that could be m

during the landing phase of flight using the Flight Control
Page 5: Using SHERPA to predict design-induced error on the flight deck

D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532 529

e correcta

asures

Table 1Extract of SHERPA analysis. In the following extract, error mode A3 is an operation in the wrong direction; error mode A6 is an operation in thdirection but on the wrong object. Estimates (low, medium or high) of the probability of occurrence and criticality of the error are given in columns lbelled Pand C, respectively

Task step Error mode Description Consequence Recovery P C Remedial me

3.2.2 A3 Pilot turns theSpeed/MACHselector knob thewrong way

The wrong airspeed isentered and the planespeeds up instead ofslowing down

3.2.1 M M – Clearer controllabeling– Auditory signalinformingincrease/decrease

3.2.2 A6 The pilot dials inthe desiredairspeed usingthe wrong controlknob i.e. theheading knob

Before capture, theauto-pilot will attempt toswitch course to the speedvalue entered causing theplane to leave theglideslope

Immediate M H – Improvedcontrol labeling– Improvedseparation ofcontrols

i-rakel-tionsde

s toheyg atic-notbut

meas

tive,her

ntorsnalK

allydi-

rakeringeck-s);gedns)lo-

ns);

ad-al

imeap-

theinsn-

essurs

typers toours

erddi-, inhand. Inerelves;di-r that

ors.ctedbeap-

werd-

Unit (FCU) as the main controlling interface. A few addtional systems were also included such as the speed band flaps. An additional list of possible errors was devoped from observations made during a series of orientaflights on Aircraft X and comments from further interviewwith type rated pilots. These data were used to develop asign induced error questionnaire specific to Aircraft X.

The objective of the questionnaire was to survey pilotobtain a comprehensive picture of the low-level errors tknew had ever been made on the flight deck while flyinfully-coupled autoland approach and landing in the parular type of aircraft. To achieve this, respondents wereonly asked if they had ever made the error themselvesalso if they knew of a fellow pilot who had made the saerror. A simple ‘yes/no’ response format was used. As it whighly probable that the list of questions was not exhausspace was provided to report additional errors or for furtcomments to be given.

Following an initial pilot administration of the instrumeto a small sample of senior pilots to check for major errand to refine the wording of the survey items further, the fiquestionnaire was sent to pilots flying Aircraft X in three UAirlines.

The final instrument contained 70 questions specificconcerned with the flight deck interface. The survey wasvided into 13 subsections, with items regarding speed bsetting (7 questions); flap selection (10 questions); lowelanding gear (1 question); airspeed (11 questions); ching ALT is engaged (1 question); altitude (8 questionchanging headings (4 questions); checking HDG is enga(1 question); engaging the approach system (4 questiochecking APPR is engaged (1 question); tracking thecaliser (7 questions); tracking the glideslope (2 questioand other miscellaneous items (13 questions).

After the return of the questionnaires, a number ofditional interviews were conducted to clarify the addition

comments received on many of the survey instruments.

e

-

;

5. Error data results

5.1. Sample

Forty-six completed questionnaires were returned in tfor analysis. Of the 46 respondents the majority were Ctains (45.7%). First Officers provided a further 37% ofsample with the remainder being either Training Capta(13.3%) or they failed to state their position (two respodents). The number of total flying hours ranged from lthan 2,000 hours to over 16,000 with a mean of 6,832 hoand a standard deviation of 4,524 hours. With regard tospecific experience this ranged from less than 1,000 houover 5,000 hours. The mean time on type was 1,185 hwith a standard deviation of 1,360 hours.

5.2. Survey results

A total of 57 different types of error were reported, eithas responses to the structured survey items or in the ational comments section of the questionnaire. For brevityTable 1 only the data from the survey items where more ttwo respondents reported a particular error are presenteTable 2, the column ‘ME’ contains the frequency data whthe respondent has made the error in question themsethe column labelled ‘OTHER’ contains the data which incates that they have seen someone else make the error othey are aware of someone who has.

5.3. Predictive validity and reliability of SHERPA forpredicting pilot error

Overall, the analysts using SHERPA predicted 56 errThese predictions were compared with error data collein the survey of pilots. This enabled validity statistics tocomputed using a signal detection paradigm [15]. Thisproach provides a useful framework for testing the poof formal human error identification methods [2,17]. In a

dition to comparing correct predictions of error with actual
Page 6: Using SHERPA to predict design-induced error on the flight deck

530 D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532

rforming an

Table 2Percentage of pilots reporting having made (or knowing about) common design induced errors during the approach and landing phase while peautoland in Aircraft X

ITEM ME [%] OTHER [%]

Speed brakeMoved the flap lever instead of the speed brake lever when intended to apply the speed brake 0 6.5FlapsChecked the flap position and misread it 4.3 4.3Moved the flap lever further or not as far as intended 17.4 6.5Landing gearOmitted to put the landing gear down until reminded 19.6 37.0AirspeedInitially, dialled in an incorrect airspeed on the Flight Control Unit by turning the knob in the wrong direction 39.1 37.0Having entered the desired airspeed, pushed or pulled the switch in the opposite way to the one that you wanted 26.1 26.1Adjusted the heading knob instead of the speed knob 78.3 65.2AltitudeEntered the wrong altitude on the Flight Control Unit and activated it 15.2 17.4Entered an incorrect altitude because the 100/1000 feet knob wasn’t clicked over 26.1 28.3Believed you were descending in flight path angle and found that you were in fact in Vertical speed mode or vice versa 8.7 13.0Failed to check ALT (Altitude) mode was active 8.7 6.5HeadingEntered a heading on the Flight Control Unit and failed to activate it at the inappropriate time 34.8 34.8Failed to check HDG (Heading) mode was active 23.9 19.6Approach systemTried to engage APPR (Approach) mode too late so that it failed to capture 28.3 30.4Pressed the wrong button when intending to engage APPR such as EXPED (Expedite) 6.5 8.7Failed to check APPR was active 28.3 30.4LocaliserIncorrectly adjusted heading knob to regain localiser and activated the change 4.3 4.3GlideslopeFailed to monitor the glide slope and found that the aircraft had not intercepted it 39.1 52.2OtherHad an incorrect barometric air pressure set 45.7 45.7Set an altitude ‘out of the way’ and then out of habit pulled the altitude knob 15.2 32.6

enes)rrorot).

re-of

t yetffi-atesngpar-

andssedutnts

ither

ms,aren-

m-es,y ofutedhesehis

oes,

Table 3Performance of SHERPA in predicting pilot error during ap-proach and landing using pooled error data

Error observed?

Yes NoError predicted? Yes Hits False alarms

52 4No Misses Correct rejections

5 179

errors (hits) it identifies type I analytical errors (a miss: whthe error analyst predicts the error will not occur and it doand type II analytical errors (a false alarm: when the eanalyst predicts that there will be an error and there is nThe results of this analysis are presented in Table 3.

The overall hit rate was very high. Of the 57 errorsported by flight crew, 52 were predicted by one or morethe analysts. Four errors were predicted which were noreported by any of the pilots surveyed. The validity coecient, expressed as a mean of the ‘hit’ and ‘false alarm’ rwascirca 0.6. This value is toward the lower end of beiacceptable. However, if the error predictions across theticipants are pooled, the validity statistic rises tocirca 0.9,

which is exceptionally high.

The test/re-test reliability of analysts between the firstsecond applications of the SHERPA methodology (asseusing Pearson’s r) wascirca 0.7. This value is moderate, bit should be noted that this was the first time the participahad applied the SHERPA and they were not experts in ehuman factors or from the aerospace industry.

The complete results for all analysts (Hits, False AlarMisses, Correct Rejections and Sensitivity calculations)included in Table 4. The formula for calculation of the Sesitivity Index is included at Eq. (1):

Si =Hit

Hit +Miss + 1− False AlarmFA+Correct Rejection

2. (1)

6. Discussion

The identification of errors made on the flight deck comitted as a result of poor design is difficult. In many caseven in accident and incident reports, there is a paucitdata on the low-level errors made that may have contribto the accident sequence. As a result of the scarcity of tdata the validation of formal error prediction methods in tcontext is problematic.

The questionnaire survey conducted in this study d

however, suggest that design-induced errors are an everyday
Page 7: Using SHERPA to predict design-induced error on the flight deck

D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532 531

71665636

Table 4Hits, false alarms, misses and correct rejections and SIs for time 1 and time 2

Sub Time 1 Time 2

Hits FA Misses CR SI Hits FA Misses CR SI

1 71 66 32 203 0.72 51 43 53 226 0.62 54 38 50 231 0.69 62 45 42 224 0.73 52 38 52 231 0.68 44 27 60 242 0.64 62 76 42 193 0.66 98 116 6 153 0.75 48 44 116 285 0.58 60 76 44 193 0.66 68 42 36 227 0.75 74 51 30 218 0.77 83 86 21 183 0.74 80 83 24 186 0.78 38 29 66 240 0.63 71 42 33 227 0.7Mean SI 0.68 0.71

dedatedap-y oforts

e asnt aainor-edt isnt

hepi-thethewillenti-ingredif-

viceeaserel-of

e ofthe

taelye.-erys’ndsesltsthethe

nces

n aatetanters.illbeenulti-

tifi-(e.g.avi-par-

calal

estser-ell

ithin-

e ap-t isng.ire

flyf the

y ofto

ionali-sedion

wasentA!

issue for the pilots of modern airliners. Pilots who responto the survey reported that between them, 57 errors relto the design of the pilot interface occurred in just theproach and landing phase of flight. Simply because manthese errors do not appear in accident and incident repdoes not make them unimportant issues. At best, thespects of flight deck design that encourage error represedaily source of frustration for pilots, as they have to remvigilant to their own mistakes which they then need to crect. At worst, it is possible that the types of errors identifimay be significant factors in accidents and incidents. Isimply too difficult, however, to link them to the accidesequence with any degree of certainty.

Many of the potential errors that were identified in tSHERPA analysis were the types of errors that mostlots were aware of and have simply had to accept onflight deck during everyday operations. It is hoped thatdevelopment of human factors certification standardshelp to ensure that many of the design induced errors idfied are eradicated in future aircraft. The initial results usSHERPA for this type of task are promising. Whilst mostudies are needed to investigate more flight tasks inferent phases of flight, the current study shows that noanalysts were able to acquire the approach with relativeand reach an acceptable level of performance within aatively short period of time, which supports the resultsprevious studies [17]. This is essential given the naturhow, and by whom, the technique may be applied in bothflight deck design and certification context.

The validity coefficient for the non-pooled analyst daat first seems a little disappointing however this is unlikto be a major problem within a certification programmFirstly, analyst skills improve with familiarity with the technique [17]. A previous study has shown that there is vlittle change over time in the frequency of ‘hits’ and ‘missehowever, the frequency of ‘false alarms’ falls over time aconsequently, the frequency of correct rejections increaIn the present study a slightly different pattern of resuwas noted. There was a trend for ‘hits’ to increase atsecond attempt but ‘misses’ to decrease. In contrast topreviously reported study, though, one of the conseque

of an increased ‘hit’ rate was also a moderate increase in

-

.

‘false alarms’ (see Table 4). However, it is argued that isafety critical industry, an increase in the ‘false alarm’ ris perfectly acceptable if it is accompanied by a concomiincrease in the ‘hit’ rate, as the latter is what really mattSecondly, it is unlikely that any aspect of certification wbe accepted on the basis of a single analyst. It has alsoshown that the error identification rate goes up when mple analysts are used as in this case.

The results indicate that existing human error idencation techniques developed for use in other domainsNuclear Power) can be applied with some success in anation context. This study suggests that SHERPA seemsticularly well suited in this respect. Furthermore, empirievidence from comparisons of SHERPA with other formhuman error prediction techniques [16] strongly suggthat SHERPA is the best of the currently available humanror prediction techniques. SHERPA’s error taxonomy is wsuited to the tasks carried out by a civil aircraft pilot, w‘actions’ and ‘checks’ being the most prominent tasksvolved.

This paper demonstrates that SHERPA can already bplied as a flight deck design evaluation tool, although iacknowledged that the initial HTA may be time consumiThe analyst applying SHERPA in this context will requsome knowledge of the skills and procedures required tothe aeroplane under analysis, or at least experience osystems used on the flight deck.

It has been suggested that in an analysis of the abilithuman reliability tools SHERPA would have been ablepredict significant errors that contributed to certain aviataccidents [10]. With further suitable development and vdation, SHERPA (or a SHERPA-like approach) may be uin the future to predict design-induced error for certificatpurposes.

Acknowledgements

The authors wish to acknowledge that this researchmade possible through funding from the UK Departmof Trade and Industry as part of the European EUREK

Programme. Our thanks go to Gillian Richards and John
Page 8: Using SHERPA to predict design-induced error on the flight deck

532 D. Harris et al. / Aerospace Science and Technology 9 (2005) 525–532

e-rchs inity,im-gosto

sis,

liedppl.

m-99,

6

pre-Ad-

r inSe

gnn,

eenAd-

ghti-

realtan-ction

lett90,

ent.992)

ay-

h-ons

r’s

ski,ilotndris,m-67–

: Is-8)

ents–41.cs:

R

ons, US

ryAs-),

Brumwell, at the DTI and Richard Harrison (now at QintiQ) who have provided invaluable support to this reseaeffort. We are most grateful to our European colleaguethe programme, Sidney Dekker from Linköping UniversSweden and Thomas Waldmann of the University of Lerick, Ireland for their help and advice. Our thanks alsoto Air2000, British Midland and JMC airlines in allowing uaccess to their pilots and to the pilots for taking the timecomplete the questionnaire.

References

[1] J. Annett, K.D. Duncan, R.B. Stammers, M.J. Gray, Task AnalyTraining Information No. 6, HMSO, London, 1971.

[2] C. Baber, N.A. Stanton, Human error identification techniques appto public technology: Predictions compared with observed use, AErgonom. 27 (1996) 119–131.

[3] Boeing Commercial Airplanes Group, Statistical Summary of Comercial Jet Airplane Accidents: Worldwide Operations 1959–19Boeing, Seattle, WA, 2000.

[4] Civil Aviation Authority, Global Fatal Accident Review 1980–9(CAP 681), Civil Aviation Authority, London, 1998.

[5] D.E. Embrey, SHERPA: A systematic human error reduction anddiction approach, Paper presented at the International Meeting onvances in Nuclear Power Systems, Knoxville, TN, 1986.

[6] D.E. Embrey, Quantitative and qualitative prediction of human errosafety assessments, Institute of Chemical Engineers Symposiumries 130 (1993) 329–350.

[7] Federal Aviation Administration, Advisory Circular: System Desiand Analysis (AC 25.1309-1A), Federal Aviation AdministratioWashington, DC, 1988.

[8] Federal Aviation Administration, Report on the Interfaces betwFlightcrews and Modern Flight Deck Systems, Federal Aviationministration, Washington, DC, 1996.

[9] Joint Airworthiness Authorities, Human Factors Aspects of FliDeck Design: Interim Policy Paper INT/POL/25/14, Joint Airworthness Authorities, Hoofdorp, 2001.

-

[10] R.J. Kennedy, Can human reliability assessment (HRA) predictaccidents? A case study analysis of HRA, in: A.I. Glendon, N.A. Ston (Eds.), Proceedings of the Risk Assessment and Risk ReduConference, Aston University, Birmingham, UK, March 22, 1995.

[11] B. Kirwan, Human reliability assessment, in: J.R. Wilson, E.N. Cor(Eds.), Evaluation of Human Work, Taylor and Francis, London, 19pp. 706–754.

[12] B. Kirwan, Human error identification in human reliability assessmPart 2: detailed comparison of techniques, Appl. Ergonom. 23 (1371–381.

[13] B. Kirwan, A Guide to Practical Human Reliability Assessment, Tlor and Francis, London, 1994.

[14] B. Kirwan, Validation of three human reliability quantification tecniques – THERP, HEART and JHEDI: Part I – Technique descriptiand validation issues, Appl. Ergonom. 27 (1996) 359–374.

[15] N.A. Macmillan, C.D. Creelman, Signal Detection Theory: A UseGuide, Cambridge University Press, Cambridge, 1991.

[16] P.M. Salmon, N.A. Stanton, M.S. Young, D. Harris, J.M. DemagalA. Marshall, T. Waldmann, S. Dekker, Predicting design induced perror: A comparison of SHERPA, human error HAZOP, HEIST aHET, a newly developed aviation specific HEI method, in: D. HarV. Duffy, M. Smith, C. Stephanidis (Eds.), Human Centred Coputing, Lawrence Erlbaum Associates, Mahwah, NJ, 2003, pp. 5571.

[17] N.A. Stanton, S.V. Stevenage, Learning to predict human errorsues of reliability, validity and acceptability, Ergonomics 41 (1991737–1756.

[18] N.A. Stanton, J.A. Wilson, Human factors: Step change improvemin effectiveness and safety, Drilling Contractor Jan/Feb (2000) 36

[19] N.A. Stanton, M.S. Young, A Guide to Methodology in ErgonomiDesigning for Human Use, Taylor and Francis, London, 1999.

[20] UK Civil Aviation Authority, Joint Airworthiness Requirements (JA25 – Large Aeroplanes), Civil Aviation Authority, London, 1978.

[21] US Department of Transportation, Federal Aviation Regulati(Part 25 – Airworthiness Standards), Revised January 1, 2003Department of Transportation, Washington, DC, 1974.

[22] US Department of Transportation, Aviation Rulemaking AdvisoCommittee, Transport Airplane and Engine: Notice of New Tasksignment for the Aviation Rulemaking Advisory Committee (ARACFederal Register, vol. 64, no. 140, July 22, 1999.