Safety certification of airborne software: An empirical study

:, Q

m

Airborne software

DO178B

Safety standards

ft f

e ce

me

the

hor

ws

e p

por

software throughout its lifecycle. This method is then empirically evaluated through an industrial case

study based on data collected from 9 aerospace projects covering 58 software releases. The results of

this case study show that our proposed method can help the certication authorities and the software

of theeen acs of thainingesigneuntrieponsib

tion)e [6].rities

audits. Each audit is positioned at strategic points in the lifecycle

Contents lists available at SciVerse ScienceDirect

journal homepage: www.e

Reliability Engineering

n Corresponding author.

Reliability Engineering and System Safety 98 (2012) 723reworked before the audit can be repeated. The typical timeproduct or activity provided by Airservices Australia.to reduce the risk of failing the nal certication audit. An earlyindication of a potential certication failure is vital to ensure thatthe software process is not heading in the wrong direction. Anaudit failure will normally require that an artefact must be

0951-8320/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.

doi:10.1016/j.ress.2011.09.007

E-mail addresses: [email protected] (I. Dodd),

[email protected] (I. Habli).1 Disclaimer: The research in this paper was completed before the author

joined Airservices Australia and therefore does not necessarily represent the views

of his current employer. None of the data used for illustration is related to anyare required to audit the development, verication and supportactivities. The audits are known as Stage of Involvement (SOI)as Type Certication, where the authorities approve one sampleof the proposed system type for ight use. Any exact copy of that

Considerations in Airborne Systems and Equipment Certicais the primary guidance for the approval of airborne softwar

Throughout the software process, the certication authocritical systems. Flight approval or certication authoritiesinclude the European Aviation Safety Agency (EASA) in Europe[2] and the Federal Aviation Administration (FAA) in the USA [3].When aircraft systems are rst developed or upgraded, it is theresponsibility of the ight certication authorities to approve thesystem design before it is cleared for ight. This process is known

ware, it is widely accepted that there is a limit on what can bequantitatively demonstrated [4,5], e.g. by means of statisticaltesting and operational experience. To address this limitation,many aerospace software standards appeal instead to the qualityof the development process to assure the dependability of thesoftware. In the civil aerospace domain, DO178B (Software1. Introduction

Commercial airlines provide onetransportation [1]. This has partly bsafety-integrity targets on all aspectdesign and maintenance to crew trTo ensure that aircraft systems are dthe required targets, different covarious organisations that are resthe likely outcome of the audits. The results also highlight some condentiality issues concerning the

management and retention of sensitive data generated from safety-critical projects.

& 2011 Elsevier Ltd. All rights reserved.

safest forms of publichieved by placing highe industry from aircraftand aircraft operation.d and manufactured tos have commissionedle for auditing these

type is also approved for ight use as long as it meets predeneddesign and operational constraints.

In modern avionics, it is the norm that the functionality isimplemented using a microprocessor running complex computersoftware. In many cases, the avionics is a safety-critical item andtherefore must be designed and built to the highest levels ofsafety integrity. The overall safety integrity of the avionics,comprising both software and hardware, is typically speciedquantitatively, e.g. in terms of failure rates. However, for soft-Safety requirements and safety engineers to gain condence in the certication readiness of airborne software and predictSafety certication of airborne software

Ian Dodd a,1, Ibrahim Habli b,n

a Airservices Australia, Building 101 Da Vinci Business Park, Locked Bag 747 Eagle Farmb Department of Computer Science, University of York, York YO10 5GH, United Kingdo

a r t i c l e i n f o

Article history:

Received 8 February 2011

Received in revised form

11 August 2011

Accepted 24 September 2011Available online 1 October 2011

Keywords:

Software safety

Certication

a b s t r a c t

Many safety-critical aircra

approved by the aerospac

consuming, and its outco

software. To ensure that

auditable, certication aut

practice. This paper revie

software safety audits ar

statistical method for supAn empirical study

LD 4009, Australia

unctions are software-enabled. Airborne software must be audited and

rtication authorities prior to deployment. The auditing process is time-

is unpredictable, due to the criticality and complex nature of airborne

engineering of airborne software is systematically regulated and is

ities mandate compliance with safety standards that detail industrial best

existing practices in software safety certication. It also explores how

erformed in the civil aerospace domain. The paper then proposes a

ting software safety audits by collecting and analysing data about the

lsevier.com/locate/ress

and System Safety

rst normalised and then weighted against certication factorssuch as the number and types of defects, which relate to system

l of integrity.es and the,20]. In goal-require the

where Level A is the highest and therefore requires the mostrigorous processes). Each level of software assurance is associated

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 7238safety. The evaluation of our proposed method is based on anindustrial case study covering data collected from 9 aerospaceprojects and comprising 58 software releases. In this work, wefocus on two groups of stakeholders, namely certication author-ity auditors and development teams. Auditors could use the trendof the data over the history of a project lifecycle to identifysoftware problems and possibly misleading information. The datacould also used by the development teams within aerospacecompanies to assess the readiness of a software project againstthe certication targets. As part of our evaluation, we present theadvantages and limitations of our approach from the viewpoint ofboth the developers and auditors.

This paper is organised as follows. Section 2 reviews existingapproaches to software certication and related work. Section 3describes a set of auditing issues concerning the software certi-cation process. Section 4 proposes a statistical method foraddressing many of the auditing issues listed in Section 3 basedon the concept of Statistical Process Monitoring (SPM). Thismethod is empirically evaluated in Section 5 through an indus-trial case study based on a set of data collected from anonymousaerospace manufacturers responsible for the development ofsafety-critical airborne software. A detailed discussion of the casestudy is provided in Sections 6 and 7. The legal and ethical issuesconcerning our proposed method are discussed in Section 8followed by conclusions in Section 9.

2. Background and related work

2.1. Software safety certication

Certication refers to the process of assuring that a product orprocess has certain stated properties, which are then recorded in a

certicate [7]. Assurance can be dened as justied condence ina property of interest [8]. Whereas the concept of safety andassurance cases [911] is heavily used in goal-based standards incritical domains such as defence [12,13], rail [14] and oil and gas[15], compliance with prescriptive standards tend to be the normin the civil aerospace domain [1618], particularly with regard tothe approval and certication of airborne software [6,19]. Inprescriptive certication, developers show that a software systemis acceptably safe by appealing to the satisfaction of a set ofprocess objectives that the safety standards require for compli-ance. The means for satisfying these objectives are often tightlydened within the prescriptive standards, leaving little room forduration between audits is four to six months. It is thereforeimportant for the software and safety engineers to have anindication of how well the software process is adhering to thecertication requirements before a SOI audit is performed. Inaerospace software projects, it is common practice to collectmetrics about the defects found during the lifecycle of the projectand relate these defects to a common denominator (e.g. thenumber of defects found per lines of code). By relating thesedefect metrics to the requirements of the certication authorities,indicators can be generated to determine the readiness of thesoftware for its next SOI audit.

In this paper, we identify a set of issues concerning theauditing process for the approval and certication of airbornesoftware. These issues were generated from interviews withexperienced independent software auditors. We then propose,and empirically evaluate, a statistical method for supportingsoftware certication audits based collecting and analysing dataabout the software throughout its lifecycle. This collected data isdevelopers to apply alternatives means for compliance, whichwith a set of objectives, mostly related to the underlying lifecycleprocess, e.g. planning, development and verication activities(Fig. 1). For example, to achieve software level C, where faultysoftware behaviour may contribute to a major failure condition,57 objectives have to be satised. On the other hand, to achievesoftware level A, where faulty software behaviour may contri-bute to a catastrophic failure condition, nine additional objectiveshave to be satisedsome objectives achieved with indepen-dence [27].

To demonstrate compliance with DO178B, applicants arerequired to submit the following lifecycle data to the certicationauthorities:

Plan for Software Aspects of Certication (PSAC) Software Conguration Index Software Accomplishment Summary (SAS)

They should also make all software lifecycle data, e.g. relatedto development, verication and planning, available for review bysubmission of an argument, which communicates how evidence,generated from testing, analysis and review, satises claimsconcerning the safety of the software functions. Despite theadvantages of explicit safety arguments and evidence, there aresome concerns regarding the adequacy of the guidance availablefor the creation of assurance arguments, which comply with thegoals set within these standards (i.e. lack of sufcient workedexamples of arguments or sample means for generating evi-dence). Many studies have considered and compared these twoapproaches to software safety assurance [2123], highlightingadvantages and limitations of each and how they might comple-ment each other [24].

2.2. DO178B

In the civil aerospace domain, DO178B is the primary guidancefor the approval of airborne software. The purpose of the DO178Bdocument is to provide guidelines for the production of software forairborne systems and equipment that performs its intended function

with a level of condence in safety that complies with airworthiness

requirements [6]. DO178B denes a consensus of the aerospacecommunity concerning the approval of airborne software. Toobtain certication credit, developers submit lifecycle plans anddata that show that the production of the software has beenperformed as specied by the DO178B guidance. The DO178Bguidance distinguishes between different levels of assurancebased on the safety criticality of the software, i.e. how softwarecomponents may contribute to system hazards. The safety criti-cality of software is determined at the system level during thesystem safety assessment process based on the failure conditionsassociated with software components. These safety conditions aregrouped into ve categories: Catastrophic, Hazardous/Severe-Major, Major, Minor and No Effect [25,26]. The DO178Bguidance then denes ve different assurance levels, which relateto the above categorisation of failure conditions (Levels A to E,failbastheure rate of the system is infeasible to justify [16ed certication, on the other hand, standardsThe correlation between the prescribed techniqu

necessarily lead to the achievement of a specic levemight better suit their software products and processes. Onefundamental limitation of prescriptive software standards lies inthe observation that good tools, techniques and methods do notcertication authorities. In particular, the SAS should provide

indicators of the level of condence that can be allocated to the

L

AVInDTr

L(6

erview of DO178B

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 9evidence, which shows that compliance with the PSAC has beenachieved. The SAS should provide an overview of the system (e.g.architecture and safety features) and software (e.g. functions andpartitioning strategies). It should also provide a summary of thepotential software contributions to system hazards, based on thesystem safety assessment, and how this relates to the allocatedassurance level. The SAS then references the software lifecycledata produced to satisfy the objectives associated with theallocated assurance level.

During the certication process, it is not uncommon for aero-space companies to have open issues with the software lifecycledata. This is acceptable as long as any remaining problems do notcompromise aircraft safety. All of the known problems must bedeclared and explained to the certication authorities during theSOI audits. Each problem must be submitted with a categorisationthat determines the potential safety risk to the aircraft [29]. Fourdifferent SOI audits are set throughout the software lifecycle. Eachof these audits typically lasts up to ve days and is scheduled atthe following stages in the software lifecycle:

2.3

moandcertosev(e.gGotivpuwaappdevthrbeFig. 1. OvTool qualificationCode targetCert liaison

coverageLevel C

Level D+

Level D More planning

(28 objectives)Test LL req

Planning Verif test plan, proc& resultsCMQA LL req coverageStatement coverage

HL req robustness

Verif req, design,integ processes

Data & controlHL req Coverage

(57 objectives)SOI#1: Software development planning reviewSOI#2: Software design and development reviewSOI#3: Software verication reviewSOI#4: Final software certication approval review

. Software metrics for certication

Software metrics provide powerful means for estimating andnitoring the cost, schedule and quality of the software productprocesses [30]. Metrics are particularly important for the

tication of safety-critical software, especially metrics relatedproblem reports and test coverage. Despite the availability oferal software project monitoring and measurement processes. the Practical Software and Systems Measurement (PSM) [31],al Question Metric (GQM) [32], Six-Sigma [33] and COnstruc-e COst MOdel (COCOMO II) [34]), very few studies have beenblished on how these approaches can be applied to the soft-re certication processes. Basili et al. discuss how the GQMroach can be used to provide early lifecycle visibility into theelopment of safety-critical software [35]. This is achievedough a set of readiness assessment questions, which shouldanswered against predened measures and models. Habli andsafety evidence. Similarly, Murdoch et al. report some results onthe application of PSM for the generation of measures forsupporting the management of safety processes [37,38]. Some ofthese measures are related to safety certication, e.g. completionagainst certication data requirements. Nevertheless, none ofthese studies provide empirical results that validate the effec-tiveness of these measurement techniques against industrial datagenerated from the software safety certication processes.

3. Problem statement

The current approach to auditing airborne software poses a setof issues to the certication authorities and aerospace companies.Kelly use a similar approach to the assessment of safety-criticalsoftware processes through linking the verication evidence inthe safety case to the processes by which this evidence isgenerated [36]. Weaknesses in these processes are then used as

Level A(66 objectives)Level B

evel B +MC/DC coverageMore independence

rtefact compatibilityerifiabilitydependence

+

ecision coverageansition

Source to objectevel C5 objectives)

assurance levels [28].These issues are summarised as follows:Issue 1Auditors only audit a snapshot of the software

process. Authorities spend a short period of time at the auditcompared to the time spent on the whole software engineeringlifecycle. It is difcult for the auditors to identify complexproblems in the time taken to carry out an audit.

Issue 2Software safety standards are open to differentinterpretations. It is likely for software and safety engineers toover- or under-engineer the software in their effort to full therequirements of the standards. To this end, they may be over-spending to achieve certication credit or under-spending andrisking an audit failure. An aerospace company can seek advicefrom the certication authorities, but, to maintain their indepen-dence, the authorities are restricted in the advice they can offer.

Issue 3Companies might be deceitful: This is very rare butnevertheless not unheard of. A software company might try tomislead the authorities at a SOI audit regarding the status of theirsoftware process. This might be due to nancial or timescalefactors, pressurising the company into making false declarationsin order avoid a delay in project schedules [39]. Typically, thiswould be manifested near the time of an audit, when thecompany could realise that the project is not ready for a SOI audit.

Issue 4There is a lack of objective criteria for determiningsoftware status: In order to assess the achievement of the

ard

ther

ing

he a

ties,

thi

not

me!

ever

he a

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72310certication requirements, the certication authorities mustunderstand the technical aspects of the software being developede.g. the safety issues raised during the development. This isnormally achieved by reviewing the technical reports of theproject, sometimes during the short duration of a SOI audit.Absorbing the technical subtleties of a software project andmaking an objective assessment is a challenging task for theauditors, particularly when they are responsible for the auditingof multiple software projects from different companies.

Issue 5Same errors made again and again: Many compa-nies enter SOI audits and make the same mistakes they havemade before, mainly due to poor understanding of the certica-tion objectives that the authorities aim to assess. This can be dueto lack of experience (perhaps due to turnover of staff) or lack ofrigorous quality control within these companies. The introductionof new technologies could also contribute to this problem.Equally, this issue is a source of problems for companies devel-oping airborne software for the rst time.

The above issues are partly generated from interviews withexperienced independent software auditors. The focus of the inter-views was on the efciency of the auditing process as well as the levelof transparency in the relationship between the certication autho-rities and the development teams. Some insightful quotations fromthese auditors are documented in Table 1. Of course, the above issuesform a subset of problems encountered during the auditing process.Development practices can vary across countries, companies andprojects. For example, other issues include inappropriate reuse ofcertication evidence across different projects or immature deploy-ment of novel technologies. However, the above ve issues seem tobe the most recurring difculties facing the software safety auditorsinterviewed.

Table 1Quotations from experienced software auditors.

It is not easy to assess that the whole project has been developed to the same stand

samples and therefore we do not see what the real problems are unless we dig fur

The audits can be very difcult, especially with companies which are new to develop

experienced enough to determine themselves if the work they have submitted to t

It is very rare that a company will lie to you about their development progress/activi

then you have to be a very good auditor and very alert to spot that they are doing

It is common practice that companies ask for a SOI audit to take place when they are

still request the audit and hope they will pass. This is a real waste of an auditors ti

their audits!

Time and time again the same problems are found at the audits. It is as though we n

development activities across many projects; this would provide companies with tTo this end, if the certication authorities had metrics, basedon predened criteria, reecting the status of the software projectbefore an audit commences, they could be better prepared for aSOI audit, particularly from the technical perspective. If thesemetrics are provided at regular intervals as a project develops, thecertication authorities could use the trend of the data over thehistory of a project lifecycle to identify any misleading informa-tion. For example, one indication might be in the form of suddenimprovements in progress graphs near the time of the SOI audit.Further, if the metric data is normalised across different softwareprojects, it can also be used to compare the quality and progressof a project against other projects which have previously success-fully passed the SOI audits. Equally, quality assurance depart-ments within aerospace companies could use these metrics todetermine the readiness of the software against the certicationtargets. Based on these metrics, focused training may be providedto aerospace companies on ways to improve their in-housecapability and the quality of their certication material.4. Data collection and analysis for certication audits

In this section, we dene a statistical method for collecting andanalysing live data from aerospace software projects to assess thereadiness of a project for a SOI audit by comparing the projectsdata against historical data collected from past projects, whichsuccessfully passed the SOI audits. We rst discuss the types ofdata which should be collected and justify why they are impor-tant for the certication process. We then dene weightingfactors for categorising the collected data against their impor-tance for the satisfaction of the certication objectives. We nallyspecify a detailed four-step process for statistically analysing aprojects data against data collected from past successful projects.It is important to note at this stage that what we propose in thissection is a method for assessing the readiness of a project for acertication audit and as such offers process evidence rather thanproduct evidence. This process evidence merely relates to com-pliance with the DO178B guidance and does not directly relate tocondence in the safety claims concerning the software product.

4.1. Data collection

As part of the software certication process in the civil aero-space domain, different types of lifecycle data are produced,which relate to planning, development, verication and support.Many of these types of data are produced based on compliancewith the aerospace software guidance DO178B and are sum-marised in Table 2.

In this study, based on the lifecycle data shown in Table 2, wedened 15metrics, which should be collected as part of the softwarecertication process. These metrics are rened and encoded using

by taking a small sample. Companies tend to show us only their best vertical trace

ourselves.

ight critical software. Many of these companies fail the audit because they are not

udit is of good quality or not.

but occasionally they do. If they choose to do this due to cost and schedule pressures,

s.

ready to perform the audit. In most cases the company knows it is not ready but they

I really believe the companies believe that we have nothing better to do than attend

learn from the past. We should keep consistent metrics that allow us to compare the

bility to compare their past failures/successes to their current projects.the Goal Question Metric (GQM) technique [32]. GQM is a topdownmeasurement framework for dening goals related to products andprocesses. A goal is interpreted using a set of questions whoseanswers are associated with objective or subjective metrics. A goal isspecied relative to four elements: Issue (e.g. reliability or safety),object (e.g. software, platform or process), viewpoint (e.g. indepen-dent or internal auditing) and purpose (e.g. failure rate reduction).Questions are derived from these elements and subsequently theiranswers estimate the achievement of the top goal using primitivemetrics (e.g. faults detected, failure classications or objectivessatised). These primitive metrics provide measurable referencesagainst which the analysis mechanisms can be performed. Table 3document the GQM for Software Requirements Specication (SRS).This GQM will be used throughout this paper to illustrate thevarious steps for data collection and analysis.

As shown in the metrics of the GQM in Table 3, a key factor inthe assessment of the readiness of the software for certicationaudits is the number and types of defects associated with each

Pro

Soft

Soft

equi

l A

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 11Table 2DO178B Lifecycle Data.

Plan for software aspects of certication

Software development plan

Software verication plan

Software conguration management plan

Software quality assurance plan

Software requirements standards

Software design standards

Software code standards

Software accomplishment summary

Trace data

Software requirements data

Table 3GQM for SRS.

GoalPurpose: To monitor the readiness of DO178B level A Software R

Issue: Is the SRS quality adequate to pass a SOI audit?

Object: SRS preparation and review with respect to DO178B levelifecycle artefact (e.g. problem reports associated with SRS).However, feeding raw defect data is not enough to determine ifa project is ready for a SOI audit. Collected data must rst bepre-processed (ltered) to draw out the issues that are importantfor certication during the SOI audits and amplify these issuesduring data analysis. To amplify the relevant factors, collecteddata should be normalised and then weighted against certica-tion factors such as the number and types of defects, which relateto system safety.

When a defect is identied during the software process, informa-tion about the defect is stored in a problem report. Problem reportsact as a repository for the issues that should be corrected. They alsocontain a rating that denes the impact of the identied problemson the safety of the aircraft. The requirements for software problemreports are predened in certication guidance documents. One ofthese documents is provided by EASA and titled CerticationReview Item CRI T8 The Management of Software Open ProblemReports [29]. This document species ve categories of problemreports, which are listed in Table 4.

TYPE 0 problem reports are a key concern for certicationauthorities. These problem reports capture safety-related issues

Viewpoint: From the certication authorities point of view at a SOI audi

Questions 1. How many problems are raised against the completed SRS2. Are the SRS produced of acceptable quality to pass the SOI

MetricsMetric data: The number of TYPE 0 problem reports that were raised during the SRS developmNumber of requirements in the SRS

The formula used above will be repeated for TYPE 1A, TYPESubjective/objective

metric:

Objective view that will be analysed and compared using Stati

to pass the SOI gates

Table 4Problem report classications [29].

Category Description

TYPE 0 or CAT 0 A problem whose consequence is a failure, un

TYPE 1A or CAT 1A A failure with a signicant functional consequ

system and its

specic application.

TYPE 1B or CAT 1B A failure with no signicant functional conseq

TYPE 2 or CAT 2 A fault which does not result in a failure (i.e. n

operating conditions).

TYPE 3A or CAT 3A A signicant deviation whose effects could be

unintended

behaviour.

TYPE 3B or CAT 3B A none signicant deviation to the methodolorements Specication (SRS) in preparation for a SOI audit

softwareandstarepatt

thasoforreg

t

(pr

aud

ent a

1B,stica

der

enc

uen

o sy

to l

gy (ware conguration index

blem reports

ware conguration management records

ware quality assurance recordsCon

SoftDesign description

Source code

Executable object code

Software verication cases and procedures

Software verication results

Software life cycle environment

guration indexpotentially prevent the release of the software in its currentte to ight. In practice, the categorisation of the problemorts is complicated and can be error prone. This can beributed to some of the following factors:

Problem reports are often poorly written, leading to errorsduring the severity assessment and categorisation.The impact of some problems is often too complex for a singleengineer to understand. The full scope of the impact of aproblem report is not truly understood by the engineer makingthe assessment.The categorisation of problem reports changes as the devel-opment and verication activities progress, e.g. the resolutionof one problem report may inuence the categorisation ofother problem reports.

These issues cast some doubt over a simple weighting metrict only considers TYPE 0 problem reports. For example, if atware system had one TYPE 0 problem report and 50 TYPE 1ATYPE 1B problem reports, an auditor might raise a concernarding the number of the TYPE 1A and TYPE 1B problem

oduced and reviewed) that are ready for submittal to the SOI audit?

it?

nd review

TYPE 2 and TYPE 3 Problem Reportsl Process Control (standard deviation) against projects that have been known

certain conditions, of the system with a safety impact.

e; the meaning of signicant should be dened in the context of the related

ce.

stem functional consequence, fault not detectable by the crew in foreseeable

ower the assurance that the software behaves as intended and has no

plans) that does not affect the assurance obtained.

shown are small. Nevertheless, the impact of the incorrect

captured from projects that successfully passed the SOI audit.

ber

rts

es t

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72312assessment is severe. There are six TYPE 1 problem reports thatshould have been classied as TYPE 0 problem reports (i.e. withsafety impact). To this end, because categorisation errors aresometimes made, it is insufcient to consider only TYPE 0 problemreports. Therefore, our collected data includes all of the problemreport types. Further, as shown in the next section, our proposedweighting criteria are calculated to account for the human errorsin the problem report classication process.

4.2. Selection of weighting factors

Table 6 shows a denition of weighting values based on vedifferent certication concerns. These concerns relate to thedifferent types of problem reports discussed in the previoussection. Here, the weighting values are generated based on thecertication concerns in order to be consistent with the types ofdata collected in the software aerospace domain, e.g. categorisedproblem reports (see [40] for a discussion of the limitations ofreports. The classication of non-functional problem reports(TYPE 2, TYPE 3A, TYPE 3B) may also be incorrect, which mayresult in a functional problem hidden in a non-functional problemreport.

Table 5 shows data that has been collected from a singleproject from an aerospace company developing DO178B level Aavionics software. This data shows the percentages of problemreports, which were incorrectly categorised by engineers duringthe problem report categorisation process. The error percentages

Table 5Data on errors with Problem Report classication on a large aerospace project.

Problem Reports raised on the project before the

rst SOI#4

6000

Issue Percentage of total

number of Problem

Reports (%)

Num

Repo

relat

Wrongly categorised TYPE 0 Problem Reports

that should have been TYPE 1A or TYPE 1B

0.03 2

Wrongly categorised TYPE 1A and 1B Problem

Reports that should have been TYPE 0

0.1 6

Wrongly categorised TYPE 2, 3A and 3B Problem

Reports that should have been a functional

Problem Report

4 240existing classication schemes in the safety domain). Theseweighting values were generated from discussions with variousaerospace auditors and the experience gathered by the rstauthor from various SOI audits. We have explicitly included thistable so that different certication authorities or auditors couldmodify these concerns to suit their certication practices. Oncedened, these concerns should be issued to aerospace softwarecompanies. Table 6 describes how the weighting factors arecalculated for the certication concerns and their associatedseverity factor. Table 7 is used as a guide to select the weightingvalues. For each problem report type, a likelihood value, in therange of 15, is assigned against each of the ve certicationconcerns. This value is then multiplied by the Severity Factorassociated with each of the certication concerns. The resultinggures for all of the certication concerns are then addedtogether to calculate the nal weighting factor for the problemreport type. The weighting values shown in Table 6 are thenconverted to a fraction of the highest weighting value as shownin Table 8.As a prerequisite, normalised data from these projects should becollected in order to generate a normal distribution for theseprojects. The live project data can then be compared against thisnormal distribution. If the project is within the maximumallowable deviation on this normal distribution (discussed inmore detail in Section 5.1), it can be deemed as ready to besubmitted to the SOI audit and vice versa. If the project wasdeemed to be ready for the audit and passes the audit, the datafrom the project should be added to the data set that generatesthe normal distribution for the successful projects. Consequently,the database generating the normal distribution increases in sizeand capability. However, if the project fails the SOI audit, aninvestigation must take place to understand why this failure hadnot been identied prior to the audit. Projects which have failedthe audit, when they had been expected to pass, should be used toidentify patterns in the data that show why a project may fail anaudit. Once a pattern is identied, it should be fed back into themain weighting metrics.

4.3.1. Statistical Process Monitoring (SPM)

At the heart of the data analysis stage is the concept ofStatistical Process Monitoring (SPM) using a normal distributioncurve [41]. SPM provides the ability to statistically analyse similar4.3. Data analysis

Data analysis in our proposed method is centred on comparingproblem report data from a live project with historic data

of Problem

this percentage

o

Comment

Experienced engineer found these issues late in the

programme as he was assessing very high-risk Problem

Reports.

Problem Reports missed by initial assessment but were later

found to be incorrect after accidental re-reviews of the

Problem Reports.

Inexperience of Problem Report reviewers led to many

incorrect assessments.procedures and produce results, which are normally distributedaround the mean value. This is directly applicable to the aims ofthis paper. Certication data items are produced from similarsoftware engineering processes, which are driven by the sameguidelines, i.e. DO178B guidelines, and are within the samedomain, i.e. civil aerospace. Although software engineeringprocesses may differ between aerospace companies, certicationaudits should be carried out in a consistent manner using acommon methodology and standard. If the data collected fromdifferent projects is normalised to account for the different sizeand complexity of the projects, SPM can effectively be used toanalyse such data. In this paper, we pre-process the raw databefore it is plotted on a normal distribution in order amplify theparticular issues that are of concern to the certication autho-rities, based on the weighting factors described in Section 4.2.

When SPM is used, variability (standard deviation) in the datacollected from the projects that have successfully passed the SOIaudits can be identied. On the one hand, if there is a largestandard deviation, we can see that the variation between the SOI

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 13Table 6Weighting factors calculation.

Description Derivation of weighting criteriaaudit passes is considerable. This might reveal that the auditresults are unacceptably subjective and that the auditors mightnot have a common theme, mandate or training. This may also beused to identify certain anomalies, such as overly strict or lenient

Categorymultiplicationfactor severityfactor

5 4 3

Rate value between 1 and 5

Category ProblemReport hasknownoperationalsafety risks

Likelihood ProblemReport could haveundetermined safetyrisk operational sideeffects

LikelihoodReports hcombinatthis Problem Report as a TYPE 0 certication failure

Type 0 Likelihoodrating

5 3 3

Rationale TYPE

0 Problem

Report is a

safety impact

TYPE 0 Problem Report

side effect expected

TYPE 0 Pro

be comple

understoo

Weighted value 25 12 9

Type 1A Likelihoodrating

2 2 2

Rationale Major

functional

impact

possible

hidden risk to

safety

Major issue side effects

possible but low risk for

safety

Complex P

totally und


Type 1B Likelihoodrating

1 1 2

Rationale Minor

functional

impact could

have hidden

side effects

Functional issue but less

chance of safety side

effect expected

Not compl

normally t


Type 2 Likelihoodrating

0 0 0

Rationale N/A for TYPE 2 No functional effect for

TYPE 2 Problem Reports

No functio

2 Problem


Type 3A Likelihoodrating

0 0 0

Rationale N/A for TYPE 3 No functional effect for

TYPE 3 Problem Reports

No functio

3 Problem


Table 7Weighting factors assignment.

Problem report criticality Likelihood of described event happening

Very high High Medium Low Very low

Type 0 5 4 3 2 1

Type 1A 4 3 2 1 0

Type 1B 3 2 1 0 0

Type 2 2 1 1 0 0

Type 3 2 1 0 0 05 4

blem Reports can

x and not totally

Already is a TYPE 0 TYPE 0 Problem

Report safety issues2 1

of other Problemaving functionalional effects with

Likelihood that thisProblem Reportshould be classied

Likelihood a singleProblem Report ofthis type will cause

Weightingauditors. On the other hand, if the standard deviation is small,it might show that the auditors are applying common practicesand rules.

4.3.2. Data analysis steps

Driven by the concept of SPM and the weighting factorsdiscussed in Section 4.2, Fig. 2 depicts an overview of ourproposed data collection and analysis process, comprising thefollowing four steps:

Step 1normalising the data: Raw defect data for all certi-cation artefacts in a software lifecycle step for a project iscollected (e.g. all the SRSs in the project). This raw defect datadetails the number of problem reports for each problem reporttype (TYPE 0, TYPE 1A, TYPE 1B, TYPE 2, TYPE 3A, TYPE 3B). Next,the total number of problem reports for each of the six problemreport types is calculated. The six resulting sums are thennormalised using a selected normalisation factor (e.g. the total

d are likely to cause cert

problems

10 4 60

3 2

roblem Reports not

erstood

Reasonable chance of

Problem Report being

classied wrong

Reasonable chance of

Problem Report could

cause certication

failure

6 2 32

0 0

ex Problem Reports

otally understood

Minor functional

impact identied very

unlikely Problem

Report classied

wrong

N/A for TYPE 2

0 0 15

0 0

nal effect for TYPE

Reports

Doc only no possible

Problem Report is

TYPE 0

N/A for TYPE 2

0 0 0

0 0

nal effect for TYPE

Reports

Doc only no possible

Problem Report is

TYPE 0

N/A for TYPE 3

0 0 0

sthe

e av

betw

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72314Table 8Conversion of weighting values to fractions.

Weighting factors

Category Weighting Weighting asa fraction

Weightingin use

Median ofweightingvalues

Comment

TYPE 0 60 1.00 1.00 0.25 Median ofand not th

difference

TYPE 1A 32 0.53 0.53TYPE 1B 15 0.25 0.25TYPE 2 0 0.00 TYPE 3 Aand B

0 0.00

Total 107 1.783 1.78

Artefact 1 for lifecycle stage

Comparison of a single lifecycle stage on a single projectnumber of requirements in all the SRS for a project). All of thesenormalised problem report types are then summed to producethe lifecycle stage total normalised sum. This normalisation stepis repeated for all of the projects in the data set (e.g. repeat for allof the SRSs in the projects that are in the data set).

Step 2weighting the data: For each project, the normalisedraw defect data items for each of the six problem report types(calculated in the previous step) are multiplied by the weightingfraction dened in Table 8. This produces a set of six weightednormalised data items for each problem report type for a project(e.g. all the SRSs in the project). This is repeated for all of theprojects in the data set (e.g. repeat for all of the SRSs in all of theprojects that are in the data set). Next, all of the weightednormalised Problem Report types are summed to produce thelifecycle stage total weighted normalised sum. This is thenrepeated for each of the projects in the data set (e.g. repeat forall of the SRSs in all of the projects that are in the data set). Afterthe data has been multiplied by the weighting values, it reectsthe authorities importance criteria and it can now be used toproduce a normal distribution, from which the mean and thestandard deviation (s) can be identied.

PR Type 0 (normalised) x certification weightingPR Type 1A (normalised) x certification weightingPR Type 1B (normalised) x certification weightingPR Type 2 (normalised) x certification weightingPR Type 3 (normalised) x certification weighting



Artefact 2 for lifecycle stage

Artefact ..N for lifecycle stage

Fig. 2. Comparing a single project defect daweighting is used because we want to know what the middle weighting value is

erage. Otherwise this would falsely skew any calculation, as there is a non-linear

een the majority weighting and the minority.Step 3plotting the normal distribution: The normal dis-tribution for all of the lifecycle stage total weighted normalisedsum in the data set (e.g. for all of the SRSs in the projects that arein the data set) is calculated. The normal distribution for thelifecycle stage total weighted normalised sum is then plotted.

Step 4comparing the artefacts from one project againstother projects: The normal distribution can now be used tocompare a single projects software lifecycle certication artefacts(e.g. all SRSs from one project) against this normal distribution todetermine if they are within the maximum allowable deviationlimits. The maximum allowable deviation limits can either beassumed or predened, such as in a Six Sigma scheme [33], orcalculated as illustrated in the next section.

5. Industrial case study

In this section, we evaluate the statistical analysis methoddened in the previous section using a data set which wasprovided, in an anonymous format, by two large avionicsmanufacturers. These manufactures are responsible for the

Sum

Sum

Sum

SPM

SPM performed on artefacts for a single

project lifecycle stage.

The SPM is used to compare artefacts in a

single project against one another

ta to data collected from past projects.

project, the total number of software requirements and the

Table

9Snapshotoftheproject

data

usedin

thisexperimentshowingtherst5SRSitems.

Raw

SRSData

Norm

alisedandWeightedData

Item

that

PASSED

SOI#2

audit

Number

TYPE

0Problem

Reports

Number

TYPE1A

Problem

Reports

Number

TYPE1B

Problem

Reports

Number

TYPE

2Problem

Reports

Number

TYPE3A&

3B

Problem

Reports

Totalnumber

ofProblem

Reportsopen

attheSOI

audit

Number

Requirements

inSRS

Numberof

Problem

Reportsper

requirement

Weighted

NumberTYPE

0Problem

Reportsper

requirement

Weighted

NumberTYPE

1AProblem

Reportsper

requirement

Weighted

NumberTYPE

1BProblem

Reportsper

requirement

Weighted

NumberTYPE

2Problem

Reportsper

requirement

Weighted

NumberTYPE3A

&3BProblem

Reportsper

requirement

SUM

ofall

Weighted

Norm

alised

Problem

Reports

10

531

12

20

68

2284

0.03

0.0000

0.0012

0.0034

0.0000

0.0000

0.0046

20

532

620

63

2284

0.03

0.0000

0.0012

0.0035

0.0000

0.0000

0.0047

32

15

46

13

20

96

2284

0.04

0.0009

0.0035

0.0050

0.0000

0.0000

0.0094

41

27

55

30

71

184

2284

0.08

0.0004

0.0063

0.0060

0.0000

0.0000

0.0128

51

32

47

59

100

239

2284

0.10

0.0004

0.0075

0.0051

0.0000

0.0000

0.0131

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 15number of faults found are provided. The data is collected fromartefacts submitted to a SOI#2 audit. For each of the projects, theoutcome of the SOI#2 audit is documented so that our proposedmethods predictions may be validated against the real auditresults. The aim of this case study is to demonstrate howour proposed method can be used to make decisions concerningthe likelihood that a project will successfully pass a SOI audit. Theaim is also to identify the strengths and weaknesses of themethod and explain where and why the method might fail orsucceed in predicting the outcome of a SOI audit. It is understoodthat the data set collected is relatively small and therefore theprediction capability of the data may be limited. This data isextremely difcult to obtain due to its commercial sensitivity.Ideally, our approach would be more thoroughly evaluated byusing more data from the certication authorities as they have awealth of data from projects which have been audited forcertication.

5.1. Calculating the maximum allowable deviation from the mean

As a prerequisite for applying the four-step process proposedin the previous section, the maximum allowable deviation fromthe mean of the normal distribution should be dened. In order tocalculate the maximum allowable deviation, factors such asNatural Variance (NV) and Assignable Variations (AVs) shouldbe considered. This will allow current projects to be judgedagainst a calculated rather than an assumed (3s envelope)acceptable tolerance envelope. The NV observed in SPM is thevariance in the recorded data due to natural uctuations in howitems are produced. For example, for ndings identied during anSRS review, some of the results could be attributed to issues thatare not readily controllable such as reviewers having personalproblems affecting their review judgements. On the other hand,variations that occur in a process due to conscious technical ormanagerial reasons are known as Assignable Variations (AVs).Generally, NV is difcult to reduce, as it is an inherent part of theprocess. Conversely, AV can be reduced through process improve-ment programmes. It is important to note that during thedevelopment of airborne software, there is an amount of AV thatcould be reduced, but is agreed as being an acceptable deviation.This is as a result of project management decisions to release thesoftware to the SOI audit with known problems, dened in openproblem reports. NV and AV are analysed and calculated inSections 5.1.1 and 5.1.2 and are then used in Section 5.1.3 todetermine the maximum acceptable variation or acceptabletolerance envelope.

5.1.1. Calculating the Natural Variance (NV)

In this section, metrics are collected from the SRS developersand reviewers in order to calculate the NV. Table 10 shows a set ofdata collected during the SRS development and review from thetwo avionics manufactures. The data was solicited using a ques-tionnaire to extract the gures. The questions were derived fromthe analysis of data from previous SRS reviews that indicatedwhere there may be weaknesses in the SRS production and reviewdevelopment of avionics containing software certied in accor-dance with the DO178B level A criteria. The data has beengenerated within the last 10 years and has been captured duringthe development of the Software Requirements Specication(SRS). The data has been collected from 9 projects covering 58software releases. Table 9 shows a snapshot of this data. Eachproject included between 1800 and 3800 software requirements.Each project had a budget of more than 5 million Euros. For eachprocess. Of course, the data has been captured from only two

Table 10Natural Variance calculation.

Natural Variance (NV) of SRS faults Data gathered from SRS writing and review of level A software over two projects

Item Source Description COMPANY 1 estimated variancein terms of number ProblemReports raised per 100 ProblemReports

COMPANY 2 estimated variancein terms of number ProblemReports raised per 100 ProblemReports

Notes

1 Reviewer Reviewer unknowingly does not understand

the parent requirement and raises a Problem

Report due to his lack of knowledge

2 2

2 Reviewer The Reviewer is trained but inexperienced

and unknowingly misunderstands the

review requirements and raises unnecessary

Problem Reports

1 1

3 Reviewer The Reviewer unknowingly does not

understand the content of the SRS

requirements being reviewed

1 1

4 Reviewer The Reviewer concentrates too much on

syntax and not technical content and raises

Problem Reports to cover typos

2 1

5 Reviewer The reviewer is not sure if there is a problem

so raises a Problem Report

1 0

6 Author Parent requirement is poorly dened and is

unknowingly implemented incorrectly

causing a Problem Report to be raised

1.5 1

7 Author The Author is trained but inexperienced and

therefore unknowingly writes bad

requirements causing a Problem Report to be

raised

1 1

8 Author Conicting requirements force a fault to be

reported as a Problem Report

2 1 Requirements conicting

within the development

activity NOT due to integration

with other systems

9 Author Problem Report raised twice for the same

nding but not identied by the Problem

Report originators

2 1

10 Total number of Problem Reports raisedfor every 100 Problem Reports due toNatural Variance caused by a teamreviewing the SRSs

14 9 Sum items 19 and derive the%. This is then the % ofProblem Reports raised thatare due to the NV

11 For a typical project that passed SOI#2audit the average number of ProblemReports raised in total is

908 908 Average across sampledprojects

12 For a typical project from this companythe average number of Problem Reportsraised due to the Natural Variance is

127 82 Item 10/100 Item 11 (not yetnormalised)

13 For a typical project that passed SOI#2audit the average total number ofrequirements on analysed projects is

2631 2631 Average across sampledprojects see

14 Natural variance of Problem Reportsraised for SRS normalised by requirements

0.048 0.031 Item 12/Item 13 NaturalVariance (Normalised)

15 AVERAGE NV of two project shown (NOTWEIGHTED)

0.04 Average of Item 14 for bothprojects

16 Median of weighting values 0.25 Refer to Table 817 AVERAGE NV of two project shown

(WEIGHTED)0.01 Item 15 Item 16

Natural variance of SRS faults Pre-requisites to analysisData gathered from SRS writing and review of DO178BLevel A software

Item Source Description

18 Reviewer The documents presented for Reviewer are

the correct version and applicable to the

review

19 Reviewer The Reviewers are trained to carry out

DO178B reviews

20 Author The Authors are trained to prepare a DO178B

document

21 Author The author and reviewer activity are

localised to the one system and are not part

of the global aircraft systems integration

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72316

Variation due to project-based decisions to leave some pro-

onactm

V

a

1

2

0

0

00.25

0

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 17blem reports open at the time of the SOI audit.

If we assume that the open problem reports capture thedevelopment process issues, then the AV can be attributed tothe problem reports left open at the time of the SOI audit.Therefore, the average number of problem reports left open atthe time of the SOI#2 audit, across our sample projects in thiscase study may be used to calculate the AV. The examination ofthe data set used in this case study shows that the average

nusamassnoTabnuave

5.1

NVweMashodeftoltheFurproopmetheVariations that can be reduced through improving the engi-neering processes;companies, which limits the accuracy of the calculation. Toimprove the accuracy of the NV, more reference points wouldbe required. However, this limited data set demonstrates theprinciple. The data was captured after both companies hadalready taken mitigation actions.

5.1.2. Calculating the Assignable Variance (AV)

The AV comprises two types of variation:

Table 11Calculation of maximum allowable tolerance for the normal distribution.

Calculation of Maximum Acceptable Tolerance from the MEAN

Item Description

1 For a typical project that passed a SOI#2 audit the average number of

Problem Reports that are left open for the SRS at the SOI#2 audit is (this is

project decision to leave Problem Reports open)

2 For a typical project that passed SOI#2 audit the average Total Number

Requirements on analysed projects

3 For a typical project that passed SOI#2 audit the average number Problem

Reports that are left open for SRS at the SOI#2 audit

(NORMALISED)Assignable Variance (AV) (NOT WEIGHTED)4 AVERAGE of two project shown Natural Variance (NV) (NOT WEIGHTED).

Refer to Table 10

5 MAX acceptable TOLERANCE from MEANNVAV (NOT WEIGHTED)6 Median of weighting values

7 MAX acceptable TOLERANCE from MEAN (WEIGHTED)mber of SRS Problem Reports that were left open for ourple projects at the time of their SOI#2 audit was 137. Theignable variance is therefore equal to 137, which will bermalised by dividing it by the average number of requirements.le 11 shows how the AV is calculated using the averagember of problem reports left open at the SOI#2 audit and therage total number of requirements.

.3. Calculating the maximum tolerance envelope

The Maximum Variance is calculated by adding the AV to the. Next, we must weight this value using the MEDIAN of theighting values (as calculated in Table 8). This value reects theximum Variance for the weighted normal distribution (aswn in Table 11) which is the maximum number of normalisedects allowed in the data set or the maximum allowableerance. The MEDIAN of the weighting values is used becauseweighted data is biased towards the TYPE 0 problem report.ther, there is a non-linear relationship between each of theblem report types. Most of the problem reports that remainen for the SOI are actually Type 2 or Type 3. Therefore, if thean of the weighted data values is used, it would unfairly biasmaximum tolerance weighting (due to the weighting placedIn the section, we apply the four-step process proposed in Section4.3.2. The data has been collected from 53 software releases. Thedefect data for each of the releases reects the software status whenit was presented to the SOI#2 audit. All of the 53 software releasespassed the SOI#2 audit successfully. The data shows the number ofproblem reports that were raised during the SRS development, whichhave been grouped against six categorisation types. The following is asummary of the application of the four-step process:

1. For each project, the total number of SRS ndings of eachproblem report categorisation type has been normalised bydividing it by the total number of requirements in the SRS. Allof the normalised values for each problem report criticalitytype have then been summed (known as the lifecycle stagetotal normalised sum).5.22.

3.

4.

5.2

lifesofthemi5.1breforthesucType 0 Problem Reports), making it larger than it shouldually be. The maximum variance will now be referred to as theaximum acceptable tolerance.

. Preparing and plotting multiple project SRS defect dataItem 5 Item 6.02Median of the weighting is used because we want to know what the

middle weighting value would be not the average this would skew

calculation as there is a large difference between the majority weighting

and the minority (Refer to Table 8)..09alue Notes

37 Average across sampled projects

631 Average across sampled projects

.05 Item 1/Item 2

.04 See Table 10 item 15

Item 3Item 4For all the projects, a normal distribution has been derivedfrom the lifecycle stage total normalised sum and the data isplotted and shown in Fig. 3.The normalised data has then been weighted according to Table 6.After the data has been weighted, all the weighted normalisedvalues for each problem report type have then been summed(known as the lifecycle stage total weighted normalised sum).For all the projects, a normal distribution has been calculatedfrom all the lifecycle stage total weighted normalised sumsand the data is plotted as shown in Fig. 4.

.1. Preliminary analysis

Fig. 3 shows the normal distribution plot for the un-weightedcycle stage total normalised sums raw defect data for the 53tware releases that successfully passed the SOI#2 audit. Further,plot shows the acceptable tolerance envelope that was deter-

ned using the maximum variation, which was calculated in Section.3. The plot also shows that 3 of the projects plotted actuallyech the acceptable tolerance envelope. This indicates that the SRSthose projects should have been reworked so that they fall withintarget tolerance value. It must be noted here that these 3 projectscessfully passed the SOI#2 audit.

10

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72318-1

0

1

2

3

4

5

6

7

8

9

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

Pro

babi

lity

Den

sity

Acceptable area

Density

NOT weighted PRs

Mean LineOn the other hand, Fig. 4 shows the normal distribution plotfor the lifecycle stage total weighted normalised sums for thesame 53 software releases that successfully passed the SOI#2audit. This is a more important plot as it shows the weighted dataplotted with respect to the issues, which are important for thecertication authorities during the audit. The plot also shows theacceptable tolerance envelope that was calculated in Section5.1.3. All of the projects are within the acceptable toleranceenvelop, which is in contrast to the plot in Fig. 3.

This demonstrates the difference between collecting andanalysing raw defect data in isolation and performing the analysisagainst weighted data that reects the auditing needs of certica-tion authorities. If decisions had been made based on the rawdefect plot (Fig. 3) without considering the weighting criteria,additional work may have been performed, which would havedelayed the SOI#2 audit without having any effect on theoutcome of the audit. If the project decision makers had beenpresented with Fig. 4, they could have made a balanced decisionbetween taking the risk of failing the audit or passing it withoutreworking the SRSs. This can be useful as it allows calculatedrisks to be taken, which is central to safety-critical projects inwhich delays can be costly and create damaging publicity.

5.3. Evaluation against ve random projects

In this section, we evaluate the analysis method and thegures generated in the previous section again data from ve

Fig. 3. Un-weighted normal distribution plot of SRSs, which passed SOI#2.-10

0-0.02 -0.01 0 0.01 0.02 0.03 0.04

Weighted Mean, 0.01

1 * =(68%)

0.01Mean + 3 *

=(99.7%)

0.04

Weighted Standard

Deviation,

0.01 2 * =(95% )

0.02 Mean - 3 * =(99.7%)

-0.0220

30

40

50

Pro

babi

lity

Den

sity

AcceptableAreaDensity

Weighted PRs

Mean Line

GRAPH 2Stage 2 - Weighted Data For CAT Classification for a group of projects that PASSED SOI#2passed SOI#2different projects. Interviews were conducted with the projectmanagers of these projects. The objective of these interviews wasto understand the background behind the data in order to be ableto interpret the outcome of the evaluation, particularly when thedata is plotted against the successful SOI#2 audit project datanormal distribution. It is important to note that none of theseprojects were included in the data used to derive the successfulSOI#2 audit normal distribution comparison shown in Figs. 3 and4. As such, this provides a clean baseline against which theseprojects could be compared. Four of these projects failed theirSOI#2 audit and only one project passed the SOI#2 audit.

5.3.1. Un-weighted data plotted against the successful SOI#2 audit

normal distribution

Fig. 5 shows a plot of the un-weighted data of the selected veprojects. Four of the test project data points lie outside of theacceptable tolerance envelope and therefore give the initialindication that there are problems with the SRS artefacts. In allfour cases, a synopsis from the project history (extracted throughinterviews with the project managers) indicates that all of theprojects were in need of recovery and that there were seriousquality issues associated with the SRS development. There wereclearly unacceptable numbers of problem reports raised againstthe SRS artefacts. This is clearly reected in the un-weighted data

3 * =(99.7%)

0.03

Weighted Natural

Variance (NV) 0.01

Calc using tolerance from

Table 11MAX

Acceptable TOLERANCE from MEAN

(WEIGHTED) 0.02

Calc using

tolerance fromTable

11Mean +Tolerance 0.03

Calc using

tolerance from

Table 11Mean

Tolerance -0.01

Fig. 4. Weighted normal distribution plot of SRSs, which passed SOI#2.

Me

NOweiSta

WeiSta

Dev

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 193 * = (99.7%)

0.14Devplo

1.

2.

5.3

nor

pro3 atheinhavfaifol

1.

NOweiNatVar(NV

FigSOIghted an,

0.06 1 * = (68%)

0.05 Mean + 2 * = (95% )

0.15

T ghted ndard iation,

0.05 2 * = (95% )

0.10 Mean - 2 * = (95% )

-0.04NOTwei-1

0

1

2

3

4

5

6

-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Pro

babi

lity

Den

sity

Acceptable areaDensityNOT weighted PRsTest Failed Project 1Test Failed Project 2

Test Failed Project 3Test Failed Project 4Passed Project 1Mean Line

7

8

9 GRAPH 3Stage 2 - NOT Weighted Data For CAT Classification for a group of projects that passed SOI#2 with FAILED PROJECT 2, 3, 4, 5 and passed project 1 MARKEDt shown in Fig. 5. It is important to note the following:

Passed Project 1: This project gives a clear indication that theSRS quality associated with the project was poor and yet theproject actually passed the SOI#2 audit. The reason for this isdiscussed in Section 5.3.2.Failed Project 1: This project gives a clear indication that the SRSquality was good and that there were no problems expected withthe SOI#2 audit yet this project actually failed the SOI#2 audit.Again, the reason for this is discussed in Section 5.3.2.

.2. Weighted data plotted against the successful SOI#2 audit

mal distribution

Fig. 6 shows a plot of the weighted data of the selected vejects. Three of the test project data points (failed projects 2,nd 4) lie outside of the acceptable tolerance envelope andrefore indicate that they will fail the SOI#2 audit. As discussedthe previous section, all of these three projects were known toe SRS quality problems and it was suspected that they wouldl a SOI and indeed they did fail. Based on the plot in Fig. 6, thelowing observations can be made:

The Failed Project 3 is clearly close to the maximum toleranceallowed for a SOI#2 audit pass. Although the plot is indicatingthat the project may fail the SOI#2 audit, the project manage-ment team may decide to take the risk that the SOI#2 auditwill fail and submit the project to the audit.

T ghted ural ia nce ) 0.04


Table 11MAX Acceptable

TOLERANCE from MEAN (NOT WEIGHTED) 0.09


Table 11Mean + Tolerance 0.15


Table 11Mean - Tolerance -0.04

. 5. Un-weighted randomly selected projects plotted against the successful#2 audit normal distribution.WeigNa2.

3.

4.

5.

Var(

Figaudan, (68%) = (99.7%)

ghted ndard iation,

0.01 2 * = (95% )

0.02 Mean - 3 * = (99.7%)

-0.02

3 * = (99.7%)

0.03

hted tural


Table 11

MAX Acceptable


Table 11WeigMe-10

-0.02 -0.01 0 0.01 0.02 0.03 0.04

hted 0.01 1 * = 0.01Mean + 3 *

0.040

10

20

30

Prob

abili

ty D

ensi

ty40

50 GRAPH 4Stage 2 - Weighted Data For CAT Classification for a group of projects that passed SOI#2 with FAILED PROJECT 2, 3, 4, 5 and passed project 1 MARKEDThe same decision-making approach discussed above withregard to failed project 3 could be taken with the failedproject 2. However, this clearly will carry much more risk offailure as it is further outside of the acceptable toleranceenvelope. It is acknowledged further work is needed toaddress borderline cases. These cases could be calibrated usingfurther risk analysis that allows the level of risk to be under-stood and a more informed decision to be made.The Failed Project 4 is clearly outside of the acceptabletolerance envelope. The SRS problem reports are indicatingthat the SRS artefacts have quality issues. This project is clearlynot ready for a SOI#2 audit and the decision of submitting theproject to a SOI#2 audit would be wrong.The Passed Project 1 has clearly moved into the acceptabletolerance envelope and is indicating that the project is ready forand SOI#2 audit (whereas this project was outside of theacceptable tolerance envelope in Fig. 5). This is again supportingthe assumption that the type of problem reports is far moreimportant than the number of the problem reports raised. This isbecause the weighting criteria amplify the functional safetyissues more than the documentation issues. However, the largenumber SRS problem reports clearly shows that the SRS hasquality issues. This will be noticed by the Quality Assurancedepartments and will be raised as a quality concern.The Failed Project 1 introduces a new issue to the analysis.The project appears to be within the acceptable toleranceenvelope. Yet, the project has failed the SOI#2 audit. Thereason was that a Type 0 problem report was discovered at theactual SOI#2 audit. The problem report was deemed to pose an

iance NV) 0.01

TOLERANCE from MEAN

(WEIGHTED) 0.02Mean +

Tolerance 0.03Calc using

tolerance from Table 11

Mean - Tolerance -0.01

. 6. Weighted randomly selected projects plotted against the successful SOI#2it normal distribution.

on with the SOI#2 audit regardless of the existence of the problem

Alt

6.

onl

aermoare

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72320of the SOI#2 audit. The opposite is also true. The data has demon-strated situations where the SRS quality is believed to be acceptable,but is actually unacceptable due to safety-related problem reports,which pose a high risk of a SOI#2 audit failure. This is because safetyrelated problem reports are a key consideration for the certicationauthorities. Of course, a weakness of the analysis method wasrevealed by evaluation against a Black Swan project. This evaluationis a difcult test for most statistical methods to pass. However, assuggested by Taleb [42], Black Swan events could be addressed byfactoring a number of robustness measures into the project. However,conthetosiderations, it reveals that some of the uncertainties concerningquality of some SRS artefacts do not pose high risks to the successcase. When the data is weighted to account for the certication

audof the software artefacts for SOI#2 audits. If the raw un-weighted SRSdefect data is analysed in isolation, the data can easily give the falseindication that a project is not in a suitable state to pass a SOI#2 auditand that the defective SRS artefacts must be reworked before the

it. As illustrated in the previous section, this is not always thedatbense warning signs were not acted upon.

Discussion

The case study in the previous section shows that the use of thea collection and analysis method proposed in Section 4 has someets, particularly for monitoring the SRS quality and the readinesswatheve identied this problem and not the software engineers.hough as with many ndings of this nature, there wererning signs that there was an issue with the software, butsiohareport.2. The event had a major impact. The audit did actually fail which

caused a major delay to the project and embarrassment to thesoftware engineers.

3. After interviewing the project manager, he declared that afterthe authorities had identied the issue, it was obvious that theproblem report should have been identied as being a majorsafety issue and should have been resolved before the SOI#2audit was performed.

The hidden impact of this Black Swan event is the profes-nal embarrassment factor in that the certication authoritiesunacceptable level of risk to the safety of the ight. As a result,the certication authorities did not allow the project to passthe SOI#2 audit. In short, the correct consequences of thisproblem report were not identied by the software engineersbut were later identied by the authorities at the audit.

The failed project 1 meets the criteria of the statisticalconcept of the Black Swan [42]. Taleb stated that a Black Swanevent has the following criteria [42]:

1. Rarity: it is a surprise;2. Impact: it has an extreme impact;3. Retrospective predictability: after the fact, it is rationalised,

making it explainable and predictable.

By further examining the history of Failed Project 1, each ofthe above criteria is clearly satised:

1. The event was not identied by the software engineers and was asurprise to them. However, the event was identied by thecertication authorities as being severe enough to fail a SOI#2audit. The software engineers believed that the project was readyfor the audit. It was not a project management decision to presseffectively reduce the number of Black Swan projects, it is notto be reliable. This could be ignored if the NV were insignicant.However, at present, the NV is 43% of the acceptable toleranceenvelope. As such, a shift in the calculated NV will inuence theenvelope signicantly. To this end, data from more companies isneeded to improve the accuracy of the calculated AV, ideally fromdifferent nations throughout the world.

Another area of uncertainty is the amount of data used to producethe SOI#2 audit Normal Distribution graphs for the successfulprojects, as shown in Figs. 5 and 6. Although this is a reasonablylarge set of data (53 software releases), the ideal situation wouldbe that such an important graph should be produced fromthousands of data points collected from a variety of manufacturersand projects across the world. This would allow for a global viewof the successful SOI audit projects criterion. Using a larger dataset may change the shape of the normal distribution bell curve,which could change the number of projects that fall inside theacceptable tolerance envelope.

6.3. SPM normal distribution

When different projects and different lifecycle data are con-sidered from around the world, there will undoubtedly be adifference between the mean and the standard deviation calcu-lated for each individual project data set. This is because there area number of factors that could inuence the number of problemreports found on different projects, which would clearly skew thenormal distribution. Examples of these issues are as follows:

If two projects were compared where the code on one projectwas developed manually and the code on the other project was. Accuracy limitations of the case study

The data used in the case study is difcult to extract fromospace companies due to condentiality issues. The lack ofre data sets might be a weakness. Specically, there are twoas where this weakness is manifested:

The calculation of the acceptable tolerance envelope is an area ofuncertainty. In this paper, the calculation of the NV relies on onlytwo sets of data from two companies. This is obviously too small6.2method. In the rest of this section, we discuss a number of observa-tions relating to the selected weighting criteria and accuracy limita-tions of the case study and the SPM normal distribution.

6.1. Weighting criteria

The weighting criteria dened in this paper were generated fromdiscussions with various aerospace auditors and the experiencegathered by the rst author from various SOI audits. The calculationof the weighting criteria presented demonstrates how the weightingfactors can be calculated to reect the needs of the certicationauthorities. However, to improve accuracy, the calculation of theweighting factors should be discussed at length, and agreed on, byexperienced certication auditors and then discussed and sharedwith the aerospace companies. Concerning the precision of theweighting criteria, we have carried out a sensitivity check on theweighting factors. Needless to say, this check has revealed that theprecision of the weighting factors has little effect on the nalconclusions of the analysis method (i.e. is the project ready for aSOI audit or not?). What is more important, however, is the relativeweighting between the different problem report types.dev

y the analysis method that will need to change, but also theelopment processes that feed the problem report data to theautomatically generated, then there would most likely be less

the certication results as long as they are resolved at the time of

adv

7.2

dis

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 21. Advantages

The data collection and analysis method offers the followingantages with regard to the auditing process:

The history of the software development can be maintainedand tracked over the lifecycle of a project. The engineers and7.1the SOI audit. As illustrated in the previous two sections, this isamplied by plotting the weighted data rather than the raw data.

7. Overall evaluation

Apart from one person with a basic understanding of statisticalmodelling, most engineers and auditors should be able to use theapproach presented in this paper without the need for any furthercalibration of the models. It is important to note that our case studyfocuses on the review of SRS at SOI#2 audits. However, our approachis designed to be used continuously through the auditing process. Asthe software is developed, live defect data will be analysed in the SPMmodel and the overall state of the artefacts in the lifecycle stage canbe evaluated. If at the measurement point, an artefact is deemed to beunsuitable for certication, the artefact should be reworked toimprove its quality and mitigate identied defects. This rework willthen result in improved defect data being fed into the SPM model,which could indicate that the software has reached a suitable level ofquality for submission to the certication authorities. This approachmay be used live and on a continual basis if the model is integratedwith the problem report database. In the rest of this section, wediscuss the advantages and disadvantages of using our data collectionand analysis method from the point of view of the two keystakeholders, namely the certication authorities and the aerospacecompanies.However, this does not imply that the automatically generatedcode is better. It simply means that the variability between thetwo coding methods is large and therefore the defect datawould look very different for these two projects.

The conventional approach to applying a DO178B process is towrite the SRS artefacts, review them and then move into thedesign and coding phase after the SRS artefacts are approved(i.e. a variant of the waterfall lifecycle model [43,44]).However some companies might place less importance on anearly review phase. They might develop the SRS, design andcode quickly to allow functional testing to start as quickly aspossible (i.e. more agile processes [45,46]). This is done todetermine the main code bugs early in the lifecycle. The bugsare then corrected along with the design documents and thereviews completed after the code and documents have beenstabilised. As such, fewer problem reports will be raised duringthe SRS review activity and more problem reports will beraised during the testing phase. This is the inverse to what isnormally expected and will skew the data tracked by theauthorities compared to the majority of the companies thatfollow the conventional approach to DO-178B processes.

However, the fact that the data is skewed should not affect thecomparison results. This is because regardless of the methods orwork instructions used to implement a DO178B process, thesoftware artefacts must meet certain levels of quality to passthe certication audits. The phase in which problem reports areraised in the development lifecycle should not signicantly affectcode walkthrough problem reports raised against the auto-matically generated code than the manually developed code.auditors can assess how the quality of the product ishiding poor software processes. The analysis method isobviously open to abuse if the companies supply incorrectdata. As such, the auditors should only use the data as a guide.They should still use their auditing skills to identify if theprocesses behind the development are of the required qualityand are being followed.The certication authorities are not responsible for stating thatthe software is safe. This responsibility lies with the aerospaceThe data collection and analysis method also has a number ofadvantages, including the following:

DO178B is a guidance document and if the certicationauthorities monitor software development at such a closelevel, they might be close to mandating the developmentactivities that need to be completed and monitored. This alsobecomes an issue with respect to the required independence ofthe certication authorities.If the certication authorities overuse the metrics, they couldeasily be misled by a good set of data that might actually bedirection and issues can therefore be corrected before theaudit. This is very important considering the large develop-ment costs and multiple stakeholders involved in aircraftdevelopment. The early detection and correction of problemscan reduce the costs signicantly.

. Disadvantagesdetermining if the certication authorities are fullling theirrole as an independent certication agency.The uncertainty of whether a project will pass the SOI audit isbetter managed and the big bang approach to a certicationaudit is avoided. Condence can be gained by the softwaremanagement team that the project is heading in the rightauthorities will have the opportunity to follow the recordedproblem report history throughout the lifecycle of a project.

The collected data will help the certication authorities inauditing their own auditors. The authorities can then see ifthere is a trend occurring, where an auditor may be too strictor too lenient. It will also allow the certication authorities toidentify where their auditors may be lacking in experience andtraining. As such, the certication authorities can focus thetraining of their auditors on the parts of the lifecycle, whichcan identify the extent to which a project is meeting thecertication objectives.

The metrics and the issues raised above allow the certicationauthorities to be audited by their governing bodies. Havingdata about the audits, the quality of the auditors and thequality of the projects can help these governing bodies inproject develops, the certication authorities could use thetrend of the data over the history of a project to acquire moreexposure to the software process weaknesses. They no moreneed to make important decisions based on a small snapshotof the software lifecycle data. Further, it will also be moredifcult for companies to mislead the authorities at the auditsregarding the quality of the software. That is because thethe in-service phase and then through its operational life. Theamount of operational issues observed can then be used togather statistics regarding the number of failures in service,with respect to the number of related problem reports thatwere raised during the development phases. This may alsohelp in crash investigation if the causes of a crash are relatedto the software [47].If software project data is provided at regular intervals as aimproving as it progresses through its development phase intocompanies. If the certication authorities become too involved

see which company declares late delivery rst and hence pays

sofrepearthephthi

[19] RTCA. Final report for clarication of DO-178B software considerations inairborne systems and equipment certication. RTCA; October 2001.

certication of safety critical systems in the transportation sector. ReliabilityEngineering & System Safety 1999;63(1):4766.

I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72322tication. The maintenance costs of a poorly documentedtware package will rise dramatically if the number of problemorts is not kept to a minimum. Problems that are not resolvedly in the development lifecycle will have a knock-on effect indownstream lifecycle stages, particularly in the maintenance

ases. It is therefore important to use the method proposed incerthe late penalty costs.

8. Legal and ethical issues

In an ideal application of the approach presented in this paper, thecertication authorities will be responsible for holding the data for alarge number of companies. As discussed in the previous section, thisdata is sensitive and may be damaging if it is leaked to competitors orinterested parties. As such, companies will not release their datawithout assurance that the data is secured from other competitors orinterested parties. Further, the certication authorities will normallyhave condentiality agreements in place with the aerospace compa-nies to cover this type of condentiality. However, it is important thatindependent security and integrity audits are carried out to ensurethat the data is held securely and protected from industrial espionage.The certication authorities will be given the privilege of accessingmore of a companys dirty laundry than they would normally view.They must respect this arrangement and not abuse this trust. This is aprivilege that can also be abused by the aerospace companies. Thesecompanies have an insight into what the authorities are looking for atthe audits and therefore they could falsify the data to improve howthe project quality appears to the outside world. However, it is for thebenet of all the stakeholders that this data collection and analysisapproach is successful as it allows for clear and open communicationchannels between both parties.

The certication authorities must also be careful about theirlegal obligations. As previously discussed, they may become tooinvolved and implicated if an accident occurs. The certicationauthorities must clearly dene their roles and responsibilitieswith respect to the collected data and its use. Their independencemust be maintained and it must be made clear that they are usingthe data in an oversight role only. If an accident occurs, then thisdata would clearly be admissible in a court of law and thereforethe aerospace companies may be even more reluctant to providethis data to the certication authorities, for fear that the datamight be used against them.

9. Concluding remark

Although the safety categorisation of the problem reports is animportant issue, safety-critical software must be maintained andbe serviceable for a long period of time. From a maintenance andeconomic point of view, it would be unwise to undermine theimpact of non-safety related issues purely for the sake ofin the detail of the development, then the aerospace compa-nies may claim in the event of an accident that the certicationauthorities were fully aware of the issues, thereby implicatingthe certication authorities by making them indirectly accoun-table for assuring the softwares safety.

As part of the data collection and analysis method, commerciallysensitive data regarding a compan

Safety certification of airborne software: An empirical study

Documents

Transcript of Safety certification of airborne software: An empirical study