Safety certification of airborne software: An empirical study
Transcript of Safety certification of airborne software: An empirical study
-
:, Q
m
Airborne software
DO178B
Safety standards
ft f
e ce
me
the
hor
ws
e p
por
software throughout its lifecycle. This method is then empirically evaluated through an industrial case
study based on data collected from 9 aerospace projects covering 58 software releases. The results of
this case study show that our proposed method can help the certication authorities and the software
of theeen acs of thainingesigneuntrieponsib
tion)e [6].rities
audits. Each audit is positioned at strategic points in the lifecycle
Contents lists available at SciVerse ScienceDirect
journal homepage: www.e
Reliability Engineering
n Corresponding author.
Reliability Engineering and System Safety 98 (2012) 723reworked before the audit can be repeated. The typical timeproduct or activity provided by Airservices Australia.to reduce the risk of failing the nal certication audit. An earlyindication of a potential certication failure is vital to ensure thatthe software process is not heading in the wrong direction. Anaudit failure will normally require that an artefact must be
0951-8320/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ress.2011.09.007
E-mail addresses: [email protected] (I. Dodd),
[email protected] (I. Habli).1 Disclaimer: The research in this paper was completed before the author
joined Airservices Australia and therefore does not necessarily represent the views
of his current employer. None of the data used for illustration is related to anyare required to audit the development, verication and supportactivities. The audits are known as Stage of Involvement (SOI)as Type Certication, where the authorities approve one sampleof the proposed system type for ight use. Any exact copy of that
Considerations in Airborne Systems and Equipment Certicais the primary guidance for the approval of airborne softwar
Throughout the software process, the certication authocritical systems. Flight approval or certication authoritiesinclude the European Aviation Safety Agency (EASA) in Europe[2] and the Federal Aviation Administration (FAA) in the USA [3].When aircraft systems are rst developed or upgraded, it is theresponsibility of the ight certication authorities to approve thesystem design before it is cleared for ight. This process is known
ware, it is widely accepted that there is a limit on what can bequantitatively demonstrated [4,5], e.g. by means of statisticaltesting and operational experience. To address this limitation,many aerospace software standards appeal instead to the qualityof the development process to assure the dependability of thesoftware. In the civil aerospace domain, DO178B (Software1. Introduction
Commercial airlines provide onetransportation [1]. This has partly bsafety-integrity targets on all aspectdesign and maintenance to crew trTo ensure that aircraft systems are dthe required targets, different covarious organisations that are resthe likely outcome of the audits. The results also highlight some condentiality issues concerning the
management and retention of sensitive data generated from safety-critical projects.
& 2011 Elsevier Ltd. All rights reserved.
safest forms of publichieved by placing highe industry from aircraftand aircraft operation.d and manufactured tos have commissionedle for auditing these
type is also approved for ight use as long as it meets predeneddesign and operational constraints.
In modern avionics, it is the norm that the functionality isimplemented using a microprocessor running complex computersoftware. In many cases, the avionics is a safety-critical item andtherefore must be designed and built to the highest levels ofsafety integrity. The overall safety integrity of the avionics,comprising both software and hardware, is typically speciedquantitatively, e.g. in terms of failure rates. However, for soft-Safety requirements and safety engineers to gain condence in the certication readiness of airborne software and predictSafety certication of airborne software
Ian Dodd a,1, Ibrahim Habli b,n
a Airservices Australia, Building 101 Da Vinci Business Park, Locked Bag 747 Eagle Farmb Department of Computer Science, University of York, York YO10 5GH, United Kingdo
a r t i c l e i n f o
Article history:
Received 8 February 2011
Received in revised form
11 August 2011
Accepted 24 September 2011Available online 1 October 2011
Keywords:
Software safety
Certication
a b s t r a c t
Many safety-critical aircra
approved by the aerospac
consuming, and its outco
software. To ensure that
auditable, certication aut
practice. This paper revie
software safety audits ar
statistical method for supAn empirical study
LD 4009, Australia
unctions are software-enabled. Airborne software must be audited and
rtication authorities prior to deployment. The auditing process is time-
is unpredictable, due to the criticality and complex nature of airborne
engineering of airborne software is systematically regulated and is
ities mandate compliance with safety standards that detail industrial best
existing practices in software safety certication. It also explores how
erformed in the civil aerospace domain. The paper then proposes a
ting software safety audits by collecting and analysing data about the
lsevier.com/locate/ress
and System Safety
-
rst normalised and then weighted against certication factorssuch as the number and types of defects, which relate to system
l of integrity.es and the,20]. In goal-require the
where Level A is the highest and therefore requires the mostrigorous processes). Each level of software assurance is associated
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 7238safety. The evaluation of our proposed method is based on anindustrial case study covering data collected from 9 aerospaceprojects and comprising 58 software releases. In this work, wefocus on two groups of stakeholders, namely certication author-ity auditors and development teams. Auditors could use the trendof the data over the history of a project lifecycle to identifysoftware problems and possibly misleading information. The datacould also used by the development teams within aerospacecompanies to assess the readiness of a software project againstthe certication targets. As part of our evaluation, we present theadvantages and limitations of our approach from the viewpoint ofboth the developers and auditors.
This paper is organised as follows. Section 2 reviews existingapproaches to software certication and related work. Section 3describes a set of auditing issues concerning the software certi-cation process. Section 4 proposes a statistical method foraddressing many of the auditing issues listed in Section 3 basedon the concept of Statistical Process Monitoring (SPM). Thismethod is empirically evaluated in Section 5 through an indus-trial case study based on a set of data collected from anonymousaerospace manufacturers responsible for the development ofsafety-critical airborne software. A detailed discussion of the casestudy is provided in Sections 6 and 7. The legal and ethical issuesconcerning our proposed method are discussed in Section 8followed by conclusions in Section 9.
2. Background and related work
2.1. Software safety certication
Certication refers to the process of assuring that a product orprocess has certain stated properties, which are then recorded in a
certicate [7]. Assurance can be dened as justied condence ina property of interest [8]. Whereas the concept of safety andassurance cases [911] is heavily used in goal-based standards incritical domains such as defence [12,13], rail [14] and oil and gas[15], compliance with prescriptive standards tend to be the normin the civil aerospace domain [1618], particularly with regard tothe approval and certication of airborne software [6,19]. Inprescriptive certication, developers show that a software systemis acceptably safe by appealing to the satisfaction of a set ofprocess objectives that the safety standards require for compli-ance. The means for satisfying these objectives are often tightlydened within the prescriptive standards, leaving little room forduration between audits is four to six months. It is thereforeimportant for the software and safety engineers to have anindication of how well the software process is adhering to thecertication requirements before a SOI audit is performed. Inaerospace software projects, it is common practice to collectmetrics about the defects found during the lifecycle of the projectand relate these defects to a common denominator (e.g. thenumber of defects found per lines of code). By relating thesedefect metrics to the requirements of the certication authorities,indicators can be generated to determine the readiness of thesoftware for its next SOI audit.
In this paper, we identify a set of issues concerning theauditing process for the approval and certication of airbornesoftware. These issues were generated from interviews withexperienced independent software auditors. We then propose,and empirically evaluate, a statistical method for supportingsoftware certication audits based collecting and analysing dataabout the software throughout its lifecycle. This collected data isdevelopers to apply alternatives means for compliance, whichwith a set of objectives, mostly related to the underlying lifecycleprocess, e.g. planning, development and verication activities(Fig. 1). For example, to achieve software level C, where faultysoftware behaviour may contribute to a major failure condition,57 objectives have to be satised. On the other hand, to achievesoftware level A, where faulty software behaviour may contri-bute to a catastrophic failure condition, nine additional objectiveshave to be satisedsome objectives achieved with indepen-dence [27].
To demonstrate compliance with DO178B, applicants arerequired to submit the following lifecycle data to the certicationauthorities:
Plan for Software Aspects of Certication (PSAC) Software Conguration Index Software Accomplishment Summary (SAS)
They should also make all software lifecycle data, e.g. relatedto development, verication and planning, available for review bysubmission of an argument, which communicates how evidence,generated from testing, analysis and review, satises claimsconcerning the safety of the software functions. Despite theadvantages of explicit safety arguments and evidence, there aresome concerns regarding the adequacy of the guidance availablefor the creation of assurance arguments, which comply with thegoals set within these standards (i.e. lack of sufcient workedexamples of arguments or sample means for generating evi-dence). Many studies have considered and compared these twoapproaches to software safety assurance [2123], highlightingadvantages and limitations of each and how they might comple-ment each other [24].
2.2. DO178B
In the civil aerospace domain, DO178B is the primary guidancefor the approval of airborne software. The purpose of the DO178Bdocument is to provide guidelines for the production of software forairborne systems and equipment that performs its intended function
with a level of condence in safety that complies with airworthiness
requirements [6]. DO178B denes a consensus of the aerospacecommunity concerning the approval of airborne software. Toobtain certication credit, developers submit lifecycle plans anddata that show that the production of the software has beenperformed as specied by the DO178B guidance. The DO178Bguidance distinguishes between different levels of assurancebased on the safety criticality of the software, i.e. how softwarecomponents may contribute to system hazards. The safety criti-cality of software is determined at the system level during thesystem safety assessment process based on the failure conditionsassociated with software components. These safety conditions aregrouped into ve categories: Catastrophic, Hazardous/Severe-Major, Major, Minor and No Effect [25,26]. The DO178Bguidance then denes ve different assurance levels, which relateto the above categorisation of failure conditions (Levels A to E,failbastheure rate of the system is infeasible to justify [16ed certication, on the other hand, standardsThe correlation between the prescribed techniqu
necessarily lead to the achievement of a specic levemight better suit their software products and processes. Onefundamental limitation of prescriptive software standards lies inthe observation that good tools, techniques and methods do notcertication authorities. In particular, the SAS should provide
-
indicators of the level of condence that can be allocated to the
L
AVInDTr
L(6
erview of DO178B
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 9evidence, which shows that compliance with the PSAC has beenachieved. The SAS should provide an overview of the system (e.g.architecture and safety features) and software (e.g. functions andpartitioning strategies). It should also provide a summary of thepotential software contributions to system hazards, based on thesystem safety assessment, and how this relates to the allocatedassurance level. The SAS then references the software lifecycledata produced to satisfy the objectives associated with theallocated assurance level.
During the certication process, it is not uncommon for aero-space companies to have open issues with the software lifecycledata. This is acceptable as long as any remaining problems do notcompromise aircraft safety. All of the known problems must bedeclared and explained to the certication authorities during theSOI audits. Each problem must be submitted with a categorisationthat determines the potential safety risk to the aircraft [29]. Fourdifferent SOI audits are set throughout the software lifecycle. Eachof these audits typically lasts up to ve days and is scheduled atthe following stages in the software lifecycle:
2.3
moandcertosev(e.gGotivpuwaappdevthrbeFig. 1. OvTool qualificationCode targetCert liaison
coverageLevel C
Level D+
Level D More planning
(28 objectives)Test LL req
Planning Verif test plan, proc& resultsCMQA LL req coverageStatement coverage
HL req robustness
Verif req, design,integ processes
Data & controlHL req Coverage
(57 objectives)SOI#1: Software development planning reviewSOI#2: Software design and development reviewSOI#3: Software verication reviewSOI#4: Final software certication approval review
. Software metrics for certication
Software metrics provide powerful means for estimating andnitoring the cost, schedule and quality of the software productprocesses [30]. Metrics are particularly important for the
tication of safety-critical software, especially metrics relatedproblem reports and test coverage. Despite the availability oferal software project monitoring and measurement processes. the Practical Software and Systems Measurement (PSM) [31],al Question Metric (GQM) [32], Six-Sigma [33] and COnstruc-e COst MOdel (COCOMO II) [34]), very few studies have beenblished on how these approaches can be applied to the soft-re certication processes. Basili et al. discuss how the GQMroach can be used to provide early lifecycle visibility into theelopment of safety-critical software [35]. This is achievedough a set of readiness assessment questions, which shouldanswered against predened measures and models. Habli andsafety evidence. Similarly, Murdoch et al. report some results onthe application of PSM for the generation of measures forsupporting the management of safety processes [37,38]. Some ofthese measures are related to safety certication, e.g. completionagainst certication data requirements. Nevertheless, none ofthese studies provide empirical results that validate the effec-tiveness of these measurement techniques against industrial datagenerated from the software safety certication processes.
3. Problem statement
The current approach to auditing airborne software poses a setof issues to the certication authorities and aerospace companies.Kelly use a similar approach to the assessment of safety-criticalsoftware processes through linking the verication evidence inthe safety case to the processes by which this evidence isgenerated [36]. Weaknesses in these processes are then used as
Level A(66 objectives)Level B
evel B +MC/DC coverageMore independence
rtefact compatibilityerifiabilitydependence
+
ecision coverageansition
Source to objectevel C5 objectives)
assurance levels [28].These issues are summarised as follows:Issue 1Auditors only audit a snapshot of the software
process. Authorities spend a short period of time at the auditcompared to the time spent on the whole software engineeringlifecycle. It is difcult for the auditors to identify complexproblems in the time taken to carry out an audit.
Issue 2Software safety standards are open to differentinterpretations. It is likely for software and safety engineers toover- or under-engineer the software in their effort to full therequirements of the standards. To this end, they may be over-spending to achieve certication credit or under-spending andrisking an audit failure. An aerospace company can seek advicefrom the certication authorities, but, to maintain their indepen-dence, the authorities are restricted in the advice they can offer.
Issue 3Companies might be deceitful: This is very rare butnevertheless not unheard of. A software company might try tomislead the authorities at a SOI audit regarding the status of theirsoftware process. This might be due to nancial or timescalefactors, pressurising the company into making false declarationsin order avoid a delay in project schedules [39]. Typically, thiswould be manifested near the time of an audit, when thecompany could realise that the project is not ready for a SOI audit.
Issue 4There is a lack of objective criteria for determiningsoftware status: In order to assess the achievement of the
-
ard
ther
ing
he a
ties,
thi
not
me!
ever
he a
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72310certication requirements, the certication authorities mustunderstand the technical aspects of the software being developede.g. the safety issues raised during the development. This isnormally achieved by reviewing the technical reports of theproject, sometimes during the short duration of a SOI audit.Absorbing the technical subtleties of a software project andmaking an objective assessment is a challenging task for theauditors, particularly when they are responsible for the auditingof multiple software projects from different companies.
Issue 5Same errors made again and again: Many compa-nies enter SOI audits and make the same mistakes they havemade before, mainly due to poor understanding of the certica-tion objectives that the authorities aim to assess. This can be dueto lack of experience (perhaps due to turnover of staff) or lack ofrigorous quality control within these companies. The introductionof new technologies could also contribute to this problem.Equally, this issue is a source of problems for companies devel-oping airborne software for the rst time.
The above issues are partly generated from interviews withexperienced independent software auditors. The focus of the inter-views was on the efciency of the auditing process as well as the levelof transparency in the relationship between the certication autho-rities and the development teams. Some insightful quotations fromthese auditors are documented in Table 1. Of course, the above issuesform a subset of problems encountered during the auditing process.Development practices can vary across countries, companies andprojects. For example, other issues include inappropriate reuse ofcertication evidence across different projects or immature deploy-ment of novel technologies. However, the above ve issues seem tobe the most recurring difculties facing the software safety auditorsinterviewed.
Table 1Quotations from experienced software auditors.
It is not easy to assess that the whole project has been developed to the same stand
samples and therefore we do not see what the real problems are unless we dig fur
The audits can be very difcult, especially with companies which are new to develop
experienced enough to determine themselves if the work they have submitted to t
It is very rare that a company will lie to you about their development progress/activi
then you have to be a very good auditor and very alert to spot that they are doing
It is common practice that companies ask for a SOI audit to take place when they are
still request the audit and hope they will pass. This is a real waste of an auditors ti
their audits!
Time and time again the same problems are found at the audits. It is as though we n
development activities across many projects; this would provide companies with tTo this end, if the certication authorities had metrics, basedon predened criteria, reecting the status of the software projectbefore an audit commences, they could be better prepared for aSOI audit, particularly from the technical perspective. If thesemetrics are provided at regular intervals as a project develops, thecertication authorities could use the trend of the data over thehistory of a project lifecycle to identify any misleading informa-tion. For example, one indication might be in the form of suddenimprovements in progress graphs near the time of the SOI audit.Further, if the metric data is normalised across different softwareprojects, it can also be used to compare the quality and progressof a project against other projects which have previously success-fully passed the SOI audits. Equally, quality assurance depart-ments within aerospace companies could use these metrics todetermine the readiness of the software against the certicationtargets. Based on these metrics, focused training may be providedto aerospace companies on ways to improve their in-housecapability and the quality of their certication material.4. Data collection and analysis for certication audits
In this section, we dene a statistical method for collecting andanalysing live data from aerospace software projects to assess thereadiness of a project for a SOI audit by comparing the projectsdata against historical data collected from past projects, whichsuccessfully passed the SOI audits. We rst discuss the types ofdata which should be collected and justify why they are impor-tant for the certication process. We then dene weightingfactors for categorising the collected data against their impor-tance for the satisfaction of the certication objectives. We nallyspecify a detailed four-step process for statistically analysing aprojects data against data collected from past successful projects.It is important to note at this stage that what we propose in thissection is a method for assessing the readiness of a project for acertication audit and as such offers process evidence rather thanproduct evidence. This process evidence merely relates to com-pliance with the DO178B guidance and does not directly relate tocondence in the safety claims concerning the software product.
4.1. Data collection
As part of the software certication process in the civil aero-space domain, different types of lifecycle data are produced,which relate to planning, development, verication and support.Many of these types of data are produced based on compliancewith the aerospace software guidance DO178B and are sum-marised in Table 2.
In this study, based on the lifecycle data shown in Table 2, wedened 15metrics, which should be collected as part of the softwarecertication process. These metrics are rened and encoded using
by taking a small sample. Companies tend to show us only their best vertical trace
ourselves.
ight critical software. Many of these companies fail the audit because they are not
udit is of good quality or not.
but occasionally they do. If they choose to do this due to cost and schedule pressures,
s.
ready to perform the audit. In most cases the company knows it is not ready but they
I really believe the companies believe that we have nothing better to do than attend
learn from the past. We should keep consistent metrics that allow us to compare the
bility to compare their past failures/successes to their current projects.the Goal Question Metric (GQM) technique [32]. GQM is a topdownmeasurement framework for dening goals related to products andprocesses. A goal is interpreted using a set of questions whoseanswers are associated with objective or subjective metrics. A goal isspecied relative to four elements: Issue (e.g. reliability or safety),object (e.g. software, platform or process), viewpoint (e.g. indepen-dent or internal auditing) and purpose (e.g. failure rate reduction).Questions are derived from these elements and subsequently theiranswers estimate the achievement of the top goal using primitivemetrics (e.g. faults detected, failure classications or objectivessatised). These primitive metrics provide measurable referencesagainst which the analysis mechanisms can be performed. Table 3document the GQM for Software Requirements Specication (SRS).This GQM will be used throughout this paper to illustrate thevarious steps for data collection and analysis.
As shown in the metrics of the GQM in Table 3, a key factor inthe assessment of the readiness of the software for certicationaudits is the number and types of defects associated with each
-
Pro
Soft
Soft
equi
l A
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 11Table 2DO178B Lifecycle Data.
Plan for software aspects of certication
Software development plan
Software verication plan
Software conguration management plan
Software quality assurance plan
Software requirements standards
Software design standards
Software code standards
Software accomplishment summary
Trace data
Software requirements data
Table 3GQM for SRS.
GoalPurpose: To monitor the readiness of DO178B level A Software R
Issue: Is the SRS quality adequate to pass a SOI audit?
Object: SRS preparation and review with respect to DO178B levelifecycle artefact (e.g. problem reports associated with SRS).However, feeding raw defect data is not enough to determine ifa project is ready for a SOI audit. Collected data must rst bepre-processed (ltered) to draw out the issues that are importantfor certication during the SOI audits and amplify these issuesduring data analysis. To amplify the relevant factors, collecteddata should be normalised and then weighted against certica-tion factors such as the number and types of defects, which relateto system safety.
When a defect is identied during the software process, informa-tion about the defect is stored in a problem report. Problem reportsact as a repository for the issues that should be corrected. They alsocontain a rating that denes the impact of the identied problemson the safety of the aircraft. The requirements for software problemreports are predened in certication guidance documents. One ofthese documents is provided by EASA and titled CerticationReview Item CRI T8 The Management of Software Open ProblemReports [29]. This document species ve categories of problemreports, which are listed in Table 4.
TYPE 0 problem reports are a key concern for certicationauthorities. These problem reports capture safety-related issues
Viewpoint: From the certication authorities point of view at a SOI audi
Questions 1. How many problems are raised against the completed SRS2. Are the SRS produced of acceptable quality to pass the SOI
MetricsMetric data: The number of TYPE 0 problem reports that were raised during the SRS developmNumber of requirements in the SRS
The formula used above will be repeated for TYPE 1A, TYPESubjective/objective
metric:
Objective view that will be analysed and compared using Stati
to pass the SOI gates
Table 4Problem report classications [29].
Category Description
TYPE 0 or CAT 0 A problem whose consequence is a failure, un
TYPE 1A or CAT 1A A failure with a signicant functional consequ
system and its
specic application.
TYPE 1B or CAT 1B A failure with no signicant functional conseq
TYPE 2 or CAT 2 A fault which does not result in a failure (i.e. n
operating conditions).
TYPE 3A or CAT 3A A signicant deviation whose effects could be
unintended
behaviour.
TYPE 3B or CAT 3B A none signicant deviation to the methodolorements Specication (SRS) in preparation for a SOI audit
softwareandstarepatt
thasoforreg
t
(pr
aud
ent a
1B,stica
der
enc
uen
o sy
to l
gy (ware conguration index
blem reports
ware conguration management records
ware quality assurance recordsCon
SoftDesign description
Source code
Executable object code
Software verication cases and procedures
Software verication results
Software life cycle environment
guration indexpotentially prevent the release of the software in its currentte to ight. In practice, the categorisation of the problemorts is complicated and can be error prone. This can beributed to some of the following factors:
Problem reports are often poorly written, leading to errorsduring the severity assessment and categorisation.The impact of some problems is often too complex for a singleengineer to understand. The full scope of the impact of aproblem report is not truly understood by the engineer makingthe assessment.The categorisation of problem reports changes as the devel-opment and verication activities progress, e.g. the resolutionof one problem report may inuence the categorisation ofother problem reports.
These issues cast some doubt over a simple weighting metrict only considers TYPE 0 problem reports. For example, if atware system had one TYPE 0 problem report and 50 TYPE 1ATYPE 1B problem reports, an auditor might raise a concernarding the number of the TYPE 1A and TYPE 1B problem
oduced and reviewed) that are ready for submittal to the SOI audit?
it?
nd review
TYPE 2 and TYPE 3 Problem Reportsl Process Control (standard deviation) against projects that have been known
certain conditions, of the system with a safety impact.
e; the meaning of signicant should be dened in the context of the related
ce.
stem functional consequence, fault not detectable by the crew in foreseeable
ower the assurance that the software behaves as intended and has no
plans) that does not affect the assurance obtained.
-
shown are small. Nevertheless, the impact of the incorrect
captured from projects that successfully passed the SOI audit.
ber
rts
es t
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72312assessment is severe. There are six TYPE 1 problem reports thatshould have been classied as TYPE 0 problem reports (i.e. withsafety impact). To this end, because categorisation errors aresometimes made, it is insufcient to consider only TYPE 0 problemreports. Therefore, our collected data includes all of the problemreport types. Further, as shown in the next section, our proposedweighting criteria are calculated to account for the human errorsin the problem report classication process.
4.2. Selection of weighting factors
Table 6 shows a denition of weighting values based on vedifferent certication concerns. These concerns relate to thedifferent types of problem reports discussed in the previoussection. Here, the weighting values are generated based on thecertication concerns in order to be consistent with the types ofdata collected in the software aerospace domain, e.g. categorisedproblem reports (see [40] for a discussion of the limitations ofreports. The classication of non-functional problem reports(TYPE 2, TYPE 3A, TYPE 3B) may also be incorrect, which mayresult in a functional problem hidden in a non-functional problemreport.
Table 5 shows data that has been collected from a singleproject from an aerospace company developing DO178B level Aavionics software. This data shows the percentages of problemreports, which were incorrectly categorised by engineers duringthe problem report categorisation process. The error percentages
Table 5Data on errors with Problem Report classication on a large aerospace project.
Problem Reports raised on the project before the
rst SOI#4
6000
Issue Percentage of total
number of Problem
Reports (%)
Num
Repo
relat
Wrongly categorised TYPE 0 Problem Reports
that should have been TYPE 1A or TYPE 1B
0.03 2
Wrongly categorised TYPE 1A and 1B Problem
Reports that should have been TYPE 0
0.1 6
Wrongly categorised TYPE 2, 3A and 3B Problem
Reports that should have been a functional
Problem Report
4 240existing classication schemes in the safety domain). Theseweighting values were generated from discussions with variousaerospace auditors and the experience gathered by the rstauthor from various SOI audits. We have explicitly included thistable so that different certication authorities or auditors couldmodify these concerns to suit their certication practices. Oncedened, these concerns should be issued to aerospace softwarecompanies. Table 6 describes how the weighting factors arecalculated for the certication concerns and their associatedseverity factor. Table 7 is used as a guide to select the weightingvalues. For each problem report type, a likelihood value, in therange of 15, is assigned against each of the ve certicationconcerns. This value is then multiplied by the Severity Factorassociated with each of the certication concerns. The resultinggures for all of the certication concerns are then addedtogether to calculate the nal weighting factor for the problemreport type. The weighting values shown in Table 6 are thenconverted to a fraction of the highest weighting value as shownin Table 8.As a prerequisite, normalised data from these projects should becollected in order to generate a normal distribution for theseprojects. The live project data can then be compared against thisnormal distribution. If the project is within the maximumallowable deviation on this normal distribution (discussed inmore detail in Section 5.1), it can be deemed as ready to besubmitted to the SOI audit and vice versa. If the project wasdeemed to be ready for the audit and passes the audit, the datafrom the project should be added to the data set that generatesthe normal distribution for the successful projects. Consequently,the database generating the normal distribution increases in sizeand capability. However, if the project fails the SOI audit, aninvestigation must take place to understand why this failure hadnot been identied prior to the audit. Projects which have failedthe audit, when they had been expected to pass, should be used toidentify patterns in the data that show why a project may fail anaudit. Once a pattern is identied, it should be fed back into themain weighting metrics.
4.3.1. Statistical Process Monitoring (SPM)
At the heart of the data analysis stage is the concept ofStatistical Process Monitoring (SPM) using a normal distributioncurve [41]. SPM provides the ability to statistically analyse similar4.3. Data analysis
Data analysis in our proposed method is centred on comparingproblem report data from a live project with historic data
of Problem
this percentage
o
Comment
Experienced engineer found these issues late in the
programme as he was assessing very high-risk Problem
Reports.
Problem Reports missed by initial assessment but were later
found to be incorrect after accidental re-reviews of the
Problem Reports.
Inexperience of Problem Report reviewers led to many
incorrect assessments.procedures and produce results, which are normally distributedaround the mean value. This is directly applicable to the aims ofthis paper. Certication data items are produced from similarsoftware engineering processes, which are driven by the sameguidelines, i.e. DO178B guidelines, and are within the samedomain, i.e. civil aerospace. Although software engineeringprocesses may differ between aerospace companies, certicationaudits should be carried out in a consistent manner using acommon methodology and standard. If the data collected fromdifferent projects is normalised to account for the different sizeand complexity of the projects, SPM can effectively be used toanalyse such data. In this paper, we pre-process the raw databefore it is plotted on a normal distribution in order amplify theparticular issues that are of concern to the certication autho-rities, based on the weighting factors described in Section 4.2.
When SPM is used, variability (standard deviation) in the datacollected from the projects that have successfully passed the SOIaudits can be identied. On the one hand, if there is a largestandard deviation, we can see that the variation between the SOI
-
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 13Table 6Weighting factors calculation.
Description Derivation of weighting criteriaaudit passes is considerable. This might reveal that the auditresults are unacceptably subjective and that the auditors mightnot have a common theme, mandate or training. This may also beused to identify certain anomalies, such as overly strict or lenient
Categorymultiplicationfactor severityfactor
5 4 3
Rate value between 1 and 5
Category ProblemReport hasknownoperationalsafety risks
Likelihood ProblemReport could haveundetermined safetyrisk operational sideeffects
LikelihoodReports hcombinatthis Problem Report as a TYPE 0 certication failure
Type 0 Likelihoodrating
5 3 3
Rationale TYPE
0 Problem
Report is a
safety impact
TYPE 0 Problem Report
side effect expected
TYPE 0 Pro
be comple
understoo
Weighted value 25 12 9
Type 1A Likelihoodrating
2 2 2
Rationale Major
functional
impact
possible
hidden risk to
safety
Major issue side effects
possible but low risk for
safety
Complex P
totally und
Weighted value 10 8 6
Type 1B Likelihoodrating
1 1 2
Rationale Minor
functional
impact could
have hidden
side effects
Functional issue but less
chance of safety side
effect expected
Not compl
normally t
Weighted value 5 4 6
Type 2 Likelihoodrating
0 0 0
Rationale N/A for TYPE 2 No functional effect for
TYPE 2 Problem Reports
No functio
2 Problem
Weighted value 0 0 0
Type 3A Likelihoodrating
0 0 0
Rationale N/A for TYPE 3 No functional effect for
TYPE 3 Problem Reports
No functio
3 Problem
Weighted value 0 0 0
Table 7Weighting factors assignment.
Problem report criticality Likelihood of described event happening
Very high High Medium Low Very low
Type 0 5 4 3 2 1
Type 1A 4 3 2 1 0
Type 1B 3 2 1 0 0
Type 2 2 1 1 0 0
Type 3 2 1 0 0 05 4
blem Reports can
x and not totally
Already is a TYPE 0 TYPE 0 Problem
Report safety issues2 1
of other Problemaving functionalional effects with
Likelihood that thisProblem Reportshould be classied
Likelihood a singleProblem Report ofthis type will cause
Weightingauditors. On the other hand, if the standard deviation is small,it might show that the auditors are applying common practicesand rules.
4.3.2. Data analysis steps
Driven by the concept of SPM and the weighting factorsdiscussed in Section 4.2, Fig. 2 depicts an overview of ourproposed data collection and analysis process, comprising thefollowing four steps:
Step 1normalising the data: Raw defect data for all certi-cation artefacts in a software lifecycle step for a project iscollected (e.g. all the SRSs in the project). This raw defect datadetails the number of problem reports for each problem reporttype (TYPE 0, TYPE 1A, TYPE 1B, TYPE 2, TYPE 3A, TYPE 3B). Next,the total number of problem reports for each of the six problemreport types is calculated. The six resulting sums are thennormalised using a selected normalisation factor (e.g. the total
d are likely to cause cert
problems
10 4 60
3 2
roblem Reports not
erstood
Reasonable chance of
Problem Report being
classied wrong
Reasonable chance of
Problem Report could
cause certication
failure
6 2 32
0 0
ex Problem Reports
otally understood
Minor functional
impact identied very
unlikely Problem
Report classied
wrong
N/A for TYPE 2
0 0 15
0 0
nal effect for TYPE
Reports
Doc only no possible
Problem Report is
TYPE 0
N/A for TYPE 2
0 0 0
0 0
nal effect for TYPE
Reports
Doc only no possible
Problem Report is
TYPE 0
N/A for TYPE 3
0 0 0
-
sthe
e av
betw
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72314Table 8Conversion of weighting values to fractions.
Weighting factors
Category Weighting Weighting asa fraction
Weightingin use
Median ofweightingvalues
Comment
TYPE 0 60 1.00 1.00 0.25 Median ofand not th
difference
TYPE 1A 32 0.53 0.53TYPE 1B 15 0.25 0.25TYPE 2 0 0.00 TYPE 3 Aand B
0 0.00
Total 107 1.783 1.78
Artefact 1 for lifecycle stage
Comparison of a single lifecycle stage on a single projectnumber of requirements in all the SRS for a project). All of thesenormalised problem report types are then summed to producethe lifecycle stage total normalised sum. This normalisation stepis repeated for all of the projects in the data set (e.g. repeat for allof the SRSs in the projects that are in the data set).
Step 2weighting the data: For each project, the normalisedraw defect data items for each of the six problem report types(calculated in the previous step) are multiplied by the weightingfraction dened in Table 8. This produces a set of six weightednormalised data items for each problem report type for a project(e.g. all the SRSs in the project). This is repeated for all of theprojects in the data set (e.g. repeat for all of the SRSs in all of theprojects that are in the data set). Next, all of the weightednormalised Problem Report types are summed to produce thelifecycle stage total weighted normalised sum. This is thenrepeated for each of the projects in the data set (e.g. repeat forall of the SRSs in all of the projects that are in the data set). Afterthe data has been multiplied by the weighting values, it reectsthe authorities importance criteria and it can now be used toproduce a normal distribution, from which the mean and thestandard deviation (s) can be identied.
PR Type 0 (normalised) x certification weightingPR Type 1A (normalised) x certification weightingPR Type 1B (normalised) x certification weightingPR Type 2 (normalised) x certification weightingPR Type 3 (normalised) x certification weighting
PR Type 0 (normalised) x certification weightingPR Type 1A (normalised) x certification weightingPR Type 1B (normalised) x certification weightingPR Type 2 (normalised) x certification weightingPR Type 3 (normalised) x certification weighting
PR Type 0 (normalised) x certification weightingPR Type 1A (normalised) x certification weightingPR Type 1B (normalised) x certification weightingPR Type 2 (normalised) x certification weightingPR Type 3 (normalised) x certification weighting
Artefact 2 for lifecycle stage
Artefact ..N for lifecycle stage
Fig. 2. Comparing a single project defect daweighting is used because we want to know what the middle weighting value is
erage. Otherwise this would falsely skew any calculation, as there is a non-linear
een the majority weighting and the minority.Step 3plotting the normal distribution: The normal dis-tribution for all of the lifecycle stage total weighted normalisedsum in the data set (e.g. for all of the SRSs in the projects that arein the data set) is calculated. The normal distribution for thelifecycle stage total weighted normalised sum is then plotted.
Step 4comparing the artefacts from one project againstother projects: The normal distribution can now be used tocompare a single projects software lifecycle certication artefacts(e.g. all SRSs from one project) against this normal distribution todetermine if they are within the maximum allowable deviationlimits. The maximum allowable deviation limits can either beassumed or predened, such as in a Six Sigma scheme [33], orcalculated as illustrated in the next section.
5. Industrial case study
In this section, we evaluate the statistical analysis methoddened in the previous section using a data set which wasprovided, in an anonymous format, by two large avionicsmanufacturers. These manufactures are responsible for the
Sum
Sum
Sum
SPM
SPM performed on artefacts for a single
project lifecycle stage.
The SPM is used to compare artefacts in a
single project against one another
ta to data collected from past projects.
-
project, the total number of software requirements and the
Table
9Snapshotoftheproject
data
usedin
thisexperimentshowingtherst5SRSitems.
Raw
SRSData
Norm
alisedandWeightedData
Item
that
PASSED
SOI#2
audit
Number
TYPE
0Problem
Reports
Number
TYPE1A
Problem
Reports
Number
TYPE1B
Problem
Reports
Number
TYPE
2Problem
Reports
Number
TYPE3A&
3B
Problem
Reports
Totalnumber
ofProblem
Reportsopen
attheSOI
audit
Number
Requirements
inSRS
Numberof
Problem
Reportsper
requirement
Weighted
NumberTYPE
0Problem
Reportsper
requirement
Weighted
NumberTYPE
1AProblem
Reportsper
requirement
Weighted
NumberTYPE
1BProblem
Reportsper
requirement
Weighted
NumberTYPE
2Problem
Reportsper
requirement
Weighted
NumberTYPE3A
&3BProblem
Reportsper
requirement
SUM
ofall
Weighted
Norm
alised
Problem
Reports
10
531
12
20
68
2284
0.03
0.0000
0.0012
0.0034
0.0000
0.0000
0.0046
20
532
620
63
2284
0.03
0.0000
0.0012
0.0035
0.0000
0.0000
0.0047
32
15
46
13
20
96
2284
0.04
0.0009
0.0035
0.0050
0.0000
0.0000
0.0094
41
27
55
30
71
184
2284
0.08
0.0004
0.0063
0.0060
0.0000
0.0000
0.0128
51
32
47
59
100
239
2284
0.10
0.0004
0.0075
0.0051
0.0000
0.0000
0.0131
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 15number of faults found are provided. The data is collected fromartefacts submitted to a SOI#2 audit. For each of the projects, theoutcome of the SOI#2 audit is documented so that our proposedmethods predictions may be validated against the real auditresults. The aim of this case study is to demonstrate howour proposed method can be used to make decisions concerningthe likelihood that a project will successfully pass a SOI audit. Theaim is also to identify the strengths and weaknesses of themethod and explain where and why the method might fail orsucceed in predicting the outcome of a SOI audit. It is understoodthat the data set collected is relatively small and therefore theprediction capability of the data may be limited. This data isextremely difcult to obtain due to its commercial sensitivity.Ideally, our approach would be more thoroughly evaluated byusing more data from the certication authorities as they have awealth of data from projects which have been audited forcertication.
5.1. Calculating the maximum allowable deviation from the mean
As a prerequisite for applying the four-step process proposedin the previous section, the maximum allowable deviation fromthe mean of the normal distribution should be dened. In order tocalculate the maximum allowable deviation, factors such asNatural Variance (NV) and Assignable Variations (AVs) shouldbe considered. This will allow current projects to be judgedagainst a calculated rather than an assumed (3s envelope)acceptable tolerance envelope. The NV observed in SPM is thevariance in the recorded data due to natural uctuations in howitems are produced. For example, for ndings identied during anSRS review, some of the results could be attributed to issues thatare not readily controllable such as reviewers having personalproblems affecting their review judgements. On the other hand,variations that occur in a process due to conscious technical ormanagerial reasons are known as Assignable Variations (AVs).Generally, NV is difcult to reduce, as it is an inherent part of theprocess. Conversely, AV can be reduced through process improve-ment programmes. It is important to note that during thedevelopment of airborne software, there is an amount of AV thatcould be reduced, but is agreed as being an acceptable deviation.This is as a result of project management decisions to release thesoftware to the SOI audit with known problems, dened in openproblem reports. NV and AV are analysed and calculated inSections 5.1.1 and 5.1.2 and are then used in Section 5.1.3 todetermine the maximum acceptable variation or acceptabletolerance envelope.
5.1.1. Calculating the Natural Variance (NV)
In this section, metrics are collected from the SRS developersand reviewers in order to calculate the NV. Table 10 shows a set ofdata collected during the SRS development and review from thetwo avionics manufactures. The data was solicited using a ques-tionnaire to extract the gures. The questions were derived fromthe analysis of data from previous SRS reviews that indicatedwhere there may be weaknesses in the SRS production and reviewdevelopment of avionics containing software certied in accor-dance with the DO178B level A criteria. The data has beengenerated within the last 10 years and has been captured duringthe development of the Software Requirements Specication(SRS). The data has been collected from 9 projects covering 58software releases. Table 9 shows a snapshot of this data. Eachproject included between 1800 and 3800 software requirements.Each project had a budget of more than 5 million Euros. For eachprocess. Of course, the data has been captured from only two
-
Table 10Natural Variance calculation.
Natural Variance (NV) of SRS faults Data gathered from SRS writing and review of level A software over two projects
Item Source Description COMPANY 1 estimated variancein terms of number ProblemReports raised per 100 ProblemReports
COMPANY 2 estimated variancein terms of number ProblemReports raised per 100 ProblemReports
Notes
1 Reviewer Reviewer unknowingly does not understand
the parent requirement and raises a Problem
Report due to his lack of knowledge
2 2
2 Reviewer The Reviewer is trained but inexperienced
and unknowingly misunderstands the
review requirements and raises unnecessary
Problem Reports
1 1
3 Reviewer The Reviewer unknowingly does not
understand the content of the SRS
requirements being reviewed
1 1
4 Reviewer The Reviewer concentrates too much on
syntax and not technical content and raises
Problem Reports to cover typos
2 1
5 Reviewer The reviewer is not sure if there is a problem
so raises a Problem Report
1 0
6 Author Parent requirement is poorly dened and is
unknowingly implemented incorrectly
causing a Problem Report to be raised
1.5 1
7 Author The Author is trained but inexperienced and
therefore unknowingly writes bad
requirements causing a Problem Report to be
raised
1 1
8 Author Conicting requirements force a fault to be
reported as a Problem Report
2 1 Requirements conicting
within the development
activity NOT due to integration
with other systems
9 Author Problem Report raised twice for the same
nding but not identied by the Problem
Report originators
2 1
10 Total number of Problem Reports raisedfor every 100 Problem Reports due toNatural Variance caused by a teamreviewing the SRSs
14 9 Sum items 19 and derive the%. This is then the % ofProblem Reports raised thatare due to the NV
11 For a typical project that passed SOI#2audit the average number of ProblemReports raised in total is
908 908 Average across sampledprojects
12 For a typical project from this companythe average number of Problem Reportsraised due to the Natural Variance is
127 82 Item 10/100 Item 11 (not yetnormalised)
13 For a typical project that passed SOI#2audit the average total number ofrequirements on analysed projects is
2631 2631 Average across sampledprojects see
14 Natural variance of Problem Reportsraised for SRS normalised by requirements
0.048 0.031 Item 12/Item 13 NaturalVariance (Normalised)
15 AVERAGE NV of two project shown (NOTWEIGHTED)
0.04 Average of Item 14 for bothprojects
16 Median of weighting values 0.25 Refer to Table 817 AVERAGE NV of two project shown
(WEIGHTED)0.01 Item 15 Item 16
Natural variance of SRS faults Pre-requisites to analysisData gathered from SRS writing and review of DO178BLevel A software
Item Source Description
18 Reviewer The documents presented for Reviewer are
the correct version and applicable to the
review
19 Reviewer The Reviewers are trained to carry out
DO178B reviews
20 Author The Authors are trained to prepare a DO178B
document
21 Author The author and reviewer activity are
localised to the one system and are not part
of the global aircraft systems integration
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72316
-
Variation due to project-based decisions to leave some pro-
onactm
V
a
1
2
0
0
00.25
0
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 17blem reports open at the time of the SOI audit.
If we assume that the open problem reports capture thedevelopment process issues, then the AV can be attributed tothe problem reports left open at the time of the SOI audit.Therefore, the average number of problem reports left open atthe time of the SOI#2 audit, across our sample projects in thiscase study may be used to calculate the AV. The examination ofthe data set used in this case study shows that the average
nusamassnoTabnuave
5.1
NVweMashodeftoltheFurproopmetheVariations that can be reduced through improving the engi-neering processes;companies, which limits the accuracy of the calculation. Toimprove the accuracy of the NV, more reference points wouldbe required. However, this limited data set demonstrates theprinciple. The data was captured after both companies hadalready taken mitigation actions.
5.1.2. Calculating the Assignable Variance (AV)
The AV comprises two types of variation:
Table 11Calculation of maximum allowable tolerance for the normal distribution.
Calculation of Maximum Acceptable Tolerance from the MEAN
Item Description
1 For a typical project that passed a SOI#2 audit the average number of
Problem Reports that are left open for the SRS at the SOI#2 audit is (this is
project decision to leave Problem Reports open)
2 For a typical project that passed SOI#2 audit the average Total Number
Requirements on analysed projects
3 For a typical project that passed SOI#2 audit the average number Problem
Reports that are left open for SRS at the SOI#2 audit
(NORMALISED)Assignable Variance (AV) (NOT WEIGHTED)4 AVERAGE of two project shown Natural Variance (NV) (NOT WEIGHTED).
Refer to Table 10
5 MAX acceptable TOLERANCE from MEANNVAV (NOT WEIGHTED)6 Median of weighting values
7 MAX acceptable TOLERANCE from MEAN (WEIGHTED)mber of SRS Problem Reports that were left open for ourple projects at the time of their SOI#2 audit was 137. Theignable variance is therefore equal to 137, which will bermalised by dividing it by the average number of requirements.le 11 shows how the AV is calculated using the averagember of problem reports left open at the SOI#2 audit and therage total number of requirements.
.3. Calculating the maximum tolerance envelope
The Maximum Variance is calculated by adding the AV to the. Next, we must weight this value using the MEDIAN of theighting values (as calculated in Table 8). This value reects theximum Variance for the weighted normal distribution (aswn in Table 11) which is the maximum number of normalisedects allowed in the data set or the maximum allowableerance. The MEDIAN of the weighting values is used becauseweighted data is biased towards the TYPE 0 problem report.ther, there is a non-linear relationship between each of theblem report types. Most of the problem reports that remainen for the SOI are actually Type 2 or Type 3. Therefore, if thean of the weighted data values is used, it would unfairly biasmaximum tolerance weighting (due to the weighting placedIn the section, we apply the four-step process proposed in Section4.3.2. The data has been collected from 53 software releases. Thedefect data for each of the releases reects the software status whenit was presented to the SOI#2 audit. All of the 53 software releasespassed the SOI#2 audit successfully. The data shows the number ofproblem reports that were raised during the SRS development, whichhave been grouped against six categorisation types. The following is asummary of the application of the four-step process:
1. For each project, the total number of SRS ndings of eachproblem report categorisation type has been normalised bydividing it by the total number of requirements in the SRS. Allof the normalised values for each problem report criticalitytype have then been summed (known as the lifecycle stagetotal normalised sum).5.22.
3.
4.
5.2
lifesofthemi5.1breforthesucType 0 Problem Reports), making it larger than it shouldually be. The maximum variance will now be referred to as theaximum acceptable tolerance.
. Preparing and plotting multiple project SRS defect dataItem 5 Item 6.02Median of the weighting is used because we want to know what the
middle weighting value would be not the average this would skew
calculation as there is a large difference between the majority weighting
and the minority (Refer to Table 8)..09alue Notes
37 Average across sampled projects
631 Average across sampled projects
.05 Item 1/Item 2
.04 See Table 10 item 15
Item 3Item 4For all the projects, a normal distribution has been derivedfrom the lifecycle stage total normalised sum and the data isplotted and shown in Fig. 3.The normalised data has then been weighted according to Table 6.After the data has been weighted, all the weighted normalisedvalues for each problem report type have then been summed(known as the lifecycle stage total weighted normalised sum).For all the projects, a normal distribution has been calculatedfrom all the lifecycle stage total weighted normalised sumsand the data is plotted as shown in Fig. 4.
.1. Preliminary analysis
Fig. 3 shows the normal distribution plot for the un-weightedcycle stage total normalised sums raw defect data for the 53tware releases that successfully passed the SOI#2 audit. Further,plot shows the acceptable tolerance envelope that was deter-
ned using the maximum variation, which was calculated in Section.3. The plot also shows that 3 of the projects plotted actuallyech the acceptable tolerance envelope. This indicates that the SRSthose projects should have been reworked so that they fall withintarget tolerance value. It must be noted here that these 3 projectscessfully passed the SOI#2 audit.
-
10
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72318-1
0
1
2
3
4
5
6
7
8
9
-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25
Pro
babi
lity
Den
sity
Acceptable area
Density
NOT weighted PRs
Mean LineOn the other hand, Fig. 4 shows the normal distribution plotfor the lifecycle stage total weighted normalised sums for thesame 53 software releases that successfully passed the SOI#2audit. This is a more important plot as it shows the weighted dataplotted with respect to the issues, which are important for thecertication authorities during the audit. The plot also shows theacceptable tolerance envelope that was calculated in Section5.1.3. All of the projects are within the acceptable toleranceenvelop, which is in contrast to the plot in Fig. 3.
This demonstrates the difference between collecting andanalysing raw defect data in isolation and performing the analysisagainst weighted data that reects the auditing needs of certica-tion authorities. If decisions had been made based on the rawdefect plot (Fig. 3) without considering the weighting criteria,additional work may have been performed, which would havedelayed the SOI#2 audit without having any effect on theoutcome of the audit. If the project decision makers had beenpresented with Fig. 4, they could have made a balanced decisionbetween taking the risk of failing the audit or passing it withoutreworking the SRSs. This can be useful as it allows calculatedrisks to be taken, which is central to safety-critical projects inwhich delays can be costly and create damaging publicity.
5.3. Evaluation against ve random projects
In this section, we evaluate the analysis method and thegures generated in the previous section again data from ve
Fig. 3. Un-weighted normal distribution plot of SRSs, which passed SOI#2.-10
0-0.02 -0.01 0 0.01 0.02 0.03 0.04
Weighted Mean, 0.01
1 * =(68%)
0.01Mean + 3 *
=(99.7%)
0.04
Weighted Standard
Deviation,
0.01 2 * =(95% )
0.02 Mean - 3 * =(99.7%)
-0.0220
30
40
50
Pro
babi
lity
Den
sity
AcceptableAreaDensity
Weighted PRs
Mean Line
GRAPH 2Stage 2 - Weighted Data For CAT Classification for a group of projects that PASSED SOI#2passed SOI#2different projects. Interviews were conducted with the projectmanagers of these projects. The objective of these interviews wasto understand the background behind the data in order to be ableto interpret the outcome of the evaluation, particularly when thedata is plotted against the successful SOI#2 audit project datanormal distribution. It is important to note that none of theseprojects were included in the data used to derive the successfulSOI#2 audit normal distribution comparison shown in Figs. 3 and4. As such, this provides a clean baseline against which theseprojects could be compared. Four of these projects failed theirSOI#2 audit and only one project passed the SOI#2 audit.
5.3.1. Un-weighted data plotted against the successful SOI#2 audit
normal distribution
Fig. 5 shows a plot of the un-weighted data of the selected veprojects. Four of the test project data points lie outside of theacceptable tolerance envelope and therefore give the initialindication that there are problems with the SRS artefacts. In allfour cases, a synopsis from the project history (extracted throughinterviews with the project managers) indicates that all of theprojects were in need of recovery and that there were seriousquality issues associated with the SRS development. There wereclearly unacceptable numbers of problem reports raised againstthe SRS artefacts. This is clearly reected in the un-weighted data
3 * =(99.7%)
0.03
Weighted Natural
Variance (NV) 0.01
Calc using tolerance from
Table 11MAX
Acceptable TOLERANCE from MEAN
(WEIGHTED) 0.02
Calc using
tolerance fromTable
11Mean +Tolerance 0.03
Calc using
tolerance from
Table 11Mean
Tolerance -0.01
Fig. 4. Weighted normal distribution plot of SRSs, which passed SOI#2.
-
Me
NOweiSta
WeiSta
Dev
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 193 * = (99.7%)
0.14Devplo
1.
2.
5.3
nor
pro3 atheinhavfaifol
1.
NOweiNatVar(NV
FigSOIghted an,
0.06 1 * = (68%)
0.05 Mean + 2 * = (95% )
0.15
T ghted ndard iation,
0.05 2 * = (95% )
0.10 Mean - 2 * = (95% )
-0.04NOTwei-1
0
1
2
3
4
5
6
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
Pro
babi
lity
Den
sity
Acceptable areaDensityNOT weighted PRsTest Failed Project 1Test Failed Project 2
Test Failed Project 3Test Failed Project 4Passed Project 1Mean Line
7
8
9 GRAPH 3Stage 2 - NOT Weighted Data For CAT Classification for a group of projects that passed SOI#2 with FAILED PROJECT 2, 3, 4, 5 and passed project 1 MARKEDt shown in Fig. 5. It is important to note the following:
Passed Project 1: This project gives a clear indication that theSRS quality associated with the project was poor and yet theproject actually passed the SOI#2 audit. The reason for this isdiscussed in Section 5.3.2.Failed Project 1: This project gives a clear indication that the SRSquality was good and that there were no problems expected withthe SOI#2 audit yet this project actually failed the SOI#2 audit.Again, the reason for this is discussed in Section 5.3.2.
.2. Weighted data plotted against the successful SOI#2 audit
mal distribution
Fig. 6 shows a plot of the weighted data of the selected vejects. Three of the test project data points (failed projects 2,nd 4) lie outside of the acceptable tolerance envelope andrefore indicate that they will fail the SOI#2 audit. As discussedthe previous section, all of these three projects were known toe SRS quality problems and it was suspected that they wouldl a SOI and indeed they did fail. Based on the plot in Fig. 6, thelowing observations can be made:
The Failed Project 3 is clearly close to the maximum toleranceallowed for a SOI#2 audit pass. Although the plot is indicatingthat the project may fail the SOI#2 audit, the project manage-ment team may decide to take the risk that the SOI#2 auditwill fail and submit the project to the audit.
T ghted ural ia nce ) 0.04
Calc using tolerance from
Table 11MAX Acceptable
TOLERANCE from MEAN (NOT WEIGHTED) 0.09
Calc using tolerance from
Table 11Mean + Tolerance 0.15
Calc using tolerance from
Table 11Mean - Tolerance -0.04
. 5. Un-weighted randomly selected projects plotted against the successful#2 audit normal distribution.WeigNa2.
3.
4.
5.
Var(
Figaudan, (68%) = (99.7%)
ghted ndard iation,
0.01 2 * = (95% )
0.02 Mean - 3 * = (99.7%)
-0.02
3 * = (99.7%)
0.03
hted tural
Calc using tolerance from
Table 11
MAX Acceptable
Calc using tolerance from
Table 11WeigMe-10
-0.02 -0.01 0 0.01 0.02 0.03 0.04
hted 0.01 1 * = 0.01Mean + 3 *
0.040
10
20
30
Prob
abili
ty D
ensi
ty40
50 GRAPH 4Stage 2 - Weighted Data For CAT Classification for a group of projects that passed SOI#2 with FAILED PROJECT 2, 3, 4, 5 and passed project 1 MARKEDThe same decision-making approach discussed above withregard to failed project 3 could be taken with the failedproject 2. However, this clearly will carry much more risk offailure as it is further outside of the acceptable toleranceenvelope. It is acknowledged further work is needed toaddress borderline cases. These cases could be calibrated usingfurther risk analysis that allows the level of risk to be under-stood and a more informed decision to be made.The Failed Project 4 is clearly outside of the acceptabletolerance envelope. The SRS problem reports are indicatingthat the SRS artefacts have quality issues. This project is clearlynot ready for a SOI#2 audit and the decision of submitting theproject to a SOI#2 audit would be wrong.The Passed Project 1 has clearly moved into the acceptabletolerance envelope and is indicating that the project is ready forand SOI#2 audit (whereas this project was outside of theacceptable tolerance envelope in Fig. 5). This is again supportingthe assumption that the type of problem reports is far moreimportant than the number of the problem reports raised. This isbecause the weighting criteria amplify the functional safetyissues more than the documentation issues. However, the largenumber SRS problem reports clearly shows that the SRS hasquality issues. This will be noticed by the Quality Assurancedepartments and will be raised as a quality concern.The Failed Project 1 introduces a new issue to the analysis.The project appears to be within the acceptable toleranceenvelope. Yet, the project has failed the SOI#2 audit. Thereason was that a Type 0 problem report was discovered at theactual SOI#2 audit. The problem report was deemed to pose an
iance NV) 0.01
TOLERANCE from MEAN
(WEIGHTED) 0.02Mean +
Tolerance 0.03Calc using
tolerance from Table 11
Mean - Tolerance -0.01
. 6. Weighted randomly selected projects plotted against the successful SOI#2it normal distribution.
-
on with the SOI#2 audit regardless of the existence of the problem
Alt
6.
onl
aermoare
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72320of the SOI#2 audit. The opposite is also true. The data has demon-strated situations where the SRS quality is believed to be acceptable,but is actually unacceptable due to safety-related problem reports,which pose a high risk of a SOI#2 audit failure. This is because safetyrelated problem reports are a key consideration for the certicationauthorities. Of course, a weakness of the analysis method wasrevealed by evaluation against a Black Swan project. This evaluationis a difcult test for most statistical methods to pass. However, assuggested by Taleb [42], Black Swan events could be addressed byfactoring a number of robustness measures into the project. However,conthetosiderations, it reveals that some of the uncertainties concerningquality of some SRS artefacts do not pose high risks to the successcase. When the data is weighted to account for the certication
audof the software artefacts for SOI#2 audits. If the raw un-weighted SRSdefect data is analysed in isolation, the data can easily give the falseindication that a project is not in a suitable state to pass a SOI#2 auditand that the defective SRS artefacts must be reworked before the
it. As illustrated in the previous section, this is not always thedatbense warning signs were not acted upon.
Discussion
The case study in the previous section shows that the use of thea collection and analysis method proposed in Section 4 has someets, particularly for monitoring the SRS quality and the readinesswatheve identied this problem and not the software engineers.hough as with many ndings of this nature, there wererning signs that there was an issue with the software, butsiohareport.2. The event had a major impact. The audit did actually fail which
caused a major delay to the project and embarrassment to thesoftware engineers.
3. After interviewing the project manager, he declared that afterthe authorities had identied the issue, it was obvious that theproblem report should have been identied as being a majorsafety issue and should have been resolved before the SOI#2audit was performed.
The hidden impact of this Black Swan event is the profes-nal embarrassment factor in that the certication authoritiesunacceptable level of risk to the safety of the ight. As a result,the certication authorities did not allow the project to passthe SOI#2 audit. In short, the correct consequences of thisproblem report were not identied by the software engineersbut were later identied by the authorities at the audit.
The failed project 1 meets the criteria of the statisticalconcept of the Black Swan [42]. Taleb stated that a Black Swanevent has the following criteria [42]:
1. Rarity: it is a surprise;2. Impact: it has an extreme impact;3. Retrospective predictability: after the fact, it is rationalised,
making it explainable and predictable.
By further examining the history of Failed Project 1, each ofthe above criteria is clearly satised:
1. The event was not identied by the software engineers and was asurprise to them. However, the event was identied by thecertication authorities as being severe enough to fail a SOI#2audit. The software engineers believed that the project was readyfor the audit. It was not a project management decision to presseffectively reduce the number of Black Swan projects, it is notto be reliable. This could be ignored if the NV were insignicant.However, at present, the NV is 43% of the acceptable toleranceenvelope. As such, a shift in the calculated NV will inuence theenvelope signicantly. To this end, data from more companies isneeded to improve the accuracy of the calculated AV, ideally fromdifferent nations throughout the world.
Another area of uncertainty is the amount of data used to producethe SOI#2 audit Normal Distribution graphs for the successfulprojects, as shown in Figs. 5 and 6. Although this is a reasonablylarge set of data (53 software releases), the ideal situation wouldbe that such an important graph should be produced fromthousands of data points collected from a variety of manufacturersand projects across the world. This would allow for a global viewof the successful SOI audit projects criterion. Using a larger dataset may change the shape of the normal distribution bell curve,which could change the number of projects that fall inside theacceptable tolerance envelope.
6.3. SPM normal distribution
When different projects and different lifecycle data are con-sidered from around the world, there will undoubtedly be adifference between the mean and the standard deviation calcu-lated for each individual project data set. This is because there area number of factors that could inuence the number of problemreports found on different projects, which would clearly skew thenormal distribution. Examples of these issues are as follows:
If two projects were compared where the code on one projectwas developed manually and the code on the other project was. Accuracy limitations of the case study
The data used in the case study is difcult to extract fromospace companies due to condentiality issues. The lack ofre data sets might be a weakness. Specically, there are twoas where this weakness is manifested:
The calculation of the acceptable tolerance envelope is an area ofuncertainty. In this paper, the calculation of the NV relies on onlytwo sets of data from two companies. This is obviously too small6.2method. In the rest of this section, we discuss a number of observa-tions relating to the selected weighting criteria and accuracy limita-tions of the case study and the SPM normal distribution.
6.1. Weighting criteria
The weighting criteria dened in this paper were generated fromdiscussions with various aerospace auditors and the experiencegathered by the rst author from various SOI audits. The calculationof the weighting criteria presented demonstrates how the weightingfactors can be calculated to reect the needs of the certicationauthorities. However, to improve accuracy, the calculation of theweighting factors should be discussed at length, and agreed on, byexperienced certication auditors and then discussed and sharedwith the aerospace companies. Concerning the precision of theweighting criteria, we have carried out a sensitivity check on theweighting factors. Needless to say, this check has revealed that theprecision of the weighting factors has little effect on the nalconclusions of the analysis method (i.e. is the project ready for aSOI audit or not?). What is more important, however, is the relativeweighting between the different problem report types.dev
y the analysis method that will need to change, but also theelopment processes that feed the problem report data to theautomatically generated, then there would most likely be less
-
the certication results as long as they are resolved at the time of
adv
7.2
dis
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 723 21. Advantages
The data collection and analysis method offers the followingantages with regard to the auditing process:
The history of the software development can be maintainedand tracked over the lifecycle of a project. The engineers and7.1the SOI audit. As illustrated in the previous two sections, this isamplied by plotting the weighted data rather than the raw data.
7. Overall evaluation
Apart from one person with a basic understanding of statisticalmodelling, most engineers and auditors should be able to use theapproach presented in this paper without the need for any furthercalibration of the models. It is important to note that our case studyfocuses on the review of SRS at SOI#2 audits. However, our approachis designed to be used continuously through the auditing process. Asthe software is developed, live defect data will be analysed in the SPMmodel and the overall state of the artefacts in the lifecycle stage canbe evaluated. If at the measurement point, an artefact is deemed to beunsuitable for certication, the artefact should be reworked toimprove its quality and mitigate identied defects. This rework willthen result in improved defect data being fed into the SPM model,which could indicate that the software has reached a suitable level ofquality for submission to the certication authorities. This approachmay be used live and on a continual basis if the model is integratedwith the problem report database. In the rest of this section, wediscuss the advantages and disadvantages of using our data collectionand analysis method from the point of view of the two keystakeholders, namely the certication authorities and the aerospacecompanies.However, this does not imply that the automatically generatedcode is better. It simply means that the variability between thetwo coding methods is large and therefore the defect datawould look very different for these two projects.
The conventional approach to applying a DO178B process is towrite the SRS artefacts, review them and then move into thedesign and coding phase after the SRS artefacts are approved(i.e. a variant of the waterfall lifecycle model [43,44]).However some companies might place less importance on anearly review phase. They might develop the SRS, design andcode quickly to allow functional testing to start as quickly aspossible (i.e. more agile processes [45,46]). This is done todetermine the main code bugs early in the lifecycle. The bugsare then corrected along with the design documents and thereviews completed after the code and documents have beenstabilised. As such, fewer problem reports will be raised duringthe SRS review activity and more problem reports will beraised during the testing phase. This is the inverse to what isnormally expected and will skew the data tracked by theauthorities compared to the majority of the companies thatfollow the conventional approach to DO-178B processes.
However, the fact that the data is skewed should not affect thecomparison results. This is because regardless of the methods orwork instructions used to implement a DO178B process, thesoftware artefacts must meet certain levels of quality to passthe certication audits. The phase in which problem reports areraised in the development lifecycle should not signicantly affectcode walkthrough problem reports raised against the auto-matically generated code than the manually developed code.auditors can assess how the quality of the product ishiding poor software processes. The analysis method isobviously open to abuse if the companies supply incorrectdata. As such, the auditors should only use the data as a guide.They should still use their auditing skills to identify if theprocesses behind the development are of the required qualityand are being followed.The certication authorities are not responsible for stating thatthe software is safe. This responsibility lies with the aerospaceThe data collection and analysis method also has a number ofadvantages, including the following:
DO178B is a guidance document and if the certicationauthorities monitor software development at such a closelevel, they might be close to mandating the developmentactivities that need to be completed and monitored. This alsobecomes an issue with respect to the required independence ofthe certication authorities.If the certication authorities overuse the metrics, they couldeasily be misled by a good set of data that might actually bedirection and issues can therefore be corrected before theaudit. This is very important considering the large develop-ment costs and multiple stakeholders involved in aircraftdevelopment. The early detection and correction of problemscan reduce the costs signicantly.
. Disadvantagesdetermining if the certication authorities are fullling theirrole as an independent certication agency.The uncertainty of whether a project will pass the SOI audit isbetter managed and the big bang approach to a certicationaudit is avoided. Condence can be gained by the softwaremanagement team that the project is heading in the rightauthorities will have the opportunity to follow the recordedproblem report history throughout the lifecycle of a project.
The collected data will help the certication authorities inauditing their own auditors. The authorities can then see ifthere is a trend occurring, where an auditor may be too strictor too lenient. It will also allow the certication authorities toidentify where their auditors may be lacking in experience andtraining. As such, the certication authorities can focus thetraining of their auditors on the parts of the lifecycle, whichcan identify the extent to which a project is meeting thecertication objectives.
The metrics and the issues raised above allow the certicationauthorities to be audited by their governing bodies. Havingdata about the audits, the quality of the auditors and thequality of the projects can help these governing bodies inproject develops, the certication authorities could use thetrend of the data over the history of a project to acquire moreexposure to the software process weaknesses. They no moreneed to make important decisions based on a small snapshotof the software lifecycle data. Further, it will also be moredifcult for companies to mislead the authorities at the auditsregarding the quality of the software. That is because thethe in-service phase and then through its operational life. Theamount of operational issues observed can then be used togather statistics regarding the number of failures in service,with respect to the number of related problem reports thatwere raised during the development phases. This may alsohelp in crash investigation if the causes of a crash are relatedto the software [47].If software project data is provided at regular intervals as aimproving as it progresses through its development phase intocompanies. If the certication authorities become too involved
-
see which company declares late delivery rst and hence pays
sofrepearthephthi
[19] RTCA. Final report for clarication of DO-178B software considerations inairborne systems and equipment certication. RTCA; October 2001.
certication of safety critical systems in the transportation sector. ReliabilityEngineering & System Safety 1999;63(1):4766.
I. Dodd, I. Habli / Reliability Engineering and System Safety 98 (2012) 72322tication. The maintenance costs of a poorly documentedtware package will rise dramatically if the number of problemorts is not kept to a minimum. Problems that are not resolvedly in the development lifecycle will have a knock-on effect indownstream lifecycle stages, particularly in the maintenance
ases. It is therefore important to use the method proposed incerthe late penalty costs.
8. Legal and ethical issues
In an ideal application of the approach presented in this paper, thecertication authorities will be responsible for holding the data for alarge number of companies. As discussed in the previous section, thisdata is sensitive and may be damaging if it is leaked to competitors orinterested parties. As such, companies will not release their datawithout assurance that the data is secured from other competitors orinterested parties. Further, the certication authorities will normallyhave condentiality agreements in place with the aerospace compa-nies to cover this type of condentiality. However, it is important thatindependent security and integrity audits are carried out to ensurethat the data is held securely and protected from industrial espionage.The certication authorities will be given the privilege of accessingmore of a companys dirty laundry than they would normally view.They must respect this arrangement and not abuse this trust. This is aprivilege that can also be abused by the aerospace companies. Thesecompanies have an insight into what the authorities are looking for atthe audits and therefore they could falsify the data to improve howthe project quality appears to the outside world. However, it is for thebenet of all the stakeholders that this data collection and analysisapproach is successful as it allows for clear and open communicationchannels between both parties.
The certication authorities must also be careful about theirlegal obligations. As previously discussed, they may become tooinvolved and implicated if an accident occurs. The certicationauthorities must clearly dene their roles and responsibilitieswith respect to the collected data and its use. Their independencemust be maintained and it must be made clear that they are usingthe data in an oversight role only. If an accident occurs, then thisdata would clearly be admissible in a court of law and thereforethe aerospace companies may be even more reluctant to providethis data to the certication authorities, for fear that the datamight be used against them.
9. Concluding remark
Although the safety categorisation of the problem reports is animportant issue, safety-critical software must be maintained andbe serviceable for a long period of time. From a maintenance andeconomic point of view, it would be unwise to undermine theimpact of non-safety related issues purely for the sake ofin the detail of the development, then the aerospace compa-nies may claim in the event of an accident that the certicationauthorities were fully aware of the issues, thereby implicatingthe certication authorities by making them indirectly accoun-table for assuring the softwares safety.
As part of the data collection and analysis method, commerciallysensitive data regarding a compan