Pandora: An Approach to Analyzing Safety-Related...

A DissertationPresented to

the faculty of the School of Engineering and Applied ScienceUniversity of Virginia

In Partial Fulfillment

of the requirements for the Degree

Doctor of Philosophy, Computer Science

by

William S. Greenwell

May 2007

Pandora: An Approach to Analyzing

Safety-Related Digital-System Failures

ii

Approval Sheet

The dissertation is submitted in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy, Computer Science

____________________________________William S. Greenwell, Author

This dissertation has been read and approved by the examining Committee:

____________________________________John C. Knight, Dissertation advisor

____________________________________

____________________________________

____________________________________

____________________________________

____________________________________

Accepted for the School of Engineering and Applied Science:

____________________________________Dean, School of Engineering and Applied Science

May 2007

Abstract

Safety-related systems are those whose failure could result in loss of life, injury, or

damage to property. The use of software and programmable electronic systems in safety-

related domains, which include aerospace, commercial aviation, medicine, and nuclear

power generation, is increasing. This increased reliance on digital systems to control

potentially hazardous operations or to alert operators to dangerous conditions creates new

failure modes and risks that might lead to accidents, and it poses new system development

and safety assurance challenges.

Ensuring that digital systems will operate at least as dependably as the mechanical and

analog systems they replace is essential, but achieving this level of dependability in a dig-

ital system can be exceptionally difficult. The design faults that plague digital systems are

harder to identify and address than the physical faults that precede the bulk of mechanical

and analog system failures. These design faults, coupled with the complex new designs

that digital systems typically implement, complicate the safety assurance of digital sys-

tems. The increased reliance on digital systems to perform safety-related functions and the

difficulty of ensuring that they will do so correctly increase the probability of accidents.

iii

Abstract iv

Analyzing safety-related failures of digital systems can yield lessons for improving

development and assurance practices in order to reduce the risk of future accidents, but the

same factors that complicate the safety assurance of these systems also affect failure anal-

ysis. Traditional techniques for investigating accidents assume that systems exhibit a com-

mon set of failure modes and that each failure mode leaves evidence that can be

discovered from the accident scene. Such is not the case for digital systems, and so new

techniques are needed to address the unique challenges that digital systems pose.

To address this problem, this dissertation introduces the Pandora approach to failure

analysis. Pandora is a systematic but manual approach to analyzing safety-related failures

of digital systems in which the analysis is framed around a system’s safety case. The

safety case documents the complete argument that the system is acceptably safe to oper-

ate, and framing failure analysis around the safety case provides important benefits. Inves-

tigators applying Pandora to a failure examine the safety case for fallacies; the presence of

a fallacy in the safety case suggests the existence of a fault in the system that might have

contributed to the failure. Pandora guides investigators through the steps of developing

theories of the failure, eliciting evidence, and developing lessons and recommendations

that address the problems the investigators identify. While Pandora may be applied to a

wide array of system accidents, this dissertation focuses on its application to those involv-

ing safety-related digital systems

Pandora is accompanied by a taxonomy of safety-argument fallacies to assist investi-

gators in applying the process. The taxonomy documents fallacious reasoning that might

appear in safety arguments and was developed from separate surveys of fallacies in real-

Abstract v

world safety arguments and of fallacies documented in the philosophical literature. It may

be used with Pandora or separately to assist in the detection of safety-argument fallacies.

Pandora was applied to a series of commercial-aviation accidents involving a mini-

mum safe altitude warning system, and the safety-argument fallacy taxonomy was evalu-

ated through a controlled study involving twenty computer-science graduate students,

engineers, and safety professionals. In the former study, the application of Pandora pro-

duced findings comparable to those of the official investigations into the accidents. The

latter study, while statistically inconclusive, suggests that the fallacy taxonomy assists the

detection of fallacies in safety arguments. While both studies have significant limitations,

they show that the Pandora approach holds promise and justify further evaluation.

Acknowledgments

It is an honor to thank my adviser, John Knight, for the years of direction and support

he has given me as I completed my graduate study in spite of the turmoil—both external

and self-induced—that I have encountered along the way. Similarly, I thank Michael Hol-

loway of NASA Langley Research Center for inviting me to explore accident analysis as a

research area and for serving in an informal capacity as my co-adviser throughout my

graduate study. I am grateful to John, Michael, and the other members of my dissertation

committee—David Evans, Greg Humphreys, Chris Johnson, and Paul Reynolds—for the

effort they invested in reviewing my dissertation, for their constructive criticism and

insight into how I should improve my work, and for spotting a plethora of typographical

errors that escaped my attention.

Completing a sizeable research project requires the assistance of colleagues. Michael

Holloway and Jacob Pease collaborated with me in developing the safety-argument fallacy

taxonomy, and I thank Michael for assisting me in developing and administering the con-

trolled study to evaluate the safety-argument fallacy taxonomy. Elisabeth Strunk collabo-

rated with me in developing the enhanced safety-case lifecycle and on earlier work that

vi

Acknowledgments vii

preceded my dissertation research. Performing a controlled study is only possible if one

can find volunteers, and I appreciate the twenty anonymous participants who donated their

time in furtherance of my research.

I am grateful to those who contributed to my work by providing peer reviews, emo-

tional support, and other assistance. Elisabeth, in addition to serving as a collaborator and

peer reviewer for much of my research, is also a close friend, and her friendship has bene-

fitted me professionally and personally. Kelly Hayhurst of NASA Langley Research Cen-

ter advised me in developing the methodology for the controlled study, peer reviewed my

work, and worked various feats of magic behind the scenes to my benefit. Peter Ladkin of

Beilefeld University provided insightful yet uncompromising peer reviews of my work.

Barry Strauch of the National Transportation Safety Board served as a committee member

for my dissertation proposal and informed me on the inner workings of the NTSB. Bev

Littlewood and David Wright of City University assisted me with a difficult statistics

question related to the controlled study. Patrick Graydon provided me with housing during

the delicate two months in which I raced to complete my dissertation in time for my

defense, and I am pleased to count Patrick and Tony Aiello as colleagues and friends.

Of all the emotional support I have received, there is none greater than the love given

to me by my mother and father. I owe all that I accomplish to their parenting and to the

opportunities they have made available to me. I love them both dearly.

Finally, I thank the Lord for the blessings He has given me, and I pray that His hand

will continue to guide my work and my life.

This work was funded in part by NASA Langley Research Center under grant number

NAG-1-02103.

Table of Contents

Chapter 1 Introduction .........................................................................11.1. Korean Air Flight 801 ...........................................................................................11.2. The Investigation ...................................................................................................41.3. Software’s Role in Accidents ................................................................................61.4. Thesis Statement....................................................................................................71.5. Contributions .........................................................................................................81.6. Dissertation Outline...............................................................................................9

Chapter 2 Software Safety Assurance ...............................................102.1. Safety Assurance of Digital Systems ..................................................................102.2. The System Safety Case ......................................................................................132.3. System Failure .....................................................................................................172.4. Techniques for Analyzing Digital-System Failures ............................................212.5. Chapter Summary................................................................................................28

Chapter 3 Software Accident Investigation ......................................303.1. The National Transportation Safety Board..........................................................303.2. NTSB Investigations of Major Aviation Accidents ............................................313.3. Treatment of Digital Systems..............................................................................343.4. American Airlines Flight 965..............................................................................363.5. Chapter Summary................................................................................................40

Chapter 4 Pandora ..............................................................................424.1. The Enhanced Safety-Case Lifecycle..................................................................424.2. Pandora ................................................................................................................454.3. Discussion............................................................................................................544.4. Chapter Summary................................................................................................57

viii

Table of Contents ix

Chapter 5 Safety-Argument Fallacies ...............................................595.1. Fallacies in System Safety Arguments ................................................................595.2. A Taxonomy of Safety-Argument Fallacies........................................................665.3. Applications.........................................................................................................755.4. Chapter Summary................................................................................................75

Chapter 6 MSAW Case Study of Pandora........................................776.1. Overview of the MSAW System.........................................................................776.2. Methodology........................................................................................................806.3. Lessons Learned From Pandora ..........................................................................856.4. Discussion............................................................................................................936.5. Threats to Validity...............................................................................................976.6. Chapter Summary................................................................................................99

Chapter 7 Fallacy Taxonomy Evaluation........................................1007.1. Overview ...........................................................................................................1017.2. Experimental Methodology ...............................................................................1037.3. Results ...............................................................................................................1147.4. Discussion..........................................................................................................1207.5. Threats to Validity.............................................................................................1307.6. Chapter Summary..............................................................................................131

Chapter 8 Conclusions & Future Work ..........................................1328.1. Conclusions .......................................................................................................1328.2. Contributions .....................................................................................................1368.3. Future Work.......................................................................................................1388.4. Chapter Summary..............................................................................................1448.5. Coda...................................................................................................................145

Appendix A Safety-Argument Fallacy Taxonomy.............................146A.1. Circular Reasoning ............................................................................................147A.2. Diversionary Arguments ...................................................................................148A.3. Fallacious Appeals.............................................................................................149A.4. Mathematical Fallacies......................................................................................153A.5. Unsupported Assertions.....................................................................................157A.6. Anecdotal Arguments........................................................................................159A.7. Omission of Key Evidence................................................................................162A.8. Linguistic Fallacies............................................................................................166

Appendix B MSAW Case Study Safety Arguments ..........................169B.1. USAir 105..........................................................................................................170B.2. TAESA Learjet 25D..........................................................................................173B.3. Beechcraft A36..................................................................................................176B.4. Piper PA-32-300................................................................................................180

Table of Contents x

B.5. Korean Air 801 ..................................................................................................184

Appendix C Survey Instruments .........................................................188C.1. Experimental Procedure ....................................................................................188C.2. Survey Instruments............................................................................................191

Appendix D Taxonomy Evaluation Data & Plots ..............................218D.1. Coded Yes/No Responses..................................................................................218D.2. Sample Data.......................................................................................................225D.3. Frequency Distributions ....................................................................................227

Bibliography ..............................................................................................233

List of Figures

1.1 The MSAW Inhibit Zone at Guam.............................................................................5

2.1 A Safety Argument in Goal Structuring Notation....................................................15

2.2 The Functional Decomposition Safety-Case Pattern ...............................................16

2.3 AntiPattern of Fallacious Functional Decomposition ..............................................18

2.4 Example ECF Chart .................................................................................................24

2.5 Example WBA Graph ..............................................................................................25

2.6 Typical Control Loop ...............................................................................................27

2.7 PARCEL Investigation Schemes .............................................................................28

3.1 Major Sections of the Final Report ..........................................................................34

3.2 Excerpt of the Systems Group Checklist..................................................................35

4.1 Kelly & McDermid’s Safety-Case Lifecycle ...........................................................43

4.2 The Enhanced Safety-Case Lifecycle ......................................................................44

4.3 The Pandora Failure-Analysis Process.....................................................................48

5.1 Excerpt from the EUR RVSM Safety Case .............................................................63

5.2 Sample Taxonomy Entry..........................................................................................74

xi

List of Figures xii

6.1 Pre-Failure Safety Argument for the USAir 105 Accident ......................................83

6.2 Excerpt of the Safety Argument for the TAESA Learjet Accidents ........................87

6.3 Top-Level Safety Argument for the Korean Air 801 Accident ...............................90

6.4 Inhibit-Zone Minimization Argument for the Korean Air 801 Accident.................91

7.1 Mean Acceptance Rates .........................................................................................115

7.2 Mean Accuracy Rates.............................................................................................115

B.1 USAir 105 Safety Argument, Top Level ...............................................................170

B.2 USAir 105 Safety Argument, Goal G02 ................................................................171

B.3 USAir 105 Safety Argument, Goal G101 ..............................................................172

B.4 TAESA Learjet 25D Safety Argument, Top Level................................................173

B.5 TAESA Learjet 25D Safety Argument, Goal G02.................................................174

B.6 TAESA Learjet 25D Safety Argument, Goal G17.................................................175

B.7 Beechcraft A36 Safety Argument, Top Level........................................................176

B.8 Beechcraft A36 Safety Argument, Goal G02.........................................................177

B.9 Beechcraft A36 Safety Argument, Goal G17.........................................................178

B.10 Beechcraft A36 Safety Argument, Goal G301.......................................................179

B.11 Piper PA-32-300 Safety Argument, Top Level......................................................180

B.12 Piper PA-32-300 Safety Argument, Goal G02.......................................................181

B.13 Piper PA-32-300 Safety Argument, Goal G17.......................................................182

B.14 Piper PA-32-300 Safety Argument, Goal G401.....................................................183

B.15 Korean Air 801 Safety Argument, Top Level........................................................184

B.16 Korean Air 801 Safety Argument, Goal G02.........................................................185


List of Figures xiii


D.1 Frequency Distribution, Acceptance Rate, Argument 1 ........................................227





D.6 Frequency Distribution, Accuracy Rate, Argument 1............................................230





List of Tables

5.1 Tally of Fallacies Identified in the EUR RVSM Safety Case ..................................62

5.2 Tally of Fallacies Identified in the Opalinus Clay Safety Case ...............................64

5.3 Tally of Fallacies Identified in the EUR Whole Airspace Safety Case ...................66

5.4 Excluded Fallacies Grouped by Source ...................................................................69

5.5 Examples of Excluded Fallacies ..............................................................................69

5.6 The Safety-Argument Fallacy Taxonomy................................................................70

7.1 Sample Population Composition by Recruitment Site...........................................104

7.2 Mean Acceptance and Accuracy Rates ..................................................................114

7.3 Two-Sample F Tests for Variances: Acceptance Rate...........................................117

7.4 Two-Sample T Tests: Acceptance Rate .................................................................118

7.5 Two-Sample, Two-Tailed F Tests for Variances: Accuracy Rate .........................119

7.6 Two-Sample T Tests: Accuracy Rate.....................................................................120

7.7 Distribution of Participants by Academic Degree..................................................122

7.8 Distribution of Participants by Occupation............................................................123

7.9 Distribution of Participants by Training in Logic or Philosophy...........................123

xiv

List of Tables xv

7.10 Attrition in Control & Treatment Groups ..............................................................126

7.11 Adjusted Mean Acceptance Rates..........................................................................128

7.12 Adjusted Mean Accuracy Rates .............................................................................128

A.1 The Safety-Argument Fallacy Taxonomy..............................................................146

D.1 Yes/No Responses, Control Group ........................................................................219

D.2 Yes/No Responses, Control Group (Continued) ....................................................220

D.3 Yes/No Responses, Control Group (Concluded) ...................................................221

D.4 Yes/No Responses, Treatment Group ....................................................................222

D.5 Yes/No Responses, Treatment Group (Continued)................................................223

D.6 Yes/No Responses, Treatment Group (Concluded) ...............................................224

D.7 Acceptance Rates, Control Group..........................................................................225

D.8 Acceptance Rates, Treatment Group......................................................................225

D.9 Accuracy Rates, Control Group .............................................................................226

D.10 Accuracy Rates, Treatment Group .........................................................................226

xvi

The concept of failure is central to the design process, and it is by

thinking in terms of failure that successful designs are achieved.

—Henry Petroski [1]

Chapter 1

Introduction

Digital systems power the economic and technological infrastructure that defines

modern society. They are developed for a diverse array of business, consumer, and gov-

ernment applications, but their use in safety-related areas is of particular concern. In these

contexts, which include commercial aviation, health care, and nuclear power generation,

the consequences of failure can be severe [2]. This dissertation is about investigating

safety-related incidents that arise from the operation of digital systems. To motivate the

work, it begins with an account of one such incident.

1.1. Korean Air Flight 801In the early morning hours of August 6, 1997, Korean Air flight 801 was nearing the

end of its five-hour journey from Seoul to Guam. With approximately 270 miles left to

travel above the western Pacific Ocean, the flight crew were preparing to begin their

descent into the tropical U.S. territory. At 1:03 a.m., the first officer made initial contact

with Guam air traffic control while the captain completed the descent checklist. About

seven minutes later, air traffic control cleared Korean Air 801 to descend into Guam.

1

Introduction 2

The weather in Guam that morning was typical. A weak low-pressure system making

its way through the Mariana Islands brought partly-cloudy skies and scattered showers to

the region. Light easterly winds prevailed, visibility was seven miles, and the surface tem-

perature was about 82° F. Conditions were generally favorable for a visual approach into

Guam, but since there was a chance of rain, the flight crew would have to prepare for an

instrument approach if visibility deteriorated. Moreover, since the glideslope1 at the Guam

airport was out of service, special procedures would be necessary should the need for an

instrument approach arise.

Before starting their descent, the captain briefed the first officer on the approach and

landing procedures for Guam. He told the first officer to expect weather conditions to be

favorable for a visual approach and noted the special procedures that would be necessary

since the glideslope was unavailable. At 1:13 a.m., the aircraft began to descend.

During the descent, the captain complained to the first officer about having to work

with little rest. The captain had originally been scheduled to fly to Dubai, but after arriving

in Seoul the day before on a delayed flight from Hong Kong, he did not have an adequate

amount of time to rest for the Dubai trip and was instead assigned to fly the shorter route

to Guam. “They make us work to maximum, up to maximum,” the captain said. “Probably

this way...hotel expenses will be saved for cabin crews, and maximize the flight hours.”

One minute later, he remarked, “Eh...really...sleepy.”

Ten minutes into the descent, Korean Air 801 encountered storm clouds that

obstructed the flight crew’s view of the airport. After deviating 20 miles from their origi-

nal descent course due to the weather, the first officer reported to air traffic control at

1. The glideslope is a radio navigation aid that provides vertical guidance to the flight crew during the landing approach.

Introduction 3

1:31 a.m. that they were clear of the clouds and requested radar vectors for the approach.

The controller responded, and at 1:38 a.m., he instructed the flight crew to join the

approach course. One minute later, the controller cleared Korean Air 801 for its approach

and advised them that the glideslope at Guam was unusable. The first officer acknowl-

edged the approach clearance but not the glideslope warning.

Korean Air 801’s approach to Guam was marred by confusion among the flight crew

regarding the status of the glideslope at the Guam airport. At about 1:40 a.m., the flight

engineer asked whether the glideslope was working. The captain responded, “Yes, yes, it’s

working,” but about five seconds later, the first officer stated, “Not usable.” Shortly there-

after, an altitude alert chimed, indicating that the aircraft had deviated from its selected

altitude. At this time, the aircraft was approximately 10 miles southwest of the runway and

descending through an altitude of 2,640 feet above sea level. Further discussion about the

status of the glideslope ensued for the next 45 seconds until air traffic control handed the

aircraft off to the Guam tower. As the first officer made initial contact with the tower, a

second altitude chime sounded in the cockpit. At 1:41 a.m., while the aircraft was

descending through an altitude of 1,800 feet, the tower cleared Korean Air 801 to land.

The flight crew began executing the landing checklist at 1:41:31 a.m. At 1:41:42, the

automated voice of the ground-proximity warning system (GPWS) announced, “One-

thousand,” indicating that one-thousand feet of distance separated the aircraft from the ter-

rain immediately below. The captain asked, “Isn’t glideslope working?” At about 1:41:59,

the first officer stated, “Not in sight?” This remark was followed one second later by a sec-

ond GPWS annunciation: “Five hundred,” to which the flight engineer uttered, “Eh?”

Introduction 4

Despite their confusion regarding the status of the glideslope at Guam and the GPWS

annunciations, the flight crew proceeded with the landing checklist. At 1:42:14, the

GPWS announced, “Minimums, minimums,” and then “sink rate,” which warned that the

aircraft was descending too rapidly. At 1:42:20, the first officer stated, “Not in sight;

missed approach.” Two seconds later, the captain stated, “Go around,” and at about the

same time power was applied to the aircraft’s engines and the control column was gradu-

ally pulled back in order to initiate a climb. At 1:42:23, the autopilot disconnect warning

began sounding in the cockpit. According to the cockpit voice recorder’s log, at 1:42:24

the GPWS emitted the final intelligible utterance in the cockpit:

“One-hundred... fifty... forty... thirty... twenty...”

About 1:42:26 local time, Korean Air flight 801, carrying 254 passengers and crew,

impacted terrain near Nimitz Hill, Guam approximately 3.6 miles southwest of the run-

way. 228 people aboard perished in the crash, and the rest sustained serious injuries. The

aircraft was destroyed by the impact and a subsequent fire.

1.2. The InvestigationThe preceding account of the Korean Air flight 801 crash is a summary of the flight

history contained in the U.S. National Transportation Safety Board’s (NTSB) report on

the accident [3]. During its investigation, the NTSB studied several factors that were

involved in the accident, including the flight crew’s performance and that of air traffic

control, flight-crew and controller training, fatigue, navigation aids, the GPWS, and mete-

orological conditions. The NTSB ultimately concluded that the probable cause of the acci-

dent was the captain’s failure to execute the landing approach properly and the failure of

Introduction 5

the first officer and flight engineer to monitor the approach carefully. Of interest to this

dissertation, however, is an additional factor that the NTSB noted explicitly in its state-

ment of probable cause:

“Contributing to the accident was the Federal Aviation Administration’s inten-

tional inhibition of the minimum safe altitude warning system at Guam and the

agency’s failure to adequately manage the system [3].

The minimum safe altitude warning (MSAW) system is a software system designed to

alert air traffic controllers when aircraft operating in their airspace descend, or are pre-

dicted by the system to descend, below a predetermined minimum safe altitude. The U.S.

Federal Aviation Administration (FAA) implemented MSAW in the early 1970s in order

to help prevent accidents such as the Korean Air 801 crash in which a flight crew become

disoriented and inadvertently pilot a serviceable aircraft into terrain or obstacles. The

MSAW system at Guam was installed in 1990 and provided service within a 60-mile

radius of the air traffic control radar facility.

During its investigation, the NTSB learned that, in March 1993, FAA technicians

developed a customized MSAW software package for installation at Guam that was

designed to reduce the occurrence of nuisance warnings. As depicted in Figure 1.1, the

Figure 1.1: The MSAW Inhibit Zone at Guam

Pacific Ocean

Guam

MSAW Service Area Boundary(60-mi. radius)

Inhibit Zone(59-mi. radius) Not to scale.

Introduction 6

new software inhibited MSAW processing within a 59-mile radius of the radar facility,

effectively rendering the system inoperative. This change was requested and approved by

the air traffic control facility at Guam. The new software became operational in February

1995 and remained in operation through the time of the accident. Through post-accident

simulation, the Safety Board concluded that, had the system not been inhibited, it would

have generated an alert for Korean Air 801 about one minute before impact, which would

have been a sufficient amount of time for the controller to notify the flight crew and for

the flight crew to take evasive action.

1.3. Software’s Role in AccidentsThe Korean Air 801 accident exemplifies the subtle role that software may play in

accidents as well as the challenges investigators face in attempting to analyze failed com-

puter systems. With notable exceptions [4, 5, 6, 7, 8, 9], incidents in which system hazards

resulted from software defects are historically rare [10]. Other safety barriers such as

hardware interlocks, mechanical backups, and operator oversight may mitigate the effects

of an error so that it is little more than an inconvenience to the operator. It is only when all

of these barriers fail in concert that a hazard can arise, and consequently investigators

might focus their efforts on strengthening the mitigating barriers rather than correcting the

software defects that originated the hazard.

With an industry-wide shift underway in the commercial aviation sector toward fully-

automated flight control and navigation systems, developers are increasingly relying upon

software to perform safety-critical functions and reducing the use of costly hardware inter-

locks and mechanical backups. Accompanying the increased reliance upon software to

function dependably is an increase in the potential severity of software defects. Since

Introduction 7

flaws in a software system’s design are shared by all instances of the software, it is thus

imperative that failures of safety-related computer systems be thoroughly analyzed in

order to diagnose the failures and prevent them from recurring.

Analyzing a software design in order to determine whether a computer-system failure

was indeed due to a software defect poses new challenges to investigators. Since each

software system that is developed may exhibit a radically new design, investigators often

cannot benefit much from knowledge of previous software systems in their efforts to

understand new systems or even newer versions of existing systems. Moreover, the failure

semantics of software depend upon the environment in which the software is executed in

addition to the design and implementation of the software itself. Consequently, software

systems tend to possess few common failure modes, and the same error may display dif-

ferent symptoms when it is executed in separate environments. In contrast, the mecha-

nisms associated with hardware failures, e.g., those of mechanical or electrical systems,

are well-understood and often accompanied by physical evidence such as fatigue or

corrosion [11]. Faced with the complications associated with investigating software-

related failures, investigators might instead treat software as an impenetrable black box

and either issue superficial recommendations that fail to address underlying errors and the

development practices that introduced them, or they might ignore the software entirely

and focus on other systems that are more amenable to failure analysis.

1.4. Thesis StatementThe thesis statement of this dissertation is:

The challenges associated with identifying and addressing design faults combined

with the complexity and coupling of digital systems encumber the analysis of digi-

Introduction 8

tal-system failures, which in turn increases the risk of an analysis being incomplete

or ad hoc. Framing the analysis around the system safety argument—the complete

rationale that a system is acceptably safe to operate—addresses these problems by

providing a systematic method of diagnosing failures.

To this end, this dissertation presents Pandora as a novel approach to analyzing fail-

ures of digital systems based upon the concept of safety arguments. Applying Pandora to

the safety argument of a failed system assists investigators in developing hypotheses con-

cerning how the system might have failed, eliciting the evidence necessary to confirm or

refute those hypotheses, documenting lessons, and developing recommendations to pre-

vent the failure from recurring. Presented with Pandora is a taxonomy of safety arguments

fallacies, which was developed as part of this research to assist investigators and others

who review safety cases in detecting common types of fallacies in safety arguments. The

taxonomy may be used as part of a Pandora analysis to help discover fallacious reasoning

that might have contributed to a failure, or it may be used separately to review safety argu-

ments prior to system deployment or as a training aid to assist developers of safety argu-

ments in avoiding common pitfalls.

1.5. ContributionsThe primary contributions of this dissertation are:

• The discovery of the enhanced safety-case lifecycle in which the safety case of a

system guides iterative improvements in system safety through failure analysis;

• The development of Pandora: a safety-case-based approach to failure analysis; and

• The development of the safety-argument fallacy taxonomy, which supports the

application of Pandora.

Introduction 9

The extent to which these contributions were achieved was evaluated through a case-

study evaluation of Pandora and a controlled experimental evaluation of the fallacy taxon-

omy.

1.6. Dissertation OutlineThis dissertation is organized into eight chapters, including this introduction. Chapter

2 provides an overview of related work in safety assurance and failure analysis of soft-

ware-based systems. Using the NTSB’s practices as an example, Chapter 3 examines the

investigative process for complex systems accidents and why this process is not well-

suited to treat failures involving digital systems. Chapter 4 describes the Pandora failure

analysis process and the theory surrounding it, and Chapter 5 documents the safety argu-

ment fallacy taxonomy and its derivation. Chapter 6 reports on a case study evaluation of

Pandora using a series of accidents involving the MSAW system mentioned earlier. Chap-

ter 7 describes a controlled experimental evaluation of the fallacy taxonomy using student

and professional-engineer test subjects. Finally, Chapter 8 presents the conclusions of this

work and explores areas for future research.

Chapter 2

Software Safety Assurance

Assuring the safety of digital systems is complicated by their complexity, the unique-

ness of their designs, and the manner in which they fail. For the same reasons that they are

hard to prevent, safety-related failures of digital systems can be equally challenging to

diagnose. This chapter examines the unique aspects of digital systems that exacerbate

safety assurance and failure analysis, and it reviews existing techniques developed specif-

ically to treat failures of these systems.

2.1. Safety Assurance of Digital SystemsSafety may be viewed in an absolute sense or in a relative sense. In the absolute sense,

safety as a system dependability property is typically defined as “the absence of cata-

strophic consequences on the user(s) and the environment” [12]; however, treating safety

in this manner precludes a system from being judged to be safe even if its probability of

catastrophic failure is extremely remote. In practice, the absolute notion of safety is

regarded as an ideal “that can only be approached asymptotically” [9], and relative notions

of safety are used instead. In this sense, Lowrance defines safety as “a judgement of the

10

Software Safety Assurance 11

acceptability of risk, and risk, in turn, as a measure of the probability and severity of harm

to human health” [13]. Although Leveson—and Lowrance himself—raise the questions of

how acceptability is to be determined, who will determine it, and who will be affected by

the risk of loss, these questions must be answered even with the absolute notion since the

best one can do is to arrive at an acceptable approximation of safety.

The commercial aviation industry provides a popular example of what “acceptably

safe” means for software aboard aircraft systems, where the Federal Aviation Administra-

tion (FAA) requires that “catastrophic failure conditions” (excluding software faults) have

a probability of occurrence of less than [14, 15]. Whereas mechanical engineers,

within limits, can determine the reliability of a system as they design it, Littlewood and

Strigini note several unique aspects of digital systems that make planning a system to

exhibit this level of dependability difficult:

1. Software failures are due to design faults, which are difficult both to avoid and to

tolerate;

2. Software is often used to implement radically new systems, which cannot benefit

much from previous, successful designs; and

3. Digital systems in general implement discontinuous input-to-output mappings that

are intractable by simple mathematical modeling [15].

Testing is a popular technique for validating typical software applications, but for

ultra-dependable systems—those whose failure rates must be less than —testing

alone is an infeasible method of demonstrating software reliability [16]. Given the infeasi-

bility of sample-based testing and the impracticability of exhaustive testing, two

approaches have emerged for assuring that a digital system is acceptably safe to operate:

10 9–

10 7–


• The process-based approach, in which safety is determined by adherence to pre-

scribed or recommended development practices and methods; and

• The evidence-based approach, which relies upon explicit evidence of safety [17].

2.1.1. Process-Based AssuranceIn the process-based approach, the development of a safety-related digital system is

governed by one or more standards. In many cases, the standard assigns a safety integrity

level (SIL) based on the potential consequences of system failure and then prescribes or

recommends techniques to be employed in building the system. The standard might also

require the developer to produce specific evidence of compliance such as test results or

code review summaries [18]. To demonstrate to a regulator that the system is acceptably

safe to operate, the developer must show that the chosen SIL is appropriate to the system,

that all of the required development practices associated with the SIL have been followed,

and that all of the required evidence has been produced. The developer and the regulator

accept that, by following the development standard, the resulting system meets the

required level of safety [17].

Numerous software development standards exist [19]. Examples of notable standards

include DO-178B [18], which applies to software aboard commercial aircraft, Defense

Standards (DS) 00-55 and 00-56 [20, 21], which govern software developed for the U.K.

Ministry of Defense, and IEC 61508 [22], which is a cross-domain standard for software,

electronic, and programmable electronic systems. Each industry chooses or creates its

own standards, and each standard, in turn, specifies its own set of development practices.


2.1.2. Problems with Process-Based AssuranceSoftware development standards tend to over-emphasize the importance of following a

particular process instead of assuring the safety of the product, and they often impose pro-

cess requirements whose effectiveness has not been validated [19]. They assume that by

developing a system according to a set of best practices the system will be safe, but there

is little evidence to support this assumption [23]. For these reasons and others, Weaver,

McDermid, and Kelly advocate the use of the safety case as an evidence-based approach

to safety assurance for software and other digital systems in which “explicit evidence of

safety, directly linked to the requirements of the system” is provided [23, 24]. This evi-

dence should employ a combination of “testing, analysis, process, proof, and historical

usage” [17].

2.2. The System Safety CaseThe evidence-based approach does not impose specific process requirements; instead,

the developer chooses the evidence and argument strategy for demonstrating that the sys-

tem meets its safety requirements. System safety is argued through satisfaction of the

requirements, which are then broken down further into more specific goals that can be sat-

isfied directly by the evidence. This approach affords the developer greater flexibility in

choosing which development techniques to apply to a system, and it can accommodate

newer techniques, such as model-based development and formal verification, that might

not be prescribed in software development standards.

At the center of the evidence-based approach is the system safety case, which is a

“documented body of evidence that provides a convincing and valid argument that a sys-

tem is adequately safe for a given application in a given environment” [19]. The safety


case documents the safety requirements for a system, the evidence that the requirements

have been met, and the argument linking the evidence to the requirements. Bishop and

Bloomfield decompose the safety case into claims, evidence, argument, and inferences:

• Claims are propositions about properties of the system supported by evidence.

• Evidence may either be factual findings from prior research or scientific literature

or sub-claims that are supported by lower-level arguments.

• The safety argument is the set of inferences between claims and evidence that

leads the reader from the evidence forming the basis of the argument to the top-

level claim—typically that the system is safe to operate [25].

A safety argument may be viewed as a directed acyclic graph in which the root node is

the top-level claim, the internal nodes are sub-claims, and the leaves are the basis of the

argument. Hence, the internal structure of an argument—that is, the inferences between

claims and evidence—may be formalized and represented diagrammatically. Adelard’s

Claims-Argument-Evidence (ASCAD) and Kelly’s Goal Structuring Notation (GSN) are

two popular graphical safety-argument notations [26, 27], and tool support for construct-

ing and analyzing safety arguments is available for each [28, 29]. This dissertation depicts

safety arguments in GSN, and a brief tutorial of the notation follows.

2.2.1. Goal Structuring NotationFigure 2.1 contains an example of a simple safety argument expressed in GSN. GSN

augments Bishop and Bloomfield’s concept of claims, argument, and evidence to include

contextual elements. The basic nodes of a GSN goal structure are explained below and

illustrated in Figure 2.1.

• Goals state the safety claims of the argument;


• Sub-Goals are refined, specific goals that more closely relate to the evidence;

• Solutions state the evidence that the goals (or sub-goals) have been met;

• Strategies describe the methods used to decompose goals into sub-goals;

• Assumptions state aspects of the context that are taken to be true;

• Justifications explain why solutions satisfy goals (or sub-goals); and

• Context documents other relevant contextual information. [27]

Goal structures in GSN are read in a top-down fashion beginning with the top-level

goal. Two types of arrows are used to indicate the relationships between nodes: filled

arrows indicate that the source node is solved by the target node, and hollow arrows indi-

cate that the source node is stated in the context of the target node. Often, a goal will be

solved by multiple sub-goals. In a linked argument, each of the sub-goals is necessary to

Figure 2.1: A Safety Argument in Goal Structuring Notation

Top-Level Goal

The system is acceptably safe to operate.

StrategyArgument that credible hazards have been addressed.

Context

FMEA & HAZOP analyses

Sub-Goal 1The hazard of spurious outputs has been addressed.

Solution 1

Model-checking results

Sub-Goal 2The hazard of deadlock is eliminated by the chosen design.

Assumption

All credible hazards have been identified.

A

Solution 2

System design document

JustificationModel correspondence was verified by an independent auditor.

J


satisfy the parent goal; in a convergent argument, each of the sub-goals is believed to be

sufficient to satisfy the parent independently, but multiple lines of reasoning are offered to

improve confidence that the parent goal is true [30].

2.2.2. Safety-Case Patterns & AntiPatternsAn additional advantage of representing the structure of safety arguments formally is

that successful arguments may be reused fairly easily. This observation leads to the notion

of safety-case patterns, which capture solutions to common problems in safety argumenta-

tion much as design patterns do for software development [31, 32]. As an example, Figure

2.2 presents the functional decomposition pattern in GSN, which could be instantiated as a

top-level argument for the safety of a system based on the safety of its component

functions [33]. The argument states that the system is safe because each of the n functions

in the system is safe and either that interactions between functions are non-hazardous or

Figure 2.2: The Functional Decomposition Safety-Case Pattern

G01

{System X} is safe.

ST01Argument by claiming safety of all system safety-related functions

C01Safety-related functions of {System X}

n = # functions

G02

{Function i} is safe.

n

G03Interactions between system functions are non-hazardous.

G04

All system functions are independant.


that the functions are independent. To instantiate the argument, one would have to provide

a separate safety argument for each function of the system as well as an argument showing

that interactions are non-hazardous or that the functions are independent. Following a pat-

tern can help a developer ensure that all the required evidence for an argument has been

presented, which makes patterns useful for storing knowledge gained through experience,

but pattern use alone does not validate a safety argument.

Similar to the concept of safety-case patterns, AntiPatterns “communicate weak and

flawed arguments – such that they may be recognized and avoided in future

developments” [34]. They are “the antithesis of patterns, but also include an approach for

re-factoring the problem, which has negative consequences, into an acceptable

solution” [17]. AntiPatterns can assist detection of fallacious inferences in system safety

arguments, and analysis of failed systems may reveal new AntiPatterns. Figure 2.3 con-

trasts the functional decomposition pattern illustrated in Figure 2.2 with an AntiPattern

showing how the decomposition might be performed incorrectly.

2.3. System FailureSafety-related digital systems may generally be divided into two categories: those that

control potentially hazardous operations and those that monitor for hazardous

conditions [35]. The failure of a control system might cause the system to take actions that

directly compromise safety; for example, the series of radiation overdoses administered by

the Therac-25 treatment apparatus that caused the deaths of six patients [9]. The failure of

a monitor system could cause the system to display false information or not take the

appropriate actions when a hazardous condition arises; such was the case with the MSAW

system discussed in Chapter 1. The severity of a failure depends upon the context in which


it occurs and the range of hazards it exposes, but some failures might lead to hazards too

quickly to be mitigated and others might go undetected until an accident is inevitable.

Thus, as with any safety-related system, it is important to diagnose failures of digital sys-

tems in order to prevent them from recurring. Before discussing techniques for analyzing

failures, however, it is worth reviewing the basic concepts of system failures and constrict-

ing failures of digital systems to those of their analog counterparts.

2.3.1. Failures & FaultsLeveson defines the failure of a system to be “the nonperformance or inability of the

system or component to perform its intended function for a specific time under specified

environmental conditions” [9]. Because failures concern deviations from intended behav-

ior, any system whose behavior is inconsistent with its goals may be considered to have

failed, even if the system behaved as specified.

Figure 2.3: AntiPattern of Fallacious Functional Decomposition

Comp.Goal

{System X} has {Property P}.

Comp.Composite

{X} is composed of functions or components {F1, F2, ..., Fn}.

Comp.Strategy

Argument over individual functions/components of {X}

Comp.IndivFunc

Function/component Fi has {P}.

Comp.InteractionsInteractions between components and external factors do not violate {P} for the system as a whole.

Comp.Independence

No interactions between functions/components or external factors exist in {X}.

Comp.Implication

If each Fi of {X} has {P}, then {X} has {P} as well.

J


Generally, failures are caused by faults [12], which can be introduced into a system

several ways. Multiple schemes for classifying faults exist; two are presented below for

comparison.

McCormick divides faults into three classes: primary faults, secondary faults, and

command faults [36]. Primary faults occur when a system deviates from its intended

behavior despite operating under nominal conditions; these faults are introduced during

system development and will persist through the life of the system unless they are explic-

itly removed. Secondary faults occur when a system is operated outside of its design enve-

lope, such as when it is damaged by an external factor, subjected to excessive stress or

heat, or used beyond its intended lifetime. Finally, command faults arise when a system is

operated inadvertently due to a control failure even though the system itself operated as

specified. Generally, primary faults arise from intrinsic defects in a system whereas sec-

ondary and command faults are triggered externally. Under this classification, software

errors can cause primary faults in a computer system; if the system then fails, its erroneous

outputs may lead to secondary or command faults in other systems.

Avižienis, Laprie, Randell, and Landwehr organize faults into a taxonomy that consid-

ers the lifecycle phase in which they are created or occur, system boundaries, and several

other factors, but three broad classes ultimately emerge from their taxonomy: design

faults, physical faults, and interaction faults [12]. These faults are roughly analogous to

McCormick’s primary, secondary, and command faults, respectively, but important dis-

tinctions exist between the two schemes. Like primary faults, design faults are intrinsic to

a system, are introduced artificially during development, and persist throughout the lifecy-

cle of the system. Design faults exclude production defects, however, which McCormick


includes as primary faults. Avižienis et al. instead consider these defects to be physical

faults. As with secondary faults, physical faults may stem from external factors, but they

may also arise internally from production defects, deterioration, and hardware errata.

Interaction faults encompass McCormick’s command faults as well as unintended compo-

nent interactions, intrusions, and other attacks that occur during system operation.

2.3.2. Failures of Mechanical & Electrical SystemsIn most engineering disciplines, the failure mechanisms associated with physical faults

are well-understood. For example, failures of mechanical systems may be divided into

three categories: (1) stress-induced failure mechanisms such as brittle fracture, buckling,

and elastic deformation; (2) strength-induced failure mechanisms such as wear, cracking,

and creep; and (3) stress-increased failure mechanisms such as fatigue, thermal-shock, and

fretting [11]. A similar classification exists for electrical failure mechanisms. Thus, fail-

ures of these systems can be analyzed by identifying the relevant failure mechanisms from

the physical evidence, deducing the set of applicable physical faults, and then determining

whether those faults were triggered by underlying design faults or by factors external to

the system. The process of identifying underlying design faults is further eased by the fact

that many structural, mechanical, and electrical systems are adaptations of existing high-

level designs, which allows investigators to exploit knowledge of similar systems in their

analysis or concentrate their expertise in a particular type of system [37].

2.3.3. Failures of Digital SystemsIn contrast to mechanical and electrical systems, software is a design abstraction, and

so failures of digital systems that arise from software errors can occur with no associated

physical faults [9, 12, 15]. Analyzing such a failure thus requires one to work directly


from the failure to the underlying design faults, “which are harder to visualize, classify,

detect, and correct” than physical faults [38]. Further complicating matters is the fact that

most digital systems exhibit interactive complexity and tight coupling, both internally and

with other systems, which can lead to unexpected behavior stemming from these

interactions [39]. Johnson uses the term “emergent properties” to refer to this

behavior [40], and Perrow argues that systems with emergent properties are prone to acci-

dents involving “the unanticipated interaction of multiple failures” of components, which

he calls system accidents [41]. Not only are system accidents nearly inevitable, but the

interactions between failures may hinder attempts to diagnose them.

To overcome some of the this difficulty, Laprie and Kanoun classify software failure

modes according to whether they pertain to values or timing, the way in which they are

perceived by different users, and their consequences on the environment [12], but there is

no correspondence between these failure modes and the underlying errors. Lacking an

effective classification of software failure modes, investigators must instead learn the fail-

ure modes and their associated symptoms and causes for each system they analyze, which,

without guidance, can be an exhaustive effort.

2.4. Techniques for Analyzing Digital-System FailuresThe same aspects of software and digital systems identified by Littlewood and Strigini

that make these systems difficult to design for dependability also make them difficult to

analyze when they fail [15]. Since failures of these systems typically occur due to design

faults, which are introduced during development, an attractive method of diagnosing them

is to trace the failures to errors committed in the development lifecycle. Existing software

engineering techniques are not well-suited to this task because they tend to treat compo-


nents in isolation according to the principle of functional decomposition [42], but the

complexity and coupling noted above can lead to accidents caused by interacting failures

of multiple components. Debugging quickly becomes infeasible as system complexity

grows [43]. More sophisticated analysis tools such as formal verification can be used to

represent or confirm a theory of system failure, but they cannot attest to the completeness

of that theory [44]. Model checking has been shown to be effective at detecting potentially

hazardous states in complex systems, but it has not been applied to failure analysis [45].

Aside from the inapplicability of existing software engineering techniques, Johnson

identifies the following additional factors that complicate the analysis of digital system

failures:

• The lack of an accepted stopping rule for framing the analysis;

• The assessment of the developer’s intentions;

• The influence of contextual factors on development; and

• The challenge of issuing relevant and practicable recommendations [44].

Framing an analysis refers to determining which aspects of a system’s development

and maintenance history might have contributed to a failure as well as the evidence that

would be required to confirm or exclude them as contributors. If the scope of an investiga-

tion is too shallow, it might only address superficial issues that are symptomatic of deeper,

more fundamental problems. Investigators might also consider the intentions behind

development decisions—why those decisions were believed to enhance or not adversely

affect safety—as well as contextual factors that might have influenced the developer’s

actions. Finally, investigators should be confident that the recommendations they issue

address the problems discovered during the analysis and are feasible to implement.


Several techniques exist for analyzing system failures; well-known examples include

Events and Causal Factors Analysis, Why-Because Analysis, STAMP, and PARCEL.

Each of these techniques is summarized below.

2.4.1. Events and Causal Factors AnalysisMany traditional techniques for investigating accidents are based upon Events and

Causal Factors Analysis (ECFA), which is a lightweight technique for constructing causal

chains and event sequences that is intended to augment a more comprehensive analysis.

ECFA decomposes accidents into events and conditions, which are represented diagram-

matically as rectangles and ovals, respectively, on an ECF chart. The accident itself is rep-

resented as an event, typically the final event in the sequence. In applying ECFA,

investigators work backward from the accident and identify the events and conditions that

caused or influenced it, including “performance errors, changes, oversights, and

omissions” [46]. These events and conditions are then added to the ECF chart and con-

nected to the accident by arrows to indicate a causal relationship, and the process is

repeated iteratively until the investigators are confident that they have identified the root

causes of the accident. An excerpt of a simple ECF chart taken from the U.S. Department

of Energy’s documentation on the technique is presented in Figure 2.4 [46].

Johnson surveyed a number of techniques for conducting accident analysis that are

based on ECFA [42]. Multilinear Events Sequencing (MES) augments ECFA by introduc-

ing the dimensions of time as well as multiple actors, which allows it to capture linear

interactions between events. Sequentially Timed and Events Plotting (STEP) is a refine-

ment of MES that does away with conditions, which its creators argued were being abused

and biasing the analysis, and is intended to provide a highly pragmatic approach to acci-


dent analysis. Finally, Management Oversight and Risk Tree (MORT) is a technique

intended to be used alongside ECFA that provides investigators with a hierarchical check-

list of possible causal factors to consider in investigating an accident. As with ECFA itself,

however, these techniques are generic and do not account for the special characteristics of

digital systems described earlier that complicate failure analysis.

2.4.2. Why-Because AnalysisWhy-Because Analysis (WBA), developed by Ladkin and Loer, is a method of devel-

oping rigorous, highly formal event-chain models of accidents and incidents based on

Lewis’s counter-factual reasoning [43]. WBA begins with a semi-formal graphical recon-

struction of an accident’s event sequence, which is somewhat similar to an ECF chart. Fig-

ure 2.5 provides an example of a WBA graph taken from Ladkin and Loer’s analysis of a

1993 commercial aviation accident in Warsaw, Poland [43, 7]. The WBA graph is then

translated into temporal, counter-factual logic and verified for logical consistency and suf-

ficiency. The result of this analysis is a highly rigorous causal explanation as to why an

Figure 2.4: Example ECF Chart

Boy suffered serious injuries.

Truck crashes into parked

car.

Boy stayed in truck.

Truck rolled down hill.

Accidentally released brakes.

Boy could not control truck.

Did not know how

Afraid to jump free


event occurred. WBA does not address the issue of what evidence is needed to establish

the event sequence, nor does it suggest where recommendations should be issued to pre-

vent the causal chain from recurring (although one might infer some of this information

from the graphical depiction of the event sequence). Thus, WBA is useful for those who

have a hypothesis as to why an accident occurred and wish to verify its soundness.

2.4.3. STAMPECFA and its related techniques and WBA each model accidents as event sequences

stemming from root causes. Leveson argues that techniques that rely on a linear notion of

causality “do not fit the complex, software-intensive systems we are attempting to build

Figure 2.5: Example WBA Graph

3.1AC hits earth bank

3.1.1.3Braking delayed

3.1.1.3.1Wheel braking

delayed

3.1.1.2unstabilized approach

3.1.2earth bank in overrun path

3.1.1AC overruns RWY

3.1.2.1built by airport

authority for radio equipment

3.1.1.2.1CRW’s actions

3.1.1.1excessive speed

on landing


today which often involve human-machine systems-of-systems with distributed decision-

making across both physical and organizational boundaries” [47]. Instead, she contends

that accidents should be “viewed as the result of a lack of constraints imposed on the sys-

tem design and on operations” and offers an accident model and process for doing so [39].

Leveson’s Systems-Theoretic Accident Model and Process (STAMP) is an accident

model grounded in control theory in which accidents are viewed as dysfunctional interac-

tions between system components resulting from inadequate constraints on system behav-

ior or inadequate enforcement of those constraints [39]. STAMP models systems and

accidents as control and informational feedback loops in order to depart from event-chain

models and seek out management, organizational, governmental, and other social-techni-

cal factors that might have influenced the event sequence preceding an accident. It classi-

fies accidents as arising from either inadequate enforcement of constraints on system

behavior, inadequate execution of control actions, or inadequate or missing feedback.

To illustrate how STAMP might be applied to an accident, Figure 2.6 presents a basic

control loop taken from Leveson and Dulac involving an automated controller, a con-

trolled process, and a human supervisor [47]. The human supervisor operates the auto-

mated controller, which in turn effects state changes in the controlled process via a set of

actuators. Similarly, sensors monitoring the controlled process report information to the

automated controller, which processes and relays this information to the human. Both the

human and the automated controller have a model of the controlled process, but the

human’s model is largely based on his or her interactions with the automated controller.

Inconsistencies between the human’s model of how the automated controller commands

the actuators or processes input from the sensors and how these functions are actually per-


formed are a potential source of hazards because the human might assume that the control-

ler is enforcing safety constraints on the process that it is not. It is by identifying these

inconsistencies that STAMP discovers dysfunctional interactions between system compo-

nents, but STAMP’s analysis extends beyond the system itself into the socio-technical

control structure surrounding the system’s development and operation.

2.4.4. PARCELJohnson and Bowell’s PARCEL (Programmable Electronic Systems Analysis for Root

Causes and Experience-based Learning) is an accident analysis technique specifically

developed for investigating failures of programmable systems that “traces the causes of

adverse events back to the lifecycle phases a common requirements of the IEC 61508

standard” [48]. As Figure 2.7, taken from Johnson [48], illustrates, PARCEL is actually a

pair of causal analysis techniques. The first is a lightweight flowcharting technique that

can be applied relatively quickly to suggest typical potential causal factors; this approach

might be applied to low-cost accidents or incidents. The second approach involves a more

complex analysis based upon ECFA that could be reserved for incidents with a higher risk

Figure 2.6: Typical Control Loop

Human Supervisor(Controller)

Model of Process

Model of Automation

Automated Controller

Model of Process

Model of Interfaces

Controlled Process

Sensors

Actuators

Displays

Controls

MeasuredVariables

ControlledVariables

ProcessInputs

ProcessOutputs

Dis

turb

ance

s


of recurrence. Both of the techniques map possible causal factors in the accident or inci-

dent to lifecycle phases in system development as specified in the IEC 61508 standard.

Casual factors are classified according to a taxonomy developed from the same standard.

Although PARCEL is based upon the IEC 61508 standard, Johnson envisions adapting it

to other popular standards such as DO-178B and DS 00-55/56.

2.5. Chapter SummaryUnlike their analog counterparts, digital systems typically implement radically new

designs that exhibit complex, nonlinear interactions between system components. The

emergent properties that can arise from these interactions, combined with the difficulty of

Figure 2.7: PARCEL Investigation Schemes

A: Information elicitation(standard report forms)

Reconstruct incident(ECF modeling)

Distinguish causal factors(Counterfactual reasoning)

Root cause classification(using IEC 61508 lifecycle and

common requirements)

Simplified flowcharting(using preset questions leading

to IEC 61508 lifecycle and common requirements)

C: Generation of recommendations

B: Causalanalysis

More complex / higher risk mishaps

Simpler / lower riskmishaps


identifying and addressing design faults, complicate safety assurance of digital systems as

well as failure analysis. Specialized techniques such as WBA, STAMP, and PARCEL

have been developed to address these problems, but there are still the issues of identifying

the safety-related aspects of a system’s design and development history, constructing the-

ories of how these aspects might have contributed to a failure, and determining the evi-

dence that should be elicited in order to confirm or refute those theories. The system safety

case offers a solution to these problems by documenting the developer’s rationale for

claiming that a system is acceptably safe to operate. The next chapter examines current

investigative techniques in greater detail and explains why they are not well-suited to treat

failures of digital systems, and Chapter 4 describes how framing failure analysis around a

system’s safety case can systematically address each of the issues noted above.

Chapter 3

Software Accident Investigation

Chapter 2 concluded that the radically new designs and complex interactions that digi-

tal systems typically exhibit complicate the safety assurance and failure analysis of digital

systems. Using the National Transportation Safety Board’s (NTSB) practices as an exam-

ple, this chapter examines the techniques used to investigate major commercial aviation

accidents, which invariably involve failures of complex systems, and it shows why these

techniques are not well-suited to treat accidents involving failures of digital systems. It

then presents an example of a commercial aviation accident involving a digital system to

illustrate how such accidents are investigated.

3.1. The National Transportation Safety BoardCommercial aviation is a complex system consisting of thousands of aircraft, airports,

and air traffic control facilities distributed globally. Each of these elements is itself a com-

plex system employing structures, electrical systems, automation, and human operators.

Thousands of civil-aviation accidents have occurred, and notable accidents have prompted

the development of important safety improvements. Thus, commercial aviation provides a

30

Software Accident Investigation 31

good example of a complex system that has been vetted through accident investigations

and for which investigative techniques have been developed.

The National Transportation Safety Board (NTSB) is the federal agency in the United

States charged with investigating “every civil aviation accident in the United States and

significant accidents in the other modes of transportation—railroad, highway, marine, and

pipeline” [49]. The Board has investigated approximately 124,000 aviation accidents and

10,000 surface transportation accidents since its inception in 1967, and it has issued more

than 12,000 safety recommendations, 82% of which have been adopted [49]. It is regarded

as “the most important independent safety investigative authority in the world” and “the

international standard” with respect to the caliber of its investigations [50].The Board also

assists in investigations overseas involving U.S.-registered aircraft or equipment of U.S.

manufacture, it maintains the federal government’s database of civil aviation accidents,

and it and conducts studies of transportation safety issues.

3.2. NTSB Investigations of Major Aviation AccidentsWhile the NTSB is charged with investigating all civil aviation accidents, the bulk of

its resources are devoted to major accidents—generally those involving fatalities and

damage to commercial aircraft. The basic elements of the Board’s investigation into a

major accident are the “Go Team,” the party system, safety recommendations, the public

hearing, and the final report [51]. Each of these elements is summarized below.

3.2.1. The Go TeamWithin hours of notification of a major accident, the NTSB forms a Go Team that

reports to the accident scene as quickly as possible to begin the investigation. The Go


Team represents a variety of technical expertise; the NTSB lists the following eight disci-

plines as areas of expertise typically represented by the Go Team [51]:

• Operations: Analysis of the history of the flight and crew members’ duties.

• Structures: Documentation of the airframe wreckage and the accident scene.

• Powerplants: Examination of engines (and propellers) and engine accessories.

• Systems: Study of components of the plane’s hydraulic, electrical, pneumatic and

associated systems, together with instruments and elements of the flight control

system.

• Air Traffic Control: Reconstruction of the air traffic services given the plane,

including acquisition of ATC radar data and transcripts of radio transmissions.

• Weather: Collection of weather data for a broad area around the accident scene.

• Human Performance: Study of crew performance and all before-the-accident fac-

tors that might be involved in human error.

• Survival Factors: Documentation of impact forces and injuries, evacuation, com-

munity emergency planning and all crash-fire-rescue efforts.

The expert associated with each discipline chairs a working group to investigate that

aspect of the accident, and the overall investigation is headed by the Investigator-In-

Charge (IIC). Upon completing their on-scene investigation, the working groups may

move to a manufacturer’s facility for follow-up analyses or return to the NTSB’s head-

quarters in Washington, D.C. to complete their analysis. Each working group prepares a

factual report covering that group’s aspect of the investigation. NTSB staff members then

perform any subsequent analysis on the factual reports and prepare a proposed final report

that is delivered to the Board for deliberation.


3.2.2. The Party SystemThe NTSB is staffed by about 400 employees and investigates approximately 2,000

aviation accidents each year [51]. To manage this burden, the Board names parties to its

investigations who participate in the working groups and assist in developing and verify-

ing the accuracy of the factual reports. Aside from the Federal Aviation Administration

(FAA), which by law is always a party to an investigation, the Board exercises complete

discretion over who may participate in an investigation. “Only those organizations or cor-

porations that can provide expertise to the investigation are granted party status and only

those persons who can provide the Board with needed technical or specialized expertise

are permitted to serve on the investigation” [51].

3.2.3. Safety RecommendationsThe NTSB issues safety recommendations during the course of an investigation to

address safety deficiencies as they are discovered. Safety recommendations may be issued

before an investigation is complete, and they need not correspond to what the Board deter-

mines to be the probable cause of an accident. To date, the NTSB has issued safety recom-

mendations to over 2,200 recipients, and, while non-binding, most of the Board’s

recommendations have been implemented [49].

3.2.4. The Public HearingThe NTSB may convene a public hearing as part of its investigation into an accident to

gather sworn testimony and to invite the public to observe the Board’s progress. The

NTSB has the authority to subpoena witnesses to testify at the hearing.

3.2.5. The Final ReportThe final report is the product of the NTSB’s investigation into an accident. It contains

the contents of the factual reports prepared by the working groups, subsequent analyses


performed by NTSB staff, findings, recommendations, and the Board’s determination of

probable cause. Other parties to an investigation may participate in the factual analysis

and verify the accuracy of the Board’s analyses, but only NTSB staff members perform

the subsequent analysis and preparation of the final report. Once the report is completed, a

draft is submitted to the full Board for review and approval. Final reports are typically

published several weeks after receiving approval from the Board. Figure 3.1 lists the

major sections of the final report, which correspond to the International Civil Aviation

Organization’s (ICAO) formatting recommendations [52].

3.3. Treatment of Digital SystemsThe NTSB’s investigative process decomposes accidents into the eight disciplines

listed in section 3.2 and assigns a working group to each discipline. Digital systems are

• Factual Information• History of the flight• Injuries to persons• Damage to aircraft• Personnel information• Aircraft information• Meteorological information• Aids to navigation• Communications• Aerodrome information• Flight recorders• Wreckage and impact information• Medical and pathological information• Fire• Survival aspects• Tests and research• Organizational and management information• Additional information• Useful or effective investigation techniques

• Analysis

• Conclusions

• Safety Recommendations

• Appendices

Figure 3.1: Major Sections of the Final Report


primarily the domain of the Systems Group, which also investigates hydraulic, electrical,

and pneumatic systems. As with each of the groups, the NTSB’s Aviation Investigation

Manual provides a checklist of items to be examined by the Systems Group as part of its

on-scene investigation. Figure 3.2 presents a portion of this checklist, which focuses on

gathering factual information from the accident scene [53]. Checklist items that pertain to

digital systems are concerned with recording control settings and indicator positions and

recovering the contents of non-volatile memory. Based on this information, investigators

attempt to discover system failure modes that might have contributed to an accident.

• Air Conditioning - air cycle equipment, valves, bearings, impellers, ducting connec-tions, thermocouples, switches.

• Auto Flight - cockpit control settings, servomotors.

• Communications - operation and indications.

• Electrical Power - wire integrity (continuity, shorts, arcing), switches, circuit breakers, electric generators.

• Fire Protection - extinguisher bottles, discharge indications.

• Flight Controls - pre-impact position and integrity, travel of control surface, control cable continuity.

• Hydraulic Power - hydraulic fluid quantity and quality, valves, pumps, filters, tubing.

• Ice and Rain Protection - anti-icing ducting, wiper controls.

• Instruments - needle imprints, internal gears, non-volatile memory.

• Landing Gear - actuators, up/down locks.

• Lights - light bulb filaments, interior/exterior lights.

• Navigation - frequencies, control knob positions.

• Oxygen - crew/passenger oxygen bottles, lines, generators.

• Pneumatics - ducting, joints.

• Vacuum/Pressure...Determine whether electronics/avionics may have recoverable memory. Recover electronics/avionics with 12-18 inches of wire harness, rather than simply unracking the avionics box. Onlydisconnect at the plug connection if airplane is salvageable and memory retrieval is not possibleor necessary.

Figure 3.2: Excerpt of the Systems Group Checklist


The NTSB’s reliance on evidence from wreckage and the discipline-oriented structure

of its investigations limit the Board’s ability to thoroughly investigate accidents involving

digital systems. In 1999, the RAND Institute for Civil Justice performed an independent

review of the NTSB’s readiness to investigate future commercial aviation accidents. In its

report, RAND cited the hidden design and equipment defects software and electronic sys-

tems might exhibit as “areas of increasing concern” because these systems are replacing

mechanical components [50]. The report noted that the NTSB’s reliance on evidence

obtained from wreckage is problematic because, in the case of digital systems, “failure

states can be ‘reactive,’ leaving no permanent state in the wreckage.” The report also pre-

dicted that, as system complexity increases, so too will the complexity of accidents, and

the NTSB’s discipline-oriented structure might not be suited for accidents that present

investigators with new failure modes that require a multidisciplinary approach.

3.4. American Airlines Flight 965The American Airlines flight 965 accident near Cali, Columbia provides an example

of how commercial aviation accidents involving digital systems are investigated.

Although the accident was investigated by the Aeronautica Civil of the Republic of

Columbia, the Colombian counterpart of the NTSB, the agency employs an investigative

process similar to that of the NTSB, and the NTSB assisted in the investigation.

3.4.1. Accident DescriptionOn December 20, 1995 at 21:42 local time, American Airlines flight 965, a Boeing

757-223, impacted mountainous terrain near Buga, Columbia while performing a descent

to Alfonso Bonilla Aragon International Airport in Cali, Columbia. Of the 163 persons


aboard the aircraft, four passengers survived, and the airplane was destroyed. Visual mete-

orological conditions prevailed at the time of the accident [8].

3.4.2. Flight HistoryAmerican Airlines flight 965 was operated as a regularly scheduled passenger flight

from Miami International Airport under an instrument flight rules (IFR) flight plan. It

departed Miami at 18:35 with an estimated flight time of 3 hours, 12 minutes. At approxi-

mately 21:26, the flight began its descent to Cali, and at 21:34 the flight crew contacted

the Cali approach controller.

The typical approach procedure for the Cali airport was to land on runway 1, which for

a northerly arrival entailed flying past the airport and turning back [54]. The flight crew

had already programmed this procedure into the aircraft’s flight management system

(FMS) before arriving in the vicinity of the airport. Since the weather at Cali was clear,

however, the approach controller offered American Airlines flight 965 a visual approach

to runway 19, which the flight crew accepted. Landing on runway 19 would allow the

crew to execute a “straight-in” approach and avoid having to fly past the airport and turn

back to land. While preferable, this procedure afforded the crew less time to land and thus

required them to expedite their descent.

The flight crew of American Airlines flight 965 were unfamiliar with the arrival pro-

cedure for runway 19. The procedure involved flying to a navigational waypoint named

“Rozo,” which was abbreviated “R” on their aeronautical charts. As either the captain or

the first officer programmed the new arrival procedures into the FMS, he selected “R” as

one of the waypoints for the approach; however, this “R” referred to a different waypoint

named “Romeo” located about 132 miles northeast of Cali. When the FMS executed this


procedure, the aircraft turned away from the airport and began to fly an easterly heading,

but the flight crew did not notice this deviation for about 90 seconds, at which point they

turned back toward the runway. The aircraft continued to descend during this interval,

however, and, as a result of the deviation, had descended into a valley. The final turn back

toward the runway placed the aircraft in conflict with terrain.

At 21:41:15, the ground-proximity warning system announced, “Terrain, terrain,

whoop, whoop.” The flight crew attempted a full-power climb to escape the mountainous

terrain, but the maneuver was executed incorrectly as the spoilers, which had been

extended for landing, were not retracted. The escape maneuver was unsuccessful, and at

about 21:42 the aircraft impacted terrain approximately 35 miles northeast of Cali.

3.4.3. Findings of the InvestigationAeronautica Civil attributed the probable cause of the American Airlines flight 965

accident to the flight crew’s performance, but it cited the behavior of the FMS as a contrib-

utory factor; specifically:

• “FMS logic that dropped all intermediate fixes from the display(s) in the event of

execution of a direct routing.

• FMS-generated navigational information that used a different naming convention

from that published in navigational charts” [8].

As a result of its findings, Aeronautica Civil issued safety recommendations to the

Federal Aviation Administration, the ICAO, and American Airlines. Ten of the 21 recom-

mendations pertain to Aeronautica Civil’s findings associated with the FMS.


3.4.4. Recovery of FMS DataIn its discussion of the wreckage found at the scene of the accident, Aeronautica

Civil’s report states:

“Numerous circuit cards and other parts that were considered likely to contain

non-volatile memory were retrieved from the site, packed in static free material,

and shipped to the United States for read out at the facilities of their manufacturers.

With the exception of one circuit card from the Honeywell-manufactured FMC

[flight management computer], the material either did not contain non-volatile

memory or was too severely damaged to permit data retrieval [8].

The report goes on to discuss the analysis of the FMC circuit card at a Honeywell facility:

“After the components were cleaned for laboratory examination, it was found that

the FMC contained a printed circuit card that had two non-volatile memory inte-

grated circuits. Data recovered from the integrated circuits included a navigation

data base, guidance buffer, built in test equipment (BITE) history file, operational

program, and other reference information. A load test of the FMC memory showed

that the operational software and navigational data were current for the time of the

accident [8].

The contents of the FMS memory revealed the programmed route at the time of the

accident, including the entry for “R.” Based on these data and with the assistance of Boe-

ing and Honeywell, investigators were able to reconstruct the flight crew’s cockpit dis-

plays at the time of the accident. From the reconstruction, investigators learned of the

flight crew’s apparent confusion concerning the waypoint to which “R” referred and the

discrepancy between the use of this abbreviation in the published approach procedures, in

which it stood for “Rozo,” and the FMS navigational database, in which it stood for


“Romeo.” Aeronautica Civil’s findings and recommendations concerning flight manage-

ment systems were based on its analysis of the FMS data recovered from the accident.

3.4.5. DiscussionThe recovery of the FMS data from the wreckage was a major find, but it was also

quite fortuitous. As the report noted, several circuit boards were recovered, but most were

damaged too severely to be analyzed or contained no useful data. That one of the FMS cir-

cuit boards was recovered in-tact and contained data that proved invaluable to the investi-

gation was not by design but rather by coincidence: the board could have been destroyed

or its memory contents corrupted in the crash. In the absence of other obvious clues that

the FMS contributed to the accident, investigators might have instead attributed the acci-

dent solely to the flight crew’s performance and overlooked the automation altogether.

Hence, while the American Airlines flight 965 accident highlights an instance in which

applying traditional techniques to failures involving digital systems was successful, it also

underscores the need for a systematic process for analyzing these failures in the future.

3.5. Chapter SummaryThe National Transportation Safety Board is one of the world’s foremost investigative

agencies, and its process for investigating major aviation accidents provide a model of

how complex system accidents are analyzed. The NTSB’s process decomposes accidents

by discipline and relies heavily on facts gleaned from the scene of an accident to deter-

mine the probable cause and contributory factors. This approach is not well-suited to treat

accidents involving digital systems because the complexity of a digital system may

present investigators with new failure modes that undermine the structured, discipline-ori-

ented approach and because there is no guarantee that a digital-system failure will produce


evidence that can be recovered from the scene of an accident. The American Airlines

flight 965 accident illustrates this point: although half of the investigation’s recommenda-

tions pertained to automation issues, the discovery of these problems hinged on a single

circuit board that was recovered from the wreckage. The accident underscores the need for

a systematic approach to discovering failure modes and eliciting evidence.

Chapter 4

Pandora

An important feedback loop exists between system development and failure analysis.

Systems are developed and operated with unknown design faults, and those faults are dis-

covered as failures are observed. Lessons are derived from the failures, which are then

incorporated into development and operational practices in order to mitigate the introduc-

tion of design faults in future systems. The system safety case is a part of this feedback

loop because it must be updated after a failure has been analyzed to account for the design

faults or errors in the development lifecycle that were identified, but it can play a much

greater role by guiding the analysis itself. This chapter introduces Pandora as an approach

to analyzing failures of digital systems in which the analysis is framed around the safety

case, and it explains the benefits and limitations of doing so.

4.1. The Enhanced Safety-Case LifecycleKelly and McDermid developed a process they refer to as safety-case lifecycle for

updating the system safety case in response to a recognized challenge to its validity. Their

lifecycle is illustrated in Figure 4.1 [25, 55]. Challenges may arise in the form of changes

42

Pandora 43

to regulatory requirements, system design changes, modified assumptions, and counter-

evidence obtained from operational experience including observed failures. They assume

that the challenges to a safety argument have been recognized prior to the application of

their process, and they do not address the problem of deriving challenges from an

observed failure. To complete their lifecycle, an additional step is needed in which failure

analysis is performed on the safety case to discover the fallacies in the safety argument

that contributed to a failure so that the argument may then be recovered. This refined pro-

cess is the enhanced safety-case lifecycle [56].

The enhanced safety-case lifecycle, illustrated in Figure 4.2, augments the core lifecy-

cle so that the safety case guides failure analysis and serves as the medium through which

lessons and recommendations from the analysis are communicated [56]. As a safety-

related system is developed, a safety argument is prepared that shows: (1) why confidence

should be placed in the system’s ability to meet its safety requirements; and (2) why those

requirements entail acceptably-safe operation. Once the system and its safety argument

Figure 4.1: Kelly & McDermid’s Safety-Case Lifecycle

Safety case challenge

GSN challenge GSN impact

Recovery action

Step 1Recognize

challenge tosafety case.

Step 2Express challenge

in GSN terms.

Step 3Use GSN to

identify impact.

Step 4Decide upon

recovery action.

Step 5Recover identified

damaged argument.

Damage Phase Recovery Phase

Pandora 44

are accepted, the system is put into operation, but faults may still exist in the system or in

its operational or maintenance procedures if the safety argument is incomplete. If trig-

gered, these faults may lead to a failure, in which case an analysis will be conduced to dis-

cern the nature of the failure. The failure might be random—within the fault probabilities

specified by the safety requirements—or it might be systemic; that is, due to a design fault

that was introduced when the system was developed or serviced. Failures arising from sys-

temic faults must be thoroughly analyzed to prevent recurrences.

The pre-failure safety case—that is, the safety case for the system as it existed just

prior to the failure—can guide the analysis by leading investigators through the original

argument describing why the system was thought to be safe. Using the evidence obtained

from the failure, investigators correct flaws in the argument as they discover them, which

Figure 4.2: The Enhanced Safety-Case Lifecycle

Pre-failuresafety case Incident

Post-failuresafety case

Failure analysis

Lessons & recommendations

System & process revision

Operation

Pandora 45

are documented as lessons and recommendations, and ultimately produce a revised safety

case for the system called the post-failure safety case. Implementing the post-failure safety

case might require developers to make changes to the system or its associated procedures

as well as changes to development practices so that future systems will not exhibit similar

failures. Through this iterative loop in which the safety case plays a central role, the sys-

tem development process changes over time to reflect the accumulated experience gained

from observing operational systems.

4.2. PandoraThrough the enhanced safety-case lifecycle, the safety case can guide the analysis of

system failures and provide a framework for structuring lessons and recommendations.

The lifecycle does not describe how the safety case should guide the analysis of failure,

however, and so a process for framing the analysis around the safety case is necessary to

ensure that the failure is treated systematically. The Pandora process was developed for

this purpose, and it implements the failure-analysis process specified by the enhanced

safety-case lifecycle.

Pandora is a manual process for analyzing digital system failures in which the analysis

is framed around the system safety case. It is based on the assumption that safety-related

failures result from systemic faults, which the process uncovers by searching for fallacies

in a system’s underlying safety argument. A Pandora analysis guides investigators through

the steps of developing theories of a failure, eliciting counter-evidence to confirm or refute

those theories, and developing lessons and recommendations to prevent recurrences.

Investigators applying Pandora to a system failure examine the system’s pre-failure

safety case by revisiting each of its claims. As investigators revisit each claim, they

Pandora 46

attempt to elicit counter-evidence from the failure refuting the claim’s validity, and in

doing so they discover fallacies in the pre-failure safety case. As each fallacy is identified,

investigators develop lessons and recommendations for removing the fallacies from the

safety argument, and through this process a post-failure safety case is produced. The Pan-

dora approach provides a systematic analysis of safety-related system failures so that

greater confidence may be placed in both the quality of investigations and their findings.

4.2.1. DerivationThe need for a systematic process of analyzing a system’s safety case in response to

observed failure was recognized from a prior analysis of the Korean Air flight 801 acci-

dent and the Minimum Safe Altitude Warning (MSAW) system failure that contributed to

it [56]. Although the analysis produced valuable lessons and recommendations, the ad hoc

manner in which it was performed raised questions concerning its completeness. Thus,

Pandora was developed with the original purpose of providing a systematic process for

discovering fallacious reasoning in safety arguments.

Initially, applying Pandora consisted of a simple traversal of the safety argument to

search for fallacious inferences between claims. A depth-first traversal was chosen so that

the validity of each claim would be assessed after the validity of its premises had already

been verified, which resulted in a bottom-up approach for affirming the validity of the

safety argument as a whole. For large arguments, this approach was inefficient because it

considered irrelevant claims, and so a top-down analysis was added to confirm the rele-

vance of a claim prior to examining its validity and that of its premises. The current form

of Pandora incorporates both analyses: a top-down analysis to confirm the relevance of the

claims in the argument and a bottom-up analysis to assess their validity.

Pandora 47

The analysis of the individual claims in the argument was based initially on Damer’s

criteria of relevance, acceptability, and sufficiency for establishing a good argument [57].

Relevance was evaluated during the top-down portion of the analysis by comparing the

reasoning in the argument to Damer’s taxonomy of logical fallacies, and acceptability and

sufficiency were evaluated in the same fashion during the bottom-up portion. This

approach contained a number of flaws, however. First, the distinction between acceptabil-

ity and sufficiency was often subtle in the context of safety argumentation, and preserving

this distinction provided no value. Second, Damer’s taxonomy included types of fallacies

that did not pertain to safety argumentation such as emotional appeals and fallacies

describing acts of willful deception. Finally, basing the analysis solely on Damer’s taxon-

omy neglected evidence obtained from the failure that refuted the claim. To address these

flaws, a customized taxonomy of safety-argument fallacies was developed, which is dis-

cussed in Chapter 5, and a separate step of eliciting counter-evidence from the failure was

added to the approach. These modifications gave rise to the current form of Pandora,

which is described in the sections that follow.

4.2.2. Process OverviewFigure 4.3 presents the core Pandora process, which begins with a pre-failure safety

case and an observed system failure. For each claim in the safety argument and starting

with the top-level claim, investigators applying Pandora first consider the relevance of that

claim’s premises. Premises that do not support the claim are not considered further. This

pruning process is repeated for each of the remaining premises, leading to a depth-first

recursion through the safety argument. Once investigators have established the relevance

of the supporting premises, they then take the premises together and evaluate the suffi-

Pandora 48

ciency of the argument supporting the claim. If premises are missing or if counter-evi-

dence elicited from the failure refutes the claim, then the argument is insufficient, and the

investigators invalidate the claim.

When investigators applying Pandora discover an insufficient argument, they repair

the argument by either amending the premises or, if the claim is unsupportable, by modi-

fying or deleting the claim and making repairs elsewhere. The investigators then resume

consideration of the consequent (parent) claim in the argument. In the final step of the pro-

Figure 4.3: The Pandora Failure-Analysis Process

Pre-failuresafety case

Choose top-level

claim.

Prune irrelevant or unacceptable premises from

claim.

Elicit evidence refuting sufficiency

of premises.

Premises sufficient?

Unvisited premises?

Choose next

unvisited premise.

Document fallacious argument.

At top-level claim?

Publish results.

Return tosubsequent claim.

Claim supportable?

Add or broaden premises to

achieve sufficiency.

Delete claim.(Convert to true.)

Lessons & recommendations

No

Yes

No

Yes

No

Yes

Yes NoPost-failuresafety case

Failure

Pandora 49

cess, investigators consider the sufficiency of the top-level claim of the safety argument

after they have either affirmed or restored the logical validity of each of its premises.

Upon concluding their traversal through the safety argument, the investigators have

produced a set of lessons documenting the fallacies in the pre-failure safety case that

might have contributed to the failure, a set of recommendations for removing those falla-

cies from the argument, and the post-failure safety case: a prototype of the pre-failure case

into which the recommendations have been incorporated. With some revisions, the post-

failure safety case is intended to become the new operational safety case for the system.

Implementing the post-failure safety case may impose new requirements on the system or

otherwise entail changes to the system’s design or associated operational or maintenance

procedures.

The following sections explain how Pandora may be used to detect fallacious reason-

ing in safety arguments and repair the arguments when fallacies are discovered.

4.2.3. Detecting Fallacious ReasoningA safety case might contain premises that are irrelevant to the claims it supports. Irrel-

evant premises are those which neither increase nor decrease the probability that the

claims they support are true; they are sometimes referred to as non sequiturs [57]. The

mere existence of an irrelevant premise in a safety argument cannot contribute to a failure;

rather, these premises might mislead a developer or reviewer into accepting an insufficient

argument, which, in turn, may contribute to a system failure. Hence, Pandora requires

investigators to prune irrelevant premises from the safety argument early in the process.

Even if the arguments supporting the pruned claims are further flawed, repairing them is

futile because they ultimately support inconsequential conclusions. Pruning irrelevant pre-

Pandora 50

mises from the safety argument improves the efficiency of Pandora because it reduces the

amount of evidence investigators must collect.

While irrelevant premises might hinder one’s ability to evaluate an argument, insuffi-

cient arguments in a system’s safety case are likely to be true indicators of safety-related

flaws in the system. An argument is insufficient if it provides too little evidence in support

of its claim, if the evidence is biased, or if it fails to provide evidence that would be

expected for its particular claim [57]. To see why the former statement is true, consider a

typical top-level claim of a safety case that a system is acceptably safe to operate in its

intended context. If the system fails, and the failure is assumed to be systemic, then clearly

the top-level claim has been violated. It is thus the case that at least one of the premises

supporting the top-level claim was not satisfied or that the premises together do not pro-

vide adequate support for the claim. The same can be said of each premise for which

counter-evidence is obtained.

In a Pandora analysis, the investigator considers the sufficiency of the argument sup-

porting a claim after establishing the relevance of the claim and either affirming or restor-

ing the validity of the arguments supporting the claim’s premises. Sufficiency is evaluated

in two ways: first by checking whether the argument fits a known pattern (or AntiPattern)

and has omitted expected evidence; and second by eliciting counter-evidence from the

failure suggesting that the claim was not satisfied despite the truth of its premises. Investi-

gators are asked to consider, for a given claim, whether there exists evidence from the fail-

ure that the claim was unsatisfied. If so, then the argument’s premises provide inadequate

support for the claim, and the argument is insufficient and must be repaired.

Pandora 51

If the argument being considered fits a known pattern, then investigators can use the

pattern to determine whether the argument has provided the appropriate types and volume

of evidence to satisfy the claim. To support this step, as part of this research a taxonomy of

common safety-argument fallacies was developed based upon an assessment of existing

general-purpose fallacy taxonomies and an analysis of fallacies committed in real-world

safety-arguments. The taxonomy organizes fallacies into categories according to the types

of arguments in which they typically appear, which allows investigators to quickly deter-

mine the set of fallacies that might pertain to the arguments they consider. Chapter 5 dis-

cusses the safety-argument fallacy taxonomy and its derivation in greater detail, and

Appendix A documents the taxonomy in its current form.

4.2.4. Eliciting Counter-EvidenceEven if an argument properly instantiates a pattern or if it does not fit any known pat-

terns, it is still possible that the argument is insufficient. To determine whether the argu-

ment is insufficient, investigators must search for counter-evidence indicating that the

claim of the argument was not satisfied even though its premises were. If counter-evidence

is discovered, then there must exist additional premises that were unsatisfied because a

true antecedent cannot imply a false consequent. The counter-evidence may be docu-

mented as an AntiPattern showing how the argument failed to satisfy its claim, which can

then be used to evaluate future arguments as part of the first sufficiency check.

This second method of sufficiency evaluation is essential to Pandora because it allows

investigators to discover new forms of fallacious reasoning in safety arguments and model

them as AntiPatterns to ease future detection. New fallacies may be added to the taxon-

omy and applied to future analyses, and they may be disseminated to developers to edu-

Pandora 52

cate them about fallacies observed in operational safety arguments that they should avoid.

This step is critical in the enhanced safety-case lifecycle.

The role counter-evidence plays in a Pandora analysis differs from the role that evi-

dence plays in the traditional investigative techniques discussed in Chapter 3. Traditional

techniques discover possible failure modes from evidence that is obtained from the acci-

dent scene. Investigators might require several pieces of evidence in order to infer the

existence of a particular failure mode in a system. A piece of evidence that is overlooked

or unavailable could significantly affect the results of an investigation. In a Pandora anal-

ysis, investigators discover possible failure modes as they examine a system’s safety argu-

ment, and they are aware of those failure modes as they elicit counter-evidence. Their

search for counter-evidence is more robust because it is targeted at validating specific

safety claims rather than attempting to infer how the system might have failed.

4.2.5. Repairing the Safety ArgumentOnce a fallacious argument has been discovered in a safety case, the argument must be

repaired. Investigators applying Pandora repair the argument in two phases: they first doc-

ument the fallacious argument as a lesson, and then they develop a recommendation as to

how the argument should be amended in order to satisfy its claim. If the claim cannot be

supported as stated, then the recommendation would weaken the claim or strike it alto-

gether. Investigators using Pandora derive lessons and recommendations systematically

through the requirement that each lesson describe an identified fallacy in the safety case

and each recommendation address one or more lessons, reducing the number of superflu-

ous lessons and recommendations that are produced. If the lessons and recommendations

Pandora 53

are sufficiently general, they may be applied to other Pandora analyses or even to similar

safety cases for systems that have not yet failed.

A lesson in Pandora is an AntiPattern describing how an insufficient argument in the

safety case either failed to provide expected evidence in support of its claim or failed to

address counter-evidence elicited from the system failure. Lessons are generated when-

ever insufficient arguments are discovered. If the fallacy was discovered by matching the

argument to an existing pattern or AntiPattern, then the lesson will show how the argu-

ment fails to meet that pattern or conforms to the AntiPattern. If it was discovered based

upon counter-evidence obtained from the failure, then the lesson will be a new AntiPattern

altogether. Documenting lessons as AntiPatterns helps investigators to describe precisely

the nature of the fallacy that was committed and where the fallacious reasoning appears in

the safety case.

From each lesson a recommendation is developed as to how the argument should be

revised so that it either conforms to the pattern or addresses the counter-evidence. Just as

lessons take the form of AntiPatterns, a recommendation is a pattern showing how the

argument might be modified to address the fallacies identified by the lesson and restore

validity. The recommendation is a suggested strategy for repairing the argument—it does

not have to encompass all possible solutions nor must the system developer implement the

particular strategy contained in the recommendation. A recommendation may address

multiple lessons, but it should correspond to at least one lesson to prevent superfluous rec-

ommendations from being issued. If investigators find that a claim cannot be supported by

adding additional premises to the argument, then they may recommend that it be weak-

Pandora 54

ened or stricken from the argument, which could impact the sufficiency evaluation of con-

sequent (parent) claims.

4.3. DiscussionPandora’s strengths as an approach to analyzing failures of digital systems are:

• Its ability to reveal to investigators which aspects of a system’s design, develop-

ment, operation, and maintenance history are relevant to the analysis;

• Its ability to systematically suggest theories of a failure by revisiting the claims of

a system’s safety argument;

• Its ability to systematically determine the evidence that should be elicited from the

failure to confirm or refute those theories; and

• Its ability to develop lessons and recommendations that are stated in the context of

the original safety argument, which enhances their relevance and ensures that each

recommendation can be traced to one or more lessons.

Pandora is not a conventional technique for analyzing failures in that it does not pro-

duce an event sequence explaining how a failure occurred; rather, it is intended to assist

investigators in constructing the event sequence by suggesting theories and evidence that

should be considered as part of the investigation. Framing the investigation around the

safety case ensures that, at a minimum, the event sequence covers the breadth of topics

concerning a system’s development and operational history that were documented in its

safety case, and from this baseline investigators can decide which topics should be pur-

sued further. Thus, Pandora is compatible with many existing techniques for analyzing

failures of digital systems and may be used alongside them to improve the quality of the

investigation as a whole. Specific ways in which Pandora might be used to complement

Pandora 55

the techniques discussed in Chapter 2 are outlined below along with limitations of the

approach that should be considered when applying it.

4.3.1. ECFA and PandoraEvents and Causal Factors Analysis (ECFA) is a technique for constructing event

chains that is employed by several approaches for investigating accidents, including PAR-

CEL. For systems with a preexisting safety case, Pandora may be used to develop the ECF

chart documenting the event chain and to provide a minimal stopping rule for conducting

the analysis. The root causes identified as part of ECFA could then be mapped back into

safety argument fallacies and addressed in the post-failure argument.

4.3.2. WBA and PandoraWhy-Because Analysis (WBA) is an event-based model like ECFA, but it defines a

rigorous process for modeling the accident sequence, and so it is useful for verifying the

consistency of an accident hypothesis. Pandora may be used to develop the hypothesis

based upon its analysis of fallacies in the pre-failure safety argument, and then WBA

could be applied to verify that the hypothesis is logically consistent and produce a WBA

graph of the accident.

4.3.3. STAMP and PandoraBy focusing on the system safety case, investigators applying Pandora are able to

examine the safety requirements, evidence of safety, and contextual details of a system’s

development while providing a systematic process for detecting faults and removing them.

Pandora and STAMP are complementary techniques, and both may be used in an investi-

gation. This compatibility exists because safety constraints from higher-level control sys-

tems in a socio-technical structure function as safety obligations that lower-level systems

Pandora 56

must conform to by implementing their own lower-level constraints. Safety constraints

appear in a system’s safety argument either as goals or contextual information that frames

the argument, and so they may be analyzed with Pandora. Likewise, the lessons from a

Pandora investigation may be expressed in STAMP since they correspond to inadequate or

inadequately-enforced safety constraints.

4.3.4. PARCEL and PandoraAlthough PARCEL is intended to be used as the driving analysis technique in an

investigation, Johnson notes that other techniques may be used to perform the causal anal-

ysis instead of ECF such as STAMP or WBA [48]. To make the process compatible with

Pandora, the requirements imposed by IEC 61508 could be expressed as a safety-case

template from which the safety argument for the system under investigation is derived.

Pandora could then be applied to analyze the argument, and the lessons and recommenda-

tions from the application of Pandora could then be incorporated into PARCEL. Thus,

investigators could choose which causal analysis techniques they wished to apply accord-

ing to the nature of the system and the needs of the investigation: STAMP to analyze inter-

actions between subsystems, WBA to produce a rigorous proof of causal arguments, or

Pandora to detect fallacious reasoning in safety arguments.

4.3.5. LimitationsThe Pandora approach assumes the existence of a pre-failure safety case, and so it can-

not be applied directly to systems whose safety arguments are undefined or poorly docu-

mented. Most safety-related systems currently in operation do not have explicitly-defined

safety cases, which constitutes a significant limitation of the approach. To make such a

system amenable to analysis, a safety case would have to be derived for it retroactively;

Pandora 57

however, a systematic process for doing so remains unknown. This limitation is an area of

future work and is discussed further in Chapter 8.

The prevalence of software development standards as well as the existence of safety-

case patterns, both of which were discussed in Chapter 2, implies that many digital sys-

tems share similar safety rationale. When a fallacy is discovered in a system’s safety argu-

ment, it is desirable to identify systems that employ similar rationale in order to determine

whether they might be vulnerable to the same failure mode. Most techniques for analyzing

failures of digital systems do not include systems that are related to the failed system but

were not involved in the failure within the scope of their analyses. Pandora also suffers

this limitation, but a possible method of overcoming it is described in Chapter 8.

4.4. Chapter SummaryFailure analysis is an essential part of the design process [1]. The enhanced-safety case

lifecycle describes the manner in which the system safety case guides the analysis of fail-

ures and communicates the changes that must be made to a system’s design as well as to

development and operational practices in order to prevent a failure from recurring. Kelly

and McDermid developed a process for updating the safety case once the lessons from a

failure have been identified [55], and Pandora completes the enhanced lifecycle as an

approach for using the safety case to derive lessons from a failure.

In a Pandora analysis, investigators systematically revisit the claims of a system’s

safety argument in the context of a failure and elicit counter-evidence from the failure

refuting the claims’ validity. They then repair the argument by removing the fallacies that

were identified. The products of this analysis are a revised safety argument and a set of

lessons and recommendations documenting the fallacies that were discovered in the safety

Pandora 58

argument and the strategies for addressing them. Pandora is compatible with existing tech-

niques for analyzing digital-system failures, but a system must have a preexisting safety

case for Pandora to be applicable.

Pandora relies upon a taxonomy of safety-argument fallacies to support its analysis;

Chapter 5 describes the taxonomy and its derivation. Chapter 6 reports on a case study in

which Pandora was applied to a series of accidents involving a digital system.

Chapter 5

Safety-Argument Fallacies

Safety cases have evolved as a valuable approach to arguing that a system is accept-

ably safe to operate, and they have been adopted in several application domains [25]. Prior

analyses of accidents involving safety-critical systems, including the minimum safe alti-

tude warning (MSAW) system discussed in Chapter 1, suggested that system safety argu-

ments sometimes invoke incomplete or inherently faulty reasoning [56]. These fallacies, if

undetected, could lead to overconfidence in a system and the false belief that the system’s

design has obviated or will tolerate certain faults. Chapter 4 introduced Pandora as a pro-

cess for identifying and removing safety-argument fallacies and alluded to the safety-

argument fallacy taxonomy as a tool to assist investigators in applying Pandora. This

chapter presents the taxonomy, its derivation, and the rationale for its design, and Chapter

7 reports on a controlled experiment to evaluate its effect on fallacy detection rates.

5.1. Fallacies in System Safety ArgumentsFrom prior analyses of failed systems, including the MSAW system described in

Chapter 1, it was hypothesized that system safety arguments exhibit common types of fal-

59

Safety-Argument Fallacies 60

lacious reasoning [56]. To test this hypothesis, the author collaborated with Michael Hol-

loway of NASA Langley Research Center to collect a sample of industrial safety

arguments and then review each of the arguments in the sample for fallacies [58]. To

obtain the sample, the reviewers conducted a survey of publicly available safety argu-

ments, which yielded eight safety arguments in the disciplines of air traffic management,

automotive engineering, commuter rail transit, electrical engineering, nuclear engineering,

and radioactive waste storage [59]. Of these arguments, the reviewers selected three safety

arguments for inclusion in the sample: the EUROCONTROL (EUR) Reduced Vertical

Separation Minimums (RVSM) Pre-Implementation Safety Case [60], the Opalinus Clay

geological waste repository safety case [61], and the EUR Whole Airspace Air Traffic

Management (ATM) System Safety Case [62]. The organization of these three arguments

made them amenable to analysis by individuals who did not possess expert knowledge of

the relevant application domains. The EUR RVSM and whole airspace safety cases are

preliminary and do not necessarily reflect final engineering standards; however, it is still

appropriate to examine the arguments for fallacies so that those fallacies can be addressed

prior to implementation.

The two reviewers read and independently evaluated each of the three safety cases

selected for the study. The purpose of this evaluation was to identify the types of fallacies

that were typically committed in safety arguments. Both of the reviewers had at least a

basic understanding of fallacious reasoning from classical philosophical literature and

drew from that knowledge in performing their evaluations. When a reviewer encountered

what he believed to be faulty reasoning in an argument, he highlighted the relevant section

and recorded a brief note explaining why the reasoning was problematic. Upon complet-


ing their evaluations, the reviewers compiled their results into a comprehensive set of fal-

lacies for each of the safety cases and then achieved a consensus as to which of the

comments they made were indicative of fallacious reasoning in the arguments. The fol-

lowing sections summarize those results for each safety case using fallacy descriptions

taken from Damer [57]. Note that the goal of this study was to examine the form of the

safety arguments and not to evaluate the safety of the associated systems.

5.1.1. EUR RVSMThe EUR Reduced Vertical Separation Minimums (RVSM) Pre-Implementation

Safety Case concerns a proposed reduction in the minimum amount of vertical distance

that must be present between any two aircraft operating in EUR airspace. RVSM would

accommodate a greater density of air traffic and would thus enable EUR to meet an

expected increase in demand for air travel over the next several years. The RVSM safety

case is organized as a natural language document but provides a graphical version of the

safety argument in Goal Structuring Notation (GSN) in an appendix. To make the study

tractable, the reviewers limited their evaluation to the GSN portion of the safety case,

which consisted of 116 claims (GSN goals) spanning nine pages of GSN. Table 5.1 sum-

marizes the fallacies the reviewers identified in the EUR RVSM safety case with one col-

umn for each reviewer. Relevance fallacies were the most prevalent in the argument and

accounted for two-thirds of the fallacies identified.

Although the purpose of employing two reviewers in the case study was to assemble a

more complete set of fallacies and not to examine the consistency between the reviewers,

the disparity between the results of each reviewer is significant. Differences are present

both in the quantities and types of fallacies that each reviewer identified, the most notable


of which concerns the fallacy using the wrong reasons. All but one of the instances of this

fallacy concerned an inference that the argument invoked repeatedly. Reviewer A flagged

only the first few of these instances before choosing to ignore them while reviewer B

flagged them all. Moreover, both reviewers identified two instances of fallacious use of

language due to ambiguity in the argument; however, the specific instances they identified

did not overlap, suggesting that they had trouble agreeing upon which language was

ambiguous or misleading. Finally, reviewer A identified a greater variety of fallacies than

did reviewer B, which may be due to A’s greater experience with logic and argumentation.

A recurring problem in the RVSM safety case was its use of evidence that did not sup-

port the immediate claim of the argument. As an example, Figure 5.1 presents a portion of

the argument concerning the role of flight training in the RVSM safety requirements [60].

Of the four premises supporting the claim that “there is sufficient direct evidence of [flight

crew] training design validity,” (St.2.3.1) only one—G.2.3.1.4—pertains to the claim. The

other premises state that various aspects of the training program have been specified, but

this information does not aid the reader in assessing the validity of the training design.

Table 5.1: Tally of Fallacies Identified in the EUR RVSM Safety Case

Fallacy Reviewer A Reviewer B Totala

Drawing the Wrong Conclusion 3 3

Fallacious Use of Language 2 2 4

Hasty Inductive Generalization 4 4

Omission of Key Evidence 1 1

Red Herring 1 1

Using the Wrong Reasons 5 15 16

Total 15 18 29

a. Instances of a fallacy that were detected by both reviewers are reflected only once.


5.1.2. Opalinus ClayThe Opalinus Clay safety case concerns the feasibility of constructing a long-term

radioactive waste storage facility within the Opalinus Clay: a geological formation in the

Zürcher Weinland of Switzerland. The argument claims that the Clay is sufficiently stable

to enable the facility to meet its safety requirements for at least the next ten million

years [61]. The safety case contains approximately 43 claims spanning 22 pages and is

written in bulleted natural language with major safety claims enumerated as subsections

accompanied by their corresponding arguments. It includes a variety of arguments for the

feasibility and safety of the facility, including “...multiple arguments for safety that:

• Demonstrate safety and compliance with regulatory protection objectives;

Figure 5.1: Excerpt from the EUR RVSM Safety Case

G2.3Flight crew training design complies with safety requirements.

St2.3.1

Argue that there is sufficient direct evidence of flight crew training design validity.

G2.3.1.1

FC RVSM & Transition Training specified.

G2.3.1.2

FC Aircraft Contingency training specified.

G2.3.1.3

Flight planning training specified.

G2.3.1.4

Hazards and risks controlled and mitigated.

S2.3.1.1

PISC 5.4.3 & 5.4.4

S2.3.1.4

PISC 5.4.6


• Use indicators of safety that are complementary to those of dose and risk and that

show that radionuclide releases and concentrations due to the repository are well

below those due to natural radionuclides in the environment;

• Indicate that the actual performance of the disposal system will, in reality, be more

favorable than that evaluated in quantitative analyses; and

• No issues have been identified that have the potential to compromise safety” [61].

Both reviewers agreed that the Opalinus Clay safety case was the most compelling of

the three arguments they reviewed. Indeed, reviewer B did not identify any fallacies in the

argument while reviewer A identified only three, one of which was later determined to be

valid reasoning. Table 5.2 shows the fallacies identified by reviewer A.

One of the arguments in the safety case discusses uncertainty in the characteristics of

the chosen disposal system. It states that the choice of uncertainty scenarios to consider

“remains a matter of expert judgment” and then describes the process in which scenarios

were developed and considered using a panel of experts. A criticism of this approach

would be that scenarios suggested by some experts were not selected for consideration but

should have been. To avoid this criticism, the argument should mention some of the sce-

narios that were suggested but excluded from consideration by the panel along with its

rationale for doing so. Elsewhere, the argument claims that “uncertainties [in the risk

Table 5.2: Tally of Fallacies Identified in the Opalinus Clay Safety Case

Fallacy Reviewer A Reviewer B Total

Arguing from Ignorance 1a

a. The reviewers later agreed that this reasoning was non-fallacious.

0


Total 3 2


assessment cases] are treated using a pessimistic or conservative approach,” but no evi-

dence is provided to support the claim of conservatism. Finally, in considering possible

human intrusions, the argument assumes that “...possible future human actions that may

affect the repository are constrained to those that are possible with present-day technology

or moderate developments thereof.” Although it is difficult to imagine how one would

avoid making this assumption, it is possible that unforeseen future innovations will render

the analysis moot.

5.1.3. EUR Whole AirspaceThe EUR Whole Airspace Air Traffic Management (ATM) System Safety Case pre-

liminary study was conducted to evaluate “the possibility of developing a whole airspace

ATM System Safety Case for airspace belonging to EUROCONTROL member

states” [62]. The study proposes arguments for preserving the current safety level of EUR

airspace under a unified air traffic organization instead of the patchwork of organizations

that comprise EUR today. The arguments presented in the EUR safety case are prelimi-

nary; nevertheless, it is appropriate to examine them because operational arguments will

likely be derived from them. Like the RVSM safety case, the report presents mostly natu-

ral language arguments but does make use of GSN in some places. Again, the reviewers

chose to limit their examination to the major GSN portions of the report, which contained

17 claims spanning two pages. The GSN portions included two separate arguments for the

safety of the whole airspace: one based on arguing over individual geographic areas and

one based on reasoning about the whole airspace ATM rules. Both arguments shared the

same top-level goal that “the airspace is safe.”


Table 5.3 contains the results of the reviewers’ evaluations of the EUR Whole Air-

space argument, which reflect the combined fallacies in both the argument over geo-

graphic regions and the argument for the safe implementation of whole airspace rules.

Neither argument considered possible interactions between geographic areas, such as

when an aircraft is handed off by one air traffic controller to another in an adjacent region.

Even if the safety rules are respected within each region, without special considerations

for interactions between regions and with external airspace, the rules might be violated in

the context of the broader whole airspace system. Both reviewers flagged these arguments

as instances of fallacious composition.

5.2. A Taxonomy of Safety-Argument FallaciesAll three of the case studies the reviewers considered exhibited common types of

faulty reasoning, supporting the hypothesis stated in the previous section. To facilitate

detection of these fallacies, a taxonomy of safety-argument fallacies was developed by

adapting existing taxonomies described in the philosophical literature according to the

observations from the survey. The specific objectives in developing the taxonomy were:

1. To cover a broad range of fallacies relevant to safety argumentation;

Table 5.3: Tally of Fallacies Identified in the EUR Whole Airspace Safety Case

Fallacy Reviewer A Reviewer B Totala

Fallacy of Composition 2 2 2

Fallacious Use of Language 2 6 6

Red Herring 4 4


Total 10 8 14

a. Instances of a fallacy that were detected by both reviewers are reflected only once.


2. To categorize the taxonomy so that a user may determine which fallacies might

pertain to a given argument without searching all the fallacies in the taxonomy;

3. To define the fallacies so that they are accessible to safety professionals who have

not received formal training in logic or argumentation; and

4. To design the taxonomy for extensibility.

Jacob Pease of the University of Virginia Department of Philosophy and Michael Hol-

loway of NASA Langley Research Center assisted in developing the fallacy taxonomy.

5.2.1. CoverageSeveral taxonomies of fallacies in general arguments exist; those of Damer, Curtis,

Dowden, Pirie, and Govier were considered to develop the safety-argument fallacy

taxonomy [57, 63, 64, 65, 30]. Since these taxonomies describe fallacies that might appear

in general arguments, they include emotional appeals, malicious fallacies that convey

attempts at willful deception, and formal syllogistic, and causal fallacies. Thus, it was nec-

essary to exclude several of these fallacies in order to achieve a balance between the spec-

trum of fallacies included in the taxonomy and their applicability to safety argumentation.

Four basic constraints were employed in deriving the set of fallacies to include in the

taxonomy. First, only fallacies that could be expected to appear in a documented safety

case were considered. This constraint was based on the assumption that the documented

safety case is the authoritative argument that a system is acceptably safe to operate. Based

on this constraint, emotional appeals were excluded from the taxonomy.

Second, only fallacies that could be invoked accidentally (that is, by someone who

was unaware that the reasoning was fallacious) were considered. This constraint is based

on the assumption that the developer of a safety argument does not possess malicious


intent. Based on this constraint, malicious fallacies such as appeal to force or threat and

attacking a straw man were excluded from the taxonomy.

Third, only informal fallacies were considered for the taxonomy based on the assump-

tion that safety arguments are generally inductive and, if a portion of a safety argument

were amenable to deductive analysis, it could be expressed formally and verified mechan-

ically. Thus, formal and syllogistic fallacies were excluded from the taxonomy.

Finally, it was assumed that safety arguments generally do not attempt to establish

causal relationships between events (with the exception of inferring a causal relationship

between correlated events), and so most types of causal fallacies were excluded. These

fallacies are reserved for arguments that attempt to explain a sequence of past events by

establishing causal relationships between them whereas safety arguments typically are

concerned with predicting the occurrence of future events.

Table 5.4 summarizes the fallacies that were excluded, and Table 5.5 provides exam-

ples of these fallacies. After excluding emotional, malicious, formal, syllogistic, and

causal fallacies, the author and Jacob Pease assessed the remaining fallacies individually

and excluded those that were unlikely to appear in safety arguments for various reasons.

These fallacies appear in the “Other Excluded Fallacies” categories in Table 5.4 and Table

5.5. For example, wishful thinking was excluded because it concerns arguments in which a

claim is asserted to be true on the basis of a personal desire or vested interest in it being

true. Such an argument is unlikely to appear explicitly in a safety argument. Hasty induc-

tive generalization, which occurs when an argument offers too little evidence in support of

its claim, was excluded because its broad definition encompasses most of the other falla-

cies. Other fallacies such as refuting the example were omitted because they pertain to ref-


utations, which seldom appear in safety cases. Finally, fallacies whose definitions differed

only subtly from each other were collapsed into a single fallacy; these fallacies appear in

the row marked “Collapsed Fallacies.”

There was a strong degree of overlap in the fallacies that remained from each of the

five taxonomies that were surveyed. These fallacies were consolidated into a final set of

33 fallacies, which is presented in Table 5.6. Appendix A contains a complete description

Table 5.4: Excluded Fallacies Grouped by Source

Category Damer Curtis Dowden Pirie Govier

Fallacies Defined by Source 60 65 107 75 32

Emotional Appeals -7 -3 -9 -4 -2

Malicious Fallacies -9 -8 -15 -11 -4

Formal & Syllogistic Fallacies -7 -19 -8 -11 -3

Causal Fallacies -4 -1 -3 -1 -3

Other Excluded Fallacies -7 -11 -32 -16 -3

Collapsed Fallacies -8 -3 -12 -9 -2

Fallacies Represented 18 20 28 23 15

Table 5.5: Examples of Excluded Fallacies

Category Examples

Emotional Appeals • Argument from Outrage• Misleading Accent

• Scare Tactic• Style Over Substance

Malicious Fallacies • Appeal to Force• Ad Hominem

• Poisoning the Well• Straw Man

Formal &Syllogistic Fallacies

• Affirming the Consequent• Denying the Antecedent

• Four-Term Fallacy• Undistributed Middle Term

Causal Fallacies • Post Hoc ergo Propter Hoc • Reversing Causation

Other Excluded Fallacies • Drawing the Wrong Conclusion• Complex Question• Fallacy of the Continuum• Hasty Inductive Generalization• Refuting the Example

• Regression• Scope Fallacy• Special Pleading• Tu quoue• Wishful Thinking


of these fallacies and categories. The process of excluding fallacies, while based on an

established set of criteria, was nevertheless subjective, and there is a risk that fallacies that

could appear in a safety argument were erroneously excluded. Thus, while the taxonomy

may be used to help detect fallacies in safety arguments, the absence of a fallacy fitting

one of the patterns described in the taxonomy is insufficient to conclude that no fallacies

are present.

The fallacy of omission of key evidence appears in the taxonomy both as a category

and as an entry because there are many special forms of this fallacy:

• Fallacious composition occurs when an argument attempts to infer the properties

of a system from those of its components without considering interactions between

the components that might violate those properties.

Table 5.6: The Safety-Argument Fallacy Taxonomy

Circular ReasoningCircular ArgumentCircular Definition

Diversionary ArgumentsIrrelevant PremiseVerbose Argument

Fallacious AppealsAppeal to Common PracticeAppeal to Improper/Anonymous AuthorityAppeal to MoneyAppeal to NoveltyAssociation FallacyGenetic Fallacy

Mathematical FallaciesFaith in ProbabilityGambler’s FallacyInsufficient Sample SizePseudo-PrecisionUnrepresentative Sample

Unsupported AssertionsArguing from IgnoranceUnjustified ComparisonUnjustified Distinction

Anecdotal ArgumentsCorrelation Implies CausationDamning the AlternativesDestroying the ExceptionDestroying the RuleFalse Dichotomy

Omission of Key EvidenceOmission of Key EvidenceFallacious CompositionFallacious DivisionIgnoring Available Counter-EvidenceOversimplification

Linguistic FallaciesAmbiguityEquivocationSuppressed QuantificationVacuous ExplanationVagueness


• Fallacious division, conversely, occurs when an argument attempts to infer the

properties of a component from those of the encompassing system.

• Ignoring available counter-evidence occurs when an argument makes a claim for

which there exists refuting evidence but fails to address that evidence.

• Oversimplification describes arguments that cite evidence obtained from models of

system behavior but fail to show that the models correspond to the actual system.

5.2.2. CategorizationInitial attempts at categorizing the fallacies in the taxonomy relied upon Damer’s cate-

gories of relevance, acceptability, and sufficiency [57]. Relevance fallacies concern the

use of premises that have no bearing on the truth of the claims they ostensibly support,

acceptability fallacies concern the use of inherently faulty inferences, and sufficiency fal-

lacies describe ways in which arguments can fail to provide enough evidence in support of

their claims.

Damer’s topology exhibited three problems that contradicted the goal of making the

taxonomy accessible. First, the categories did not correspond to the types of arguments

one might encounter in a safety case, and so they provided little help in determining the set

of fallacies that might pertain to a given argument. Second, the categories required a user

of the taxonomy to know a priori the type of fallacy committed, which does not aid users

who apply the taxonomy in order to determine whether an argument is fallacious. Third,

the domain of safety argumentation presented special challenges in assigning the fallacies

to these categories unequivocally. Many of the fallacies Damer classified as relevance or

acceptability fallacies, such as fallacious composition and division, could be remedied by

supplying additional evidence, suggesting that in some cases they would be better-classi-


fied as sufficiency fallacies. The topologies employed by Curtis, Dowden, Pirie, and

Govier were also considered, but they suffered similar limitations.

Since a suitable topology to adapt to the taxonomy could not be found from existing

literature, one was developed by inferring relationships among the fallacies with respect to

the types of arguments they address. This topology—shown in Table 5.6—groups falla-

cies into eight categories, which are summarized below:

• Circular reasoning occurs when an argument is structured so that it reasserts its

claim as a premise or defines a key term in a way that makes its claim trivially true.

• Diversionary arguments contain excessive amounts of irrelevant material that

could distract a reader from a weakly supported claim.

• Fallacious appeals invoke irrelevant authorities, concepts, or comparisons.

• Mathematical fallacies concern fallacious probabilistic and statistical inferences.

• Unsupported assertions are claims stated without evidence.

• Anecdotal arguments cite examples but fail to generalize the truth of their claims.

• Omission of key evidence occurs when an otherwise complete argument omits

evidence that is necessary to establish its validity.

• Linguistic fallacies concern the use of misleading language that could suggest an

unwarranted conclusion. These fallacies may appear in any informal argument.

A user of the taxonomy may compare the argument he or she reviews to each of the

categories in the taxonomy to assess the argument’s validity. For example, the user might

first examine the structure of the argument for circularity and then evaluate the relevance

of its premises. A verbose argument might contain diversionary premises, and appeals to

regulatory standards, practices, conventions, or authorities such as regulatory agencies


should be checked to ensure that they are relevant to the context of the argument. If the

argument relies upon statistical evidence, then the user should examine the conclusions

that it draws from that evidence for mathematical fallacies. Unsupported assertions and

anecdotal evidence may suggest circumstances in which the argument’s claims do not

hold. If the argument follows an accepted pattern of reasoning, the user should verify that

it has properly instantiated the pattern and not omitted evidence. Finally, the user must be

wary of vague or ambiguous terms in the argument because different parts of the argument

might interpret these terms differently and lead the user to an unwarranted conclusion.

While somewhat subjective, organizing the fallacies by the types of arguments in

which they appear instead of the manner in which they undermine arguments addresses

the shortcomings associated with Damer’s topology. For a given argument, a user may

assess the argument with respect to the categories that describe it and then determine the

set of fallacies that might pertain to the argument. Thus, users must only be familiar with

the categories in the taxonomy and not the entire set of fallacies in order to apply the tax-

onomy, and they are not required to know a priori that an argument is fallacious. This

organization also improves the orthogonality of the topology because the type of argument

in which a fallacy is likely to appear is relatively static, whereas the manner in which it

undermines the argument depends upon the context in which the fallacy occurs.

5.2.3. Fallacy DefinitionsFigure 5.2 provides a sample taxonomy entry, and Appendix A contains the complete

taxonomy. The entries in the taxonomy were structured following a format similar to that

used in each of the source taxonomies that were surveyed. Each entry in the taxonomy

consists of a short name of the fallacy, a definition, safety-related examples of the fallacy,


and an exposition of the examples. The examples are intended to demonstrate real-world

instances of the fallacies, and in many cases they have been adapted from actual safety

arguments.

5.2.4. Completeness & ExtensibilitySome of the fallacies excluded from the taxonomy might appear with sufficient fre-

quency to warrant inclusion, and there might exist other fallacies that are relevant to safety

argumentation but were not considered. To reduce this risk, five distinct fallacy taxono-

mies were surveyed in order to ensure coverage of a broad range of fallacies, and the

strong degree of overlap between the taxonomies that were surveyed indicates that there is

general agreement in the philosophical community as to which fallacies typically appear

in arguments. Nevertheless, the taxonomy was designed so that it would be extensible.

New fallacies may be added to each category, and new categories may also be added pro-

vided they respect the existing orthogonal topology. In addition, specific forms of the fal-

lacies defined in the taxonomy, such as those that are relevant to a particular safety

domain, may be added as child elements of those fallacies.

Arguing from Ignorance

An argument supports a claim by citing a lack of evidence that the claim is false. The argument does not exhibit the fallacy if it cites as evidence a sufficiently-exhaustive search for counter-evidence that has turned up none.

Example: All of the credible hazards have been identified. Aside from the hazards noted earlier, no evi-dence exists that any other hazards pose a threat to the safe operation of the system.

This argument attempts to prove a negative (that there are no additional credible hazards to system oper-ation) by noting a lack of evidence contradicting the negative. It does not cite any efforts that have been made to discover such evidence. A mere lack of evidence that a claim is false does not make it true.

Figure 5.2: Sample Taxonomy Entry


5.2.5. OverlapOverlap between fallacies refers to scenarios in which an inference exhibits multiple

fallacies. It can arise either when multiple aspects of an inference are fallacious or when

the fallacies’ definitions are not mutually exclusive. In the latter sense, overlapping falla-

cies are problematic if the strategies for removing them are incompatible. In Damer’s clas-

sification, for example, an overlap between a relevance fallacy and a sufficiency fallacy

would lead to the dilemma of either removing an inference because it was irrelevant or

adding additional support to the argument in order to make it sufficient. Since the catego-

ries in Table 5.6 are largely orthogonal, overlap in this sense is unlikely to occur between

fallacies in different categories, and fallacies that belong to the same category share simi-

lar repair strategies. Moreover, fallacies whose definitions contained only subtle differ-

ences were consolidated in order to reduce the likelihood of overlap in the taxonomy.

5.3. ApplicationsThe intended application of the fallacy taxonomy is to facilitate the discovery of falla-

cies in a system’s safety argument as part of a Pandora analysis; however, the taxonomy

may also be used separately to assist safety-case development and pre-acceptance review.

Newly discovered fallacies may be added to the taxonomy, and so in addition to its appli-

cations as a preventative tool, the taxonomy is also a means by which lessons may be dis-

seminated from failure analyses to those who develop and certify safety-related systems.

5.4. Chapter SummaryUltimately, safety cases are informal arguments, and so they are subject to various

forms of fallacious reasoning that might undermine their validity. The presence of falla-

cies in a system’s safety argument suggests that design faults might have been introduced


into the system, and so it is important to detect and remove fallacies before a safety-related

system is put into operation and discover them when a failure is observed. The safety-

argument fallacy taxonomy introduced in this chapter is designed to assist safety profes-

sionals in detecting fallacious reasoning in safety arguments, and it may also be used by

safety-case developers as a reminder of the types of reasoning they should avoid. While

the taxonomy is intended to be used as part of a Pandora analysis, it may also be used sep-

arately, and new entries and categories may be added to the taxonomy based on lessons

learned from failure analyses. The results of a controlled study to evaluate the taxonomy’s

effect in improving fallacy detection rates are reported in Chapter 7.

Chapter 6

MSAW Case Study of Pandora

The author applied Pandora to a series of accidents involving the MSAW system

described in Chapter 1 to compare its lessons and recommendations to those of the official

investigations into the accidents. MSAW has been in use since the early 1970s, and it is

suitable for this case study because it was initially developed, and has since been

improved, in response to a series of accidents. This chapter provides a brief description of

MSAW and the accidents surrounding its development and then describes the methodol-

ogy of the case study, the lessons that were obtained by applying Pandora to the MSAW-

related accidents, the results of the case study, and the threats to experimental validity.

6.1. Overview of the MSAW SystemThe Minimum Safe Altitude Warning system (MSAW) is a software system intended

to help prevent controlled flight into terrain (CFIT), which typically occurs when a flight

crew lose situational awareness and pilot a serviceable aircraft into terrain or an obstacle.

MSAW is a function of the Automated Radar Terminal System (ARTS) family of air traf-

fic management systems deployed by the U.S. Federal Aviation Administration (FAA) in

77

MSAW Case Study of Pandora 78

the early 1970s and the Standard Terminal Automation Replacement System (STARS),

which is replacing ARTS. The FAA developed MSAW in response to a 1972 commercial

aviation accident near Miami, Florida and began to deploy the system in 1977.

MSAW alerts air traffic controllers when an aircraft descends, or is predicted by the

system to descend, below a predetermined minimum safe altitude. Upon receiving a

MSAW alert, a controller is required to identify the aircraft that triggered the alert and

issue an advisory to the flight crew or to the controller handling the aircraft. An MSAW

alert consists of a visual indication beside the aircraft’s data block on a controller’s radar

display and an aural alarm that sounds in the control room.

MSAW detects altitude violations via two mechanisms: general terrain monitoring and

approach path monitoring. General terrain monitoring applies to most aircraft that are

equipped with altitude-encoding transponders and detects violations by comparing the

altitude reported by an aircraft’s transponder to the minimum safe altitude for the aircraft’s

location. If the reported altitude is below the minimum safe altitude, then MSAW will alert

the controller. Approach path monitoring only applies to aircraft on final approach (where

the risk of CFIT is greatest) and predicts altitude violations by calculating an aircraft’s

descent path and then comparing it to the standard descent path for the approach course. If

MSAW determines the aircraft’s approach to be too steep, it alerts the controller.

Each MSAW installation is customized to the local facility at which it will operate.

Site adaptation variables specify the locations and length of runways and the dimensions

of approach capture boxes, which MSAW uses to determine whether an aircraft is flying a

final approach course. MSAW computes minimum safe altitudes from a local terrain data-

base that provides elevation data for the surrounding environment. If a MSAW installation


generates excessive false alarms, areas known as inhibit zones may be defined temporarily

to exclude problematic regions of airspace from MSAW processing until adjustments are

made to other site adaptation variables.

Since its initial development, several accidents have occurred in which the failure of

MSAW to alert a controller to an altitude violation was cited as a contributory factor. The

National Transportation Safety Board (NTSB) investigated each of these accidents, and in

section 6.3 the NTSB’s findings are compared to those that were obtained by applying

Pandora to the accidents. The accidents, beginning with the 1972 accident that motivated

the development of MSAW, are summarized below:

• Eastern Air 401 (December 29, 1972) – Although the controller noticed the air-

craft’s low altitude and queried the flight crew, he concluded that the flight was in

no immediate danger after receiving a positive response. This accident motivated

the development of the MSAW system.

• USAir 105 (September 8, 1989) – The aircraft began a premature descent, but

MSAW failed to alert the controller because the descent occurred inside an inhibit

zone. In response, the FAA revised its site adaptation guidance “to minimize the

extent of MSAW inhibited areas” [3].

• TAESA Learjet 25D (June 18, 1994) – The aircraft generated no MSAW alerts

because the site variables, including those specifying the runway layout for the air-

port, were incorrect. In response, the FAA conducted a national review of MSAW

site adaptation variables at all its facilities.


• Beechcraft A36 (January 29, 1995) – MSAW issued four visual alerts concerning

the accident aircraft, but the controller did not notice them. This accident prompted

the FAA to begin installing aural alerts at all its MSAW-equipped facilities.

• Piper PA-32-300 (October 2, 1996) – An inspection of the ATC facility involved

in the accident revealed that the MSAW aural alarm speaker had been muted. Con-

sequently, the FAA began requiring facility supervisors to inspect the speakers at

the beginning of each shift.

• Korean Air 801 (August 6, 1997) – MSAW did not generate any alerts for the air-

craft because an inhibit zone had been defined that encompassed almost the entire

radar service area. The FAA recertified all of its MSAW installations, revised its

training and site adaptation procedures, and established a centralized configuration

management program for MSAW.

• Gates Learjet 25B (January 13, 1998) – The aircraft crashed east of the runway,

but MSAW did not generate any alerts because the site adaptation variables for the

runway’s final approach course were incorrect. This accident occurred as the FAA

began revising its configuration management program for MSAW in the wake of

the Korean Air 801 accident.

6.2. MethodologyThe objective of the case study was to apply Pandora to most of the accidents listed

above beginning with the USAir 105 accident on September 8, 1989 and ending with the

Korean Air 801 accident on August 6, 1997 and then compare Pandora’s lessons to the

findings of the NTSB’s official investigations. The Eastern Air 401 accident was excluded

from the study because MSAW did not exist prior to this accident, and the Gates Learjet


accident was excluded because the findings of the investigation into the accident were

encompassed by those from previous MSAW-related accidents.

For each accident, the author developed a pre-failure safety argument for MSAW

based on the safety rationale for the system at the time of the accident. The author then

applied Pandora using the evidence obtained from the official investigation to identify

flaws in the safety argument that could have contributed to the accident, producing a post-

failure safety argument in which the flaws that were identified by applying Pandora were

corrected. Finally, the author compared the post-failure safety argument to the findings

and recommendations from the official investigation. Hindsight bias—that is, prior knowl-

edge of the outcomes of the NTSB’s investigations that might influence the application of

Pandora—was a major concern for this case study, and the measures that were taken to

mitigate it are discussed below and in section 6.5.1.

6.2.1. Constructing the Factual BasisThere does not appear to be a publicly available safety case for the MSAW system, so

each of the pre-failure arguments were derived from the National Transportation Safety

Board’s (NTSB) final report on the Korean Air 801 accident [3]. The report contains an

extensive review of MSAW and the FAA’s management of the system, and it includes an

overview of MSAW, a chronology of the system’s development and the accidents sur-

rounding it, and the findings pertaining to MSAW from the Korean Air accident. The

chronology lists the lessons and recommendations from the investigations into each acci-

dent as well as the corrective actions taken by the FAA.

Attempting to construct the pre-failure safety arguments directly from the NTSB’s

final report on the Korean Air 801 accident would have introduced a high risk of hindsight


bias. To reduce this risk, the relevant sections of the report were partitioned into individual

statements, which yielded 289 statements. Each statement was then categorized according

to the earliest accident prior to which the information contained in the statement would

have been known by the FAA. For example, the Beechcraft A36 accident in 1995 revealed

the importance of an aural alert, so it was assumed that this information was unknown

prior to the accident. In general, findings from one accident investigation were assumed to

be known by the FAA prior to the subsequent accident, and dated statements (i.e.; “In a

May 31, 1977 letter, the FAA advised the Safety Board that…”) were assumed to be

known prior to the earliest accident that occurred after the date. In some cases, however,

dated statements were moved later in the timeline because their significance would not

have been recognized until after a particular accident (e.g.; the modifications that were

made to the MSAW system at Guam prior to the Korean Air accident).

6.2.2. Deriving the Pre-Failure Safety ArgumentsAfter categorizing the statements from the report, a pre-failure safety argument for the

USAir 105 accident was constructed by extracting statements that were known prior to the

accident and discarding those that did not pertain to the safety rationale for MSAW. This

process yielded nine statements, including the high-level safety requirement that MSAW

“visually and aurally [alerts] a controller whenever an IFR-tracked target with an altitude

encoding transponder (Mode C) descends below, or is predicted by the software to

descend below, a predetermined safe altitude” [3]. The other statements discussed the

monitoring and visual alerting capabilities of MSAW, controller training, and that MSAW

may be inhibited to prevent nuisance alerts. These nine statements constituted the factual

basis for the pre-failure safety argument.


As with each of the arguments produced for this case study, the pre-failure safety argu-

ment was expressed in Goal Structuring Notation (GSN) [27]. A portion of the argument

is presented in Figure 6.1, and Appendix B contains the complete safety arguments for

each accident annotated with post-failure revisions. To construct the argument, each piece

of information from the factual basis was represented as a node in GSN, and then eviden-

tiary relationships between the nodes were inferred. The complete argument contains 23

nodes including goals, context, and solutions. Eighteen nodes correspond directly to infor-

mation contained in the factual basis of the argument. Three of the remaining nodes are

Figure 6.1: Pre-Failure Safety Argument for the USAir 105 Accident

G02

MSAW visually alerts a controller whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted to descend below, a predetermined safe altitude.

G05Controller issuance of a MSAW alert is a first-priority duty.

G03The controller will relay an MSAW alert to the flight crew so that they can take remedial action.

G08

New controllers are trained to use MSAW in their initial schooling and on the job.

G10MSAW may be inhibited when its continued use would adversely impact operational priorities.

S01

FAA Order 7210.3

S101

FAA Order 7110.65

G09Controllers who were employed when MSAW came into service were trained on the job to use MSAW.

G06Controllers are adequately trained to use MSAW.

G01

Air traffic controllers will detect altitude deviations of transponder-equipped aircraft and take timely action to assist the flight crew.

C01An "altitude deviation" is an instance in which an aircraft has descended, or is predicted to descend, below the MSA for the region of airspace in which it is operating.

C02"Assist the flight crew" refers to the issuance of a low-altitude advisory to the flight crew.

G07The frequency of nuisance MSAW alerts will be minimized.

G04

Local controllers at non-approach control towers devote the majority of their time to visually scanning the runways and local area.


goals that were added to enhance the validity of the argument (nodes G01, G06, and G07

in Figure 6.1), and the other two are context nodes that define unfamiliar terms (nodes

C01 and C02). The top-level claim of the argument (G01) was inferred from the claims

that MSAW will detect altitude violations and alert a controller (G02) and that controllers

will relay MSAW alerts to flight crews (G03).

Using the USAir 105 argument as a baseline, pre-failure arguments for each of the

other accidents were constructed by adding the additional information that was known

prior to each subsequent accident. By the time of the Korean Air accident in August 1997,

the FAA had made significant changes to MSAW and its configuration management pro-

gram for the system, so the pre-failure argument was rebuilt from the basis as was done for

USAir 105. Since the basis for the pre-failure arguments might not reflect the actual safety

rationale for MSAW at the time, the arguments are hypothetical; however, if additional

safety-related information concerning MSAW was considered as part of the official inves-

tigations into the accidents, it does not appear to be publicly available [66].

6.2.3. Applying PandoraThe author applied Pandora to each of the pre-failure arguments to produce corre-

sponding post-failure safety arguments. When counter-evidence refuting a claim was elic-

ited, the set of evidence considered by the official investigations was used to determine

whether such evidence existed. The fallacies that were discovered in each of the argu-

ments were noted, and those fallacies were then corrected in the post-failure arguments.

This process was also subject to hindsight bias since the author was aware of the actual

recommendations of the official investigations and the corrective actions taken by FAA, so


corrections in the post-failure arguments were constrained to what was required to address

the counter-evidence that was elicited.

6.3. Lessons Learned From PandoraThis section compares the lessons that were obtained from the Pandora analyses of

each of the accidents selected for the case study to the findings of the NTSB’s official

investigations. The lessons derived from applying Pandora to each accident are summa-

rized in the subsections below.

6.3.1. USAir 105The first fallacy identified from applying Pandora to the USAir 105 pre-failure safety

argument concerned goal G02 in Figure 6.1. Upon evaluating this claim, the Pandora anal-

ysis posed the question, “Is there evidence from the accident that MSAW did not visually

alert a controller when an IFR-tracked target descended below a predetermined safe alti-

tude?” MSAW did not generate any alerts regarding USAir 105 because the altitude viola-

tion occurred in a region of airspace in which MSAW processing had been inhibited.

Therefore, goal G02 was invalidated. This fallacy was corrected in the post-failure safety

argument by weakening the claim: “MSAW visually alerts a controller whenever an IFR-

tracked target descends below, or is predicted to descend below, a predetermined safe alti-

tude unless the violation occurs in an inhibited region of airspace.” This change conse-

quently invalidated the top-level claim of the argument, goal G01, because the possibility

of MSAW inhibitions created instances in which a controller might not be alerted to an

altitude violation. To remedy this problem, a new claim was added to the argument that the

risk of an altitude violation occurring in MSAW-inhibited airspace has been sufficiently


reduced. How to support this claim would be left to the discretion of the FAA, but the

post-failure argument suggested three convergent approaches:

• Showing that the creation of inhibit zones has been minimized to instances in

which continued use of MSAW would adversely impact operational priorities;

• Showing that the creation of inhibit zones is a temporary measure; and

• Showing that alternate methods for detecting altitude violations in inhibited air-

space have been implemented.

Due to the risk of hindsight bias, it would be inappropriate to conclude that these

approaches would have been suggested if Pandora had been applied immediately follow-

ing the accident.

The official investigation into the USAir 105 accident recommended that the FAA

“provide site adaptation guidance to encourage modification of Minimum Safe Altitude

Warning parameters, as appropriate, to minimize the extent of inhibit areas.” This recom-

mendation is more specific than the risk-reduction claim that was added to the post-failure

argument but consistent with the first of the suggested approaches for satisfying the claim.

6.3.2. TAESA LearjetThe Pandora analysis of the TAESA Learjet accident focused on the argument sup-

porting goal G02 in Figure 6.1. This argument is shown in Figure 6.2 with post-failure

revisions indicated by a bold outline. Goals G13 and G16 were invalidated based upon

evidence from the accident that the MSAW site variables were incorrect, and goal G15

was invalidated based upon evidence that MSAW did not generate an alert despite receiv-

ing radar returns indicating that the aircraft had violated its minimum safe altitude. To

restore validity to the argument, claims were added stating that MSAW site variables are

87ccidents

G12General terrain monitoring detects aircraft that have descended below the MSA within the local airspace unless the violation occurs in an inhibited region of airspace.

G16

MSAW uses a terrain database customized for the environment around each airport to detect MSA violations.

G

Awspap

if an w the

th.

at

G204The terrain data have been verified to correspond to the actual terrain surrounding the airport.

Figure 6.2: Excerpt of the Safety Argument for the TAESA Learjet A

G11

Approach path monitoring detects aircraft on final approach that have descended, or will descend, below the glideslope path unless the violation occurs in an inhibited region of airspace.

C04An approach capture box is a rectangular area surrounding a runway and final approach course.

C03FAA technical document NAS-MD-684

G02

MSAW visually alerts a controller whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted to descend below, a predetermined safe altitude unless the violation occurs in an inhibited region of airspace.

13

pproach capture boxes aligned ith runway final approach courses ecify the regions subject to proach path monitoring.

G14

MSAW uses a "pseudo-glideslope" underlying the glideslope path to detect altitude violations during final approach.

G15

MSAW raises an alertaircraft descends belopseudo-glideslope pa

G201The runway parameters specified for the capture box definitions have been verified to be correct.

G202MSAW will issue an alert if a radar return for a tracked aircraft contains an altitude value below the MSA.

G203The risk of MSAW discarding a genuine radar return indicating than aircraft has descended below the MSA has been sufficiently reduced.


verified to be correct (goals G201 and G204) and recommending that the FAA show either

that MSAW will issue an alert when it receives a radar return indicating a violation (goal

G202) or that the risk of MSAW discarding a legitimate return indicating a violation has

been sufficiently reduced (goal G203). Supporting these claims would again be left to the

FAA.

The official investigation recommended that the FAA “conduct a complete national

review of all environments using MSAW systems. This review should address all user-

defined site variables for the MSAW programs that control general terrain warnings, as

well as runway capture boxes, to ensure compliance with prescribed procedures.”

6.3.3. Beechcraft A36 & Piper PA-32-300In both the Beechcraft A36 and Piper PA-32-300 accidents, controllers testified that

they did not notice any MSAW alerts for the accident aircraft even though facility logs

indicated that alerts were generated. In the Beechcraft accident, the controller did not

observe the visual alerts because he was attending to other duties, and the ATC facility

was not equipped with an aural alarm. In the case of the Piper accident, NTSB investiga-

tors found that the aural alarm speakers had been muted due to frequent nuisance alerts.

This evidence was elicited upon considering the claim that “the controller will relay an

MSAW alert to the flight crew so that they can take remedial action” (goal G03 in Figure

6.1) in the pre-failure arguments for both accidents. In these accidents, the controllers did

not relay the alerts because they were unaware of them. Therefore, the claim was invali-

dated, and a new premise was added to repair the arguments: “The controller will recog-

nize an MSAW alert when one occurs.” Supporting this claim would require the FAA to

show that MSAW alerts are sufficiently conspicuous to attract a controller’s attention,


which could entail design changes. Moreover, because the alarm speaker had been muted

at the facility involved in the Piper accident due to frequent nuisance alerts, goal G07,

“The frequency of nuisance MSAW alerts will be minimized,” was invalidated. The argu-

ment supporting this claim was repaired by adding a new premise calling for the FAA to

routinely review nuisance alerts and make system design and configuration changes to

reduce them while minimizing the extent of inhibit zones.

The official investigations into the Beechcraft and Piper accidents recommended that

the FAA install aural alarms at all its MSAW-equipped facilities and that it conduct routine

inspections of the aural alarm speakers to ensure that they are not muted, respectively.

6.3.4. Korean Air Flight 801The final accident, the Korean Air flight 801 crash into Guam on August 6, 1997, was

a major accident with 228 fatalities. By 1997, the FAA had made several design changes

to MSAW and its configuration management program for the system, and it had issued

more statements regarding the safety rationale behind MSAW. To account for this new

information, the pre-failure safety argument for MSAW was rebuilt from the factual basis

as was done for the USAir 105 argument. Figure 6.3 presents the revised top-level argu-

ment, and Figure 6.4 depicts the argument that MSAW inhibit zones have been mini-

mized.

The Pandora analysis of the pre-failure safety argument for the Korean Air 801 acci-

dent noted a minor fallacy in the argument supporting goal G02 in Figure 6.3, but this fal-

lacy was a precursor to more significant problems that would be discovered later in the

analysis. Part of the argument supporting this goal claims that MSAW will generate an

alert if an aircraft descends below the normal descent path for a final approach course.


This claim was invalidated because the MSAW system at Guam was inhibited at the time

of the accident and thus did not produce an alert when Korean Air 801 descended prema-

turely. The argument was repaired by adding a contextual note that MSAW would not gen-

erate an alert if the system is inhibited. This step in the analysis was important, however,

because it caused the analysis to elicit evidence pertaining to the inhibit zone at Guam.

On arriving at the argument supporting goal G23 in Figure 6.4, which claims that

“MSAW processing is inhibited only in those areas in which continued use of MSAW

functions would adversely impact operational priorities,” evidence of the large inhibit

zone at Guam had already been elicited. Upon considering goal G28, the Pandora analysis

Figure 6.3: Top-Level Safety Argument for the Korean Air 801 Accident

G02MSAW visually and aurally alerts a controller whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted by the software to descend below, a predetermined safe altitude.

CG03

Controllers vigilantly monitor MSAW and provide timely notification to either another controller or a flight crew when an MSAW alert indicates the existence of an unsafe situation.

G09Controllers are briefed on how to respond to MSAW alerts.

G04Controllers have recevied guidance on the use of MSAW.

G06

Controllers receive on-the-job MSAW training.

G07Controllers employed after MSAW was introducted receive initial hardware training on MSAW.

G08Controller issuance of a MSAW-based safety alert could be a first-priority duty equal to separation of aircraft.

G05Controllers issue an advisory if MSAW generates an alert.

G01A controller will provide timely notification to the flight crew whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted to descend below, a predetermined safe altitude.

C01A Mode C transponder is an altitude-encoding transponder.

G501

Controllers notice MSAW alerts when they occur.

91

1 Accident

G

Mtcim

G2Asraddisineloc

d

ite variables are audited to ensure ce with site

on procedures.

G503

The FAA periodically evaluates modifications to MSAW and to MSAW site variables in order to reduce the extent of inhibit zones.

Figure 6.4: Inhibit-Zone Minimization Argument for the Korean Air 80

S01

FAA Order 7210.3

G28A brief written report is sent to the FAA air traffic directorate whenever MSAW functions are inhibited.

G27Site adaptation guidance to minimize the extent of MSAW inhibited areas has been provided.

G23MSAW processing is inhibited only in those areas in which continued use of MSAW functions would adversely impact operational priorities.

26

SAW functions can be emporarily inhibited when their ontinued use would adversely

pact operational priorities.

9surance of continued positive ar identification could place tracting and operationally fficient requirements upon the al controller.

S02

FAA Order 7110.65

G19MSAW site variables havebeen reviewed to ensure compliance with prescribeprocedures.

G504The air traffic directorate must approve the report before MSAW functions may inhibited.

G505

Technological measures exist to prevent unauthorized modification of MSAW site variables including inhibit zones.

G506MSAW sroutinelycomplianadaptati


called for evidence of the report that should have been sent to the FAA regarding the

inhibit zone. Since the FAA could not produce this report as part of the NTSB’s investiga-

tion, goal G28 was invalidated, and two new premises were added to strengthen assurance

that reports would be sent whenever MSAW functions are inhibited (goals G504 and

G505). Goal G19 was subsequently invalidated, which claims that “MSAW site variables

have been reviewed to ensure compliance with prescribed procedures,” because the extent

of the inhibit zone at Guam contradicted the intent of the FAA’s program for minimizing

inhibit zones.

Fallacies were discovered in the arguments supporting two other claims in the safety

case. The first concerned the claim that MSAW visually and aurally alerts a controller

when it detects or predicts an altitude violation. This claim was invalidated because the

facility at which MSAW was installed lacked an aural alarm. The second fallacy was dis-

covered in goal G05 in the top-level argument in Figure 6.3 because MSAW generated a

visual alert, but the controller did not notice the alert.

The NTSB’s investigation into the Korean Air flight 801 accident found that the

MSAW system at Guam had been inhibited and that it would have generated an alert at

least one minute prior to the accident had it been functioning properly. Consequently, the

NTSB cited the inhibition of the MSAW system at Guam as a contributory factor in the

accident. Although the NTSB did not issue any recommendations concerning MSAW in

its report, the FAA took several corrective actions on its own initiative. The FAA’s actions

included a revised training program for the operation and maintenance of MSAW, the

development of uniform site adaptation parameters MSAW, and centralized configuration

management of the system.


6.4. DiscussionThe Pandora analysis of the accidents included in this case study was fairly consistent

with the NTSB's official investigations, but there were some important differences, which

are summarized below. Limitations of the case study methodology that might affect these

observations are discussed in section 6.5. The key findings of this case study were that:

• The Pandora analysis identified all the defects in the MSAW assurance case that

the NTSB identified in its investigations of the accidents.

• The recommendations obtained by applying Pandora tended to be more general

than those of the NTSB.

• The evidence elicited by applying Pandora spanned the breadth of MSAW-related

evidence considered by the NTSB, but in some cases the NTSB considered evi-

dence in more depth.

• The Pandora analysis unnecessarily considered claims that were known to have

been true.

• Potentially vulnerable related systems were not included in the analysis.

6.4.1. Identification of Safety-Case DefectsCaution must be used in interpreting these findings due to the risk of hindsight bias;

however, the Pandora analysis identified all the defects in the MSAW assurance case that

the NTSB cited in its final report. Specifically, it addressed the need to minimize MSAW

inhibit zones (USAir 105), the need to verify site adaptation variables to ensure correct

operation of MSAW (TAESA Learjet 25D), the failure of MSAW alerts to attract control-

lers’ attention (Beechcraft A36 and Piper PA-32-300), and the lack of centralized configu-

ration management of MSAW to prevent unauthorized modifications to site adaptation


variables and inhibit zones (Korean Air 801). Moreover, the Pandora analysis arrived at its

findings through a systematic traversal of the safety argument instead of an exhaustive

search for evidence.

6.4.2. Generality of RecommendationsThe recommendations obtained by applying Pandora tended to be more general than

those issued by the NTSB. Some of the NTSB’s recommendations called for the FAA to

take a specific course of action to address a particular finding without presenting compel-

ling evidence that the course of action would prevent similar accidents from occurring.

For example, when the NTSB learned that the controller involved in the Beechcraft A36

accident was unaware of the visual alerts that MSAW generated for the aircraft, it recom-

mended that the FAA add an aural alert to the system. Without evidence that an aural alert

would have attracted the controller’s attention and prompted him to contact the flight

crew, one cannot be certain that this action would have prevented the accident. Moreover,

alternate corrective measures, such as enhancing the conspicuousness of visual alerts,

might also have addressed the problem adequately.

Through the Pandora analysis, the author arrived at a more general recommendation

that the FAA should provide greater assurance that controllers will recognize MSAW

alerts when they occur. This recommendation gives the FAA greater flexibility in choos-

ing corrective actions while requiring it to justify the sufficiency of whichever measures it

takes. The addition of an aural alert could still be suggested as a possible means of satisfy-

ing the recommendation.


6.4.3. Elicitation of EvidencePandora’s ability to elicit evidence is dependent upon the level of detail in the pre-fail-

ure safety argument and the aggressiveness with which the process is applied. When the

NTSB discovered that the MSAW system at Guam had been inhibited, it probed the com-

plete modification history of the Guam system beginning with its initial installation in

1990. When Pandora was applied to this accident, the analysis elicited evidence of the

inhibit zone, but the author did not probe into the modification history because the analy-

sis did not explicitly call for such evidence. Thus, applying Pandora might not elicit all of

the evidence that is necessary to understand a digital system failure; rather, it identifies

which aspects of the system’s safety argument investigators should focus on in their search

for evidence. It is important for investigators to follow up on evidence they elicit through

Pandora in order to discover details that are not directly related to the pre-failure safety

argument.

6.4.4. Unnecessary Consideration of ClaimsIn applying Pandora, the author sometimes considered claims that were known to have

been true during the failure. These claims were either vacuously true or there was evi-

dence from the failure that affirmed them. In the pre-failure argument for USAir 105

shown in Figure 6.1, for example, goal G03, which claims that the controller will relay an

MSAW alert to the flight crew, was vacuously true because MSAW did not generate any

alerts. It was therefore unnecessary to evaluate the argument supporting it.

6.4.5. Related SystemsFinally, the Pandora analysis did not consider similar systems that might exhibit the

fallacies it identified in the MSAW safety arguments. During several of its investigations,

the NTSB expressed concern that the problems it discovered in the ARTS implementa-


tions of MSAW would also manifest in STARS, the replacement system for ARTS. It

issued several safety recommendations asking the FAA to ensure that MSAW functions in

STARS would afford the same level of protection as they did in ARTS and to exploit new

features in STARS, such as color displays, to further enhance MSAW. The Pandora analy-

sis did not consider STARS in its analysis because none of the MSAW installations

involved in the accidents were STARS-based. Broadening the scope of Pandora to include

related systems in its analysis is an area for future work.

6.4.6. Counter-Evidence in GSNWhile performing this case study, the author also found that the Goal Structuring

Notation (GSN) does not support counter-evidence in safety arguments. GSN was devel-

oped to assist readers in understanding the structure of safety arguments and, in particular,

the evidence, context, justifications, and assumptions supporting each of an argument's

claims. Counter-evidence does not support a claim but rather weakens or refutes the claim.

In the case of MSAW, many of the safety features of the system were introduced to

address counter-evidence that was discovered from prior accidents. It is essential for a

reader to be familiar with this counter-evidence in order to evaluate the soundness of an

argument. The only facility GSN provides for expressing counter-evidence is the context

node, which assists the reader in evaluating a claim but does not directly support it. Con-

text nodes are often used to clarify unfamiliar terms, state facts that are not claims, or

make references to other documents, and so their meaning in a safety argument is ambigu-

ous. The use of a context node to express counter-evidence might confuse the reader

because nodes in GSN generally support the claims to which they are attached.


6.5. Threats to ValidityThe threats to the internal validity of this case study are hindsight bias and the restric-

tion of evidence that was available to the Pandora analysis. The threats to external validity

arise from the use of a single system for the case study and the fact that all the analyses

were performed by author. Each of these threats is discussed below.

6.5.1. Threats to Internal ValidityHindsight bias poses the most significant threat to the validity of this case study. The

author, who performed the case study, was aware of the chronology of MSAW at the out-

set, and the accident history of MSAW was mixed with its safety rationale in the NTSB’s

final report on the Korean Air 801 accident. There is a risk that the author inadvertently

used information learned from later accidents when he constructed the pre-failure safety

arguments, applied Pandora, and produced the post-failure arguments.

To reduce the risk, the author split the information in the report into individual facts

and then categorized each fact according to the earliest accident by which it was known by

the FAA. He then constrained the factual basis for each of the pre-failure arguments to

what was known prior to the corresponding accident, and traceability was enforced from

the nodes in the pre-failure arguments to the basis. The author also constrained the evi-

dence that was available to the Pandora analysis to that which was uncovered during the

official investigation (creating an additional threat, which is described below).

Finally, the author intentionally chose a naïve approach to repairing the fallacies that

the Pandora analysis identified in the pre-failure arguments in order to avoid exploiting

insight gained from subsequent investigations or evidence that the Pandora analysis did

not explicitly elicit. Rather than making detailed revisions to the arguments, the author


simply added premises to the claims that were invalidated in order to address the specific

counter-evidence it elicited. Despite these measures, there is still a risk that this analysis

benefited from hindsight.

Because the author restricted the evidence that was available to Pandora to that which

was elicited during the official investigation of each accident, it is possible that the Pan-

dora analysis affirmed some claims for which it would have discovered counter-evidence

if that evidence had been available to it. Hence, the fallacies that the analysis identified

might be a subset of the fallacies it would have discovered in a real-world setting. More-

over, the author refrained from following up on the evidence that was elicited (for exam-

ple, by exploring the modification history of MSAW when the analysis discovered

evidence of the Guam inhibit zone) to preserve traceability from the evidence that was

elicited to the claims in the pre-failure safety arguments. A more aggressive approach to

gathering evidence might have revealed additional fallacies in the pre-failure arguments or

produced superior recommendations for addressing fallacies.

6.5.2. Threats to External ValidityThe results obtained by applying Pandora to the MSAW system might not generalize

to accidents involving other systems. The MSAW system and the accidents surrounding it

were amenable to analysis because the author was able to derive pre-failure safety argu-

ments from the documented safety rationale and there was sufficient evidence available to

evaluate the arguments, which might not be true of systems or accidents in general. More-

over, the fallacies and recommendations obtained by applying Pandora to a given failure

may vary across investigators. Additional case studies are needed to address these issues

and improve the generality of these results.


Since all the analyses for this case study were performed by the author, repetitions of

the study performed by other researchers are needed to confirm the results from this study.

This threat and the related threat of hindsight bias are significant limitations to the reliabil-

ity of the results of this study.

6.6. Chapter SummaryThe results of this case study, while qualified by the risk of hindsight bias, indicate that

Pandora shows promise as an approach to analyzing digital-system failures. Pandora was

applied to five accidents involving the MSAW system, and in each case the Pandora anal-

ysis identified all the defects in the safety rationale for MSAW that the NTSB listed in its

official findings. Since the case study was performed by the author and concerned a single

digital system, these factors must be considered in attempting to generalize the results.

The conclusions of the study and planned future evaluations are discussed in Chapter 8.

Chapter 7

Fallacy Taxonomy Evaluation

Chapter 5 presented the safety-argument fallacy taxonomy, which describes typical

fallacies that might appear in system safety arguments. The taxonomy contains entries for

33 fallacies grouped by the type of argument in which they are likely to appear. Each entry

consists of a fallacy name, definition, and one or more examples of the fallacy in a system-

safety context. The taxonomy is intended to be used as part of a Pandora analysis to assist

investigators in detecting safety-argument fallacies that might have contributed to a sys-

tem failure, but it may also be used separately to facilitate safety-argument review or to

train safety-case developers to common pitfalls to avoid in safety argumentation.

This chapter reports on a controlled experiment to evaluate the taxonomy’s usefulness

as a training aid to help reviewers detect fallacious reasoning in safety arguments. The

experiment involved 20 professional engineers, managers, researchers, and computer-sci-

ence graduate students who were asked to assess the validity of five hypothetical safety

arguments. Participants were assigned randomly either to a control group or to a treatment

group; those assigned to the treatment group were provided with copies of the safety-argu-

ment fallacy taxonomy and asked to read the taxonomy before evaluating the arguments.

100

Fallacy Taxonomy Evaluation 101

The results of the experiment indicate that, compared to the control group, the treatment

group exhibited higher accuracy in detecting fallacious reasoning as well as a lower

acceptance rate of the claims presented in the arguments; however, the difference in per-

formance was not statistically significant for the given sample size. Consequently, addi-

tional trials with larger sample sizes are needed to confirm these results.

7.1. OverviewThe purpose of this experiment was to determine whether people familiar with the fal-

lacy taxonomy would be able to detect fallacious reasoning in safety arguments more

accurately than would people who were unfamiliar with the taxonomy.

7.1.1. Population of InterestSince the taxonomy is targeted at those who develop or review safety arguments, the

population of interest to this study, , consists of people who develop or review safety

arguments for digital systems or whose occupations might require them to do so. Exam-

ples of members of include:

• Engineers and other system developers;

• Certification officials, quality assurance specialists, and regulatory officials; and

• System accident investigators.

P may be partitioned into two sub-populations, and , which are distinguished solely

by their respective familiarity with the fallacy taxonomy and defined as follows:

: members of P who are unfamiliar with the fallacy taxonomy; and

: members of P who are familiar with the fallacy taxonomy.

P

P

Pc Pt

Pc

Pt


Since the fallacy taxonomy is a novel technique, it was assumed that no members of

were familiar with the taxonomy prior to the experiment; that is, .

7.1.2. Population ParametersSince the purpose of the experiment was to determine whether members of more

accurately detected fallacious reasoning in safety arguments than did members of , the

primary random variable of interest to this experiment was the accuracy rate. For a given

safety argument, a, the accuracy rate is defined to be the proportion of claims in the argu-

ment correctly determined by a population either to be supported by fallacious or non-fal-

lacious sub-arguments. The accuracy rates of populations and for a are denoted

respectively by the random variables and , and their means are denoted by

and . Assuming that and are normally distributed, and ,

the sample means of and , are unbiased estimators of and [67]. In

practice, assessing the validity of an informal argument is somewhat subjective, and so

correctness for this experiment was measured by comparing the populations against a

baseline developed through expert consensus; the derivation of this consensus is discussed

in section 7.2.5.

In spite of efforts to mitigate the subjectivity associated with computing the accuracy

rate, a second random variable, the acceptance rate, was chosen to provide an objective

basis for comparison between and . For a given argument, a, the acceptance rate,

denoted respectively by and for and , is the proportion of claims consid-

ered that were accepted as valid and, thus, is strictly objective. The mean acceptance rates

P

P Pc=

Pt

Pc

Pc Pt

Xa c, Xa t,

μx a c, , μx a t, , Xa c, Xa t, xa c, xa t,

Xa c, Xa t, μx a c, , μx a t, ,

Pc Pt

Ya c, Ya t, Pc Pt


of and are denoted by and , and assuming as before that the distri-

butions are normal, the sample means, and , are unbiased estimators of

and , respectively.

7.1.3. HypothesisThe fallacy taxonomy was designed to represent fallacies that typically appear in

safety arguments, and it is targeted at system safety professionals, who were assumed to

be unfamiliar with informal fallacies in safety arguments. Thus, users of the taxonomy

were expected to be more familiar with the fallacies defined in the taxonomy than non-

users, and so it was hypothesized that members of would exhibit higher accuracy rates

and lower acceptance rates than would members of . More formally, the hypothesis of

this experiment was that, for a given argument a:

(Eq. 7.1)

7.2. Experimental MethodologyThe experiment followed a double-blind, posttest-only control group design derived

from Campbell [68], and it was developed to comply with the guidelines of the University

of Virginia Institutional Review Board for the Social and Behavioral Sciences [69]. The

sample population, , consisted of twenty volunteers who were either computer-science

graduate students, researchers, engineers, or project managers. Recruiting campaigns were

conducted within the School of Engineering at the University of Virginia, at the NASA

Langley Research Center in Hampton, Virginia, and at the 2006 National Software and

Complex Electronic Hardware (SCEH) Standardization Conference, hosted by the Federal

Ya c, Ya t, μy a c, , μy a t, ,

ya c, ya t, μy a c, ,

μy a t, ,

Pt

Pc

Hx:μx a c, , μx a t, ,<

Hy:μy a c, , μy a t, ,>

S


Aviation Administration (FAA), which is frequented by digital-safety professionals in the

commercial aviation industry. The author administered the campaign at the University of

Virginia, and Michael Holloway of NASA Langley Research Center administered the

campaigns at the Center and at the SCEH conference. Recruiting advertisements were dis-

seminated via e-mail, World Wide Web, and paper flyers. Volunteers who were interested

in participating in the experiment were encouraged to contact an experimenter via e-mail

to enroll. Participants in the experiment received no compensation. A breakdown of the

sample population’s composition by recruitment site is presented in Table 7.1.

University of Virginia. Eleven people from the University of Virginia School of Engi-

neering and Applied Science volunteered to participate, and nine completed the experi-

ment. Of the nine participants, six were computer-science graduate students, two were

computer-science professors, and one was a computer-science researcher. Each of the par-

ticipants had received a bachelor’s degree in computer science or mathematics, five had

received master’s degrees in computer science, and three had received the Ph.D. in com-

puter science. Two participants indicated that their research concerns the development or

Table 7.1: Sample Population Composition by Recruitment Site

Site Enrolleda

a. The number of volunteers who received a survey packet.

Completedb

b. The number of volunteers who completed the experiment.

Attritionc

c. The number of volunteers who did not complete the experiment.

University of Virginia 11 9 2 (18%)

NASA Langley Research Center 12 8 4 (33%)

SCEH Conference 5 3 2 (40%)

Total 28 20 8 (29%)


certification of safety-critical systems, and seven reported having received formal training

in logic or philosophy.

NASA Langley Research Center. Twelve people from NASA Langley Research Center

volunteered to participate in the experiment; however, two volunteers later withdrew and

two others did not complete the experiment. Of the eight participants who completed the

experiment, six identified themselves as engineers or scientists, one as a project manager,

and one listed his or her job description as being affiliated with mission assurance. Five of

the participants had received bachelor’s degrees in computer science or mathematics; the

other three reported having received bachelor’s degrees in other engineering disciplines.

Each of the participants had received a master’s degree: five in computer science or math-

ematics, two in other engineering disciplines, and one in business administration. Three

participants had received the Ph.D.: one in computer science, one in computational and

applied mathematics, and one in materials science and engineering. Five participants indi-

cated that their occupations concern the development or certification of safety-critical sys-

tems, and six reported having received formal training in logic or philosophy.

SCEH Conference. Five people who responded to the SCEH Conference invitation vol-

unteered to participate in the study, and three completed the experiment. Of the three, two

were systems engineers, and the third was a Ph.D. student. The two systems engineers had

received bachelor’s degrees in electrical and mechanical engineering, respectively, and

one had received a master’s degree in business administration. The student participant had

received a bachelor’s degree in business and computing and a master’s degree in informa-

tion systems. All three participants reported that their occupations concerned the develop-


ment or certification of safety-critical systems, and two reported having received formal

training in logic or philosophy.

Since the recruitment effort targeted individuals who belonged to the population of

interest, , it was assumed that . Based upon the assumption that, prior to the

experiment, , a posttest-only control group design was employed for the experi-

ment in which members of were assigned randomly to a control group, , or to a treat-

ment group, [68]. Members of the treatment group were exposed to the fallacy

taxonomy, and then members of both groups completed a survey in which they were asked

to evaluate a set of five safety arguments. Participants were not informed of their group

assignments, and the survey materials they received did not mention the existence of sepa-

rate control and treatment groups.

7.2.1. Survey InstrumentsEach participant was given a survey packet containing the following materials.

• A cover letter with instructions to the participant for completing the survey;

• A questionnaire on the participant’s educational and professional background;

• A one-page explanation of safety arguments;

• A sample exercise; and

• A set of five safety arguments to be evaluated by the participant.

Participants assigned to the treatment group, , also received a copy of the safety-argu-

ment fallacy taxonomy in their packets along with special instructions directing them to

read the taxonomy prior to evaluating the safety arguments. Appendix A contains a copy

P S P⊂

P Pc=

S Sc

St

St


of the taxonomy, and Appendix C documents the experimental procedure and contains

copies of the other survey instruments. The survey was administered anonymously.

7.2.2. Safety Argument DesignFive hypothetical safety arguments were included in the survey:

• Argument #1, which contained four claims, concerned an air traffic management

system and was based on the EUR Whole Airspace Air Traffic Management Sys-

tem Safety Case [62].

• Argument #2, which also contained four claims, concerned a nuclear reactor trip

system and was created by Michael Holloway.

• Argument #3, which contained seven claims, concerned an electronic throttle for

an automobile and was adapted from Kelly’s conversion of the Jaguar XK8 Elec-

tronic Throttle safety case into Goal Structuring Notation (GSN) [70].

• Argument #4, which contained nine claims, concerned a control system for an

explosive chemical plant and was adapted from Storey’s model safety assessment

for such a plant [35].

• Argument #5, which contained twenty claims, also concerned a nuclear reactor

shutdown system and was adapted from Kelly’s conversion of Adelard’s nuclear

trip system safety case into GSN [34].

Each safety argument was presented as a sequence of numbered claims with the evi-

dence supporting each claim presented as a bulleted list beneath the claim. In order to

make the arguments realistic, the claims were organized into a tree structure in which a

piece of evidence supporting a claim was occasionally elaborated as its own claim else-

where in the argument. Since the participants were not expected to be familiar with the


application domains of the safety arguments, contextual information was provided at the

beginning of each argument that introduced the system in question and its safety concerns.

Two questions to the participants followed each of the claims and accompanying sup-

porting evidence. The first question asked, “Assume the evidence is true. Is the evidence

sufficient to convince you that the claim is also true?” Participants were asked to respond

“Yes” or “No” to this question; participants who responded in the negative were asked to

explain why the evidence failed to convince them of the truth of the claim.

Selecting the safety arguments to include in the survey was a major design challenge

due to the variety of factors that had to be considered. These factors included:

• Representativeness of the arguments: For the results of the experiment to be

meaningful, the safety arguments needed to be typical of real-world arguments.

• Tractability: The arguments needed to be short enough for participants to be able

to complete the entire survey in about three hours.

• Accessibility: The arguments needed to be amenable to analysis by persons with

little to no experience in safety argumentation and who were potentially unfamiliar

with the application domains of the systems in question.

• Taxonomy Coverage: It was desirable for the arguments to cover a decent breadth

of the fallacies in the taxonomy.

Two methods were used to develop the safety arguments to be included in the survey.

In the first method, an initial draft of an argument was prepared either from scratch or

from an existing safety argument. Any fallacies in the draft were either expunged or noted,

and then new fallacies were injected into the argument. This method appealed to the trac-

tability and accessibility concerns since the length and level of technical detail in the argu-


ment could be specially tailored to suit the purposes of the experiment. Furthermore, by

controlling the fallacies that were present in the arguments, good coverage of the fallacies

defined in the taxonomy could be ensured. Arguments #1 and #2 were developed with this

method, each of which contained four claims.

A drawback of the first method was that creating arguments specifically for the exper-

iment and injecting fallacies into them threatened to bias the arguments. Thus, a second

method was employed to develop the remaining three arguments that provided greater

confidence that the arguments would be representative of real-world safety arguments.

Arguments developed independently of the experiment by third parties were sampled and

incorporated into the survey with as few revisions as possible. Arguments #3, #4, and #5

were developed using this method.

With the exception of the electronic throttle argument, the safety cases sampled for the

survey did not describe operational systems but were instead intended to be used as mod-

els of simple, real-world safety arguments. At the time the survey was developed, very

few safety cases for operational systems were publicly available, and those that were

available were too voluminous to be included in the survey. Model arguments were chosen

instead because they were relatively short (the longest argument contained 20 claims),

making the survey tractable, and because they were designed to be accessible to readers

who were unfamiliar with the application domains of the systems they described.

Although the extent to which these arguments would exercise the fallacies in the taxon-

omy was unknown, it was decided that these arguments provided an acceptable compro-

mise between the design considerations described earlier.


7.2.3. TrialsThree trials of the experiment were conducted, one at each of the recruitment sites,

between June and July 2006. The author administered the trial at the University of Vir-

ginia, and Michael Holloway administered the trials at NASA Langley Research Center

and the SCEH conference. Prior to the commencement of each trial, an experimenter gen-

erated a list of code numbers and randomly assigned half the code numbers to the control

group; the remaining code numbers were assigned to the treatment group. The experi-

menter then published the survey packets and marked each packet with a unique code

number generated in the previous step. If the code number was assigned to the treatment

group, the experimenter added a copy of the fallacy taxonomy and the corresponding

instructions to the packet.

The survey packets were distributed to the participants either in paper form or elec-

tronically. In the case of paper packets, the experimenter sealed each packet and shuffled

the packets together so that the distribution of packets to participants would be random

and the experimenter would not know which packet a participant received or whether the

packet belonged to the control group or to the treatment group. In the case of electronic

packets, a second experimenter, who did not participate in the scoring, distributed the

packets to the participants via e-mail and withheld the assignment of packets to partici-

pants from the first experimenter.

It was expected that the participants would require about two hours to complete the

survey and that those who received treatment-group packets would require an additional

hour to read the fallacy taxonomy. To encourage the participants to analyze the arguments

carefully, however, the experimenters did not want them to feel under pressure to com-

plete the study quickly, and so in each trial the participants were allotted one week to


return their completed surveys, and requests for time extensions were granted. The partic-

ipants were told to expect the survey to take about three hours to complete, and they were

permitted to complete the survey in multiple sittings and in their own environments.

University of Virginia Trial. Paper survey packets were used exclusively in trial con-

ducted at the University of Virginia, and the author distributed the packets to participants

and scored their responses. In order to preserve his blindness during the scoring, the

author sealed the survey packets in manila envelopes and shuffled the envelopes so that he

would not know whether a participant received a control-group packet or an treatment-

group packet. Upon collecting the completed packets from the participants, the author sep-

arated the participants’ responses from other material in the packets that revealed the

packets’ assignments. The author then scored the participants’ responses before removing

the blind.

NASA Langley Research Center Trial. Electronic survey packets were used predomi-

nantly in the trial conducted at NASA Langley Research Center; although, two partici-

pants chose to receive paper survey packets. Michael Holloway, who did not participate in

the scoring distributed the packets to the participants, collected their responses, and for-

warded the responses to the author for scoring after stripping them of information that

would compromise the author’s blindness.

SCEH Conference Trial. Electronic survey packets were used exclusively for the trial

involving SCEH Conference volunteers. Michael Holloway managed the distribution and

collection of the packets as in the trial conducted at NASA Langley Research Center.


7.2.4. Response CodingThe data for this experiment consist of the participants’ responses to the questionnaire

and to the questions regarding the safety-argument evaluations. Upon receiving the com-

pleted survey packets from each trial, the author separated the packets into a stack of com-

pleted questionnaires and a stack of completed evaluations. The remainder of the material

in the survey packets was discarded since it contained no experimental data but did con-

tain information that might reveal the packet’s group assignment.

The data from the paper and electronic survey packets were entered into a computer

database to make them amenable to analysis; this process was fairly straightforward. Par-

ticipants’ responses were entered as faithfully as possible with only minor typographical

and formatting changes made to accommodate the data entry requirements. Responses to

the yes or no question that followed each claim were coded as a numeric “1” or “0” (zero),

respectively. If no response was given to this question for a particular claim or if both

“Yes” and “No” were checked, no response to the question was coded.

7.2.5. ScoringQuantitative analyses were performed on the participants’ responses to the Yes or No

question for each claim. Acceptance rate and accuracy rate values were computed for each

participant’s responses to each argument. The computations are described below.

Acceptance Rate. For an argument, a, consisting of n claims, denote participant p’s

responses to the yes-or-no questions concerning each of these claims by ,

respectively, where if p responded “Yes” to the question concerning the i-th claim

and otherwise. Let be the total number of claims in a for which p entered a

“Yes” or “No” response. Then , the acceptance rate of p with respect to a, is:

r1 r2 … rn, , ,

ri 1=

ri 0= ca p,

ya p,


(Eq. 7.2)

Thus, a participant’s acceptance rate for a given argument is ratio of the number of claims

in the argument for which the participant indicated there was sufficient evidence to con-

vince him that the claims were true to the total number of claims for which the participant

provided a response that could be coded.

Accuracy Rate. The accuracy rate, the primary statistic of this experiment, was computed

by comparing the participants’ responses to the yes-or-no question for each claim to a

baseline sequence of responses. As with the acceptance rate, accuracy rates were com-

puted for each participant’s responses to each of the five arguments. A participant’s

response for a given claim was judged to be accurate if it matched the baseline response. If

the participant’s response differed from the baseline response or if no response was coded

for the participant, the participant’s response for the claim was judged to be inaccurate.

More formally, for a given participant, p, and argument, a, consisting of n claims, denote

the baseline responses to a by and p’s responses by as above.

Define:

(Eq. 7.3)

Then , the accuracy rate of p with respect to a, is computed as:

(Eq. 7.4)

ya p,r1 r2 … rn+ + +

ca p,---------------------------------------=

b1 b2 … bn, , , r1 r2 … rn, , ,

f bi ri,( )1 bi, ri=

0 bi ri≠,⎩⎨⎧

=

xa p,

xa p,

f bi ri,( )

i 1=

n

∑n

---------------------------=


Establishing the baseline responses was a challenge due to the subjective nature of

evaluating informal arguments. To mitigate this subjectivity, a panel of three software

safety researchers consisting of the two experimenters and Kelly Hayhurst, a research sci-

entist at NASA Langley Research Center, responded to the yes-or-no questions in the sur-

vey and then convened to achieve a consensus, which was used as the baseline.

7.3. Results

Table 7.2 presents the mean acceptance and accuracy rates for the control and treat-

ment groups as well as the difference in performance between the two groups; this infor-

mation is depicted graphically in Figures 7.1 and 7.2. The acceptance rates for the

treatment group were lower than those for the control group for each of the five argu-

ments, and the accuracy rates for the treatment group were higher than those for the con-

trol group in each case as well, but for most of the arguments the differences were small.

Appendix D contains the quantitative data used to obtain these results.

7.3.1. Hypothesis TestingHypothesis tests were performed on the acceptance and accuracy rate data for each

argument to determine whether the differences reported in Table 7.2 between the control

Table 7.2: Mean Acceptance and Accuracy Rates

Acceptance Rate Accuracy Rate

Argument Control Treatment Δ Control Treatment Δ

Arg. 1 0.35 0.33 -0.02 0.65 0.68 +0.03

Arg. 2 0.20 0.13 -0.07 0.73 0.88 +0.15

Arg. 3 0.25 0.21 -0.04 0.73 0.76 +0.03

Arg. 4 0.41 0.28 -0.13 0.67 0.79 +0.12

Arg. 5 0.49 0.32 -0.17 0.58 0.62 +0.04


Figure 7.1: Mean Acceptance Rates

Figure 7.2: Mean Accuracy Rates

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Arg. 1 Arg. 2 Arg. 3 Arg. 4 Arg. 5

Acc

epta

nce

Rat

e

Control

Treatment

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00


Acc

urac

y R

ate

Control

Treatment


group, , and the treatment group, , were statistically significant. The sample data for

these tests consisted of the participants’ acceptance and accuracy rates for each of the

arguments, and the sample size, n, for both and was 10. Since the sample size for

this experiment was small ( ) and the population variances underlying the sample

data were unknown, hypothesis tests based on the Student-t distribution were used to

account for these factors [67, 68]. Microsoft® Office Excel 2003 was used to perform the

hypothesis tests.

Before performing the hypothesis tests, histogram plots were prepared to confirm that

the statistical distributions underlying the sample data appeared to be normal. A histogram

was plotted for each of the experimental variables (acceptance rate and accuracy rate), for

each of the groups (control and treatment), and for each of the five arguments, yielding

twenty plots in total, which are contained in Appendix D.

7.3.2. Hypothesis Tests on the Acceptance RateIn order to determine the appropriate T test to perform on the acceptance-rate data for

each argument, it was necessary to determine whether the variances in acceptance rates for

the underlying populations, and , were equal. Thus, a two-tailed F test was per-

formed to compare the variances. For each argument, a, let denote the variance of

, the acceptance rate for , and let denote the variance of , the accep-

tance rate for . For a two-tailed F test for variances, the null and alternate hypotheses

are as follows:

(Eq. 7.5)

Sc St

Sc St

n 30<

Pc Pt

σy a c, ,2

Ya c, Pc σy a t, ,2 Ya t,

Pt

H0:σy a c, ,2 σy a t, ,

2=

H1:σy a c, ,2 σy a t, ,

2≠


Since the F test is not very powerful and tends not to reject when the variances are

unequal, a relatively low confidence level of 80% ( ) was used [67].

Table 7.3 contains the results of the variance tests for each argument. For Argument

#4, there was sufficient statistical evidence at the 80% confidence level to reject the null

hypothesis, and so the variances for Argument #4 were assumed to be unequal. For Argu-

ments #1, #2, #4, and #5, there was insufficient statistical evidence to reject the null

hypothesis, and so the variances for those arguments were assumed to be equal.

Based on the results of the variance tests, pooled T tests were performed for Argu-

ments #1, #2, #4, and #5, and a T test assuming unequal variances was performed for

Argument #3. The tests were performed at the 95% confidence level ( ). Since

the treatment group was hypothesized to exhibit a lower acceptance rate than the control

group, a right-tailed test was used with the following null and alternate hypotheses:

Table 7.3: Two-Sample F Tests for Variances: Acceptance Rate


C T C T C T C T C T

0.35 0.33 0.2 0.13 0.25 0.21 0.41 0.28 0.49 0.32

0.06 0.10 0.03 0.03 0.03 0.07 0.05 0.03 0.06 0.06

10 10 10 10 10 10 10 10 10 10

9 9 9 9 9 9 9 9 9 9

0.60 0.8 0.39 1.72 0.94

0.23 0.37 0.09 0.21 0.46

0.56 0.56 0.56 1.79 0.56

y

s2

n

γ

F

p F f≤( )

F critical

H0

α 0.20=

α 0.05=


(Eq. 7.6)

Table 7.4 presents the results of the T tests for each argument. There was insufficient

statistical evidence at the 95% confidence level to reject the null hypothesis for any of the

five arguments. The tests for Arguments #4 and #5 shared the lowest p-value, which was

0.07. Thus, the differences in the acceptance rates of the control group and the treatment

group were not statistically significant at the 95% confidence level.

7.3.3. Hypothesis Tests on the Accuracy RateHypothesis testing of the accuracy rate data was conducted following the same process

used to test the acceptance rate data. Histogram plots were prepared to verify that the dis-

tribution underlying the data appeared to be normal, and variance testing was performed to

select the appropriate T test for each of the five arguments. Table 7.5 contains the results

of the variance tests, which show sufficient statistical evidence at the 80% confidence

level to reject the null hypothesis for Arguments #3 and #5. Thus, a pooled T test was used

Table 7.4: Two-Sample T Tests: Acceptance Rate


Test Type

0.08 0.03 N/A 0.04 0.06

18 18 15 18 18

0.2 1 0.39 1.56 1.53

0.42 0.17 0.35 0.07 0.07

1.73 1.73 1.75 1.73 1.73

H0:μy a c, , μy a t, ,=

H1:μy a c, , μy a t, ,>

σ12 σ2

2= σ12 σ2

2= σ12 σ2

2≠ σ12 σ2

2= σ12 σ2

2=

sp2

γ

t stat

p T t≤( )

t critical


for Arguments #1, #2, and #4, and a T test assuming unequal variances was used for Argu-

ments #3 and #5.

Since the treatment group was hypothesized to exhibit a higher accuracy rate than the

control group, left-tailed, two-sample T tests were used to determine whether the differ-

ences in accuracy rates for each argument between the control and treatment groups were

statistically significant. The tests were performed at the 95% confidence level as before

( ) with the following null and alternate hypotheses:

(Eq. 7.7)

Table 7.6 shows the results of the T tests on the accuracy-rate data. At the 95% confi-

dence level, there was sufficient statistical evidence to reject the null hypothesis for Argu-

ment #2, and the p-value for this argument was 0.03. There was insufficient evidence to

reject the null hypothesis for any of the remaining arguments, and the next highest p-value

Table 7.5: Two-Sample, Two-Tailed F Tests for Variances: Accuracy Rate


C T C T C T C T C T

0.65 0.68 0.73 0.88 0.73 0.76 0.67 0.79 0.58 0.62

0.06 0.10 0.02 0.03 0.02 0.06 0.04 0.03 0.02 0.01

10 10 10 10 10 10 10 10 10 10

9 9 9 9 9 9 9 9 9 9

0.60 0.64 0.39 1.31 2.11

0.23 0.26 0.09 0.35 0.14

0.56 0.56 0.56 1.79 1.79

x

s2

n

γ

F

p F f≤( )

F critical

α 0.05=

H0:μx a c, , μx a t, ,=

H1:μx a c, , μx a t, ,<


was 0.08. Thus, the difference in the accuracy rates of the control and treatment groups

was statistically significant with 95% confidence for Argument #2 only.

7.3.4. Summary of ResultsConsistent with the experimental hypothesis stated in section 7.1.3, the treatment

group exhibited lower acceptance rates and higher accuracy rates than the control group

for each of the five safety arguments included in the survey; however, in all but one of the

cases these differences were not statistically significant. At the 95% confidence level, the

difference in accuracy rates between the control and treatment groups was significant only

for Argument #2, and the differences in acceptance rates were not statistically significant

for any of the arguments.

7.4. DiscussionAs with any study, care must be taken in interpreting the results of this

experiment [71]. Rival hypotheses as well as questions concerning the applicability of the

results threaten both the internal and external validity of the results. Campbell and

Sjøberg et al. identify selection, history, maturation, attrition, instrumentation, testing,

Table 7.6: Two-Sample T Tests: Accuracy Rate


Test Type

0.08 0.03 N/A 0.04 N/A

18 18 15 18 16

-0.2 -2.09 -0.30 -1.43 -0.78

0.42 0.03 0.38 0.08 0.22

1.73 1.73 1.75 1.73 1.75

σ12 σ2

2= σ12 σ2

2= σ12 σ2

2≠ σ12 σ2

2= σ12 σ2

2≠

sp2

γ

t stat

p T t≤( )

t critical


and regression as typical threats to internal validity associated with experimentation [68,

72]. The randomized, controlled study design employed in this experiment addresses

many of these threats [68]; however, special cases arise in most experiments, and so a dis-

cussion of these threats is warranted nevertheless.

7.4.1. SelectionAs a threat to the internal validity of an experiment, selection refers to the possibility

that the characteristics of the participants in the control and treatment groups differed,

which could explain the results that were observed. Although the results of this experi-

ment were statistically inconclusive, selection could have influenced the performance of

either group nevertheless. In general, random assignment of participants to the control and

treatment groups controls for this type of bias; however, due to the low sample size, it is

possible that participants in one group were predisposed to perform differently than those

in the other group. To examine this possibility, Tables 7.7, 7.8, and 7.9 report the distribu-

tion of participants in the control and treatment groups based upon the background infor-

mation they provided on the questionnaire. Table 7.7 shows the composition of the control

and treatment groups by academic degree, Table 7.8 by occupation, and Table 7.9 by

reported training in logic or philosophy.

All of the participants in the experiment reported that they had earned bachelor’s

degrees, and so the distribution of participants by this measure is even. As Table 7.7 illus-

trates, a roughly equal number of participants in each group held master’s degrees, and

slightly more participants in the control group reported having earned the Ph.D. Decom-

posing the participants’ academic background further into the concentrations of their

degrees also reveals no major discrepancies.


Table 7.8, which examines the distribution of participants by reported occupation, also

does not indicate major differences between the control and treatment groups; however,

slightly more participants in the control group were engineers or scientists. A greater pro-

portion of participants in the control group reported that their work concerns safety-related

systems, which might have affected the performance of the control group.

Participants were asked on the questionnaire to “describe any formal training [they]

have received in logic or philosophy.” This question was open-ended, and the participants’

Table 7.7: Distribution of Participants by Academic Degree

Academic Degree Control Treatment Total

Bachelor’s Degree 10 10 20

Computer Engineering/Science 5 6 11

Engineering, Othera 3 2 5

Mathematics 3 2 5

Physics 2 2

Otherb 1 2 3

Master’s Degree 7 8 15

Computer Engineering/Science 5 5 10

Engineering, Otherc 1 1 2

Managementd 1 2 3

Mathematics 1 1

Physics 1 1

Ph.D. 4 2 6

Computer Science 3 2 5

Materials Science & Eng. 1 1

a. Includes aerospace, electrical, and mechanical engineering.b. Includes business and computing, Slavic & Germanic studies, and religion.c. Includes aerospace engineering and materials science and engineering.d. Includes business administration and engineering management.


responses ranged from formal courses in Boolean logic to informal subjects including eth-

ics, epistemology, and religious studies. Some participants gave responses such as “gradu-

ate study in both” that did not enumerate the specific subjects they studied. Consequently,

their responses were grouped into the four broad categories listed in Table 7.9. Again, the

differences between the control and treatment groups are minor, but slightly more partici-

pants in the control group reported some training in logic or philosophy.

Table 7.8: Distribution of Participants by Occupation

Occupation Control Treatment Total

Engineera 4 3 7

Safety-related 4 2 6

Manager 1 1

Safety-related

Scientistb 3 2 5

Safety-related 2 1 3

Student (Graduate) 3 4 7

Safety-related 1 1

Total Safety-related 6 4 10

a. Includes aerospace, computer, research, and systems engineers.b. Includes professors, researchers, and research scientists.

Table 7.9: Distribution of Participants by Training in Logic or Philosophy

Training Control Treatment Total

None 2 4 6

Logic Only 1 2 3

Philosophy Only

Logic & Philosophy 7 4 11

Some Training 8 6 14


Examining each of the dimensions of education, occupation, and training in logic or

philosophy reveals only minor discrepancies between the control and treatment groups;

however, these discrepancies favored the control group in each case. Thus, it is possible

that the characteristics of the control group’s composition gave it a slight advantage over

the treatment group.

7.4.2. HistoryFor this experiment, history refers to the possibility that events occurring concurrently

with the participant’s completion of the survey could have influenced the results, such as

subject interaction. The take-home format of the survey introduces history as a threat to

internal validity, but the cover letter included in the participants’ survey packets instructed

them to complete the survey without aid from others and without consulting external ref-

erences. Furthermore, the anonymity of the survey removed incentives for the participants

to violate these instructions.

A typical method of eliminating history as a threat is to require participants to com-

plete the study in a controlled environment. For this experiment, it was viewed that adopt-

ing such a requirement would discomfort the participants and, consequently, either

discourage them from volunteering to participate in the experiment or impair their ability

to concentrate on the survey. Moreover, implementing such an environment in each of the

three trials would have introduced logistical challenges and precluded the use of electronic

survey forms.

7.4.3. MaturationMaturation refers to “biological or psychological processes which systematically vary

with the passage of time” [68]. Boredom, fatigue, demotivation, and loss of enthusiasm


are common forms of maturation in software engineering studies, and they are relevant to

this study [72]. The survey spanned approximately 30 pages and asked participants to crit-

ically analyze 44 safety-argument claims, and it is possible that the participants became

exhausted or impatient with the survey as they neared the end. Due to the randomized,

controlled design employed in the experiment, one would expect these effects to be simi-

lar for the control and treatment groups, and so maturation arising from the survey itself is

unlikely to be a concern.

Maturation arising from the treatment is a concern, however, because it would affect

only members of the treatment group. In this experiment, members of the treatment group

were asked to read the fallacy taxonomy before completing the survey, which entailed an

additional ten pages of reading. These participants might have experienced some degree of

maturation while studying the taxonomy, which could have affected their comprehension

of the material in the taxonomy as well as their performance on the survey. In this respect,

maturation constitutes a threat to the internal validity of this study, and strategies for

reducing this threat in future studies are discussed in Chapter 8.

A second mechanism by which maturation might have affected the results of the study

arises from the low proportion of valid claims in the safety arguments included in the sur-

vey. Of the 44 claims that the participants were asked to consider, only eight were consid-

ered to be valid according to the baseline responses. As participants completed the survey,

they might have detected this trend and developed a predisposition to reject claims, but

one would expect this tendency to manifest itself in both the control and treatment groups.

Thus, although the low proportion of valid claims might comprise an external threat to the

validity of this experiment, it does not appear to be a concern for internal validity.


7.4.4. AttritionAttrition, or alternatively, mortality, refers to “the differential drop-out of persons from

the two groups” [68], which “can produce artifactual effects if that loss is systematically

correlated with conditions” [72]. Table 7.1 presents the attrition rates by trial, and Table

7.10 shows how attrition was distributed among the control and treatment groups.

Although the control group experienced a higher rate of attrition than did the treatment

group, this difference is minor and does not suggest a systemic cause.

7.4.5. InstrumentationInstrumentation refers to changes in the instruments used to collect the data, i.e., the

survey forms, during the course of the experiment. Upon concluding the trial conducted at

the University of Virginia, a defect was discovered in the survey that affected Arguments

#3 and #5. Four of the claims in each of these arguments included premises that were

stated explicitly as assumptions; for example, “It is assumed that all of the software safety

properties have been identified.” Although the participants were instructed to assume that

the evidence supporting each claim is true, six of the nine participants in the trial rejected

claims that were supported in this manner and cited the explicit statement of an assump-

tion as their justification for doing so. Four of these participants were assigned to the con-

trol group, and two were assigned to the treatment group. This defect was corrected in the

subsequent trials by removing the phrase “It is assumed that...” from the premises, but the

Table 7.10: Attrition in Control & Treatment Groups

Group Enrolled Completed Attrition

Control 15 10 5 (33%)

Treatment 13 10 3 (23%)


existence of the defect in the trial conducted at the University of Virginia might have influ-

enced the results of the experiment.

Simply altering the responses for the six participants who rejected the affected claims

is an inappropriate method of treating this defect because it is based upon the unfounded

counter-factual assumption that, had the defect not existed, the participants would have

accepted the claims. Similarly, other participants might have rejected the claims for the

same reasons but did not cite those reasons in their justifications. Thus, in order to quan-

tify the effect of the defect in an unbiased manner, the sample mean acceptance and accu-

racy rates for Arguments #3 and #5 were recomputed with the affected claims1 discarded.

Tables 7.11 and 7.12 compare the adjusted mean acceptance and accuracy rates to the

original values reported in Table 7.2. The tables suggest a dramatic increase in the dispar-

ity of the results between the control and treatment groups for Argument #3, but with four

of its seven claims discarded, one must regard the practical significance of this difference

with skepticism. For Argument #5, the difference in mean acceptance and accuracy rates

between the original and adjusted values is slight.

While the complications arising from the survey defect were undesirable, their impact

on the results of the experiment does not appear to be serious enough to constitute a credi-

ble threat of instrumentation bias. Only for one of the five arguments is there evidence to

suggest that instrumentation bias might have skewed the results, and even in this case the

evidence is questionable.

1. Argument #3, claims 2, 3, 4, and 6, and Argument #5, claims 3.1, 3.2, 3.3, and 4.1 were dis-carded for this analysis.


7.4.6. Additional Threats to Internal ValidityIn addition to the threats discussed above, Campbell and Sjøberg et al. identify testing

and regression as typical threats to the internal validity of an experiment [68, 72]. Testing

bias, in which “exposure to a test can affect subsequent exposures to that test” [72], is not

relevant to this experiment because a single posttest (the survey) was administered. Simi-

larly, regression is not a factor in this experiment because no pretest was used to screen

participants.

7.4.7. Hawthorne EffectThe Hawthorne effect is a phenomenon in industrial psychology that is defined as “an

increase in worker productivity produced by the psychological stimulus of being singled

out and made to feel important” [73]. In experimentation, the effect “is apparently due to

the effect on participants of knowing themselves to be studied in connection with the out-

comes measured” [74]. The double-blind, control-group design of this experiment con-

Table 7.11: Adjusted Mean Acceptance Rates

Argument #3 Argument #5

Group Originala Adjustedb Originala Adjustedb

Control 0.25 0.37 0.49 0.48

Treatment 0.21 0.12 0.32 0.32

a. As reported in Table 7.2b. With selected claims discarded as noted in §7.4.5

Table 7.12: Adjusted Mean Accuracy Rates

Argument #3 Argument #5

Group Originala Adjustedb Originala Adjustedb

Control 0.73 0.60 0.58 0.59

Treatment 0.76 0.83 0.62 0.62

a. As reported in Table 7.2b. With selected claims discarded as noted in §7.4.5


trols for the Hawthorne effect because its manifestations would be expected to be equal in

both the control and the treatment groups.

7.4.8. Multiple TestsSince multiple hypothesis tests were performed in this experiment, the number of

results that are likely to have occurred by chance must also be considered [71, 75]. Ten

two-sample T tests were performed to analyze the data obtained from the experiment, each

with a confidence level of 95%, which corresponds to a probability of Type I error of

. Thus, if the null hypothesis had been rejected in each of the ten tests, the

expected number of outcomes arising from Type I error would have been 0.5.

7.4.9. Applicability of the ResultsThe factors affecting the applicability of the results of this experiment are the repre-

sentativeness of the sample population, the representativeness of the sample of arguments

included in the survey, and the realism associated with the type of assessment the partici-

pants were asked to perform.

The use of students in software-engineering experiments raises questions about the

applicability of their results. The opinion of the software engineering community is mixed

on this subject, but Kitchenham et al. argue that “students are the next generation of soft-

ware professionals and, so, are relatively close to the population of interest” [71]. The stu-

dent participants in this experiment were each enrolled in a graduate computer-science

program and would possess qualifications similar to those of a junior software engineer.

Second, even though three of the five arguments included from the survey were

adapted from real-world or model safety arguments, it is unknown whether the results

obtained from these arguments are representative of safety arguments in general. The low

α 0.05=


number of valid claims in the arguments that were sampled might indicate that not all of

the essential context was preserved in adapting the arguments to the survey or that the

arguments themselves are faulty; if so, then whether these observations are typical of

safety arguments in general is unknown. The arguments that were included in this survey

were sampled from publicly available arguments, and since most safety arguments are not

publicly available, this sampling method might create a bias.

Finally, there are notable differences between the activity the participants were asked

to perform and what an actual safety-case review would entail. The participants were

asked to review a set of five arguments with minimal contextual knowledge of the systems

in question and mark, for each claim, whether the supporting evidence convinced them

that the claim was true. They were also instructed to assume that the evidence was true in

each case. In practice, safety arguments are evaluated as whole entities—not by treating

their claims in isolation—and reviewers are experts who are knowledgeable about the sys-

tems in question. Mirroring this environment in an experimental setting would have been

infeasible due the extensive amount of safety-engineering and domain expertise that

would have been required of the participants.

7.5. Threats to ValidityThe outstanding threats to the internal validity of this experiment are selection bias

arising from the outcome of the random assignment of participants to the control and treat-

ment group; and maturation bias associated with the self-training approach that was used

to expose the treatment group to the taxonomy. The outstanding threats to external validity

are the selection of participants from the population of interest, the selection of safety

arguments to include in the survey, and realism of the experimental methodology. These


threats were discussed in the previous section, and they comprise the limitations of this

experiment. The lack of statistical significance in the results of this study might be due to

the small sample size that was employed.

7.6. Chapter SummaryThis chapter described a controlled experiment to evaluate the effectiveness of the

safety-argument fallacy taxonomy introduced in Chapter 5. A group of twenty individuals

consisting of computer-science graduate students and research scientists, professional

engineers, and engineering project managers was divided into a control group and a treat-

ment group, which were then asked to evaluate a set of safety arguments. Members of the

treatment group were provided with copies of the fallacy taxonomy and asked to read it

before reviewing the arguments. The results of the experiment indicate that, on average,

the treatment group assessed the validity of the claims in the arguments more accurately

than the control group; however, the difference in performance was not statistically signif-

icant at the 95% confidence level. As with any study, these results are qualified by the lim-

itations of the experiment—especially the small sample size—and so they must be

interpreted with care. Chapter 8 presents the conclusions of the study and discusses areas

of future work surrounding it.

Chapter 8

Conclusions & Future Work

This chapter presents the conclusions and contributions of this dissertation as well as

areas for future work. Section 8.1 summarizes the work and the conclusions regarding

Pandora and the fallacy taxonomy. Section 8.2 lists the research contributions of the work

as a whole. Finally, Section 8.3 explores areas for future work based in part on the results

of the evaluations reported in Chapters 6 and 7.

8.1. ConclusionsThe analysis of safety-related failures involving software and programmable elec-

tronic systems is complicated by the high degree of complexity and coupling these sys-

tems typically exhibit and the fact that failures of these systems stem from design faults,

which are difficult to identify and address. Consequently, investigators who are unpre-

pared for the unique challenges associated with analyzing digital-system failures might

miss important evidence, overlook design faults or defects in the system development life-

cycle, or fail to communicate lessons and recommendations effectively. In order to assist

investigators in overcoming some of these challenges, this dissertation developed Pan-

132

Conclusions & Future Work 133

dora, an approach to analyzing failures of safety-related digital systems framed around the

system safety case, and the safety-argument fallacy taxonomy, which documents typical

fallacies in safety arguments.

8.1.1. PandoraFailure analysis is part of an iterative process in which systems are developed with

unknown design faults and put into operation. For safety-related systems, design faults

that lead to failures—and the development errors that introduced the faults—indicate the

presence of fallacies in a system’s safety argument, and so, in addition to correcting the

faults, the argument must be repaired in order to restore validity to the system’s safety

claims. This process is the enhanced safety-case lifecycle, and it is the basis for Pandora.

In the context of a failure, investigators applying Pandora systematically review a sys-

tem’s safety argument for fallacies that are associated with the failure and, thus, might

indicate the presence of design faults in the system. They do so by eliciting counter-evi-

dence from the failure that refutes the argument’s claims. When the investigators discover

a fallacy, they modify the safety argument to address the counter-evidence and restore

validity to the argument. Upon completing the application of Pandora, the investigators

have produced a revised safety argument that is free of the fallacies that were identified.

Pandora is intended to complement existing techniques for analyzing digital-system

failures, and its strengths as an analysis approach derive from its focus on the system

safety case. First, Pandora may be used early in an investigation to frame the analysis by

assisting investigators in determining the safety-related aspects of a system’s design and

development history. Second, through its systematic traversal of the safety argument, Pan-

dora can drive the elicitation of evidence by suggesting specific types of counter-evidence


that investigators should search for. Finally, Pandora can assist investigators in document-

ing lessons and recommendations because each lesson is associated with one or more fal-

lacies in the original safety argument, and each recommendation is stated in the context of

the fallacy it is intended to address, ensuring that recommendations are justified and

improving their practicability.

As part of the research, a case study was performed in which Pandora was applied to a

series of five commercial-aviation accidents involving the minimum safe altitude warning

(MSAW) system, a low-altitude warning system developed to alert air traffic controllers of

low-flying aircraft. For each accident, a safety argument for the system was developed

from information available prior to the failure, and Pandora was applied to the argument

using evidence obtained from the official investigation (but only evidence that the Pan-

dora analysis elicited). Lessons obtained from the Pandora analysis were compared to

those of the official investigations, and, in each case, the Pandora analysis successfully

identified all the system defects that were cited by the official investigations. The results

of this case study are tempered by the risk of hindsight bias associated with the methodol-

ogy; however, the study indicates that Pandora is a promising approach to analyzing

safety-related failures involving digital systems.

There are several important limitations to Pandora. In order to apply Pandora to a dig-

ital system failure, one must obtain a safety case for the system involved; however, most

digital systems currently in operation do not have documented safety cases. For these sys-

tems, a safety argument must be derived retroactively before Pandora may be applied.

Second, Pandora’s ability to elicit evidence from a failure is dependent on the complete-

ness of the pre-failure safety case. If the safety case omits major aspects of the system’s


safety rationale, a Pandora analysis is unlikely to elicit detailed evidence in these areas;

although, it can point out which sections of the safety argument are incomplete. Finally,

despite being systematic, Pandora is a manual process, and so its application will not nec-

essarily be consistent across multiple investigators.

8.1.2. Safety-Argument Fallacy TaxonomyAs informal arguments, safety cases are prone to several forms of fallacious reasoning.

Fallacies in a system’s safety argument could undermine the system’s safety claims, which

in turn could contribute to a safety-related failure of the system. Avoiding these fallacies

during system development and detecting them during review is essential if failures are to

be prevented. A survey of safety arguments for industrial systems that was conducted as

part of the research revealed that the fallacies committed in these safety arguments are typ-

ical of arguments in general. Based upon this observation, a taxonomy of safety-argument

fallacies was developed to assist reviewers in detecting fallacies in safety arguments.

The taxonomy is designed to cover a broad range of fallacies that a safety argument

might exhibit, and it is organized so that safety professionals who might lack formal train-

ing in logic and argumentation may apply it. The fallacies in the taxonomy are grouped by

the types of safety arguments in which they are likely to appear using categories that are

generally orthogonal. Thus, a user may apply the taxonomy to an argument without know-

ing beforehand whether the argument is fallacious and without complete knowledge of the

taxonomy. To ensure coverage of a broad range of fallacies, the taxonomy was derived

from five existing fallacy taxonomies for general arguments. Deriving the taxonomy in

this manner does not guarantee that the taxonomy is exhaustive, but it is extensible so that

new fallacies may be added to it as they are discovered in safety arguments.


To test the taxonomy’s effectiveness as an aid to detecting safety-argument fallacies, a

controlled study was conducted in which a group of computer-science graduate students

and researchers, professional engineers, and engineering managers were asked to read the

taxonomy and then evaluate a series of system safety arguments. Although the results of

the experiment were statistically inconclusive at the 95% confidence level, on average, the

treatment group, which had access to the taxonomy, detected fallacies in each of the argu-

ments more accurately than the control group, which did not have access to the taxonomy.

The lack of statistical significance might arise from the small sample size that the study

employed. The taxonomy shows promise as a technique for detecting fallacies in safety

arguments, but additional refinement to the taxonomy or the manner in which individuals

are trained to apply it might be necessary before it is suitable for commercial use.

8.1.3. Applicability Beyond Failure AnalysisAlthough Pandora and the fallacy taxonomy were intended to be used together as part

of a failure-analysis process, each may be used outside the context of a failure. For exam-

ple, Pandora might be applied to a safety argument prior to system certification to search

for known counter-evidence refuting the argument’s claims. Likewise, the fallacy taxon-

omy could be used to review safety arguments for typical fallacies as part of the certifica-

tion process or to educate developers about common types of fallacies they should avoid.

8.2. ContributionsThe primary contributions of this research are the development of Pandora and the

safety-argument fallacy taxonomy. Specific contributions are listed below.

Discovery of the enhanced safety-case lifecycle. Kelly and McDermid’s safety-case life-

cycle is the process by which a system’s safety argument is updated in response to a recog-


nized challenge to its validity [25, 55]. The contribution of this research stems from the

observation that Kelly and McDermid’s lifecycle is part of an iterative process in which a

system’s safety case, as with the system itself, is developed and iteratively refined based

upon lessons learned from system failures, and so it may be used to guide the failure anal-

ysis process to determine what challenges to the safety argument exist.

Development of a taxonomy of safety-argument fallacies. Many taxonomies of infor-

mal fallacies exist; however, these taxonomies are too general to be immediately applica-

ble to safety arguments. Moreover, most taxonomies organize fallacies in such a way that

a reader attempting to apply them to an argument must know beforehand that the argument

is fallacious. A major component of this research was the development of a taxonomy of

fallacies that is specific to safety arguments and targeted at system-safety professionals.

The taxonomy was designed to cover a broad range of safety-argument fallacies and to

organize fallacies in a way that would be accessible to its target audience while recogniz-

ing the need for extensibility.

To evaluate the taxonomy’s effectiveness as an aid to detecting fallacies in safety argu-

ments, a controlled experiment was performed involving computer-science graduate stu-

dents, research scientists, and professional engineers as subjects. The results of the

controlled experimental evaluation of the taxonomy, while favorable to the taxonomy,

must be interpreted carefully due to the lack of statistical significance, the selection of the

sample population, and the realism of the survey administered as part of the study. Further

evaluations involving larger sample populations are needed to confirm the results.

Development of a safety-case-based approach to failure analysis. As the complete

account of a system’s safety rationale, the safety case is a valuable tool for analyzing fail-


ures of digital systems; however, safety cases have not been previously applied for this

purpose. The second major component of this research is the development of Pandora,

which implements the enhanced safety-case lifecycle by centering failure analysis on the

system safety case. Through its traversal of the safety argument, a Pandora analysis sys-

tematizes the elicitation of evidence and the development and lessons and recommenda-

tions to prevent the failure from recurring. Pandora may be used in concert with other

techniques such as STAMP, PARCEL, or WBA to enhance confidence in the quality of the

lessons and recommendations that are produced.

Pandora was applied to a series of accidents involving a low-altitude warning system

to evaluate the completeness of its lessons by comparing them to those of the official

investigations. The results of the evaluation, although subject to hindsight bias, show Pan-

dora to be a promising approach to analyzing failures of digital systems. The safety argu-

ments that were prepared for this evaluation could be reused in future case studies.

8.3. Future WorkFuture work stemming from this dissertation concerns the refinement of Pandora to

support systems that lack documented safety arguments as well as the identification of

related systems that might share similar defects and further evaluations of both Pandora

and the safety-argument fallacy taxonomy.

8.3.1. Retroactive Derivation of the Safety ArgumentPandora presupposes the existence of a safety case for a system, but most safety-

related digital systems presently in operation do not have documented safety arguments.

In order to apply Pandora to such a system, one must derive a safety case for it retroac-

tively. The approach used to derive the safety arguments for the MSAW case study


described in Chapter 6 relied upon the safety rationale complied by the NTSB as part of its

investigation and is thus impractical for live investigations. A general method for deriving

the safety case retroactively remains unknown, but a possible approach is discussed below.

Since most safety-related digital systems are developed following a software develop-

ment standard, one possible method would be to develop a safety-case pattern for the stan-

dard and then instantiate the pattern as a safety case for the specific system of interest.

Pandora could then be applied to the safety case to discover deviations from the standard

as well as potential fallacies in the standard itself. Although constructing the pattern would

pose significant overhead on the investigation, it could be reused for subsequent analyses

involving systems developed to the same standard. The advantages of this method are that

it would likely apply to a wide range of safety-related digital systems and that the over-

head associated with deriving the safety case could be amortized over the systems to

which it was applied. Since development standards rarely justify how the techniques they

prescribe entail an acceptable level of safety, however, a drawback of this method is that it

might identify problems in the standard that neither the developer nor the regular are pre-

pared to act upon.

One of the challenges of conducting research in this area is attempting to justify the

accuracy of any proposed method for retroactively deriving a safety case. Case studies are

an obvious means of evaluating such a method, but a well-motivated case study would

concern a system for which there was no documented safety case, which potentially leaves

the researcher with no basis for comparison. Choosing a system for which a documented

safety case does exist introduces the risk of hindsight bias, however, and so the researcher

must devise a means of divorcing the safety rationale documented in the existing safety


case from that which can be obtained independently. Alternatively, a case study similar to

the MSAW study described in Chapter 6 might be performed but with the proposed deri-

vation technique coupled to Pandora in order to provide an assessment of the overall pro-

cess, but hindsight bias is a concern here, as well.

8.3.2. Identification of Related SystemsA second limitation of Pandora is that it does not consider related systems as part of its

analysis. These systems, while not involved in a failure, share similar safety rationale to a

system under investigation and consequently might exhibit similar failure modes. The

MSAW case study of Pandora discussed in Chapter 6 highlighted this limitation. While

many failure-analysis techniques share this problem, Pandora’s foundation in the system

safety case offers a possible method of overcoming it.

A candidate solution follows from the concept of safety-case patterns, which are a

structured method of reusing safety arguments [31]. If two systems share similar safety

arguments and one of them fails due to a systemic fault, then it is reasonable to suspect

that the second system might also contain the fault. Discovering a fallacy in a patterned

portion of a safety case is therefore of particular concern and warrants a secondary analy-

sis to determine whether: (a) the fallacy is present in the pattern itself; or (b) the pattern

was improperly instantiated in the argument. In the first case, the pattern must be repaired,

and other systems whose safety-arguments invoke the pattern should be reassessed to

determine the impact of the fallacy. In the second case, changes to the pattern might still

be warranted if there is reason to suspect that other arguments have also instantiated the

pattern improperly, and the arguments for these systems would likewise need to be reas-

sessed. Implementing this solution would pose the pragmatic challenge of cataloging the


use of patterns in each safety argument that is developed, and it would not address circum-

stances in which arguments share similar fallacious reasoning that does not arise from

their use of a common pattern.

8.3.3. Further Evaluation of PandoraThe results obtained from the MSAW case study of Pandora are promising, but further

evaluations are needed to reduce the risk of hindsight bias, to generalize the results to

other systems, and to examine the consistency among results obtained from multiple

investigators.

In order to completely eliminate the risk of hindsight bias from an evaluation of Pan-

dora, the ideal case study would possess the following attributes:

• It would concern a digital system failure whose investigation was still pending;

• A documented safety case would be available for the system in question that had

been developed prior to the failure; and

• The investigators involved in the case study would be able to elicit evidence from

the failure separately from the official investigation.

In practice, the presence of all three of these factors is very unlikely, and so some risk of

hindsight bias will be present. The experimenter might have to derive the safety argument

for the system in question or rely upon factual information gleaned from the official inves-

tigation, each of which introduce the possibility that the findings obtained by applying

Pandora were influenced by the findings of the investigation to which it is being com-

pared. The methodology described in Chapter 6 reduces this risk, but further reduction

could be obtained by employing multiple experimenters: one to sort the factual basis from

the evidence, one to derive the safety arguments from the basis, and one to apply Pandora.


Training the experimenters to perform these tasks is a challenge, however, and if it is not

done properly, the results of the case study could be biased against Pandora.

Although the case study described in Chapter 6 examined five accidents, all of the

accidents concerned the same system. The MSAW system was selected for the case study

because its development was guided by a series of major commercial-aviation accidents

and, for the same reason, ample documentation was available from accident reports to per-

form the study. Additional case studies are necessary to generalize the results from the

MSAW case study to other systems; however, the requirements of the methodology con-

strain the set of system failures to which it can be applied. For the same reasons discussed

in Chapter 2 that motivated this research, the official investigations into failures involving

digital systems often do not treat the systems in question in enough detail to construct a

plausible safety case, or they draw findings that are not founded in the evidence presented.

Selecting such a system for a case study biases the results against Pandora because Pan-

dora does not have access to the safety rationale and evidence that was available to the

official investigation.

Finally, case studies in which Pandora is applied to a failure by multiple investigators

working from the same safety argument and evidence are needed to examine the consis-

tency of the results between investigators. The methodology employed in Chapter 6 could

easily be extended to accommodate this change with a minor adaptation. In order to ensure

that each investigator worked from only the evidence that he or she elicited from the fail-

ure (and not the entire set of evidence that could potentially be discovered), evidence

would have to be furnished to investigators as they requested it. Furnishing evidence in

this way might be accomplished through a mock interview between the investigator and


an experimenter in which the investigator elicits evidence from the experimenter, and the

experimenter then confirms or denies the existence of such evidence.

8.3.4. Further Evaluation of the Fallacy TaxonomyThe experimental evaluation of the fallacy taxonomy described in Chapter 7 employed

a low sample size and relied upon members of the treatment group to train themselves to

use the fallacy taxonomy. Consequently, the composition of the control and treatment

groups and the possibility that members of the treatment group became frustrated or

fatigued by the survey might have biased the results of the experiment against the hypoth-

esis that the taxonomy would improve participants’ accuracy rates. Thus, additional evalu-

ations are needed to further control for these factors.

In general, random assignment is an effective means of ensuring that the control and

treatment groups share similar characteristics. When the sample size is small, however,

minor perturbations in the composition of the groups can potentially bias the results.

While there is no evidence to suggest that this form of bias arose in the evaluation that was

conducted as part of this research, there is some evidence that the control group might

have possessed a slight advantage over the treatment group with respect to academic back-

ground, experience with safety-related systems, and prior training in logic or philosophy.

Repeating the experiment with a larger sample would further mitigate the effects of such

perturbations and eliminate selection bias as a threat.

Although the survey instructed members of the treatment group to read the taxonomy

before completing the survey, this request was not enforced, and so it is possible that some

group members completed the survey without receiving the treatment. The use of more

aggressive training methods, such as a seminar in which participants receive classroom


training on the taxonomy and its application, might be more engaging to the participants

and provide a better understanding of the taxonomy than could be obtained from self-

guided training. To further reduce the risk of boredom or fatigue, a shorter format should

be used in future experiments consisting of one or two arguments with a more even ratio

of valid to invalid claims. With a sufficient sample size, participants could be asked to

evaluate the entire argument as a whole in order to better reflect a real-world assessment.

8.4. Chapter SummaryThe challenges associated with identifying and addressing design faults combined

with the complexity and coupling of digital systems encumber the analysis of digital-sys-

tem failures. Pandora is a systematic but manual approach to analyzing safety-related fail-

ures of digital systems that addresses these problems by framing the analysis around a

system’s safety case. As investigators apply Pandora to a failure, they are guided through

the steps of developing theories of the failure, eliciting evidence, and developing lessons

and recommendations. Pandora is accompanied by a taxonomy of typical safety-argument

fallacies in order to assist investigators in applying the process.

Pandora was applied to a series of commercial-aviation accidents involving a mini-

mum safe altitude warning system, and the safety-argument fallacy taxonomy was evalu-

ated through a controlled study involving twenty computer-science graduate students,

engineers, and safety professionals. In the former study, the application of Pandora pro-

duced findings comparable to those of the official investigations into the accidents. The

latter study, while statistically inconclusive, suggests that the fallacy taxonomy assists the

detection of fallacies in safety arguments. While both studies have significant limitations,

they show that the Pandora approach holds promise and justify further evaluation.


8.5. CodaThe account of the American Airlines flight 965 accident presented in Chapter 3 was

intended to illustrate the extent to which investigators rely on evidence available at the

scene of an accident to discover possible failure modes in digital systems. Any claim

about what a Pandora analysis of the accident would have revealed, in the absence of evi-

dence obtained from the wreckage, must be tempered by the limits of counter-factual rea-

soning and hindsight bias. Given this caveat and subject to the availability and

completeness of a safety case for the flight management system, a Pandora analysis poten-

tially is another method by which investigators might have considered human/automation

interaction as a contributory factor in lieu of the physical evidence obtained from the

wreckage.

Appendix A

Safety-Argument Fallacy Taxonomy

The taxonomy below defines a partial list of typical fallacies in safety arguments.

Table A.1: The Safety-Argument Fallacy Taxonomy

Circular ReasoningCircular ArgumentCircular Definition

Diversionary ArgumentsIrrelevant PremiseVerbose Argument

Fallacious AppealsAppeal to Common PracticeAppeal to Improper/Anonymous AuthorityAppeal to MoneyAppeal to NoveltyAssociation FallacyGenetic Fallacy

Mathematical FallaciesFaith in ProbabilityGambler’s FallacyInsufficient Sample SizePseudo-PrecisionUnrepresentative Sample

Unsupported AssertionsArguing from IgnoranceUnjustified ComparisonUnjustified Distinction

Anecdotal ArgumentsCorrelation Implies CausationDamning the AlternativesDestroying the ExceptionDestroying the RuleFalse Dichotomy

Omission of Key EvidenceOmission of Key EvidenceFallacious CompositionFallacious DivisionIgnoring Available Counter-EvidenceOversimplification

Linguistic FallaciesAmbiguityEquivocationSuppressed QuantificationVacuous ExplanationVagueness

146

Safety-Argument Fallacy Taxonomy 147

A.1. Circular ReasoningCircular reasoning, also known as begging the question, occurs when an argument pre-

supposes its claim in a premise. Circular premises are unlikely to appear explicitly in an

argument; however, an argument might be structured so that the use of a circular premise

is inevitable even though the premise is unstated. Circularity may span several arguments

within a safety case (e.g.; ), which further complicates its detection.

A.1.1. Circular ArgumentAn argument supports a claim with a premise that presupposes the claim [64]. Alterna-

tively, an argument is structured so that it assumes the truth of its claim even though this

assumption is implicit.

Example: The Mark II control system is acceptably safe because it is at least as safe as

the Mark I system, which is currently operational. The systems share identical safety

requirements and are sufficiently similar to warrant comparison. Moreover, the Mark I

system is acceptably safe because its operational behavior is identical to that of the Mark

II system.

This argument is circular because it assumes implicitly that the Mark II system is

acceptably safe, which is the claim of the argument.

A.1.2. Circular DefinitionAn argument defines a term that appears in its claim in such a way that the definition

has the effect of making the claim trivially true.

Example: The system safety requirements enforce safe system behavior. “Safe system

behavior” means that the system does not violate its safety requirements during operation.

The argument is circular because the definition it cites merely restates the claim.

A B C A⇒ ⇒ ⇒


A.2. Diversionary ArgumentsDiversion occurs when an argument digresses from its claim and provides irrelevant

information, which can divert the reader’s attention from the substantial premises of the

argument and possibly cause the reader to accept an insufficient argument.

A.2.1. Irrelevant PremiseAn argument uses premises that support a conclusion other than what the argument

claims. These premises could distract the reader from other substantive but possibly insuf-

ficient premises [76].

Example: The system operation and maintenance programs comply with the safety

requirements for the following reasons. The operation procedures were specified in the

operator’s handbook. The maintenance procedures were likewise specified in the mainte-

nance technician's handbook. Finally, a hazard analysis indicated that all credible hazards

were mitigated by the procedures.

Two of the three premises the argument offers in support of its claim are irrelevant.

That the operational and maintenance procedures have been specified has no bearing on

whether they comply with the safety requirements. The remaining premise describing the

hazard analysis is the only substantive premise of the argument.

A.2.2. Verbose ArgumentAn argument uses excessively verbose language that discourages the reader from eval-

uating its premises. Such arguments can deceive the reader into believing that they possess

more merit than is warranted. As a specific form of this fallacy, the argument may use

unnecessary technical, regulatory, or legal jargon.


Example: The software was developed in accordance with the guidelines of DO-178B,

“Software Considerations in Airborne Systems and Equipment Certification.” In accor-

dance with §2.2.2, the software was assigned a software level of ‘B’ based on its failure

condition classification as stipulated in §2.2.1 and in FAA AC 25-1309-1A and/or JAA

AMJ 25-1309. Pursuant to Tables A-1 through A-7 in DO-178B, the software system was

developed, verified, and certified according to its assigned software level.

Through its references to specific passages of regulatory documents, this argument

might lead the reader to believe that careful attention has been given to complying with

those documents. With the references removed, the argument may be rewritten, “The soft-

ware was developed in accordance with DO-178B because it was developed in accordance

with the guidelines contained in DO-178B.” This argument is vacuous at best (see “Vacu-

ous Explanation”), and could even be considered circular. The obscure references attempt

to disguise this weakness, which is why the argument is verbose.

A.3. Fallacious AppealsAn argument may appeal to a standard, established practice, or an authority in support

of a claim. For example, an argument that claims a system to be safe because it was devel-

oped in compliance with DO-178B is appealing to a development standard. Such appeals

are not inherently fallacious, but they can be fallacious if the appeal itself is improper or if

the entity to which it is made lacks authority in the context of the argument.

A.3.1. Appeal to Common PracticeAn argument supports a claim simply by stating that it is consistent with common

practice or popular belief. Conversely, an argument refutes a claim simply by noting that it

is inconsistent with common practice or popular belief. There might be good reasons why


the claim is widely accepted, but the argument should cite those reasons rather than rely-

ing upon the popularity of the practice or belief alone [57]. Arguments of this form also

commit the fallacy of “Destroying the Exception” if they fail to consider the applicability

of the practice or belief to the particular system and context in question.

Example: The hazard of a loss of flight control due to the failure of the flight control sys-

tem has been sufficiently mitigated by the use of a triple-modular-redundant (TMR) sys-

tem design with a voter. TMR is widely used throughout the aerospace industry as a means

of improving system reliability.

The underlined premise is a fallacious appeal to common practice because the fact that

TMR is widely used does not imply that its use in the system in question will mitigate the

risk of failure. The argument also suggests that incorporating TMR generally improves

reliability, but it does not consider whether TMR is applicable to this system, and so it also

commits the fallacy of “Destroying the Exception.”

A.3.2. Appeal to Improper/Anonymous AuthorityAn argument supports a claim by appealing to the judgment of some authority, but the

authority is unnamed, biased, or incompetent in the domain of the argument.

Example 1: All of the credible hazards associated with operating the software have been

identified. XYZ Enterprises conducted a functional hazard assessment to identify the haz-

ards. This assessment was then verified for completeness by ABC Corp. (ABC Corp. is a

subsidiary of XYZ Enterprises.)

The underlined premise is an appeal to an improper authority because any subsidiary

who is asked to evaluate its parent company’s work is likely to be biased. Depending on

the claim, the possibility for bias might be sufficient to undermine the argument.


Example 2: The flight management system meets its safety requirements. The software

for the system was developed in accordance with MISRA guidelines. (MISRA develops

best-practice documents for safety-related electronic systems in road vehicles, and spe-

cific standards pertaining to avionics software exist [77].)

This example and the one above it illustrate how contextual details are sometimes nec-

essary to identify a fallacious appeal. Without knowledge of MISRA and of the existence

of more relevant standards, the reader would be unable to determine that the MISRA

guidelines lack authority in the context of avionics software.

A.3.3. Appeal to MoneyAn argument supports a claim by noting the cost that has been incurred in making the

claim true. Alternatively, an argument supports a claim by noting the wealth of someone

associated with the claim.

Example: It is unlikely that any “safety-critical” errors exist in the software. Over 1,000

person-hours and $3 million USD were spent developing the software, and an additional

$1 million was spent reviewing it for errors. Moreover, the software developer is a Fortune

500 company with over $100 billion in assets.

The underlined premises exhibit the first and second forms of the fallacy, respectively.

A.3.4. Appeal to NoveltyAn argument advocates a concept, practice, or technology simply because it is new.

Example: The software is unlikely to contain errors because it was developed following

the concept of model-based development. Model-based development is on the bleeding

edge of technologies for constructing high-confidence software.


Model-based development might well be a wise choice for developing high-confi-

dence software, but its novelty alone is an irrelevant basis for the argument’s claim.

A.3.5. Association FallacyAn argument claims that properties of one thing are inherently properties of another

merely because the two are related somehow [76]. As a specific form of this fallacy, an

argument supports a claim about a subject simply by noting the people or organizations

affiliated with the subject.

Example 1: All of the credible hazards associated with operating the system have been

identified. The hazards were analyzed using HAZOP. HAZOP were developed by the

Chemical Industries Association (CIA), and they were used by XYZ Corp. to perform the

hazard analysis of the Mark II control system.

The underlined premise commits the specific form of the fallacy because it attempts to

justify the use of HAZOP by noting the organization that developed it and a company that

has used it previously. These facts do not aid the reader in evaluating whether the use of

HAZOP would entail the detection of all credible hazards, and so they are irrelevant.

Example 2: The PDP-11 minicomputer was considered as a platform for the air traffic

management system, but it was later eliminated as a candidate because of its affiliation

with the Therac-25 radiation therapy machine. Errors in the software for the Therac-25,

which ran on the PDP-11 platform, caused the machine to overdose six patients between

1985 and 1987.

This example exhibits the general form of the fallacy. The argument cites no evidence

for rejecting the PDP-11 other than its association with the failed Therac-25 software. The

factors that introduced the software errors could be unrelated to the decision to use the


PDP-11. If the factors were related, the argument should cite them directly as reasons for

rejecting the PDP-11 rather than its association with the Therac-25.

A.3.6. Genetic FallacyAn argument attempts to justify a claim about a subject by noting the subject’s

origins [78]. Such arguments are fallacious because they ignore relevant changes that

might have occurred in the interim [57].

Example: FMECA was chosen over HAZOP to analyze the operational hazards. HAZOP

were originally developed to analyze hazards in chemical plants; therefore they do not

pertain to software.

The underlined premise commits the genetic fallacy because the original purpose of

HAZOP has no bearing on its present-day applicability.

A.4. Mathematical FallaciesThe fallacies in this category concern invalid statistical or probabilistic inferences and

the misuse of quantitative evidence. Although they are typically associated with faulty

experimentation and quantitative arguments, the fallacies may also appear in qualitative

arguments.

A.4.1. Faith in ProbabilityAn argument claims that because an event is possible, it is inevitable. Alternatively, an

argument claims that because an event is unlikely to occur, it will not occur.

Example: The hazard of Byzantine failure to do spurious electromagnetic radiation

(EMR) has been eliminated. The components and interconnects have been hardened

against EMR with shielding that reflects 99.998% of external radiation.


The claim states that the hazard of failure has been eliminated; however the mitigation

strategy described in the premise makes the hazard unlikely but not impossible. Conse-

quently, the claim does not follow from the premise.

A.4.2. Gambler’s FallacyAn argument claims that the probability of a random event has been altered by its his-

torical occurrence. The fallacy includes arguments of the following forms:

• A random event is more likely to occur because it has occurred seldom in the past.

• A random event is more likely to occur because it has occurred often in the past.

• A random event is less likely to occur because it has occurred seldom in the past.

• A random event is less likely to occur because it has occurred often in the

past. [76]

A typical justification for the first form of the fallacy is that past performance is incon-

sistent with long-term expectation, and so future performance must be different in order to

compensate. The second form could be justified by arguing that the current trend is likely

to continue in spite of the event’s probability that predicts otherwise. Arguments may also

focus on the chronology of occurrences instead of their frequency (e.g., an event is

unlikely to occur soon because it has occurred recently). Each form is fallacious because

past performance alone cannot influence the probability of a random event.

Not all arguments that make predictions from historical data are fallacious. Statistical

arguments may use sampled data to estimate the probability of an event or to refute a pre-

viously-accepted probability. These arguments do not exhibit the Gambler’s Fallacy

because they do not claim that the historical data have somehow altered the probability of

the event. Arguments that use historical data to compute conditional probabilities also do


not necessarily exhibit the fallacy. For example, an argument could rightly claim that the

probability that a triple-modular redundant (TMR) design will fail increases when one of

the modules fails. Finally, arguments that cite external factors that were possibly moti-

vated by historical performance as a basis for claiming that the probability of an event has

been altered do not commit the fallacy. After observing several failures of a component, a

design change might be made to improve the component’s reliability. In this case, it is the

design change and not the history of failures that is responsible for the improvement.

Example 1: Fault tree analysis shows the probability of component failure to be 10-3.

Ten-thousand components have been produced, and 10 have already failed. Therefore, no

additional component failures will occur.

Assuming the probability of component failure is random, past failures cannot influ-

ence the probability of future ones. This argument exhibits the first form of the fallacy

because it claims that future performance must compensate for past performance in order

to maintain consistency with long-term expectation.

Example 2: The system has operated without failure for the past thirty years. We expect

this trend to continue.

The absence of failure for an extended period of time could be used to claim that a pre-

vious failure estimate was too conservative, but it cannot be used alone to predict when the

next failure will occur. This argument exhibits the second form of the fallacy. It could just

as easily claim that the lack of failure suggests that one is imminent.


A.4.3. Insufficient Sample SizeAn argument makes a claim about a statistical population that it supports with sample

data, but the sample is too small to be statistically significant. This fallacy is also a form of

anecdotal argument.

Example: The failure rate of the software is less than per hour. 100 replicates of the

software were life-tested simultaneously for 1000 hours and no failures were observed.

Butler and Finelli have shown that for life-testing to support a claim of failures

per hour, the software must be tested for at least 11,415 years [16]. Thus, the sample size,

i.e., the duration of the testing, is grossly inadequate.

A.4.4. Pseudo-PrecisionAn argument makes a quantitative claim that it supports with qualitative data or quan-

titative data whose precision is less than that of the claim.

Example: The failure rate of the component is less than or equal to the required fail-

ure rate. Fault tree analysis shows the failure rate of the component to be less than .

The claimed failure rate in this argument is two orders of magnitude smaller than that

demonstrated by the evidence.

A.4.5. Unrepresentative SampleAn argument supports a claim about a statistical population with sample data, but the

sample is not random, i.e., it assigns preference to some members of the population over

others [1]. Consequently, the sample data do not represent the population of interest. This

fallacy only occurs when the sample is biased; arguments that invoke unbiased but insuffi-

cient samples commit “Insufficient Sample Size.” This fallacy is also a form of anecdotal

argument.

10 8–

10 8–

10 9–

10 7–


Example: The hazard of software failure has been sufficiently mitigated. Operational test-

ing of the software was conducted at 10 air traffic control facilities for a six-month period,

and no safety-related failures of the software were observed during the testing interval.

In this argument, the population of interest is the set of inputs to which the software

might be exposed, and the parameter of interest is the percentage of inputs that cause the

software to fail. The sample population is the set of inputs to which the software was

exposed during operational testing, but this sample is likely biased because beta testing

tends to exercise only common-case uses of the software. Beta testers may also be less

inclined to use the software as rigorously as users of the final product would, and they

might not report all of the failures they encounter.

A.5. Unsupported AssertionsThese fallacies describe common ways in which an argument might assert that a claim

is true without offering any substantive supporting evidence. Since the burden of proof is

on the arguer to support his assertions, such arguments are fallacious.

A.5.1. Arguing from IgnoranceAn argument supports a claim by citing a lack of evidence that the claim is false. The

argument does not exhibit the fallacy if it cites as evidence a sufficiently-exhaustive

search for counter-evidence that has turned up none.

Example: All of the credible hazards have been identified. Aside from the hazards noted

earlier, no evidence exists that any other hazards pose a threat to the safe operation of the

system.

This argument attempts to prove a negative (that there are no additional credible haz-

ards to system operation) by noting a lack of evidence contradicting the negative. It does


not cite any efforts that have been made to discover such evidence. A mere lack of evi-

dence that a claim is false does not make it true.

A.5.2. Unjustified ComparisonAn argument claims that two things are alike, but it fails to consider importance differ-

ences between them or the significance of the similarities identified.

Example: All of the credible hazards for revision 2 of the system have been identified.

The hazard analysis of revision 2 was based on that of revision 1. Specifically, the same

set of hazards identified for revision 1 was used to conduct the analysis for revision 2.

Although it is reasonable to base the hazard analysis for a subsequent version of a sys-

tem on that of its predecessor, this argument fails to consider any differences between the

two systems that might have created new hazards. For the argument to be sound it would

need to show that the differences between the versions have not created any new hazards;

otherwise the argument would need to address the new hazards specifically.

A.5.3. Unjustified DistinctionAn argument claims that two things are distinct from each other, but it fails to provide

substantive evidence of a difference between them.

Example: Operational data from revision 1 of the control system indicated that observed

failure rates in the voter mechanism were higher than those predicted by its fault tree anal-

ysis. This behavior has been remedied in revision 2.

The argument claims that the subsequent revision of the system is free of the problems

that plague its predecessor, but it fails to explain why this claim is true.


A.6. Anecdotal ArgumentsAn anecdotal argument attempts to support its claim by citing evidence that suggests,

but does not entail, the truth of the claim. It may cite specific instances in which the claim

is true and then argue that the claim is true in general, or it may attempt to establish the rel-

ative superiority of its claim by contrasting the claim with inferior alternatives.

A.6.1. Correlation Implies CausationAn argument claims that two events are causally related because they are correlated,

but it does not consider the possibility that correlation is due to chance or a common

cause. As a specific form of this fallacy, an argument ascribes a cause to a set of data sim-

ply because the data occur in a cluster. If the argument cites a known mechanism by which

a correlation would be expected, then it may not be fallacious [76].

Example 1: The hazard of software failure has been sufficiently mitigated by developing

the software in accordance with the guidelines expressed in DO-178B. Following DO-

178B reduces the probability of software failure. Avionics software rarely fails, and all

avionics software is developed in accordance with DO-178B.

This argument cites a correlation between following DO-178B and a reduction of soft-

ware failure rates as evidence that following DO-178B reduces those failure rates. The

existence of a correlation alone is insufficient that a causal one exists, however, because

other factors might account for the correlation. Companies that follow DO-178B could

employ other practices that are responsible for the reduction in failure rates. The argument

would need to show that such a common cause does not exist or explain why following

DO-178B reduces failure rates for it to be valid.


Example 2: All of the safety-related faults have been removed from the system. The certi-

fied mean-time-to-failure (MTTF) of the system is two years; however, the system has

operated for the past six years without failing. The extended period of fault-free operation

indicates that the remaining faults were removed during the maintenance six years ago.

While three times longer than the MTTF of the system, the extended period of fault-

free operation could be due to chance. Even if it is not, the argument provides no evidence

that the anomaly was caused by the last maintenance and not other factors such as a

change in how the system is operated.

A.6.2. Damning the AlternativesAn argument considering several alternatives claims that one is superior because the

others are unattractive, but it fails to establish the relative superiority of the alternative it

advocates.

Example: The SPARKAda programming language is more amenable to the development

of safety-critical software than C, C++, and Java. Pointers in C and C++ are a common

source of errors, and there is no standard for safety certification of object-oriented soft-

ware. Java is unattractive because of the unpredictability of its garbage collector and the

performance penalty that Java programs suffer due to overhead imposed by the Java Vir-

tual Machine.

There are good reasons to use SPARKAda, but this argument does not list them.

Instead it points out weaknesses in other programming languages, but it does not consider

that SPARKAda might share those weaknesses or suffer from its own. The argument could

have just as easily advocated programming in Logo for the same reasons it cites.


A.6.3. Destroying the ExceptionAn argument claims that a norm is applicable to a particular scenario, but it ignores

relevant exceptions to the norm. As a special form of the fallacy, an argument claims that a

system will perform under extraordinary circumstances because the system was designed

to perform under normal circumstances.

Example: The Normal Control Law was designed to prevent the aircraft from entering a

stall; therefore it will prevent the aircraft from entering a stall even when the aircraft is

inverted.

Inverted flight (that is, flying upside-down) is an extremely rare occurrence in com-

mercial aviation, and so it is unlikely that the Normal Control Law was designed to

accommodate such a scenario. Consequently, merely asserting the general behavior of the

control law is insufficient to demonstrate that it would function under such an exceptional

circumstance.

A.6.4. Destroying the RuleAn argument makes a claim that contradicts a norm, and it attempts to support the

claim by invoking an exception to the norm, but the exception does not apply to the sub-

ject of the claim.

Example: The hazard of a flight control system failure causing a catastrophic loss of

flight control has been sufficiently mitigated. If a failure occurs, the aircraft will switch to

backup flight control within two seconds, or the pilot may switch to backup control manu-

ally. A two-second switchover delay is acceptable for autopilot systems, so it should be

acceptable for flight control systems. (Flight control systems are considered to be more

flight-critical than autopilot systems.)


This argument attempts to justify the two-second delay in switching from the primary

flight control system to the backup by noting that it is acceptable for autopilot systems;

however autopilot systems are considered to be less flight-critical than flight control sys-

tems and consequently are subject to more relaxed safety requirements. The argument

commits the fallacy because it appeals to the autopilot exception, which does not apply to

flight control systems.

A.6.5. False DichotomyAn argument presents a limited set of alternatives for a decision and assumes that one

of them must be correct, but it fails to establish that the presented set of alternatives

encompasses all the possibilities.

Example: The hazard of a loss of clock synchronization between the nodes has been miti-

gated by employing star time-triggered architecture (TTA-star) between the nodes. TTA-

star is preferred over the TTA-bus architecture because the latter is vulnerable to spatial

proximity faults. (Factual material for this argument was taken from [79].)

This argument presents a false dichotomy between the TTA-star and TTA-bus archi-

tectures as alternatives for addressing the problem of clock synchronization. Other archi-

tectures have been developed to address this hazard, but the argument does not consider

them or otherwise discuss the completeness of its survey.

A.7. Omission of Key EvidenceAn argument will often follow an accepted pattern of reasoning in order to more easily

satisfy its burden of proof. These patterns may call for certain types and volume of evi-

dence in order to justify particular claims. An argument omits key evidence when it fol-

lows such a pattern but fails to provide all the required evidence or when it fails to address


known counter-evidence refuting its claim. This category encompasses the fallacy “Omis-

sion of Key Evidence” and specific forms of the fallacy that are broad enough to warrant

their own entries.

A.7.1. Omission of Key EvidenceAn argument follows an accepted pattern of reasoning, but it fails to instantiate all of

the required elements of the pattern.

Example: Revision 2 of the system is acceptably safe because it is at least as safe as revi-

sion 1. Revision 1’s safety case shows it to be acceptably safe, Revision 2 is sufficiently

similar to revision 1, and Revision 2 meets or exceeds the safety requirements implied by

Revision 1’s safety record.

Although the two revisions of the system may be sufficiently similar to warrant a com-

parison, safety-related differences might exist between them depending on how the term

“sufficiently similar” is defined. The argument should address the differences or show

why none exist.

A.7.2. Fallacious CompositionAn argument claims that a property holds for a system because it holds for each of the

system's components or functions, but it fails to consider interactions between system

components or functions that might violate the property.

Example 1: Deadlocks cannot occur in the air traffic management software. The software

is composed of four functions: sensor management, traffic display, conflict alert, and low-

altitude alert. Each of these functions has been shown separately to be free of deadlock.


Although each function of the software is free of deadlocks, deadlocks might still arise

between functions. This argument commits the fallacy of fallacious composition by

neglecting to consider interactions between functions.

Example 2: The failure rate of the system is . The system is composed of three

redundant modules, each of which has been shown by its developer to have a failure rate

of . The probability that all three modules would fail coincidentally is thus

.

This argument assumes that the redundant modules comprising the system fail inde-

pendently of each other, but it offers no evidence to support this assumption.

A.7.3. Fallacious DivisionAn argument claims that a property holds of a component or function of a system

because it holds for the larger system. In general, such arguments are fallacious because

the system property might originate from some other system component or function, or it

might be an emergent property that no single component or function possesses.

Example: Interlocks in the dosage computation software prevent it from commanding a

hazardous radiation dosage to the treatment apparatus. The dosage computation software

was reused from a previous model of the treatment apparatus that featured software and

hardware interlocks. That model has been in service for twenty years, and there have been

no reports of it improperly dosing a patient.

This argument commits the fallacy of division by inferring without evidence that the

fault-free service record of the previous model was due to the software interlocks in the

10 9–

10 3–

10 3– 10 3–× 10 3–× 10 9–=


dosage computation algorithm. Some other aspect of the previous model, such as the hard-

ware interlocks, could have been responsible for the service record.

A.7.4. Ignoring Available Counter-EvidenceAn argument makes a claim, but it fails to address known evidence refuting the claim.

Example: The risk of a Byzantine fault occurring in the bus architecture is remote. The

nodes on the bus and the interconnections between them have been shielded against alpha

particles and electromagnetic radiation. (Empirical data from systems with similar shield-

ing indicate that Byzantine faults occasionally still occur.)

The argument’s claim contradicts existing knowledge, but the argument fails to

address it, for example, by explaining why the observations of similar systems do not

apply in this context.

A.7.5. OversimplificationAn argument supports a claim about a subject with a model that oversimplifies rele-

vant aspects of the subject.

Example: With a failure rate of , the software controlling the electronic braking sys-

tem will either operate according to its requirements specification or halt and issue a warn-

ing to the driver. The software was developed so that it will comply with either its primary

or alternate specification at any given time. Under nominal conditions the software oper-

ates under its primary specification in which braking functions are available. If a deviation

between the specified and actual behavior of the software is detected, the software will

transition to its alternate specification in which it merely displays a warning to the driver

and does not provide braking functions. The software has been shown through model

checking to comply with its alternate specification.

10 9–


This argument describes a software reconfiguration model to support its claim that the

software meets its required failure rate. The model is oversimplified, however, because it

fails to describe how deviations are detected and how the system maintains service as the

software is transitioning to the alternate specification. Detecting an oversimplified model

will usually require the reader to possess an understanding of the concepts upon which the

model is based; without this understanding the reader will be unable to determine whether

the model has satisfactorily discharged all the relevant engineering issues.

A.8. Linguistic FallaciesNatural language is imperfect, and subtleties in the meaning of words can lead to mul-

tiple interpretations of the same prose, which in turn could deceive a reader into accepting

a fallacious argument. The fallacies in this category describe ways in which imprecise lan-

guage can conceal inconsistencies in arguments.

A.8.1. AmbiguityAn argument uses a term that can be interpreted in multiple discrete ways, but it is

unclear which interpretation is intended.

Example: The system is acceptably safe to operate because fault tree analysis shows the

probability of system failure to be . The operational procedures for the system are

also acceptably safe because they comply with applicable government regulations. Finally,

the maintenance plan for the system also meets these safety requirements.

Because the argument cites two different measures of safety (failure rate and regula-

tory compliance), it is unclear to which measure the safety requirements for the mainte-

nance plan refer.

10 8–


A.8.2. EquivocationAn argument shifts the meaning of a term between premises or between a premise and

the claim.

Example: The system complies with applicable standards. For the given criticality level,

the standard requires that the probability of multiple component failures to be “remote.”

Hazard analysis using a risk assessment matrix shows the probability of multiple compo-

nent failures to be “very remote.”

If the standard and the risk assessment matrix were developed separately from each

other, they could define the term “remote” differently; however the argument uses the

term as if only one definition exists.

A.8.3. Suppressed QuantificationAn argument uses wording that could mislead the reader into inferring more confi-

dence in its claim than is warranted.

Example: Independent examiners reviewed the software code for errors and found none.

This claim fails to quantify the examiners who found no errors in the code, and so it is

unclear whether none of them found errors or only some of them found no errors.

A.8.4. Vacuous ExplanationAn argument explains a phenomenon using a concept that offers no more information

than the phenomenon itself.

Example: Design features of the software prevent it from executing a flight path that

intersects terrain. When the pilot inputs a new flight path, the flight path is analyzed by a

terrain conflict detection algorithm that determines whether it intersects terrain. If the


algorithm determines that the flight path does intersect terrain, the software rejects the

flight path and displays an error message to the pilot.

In this argument, the phenomenon being explained is the software’s capability to

detect flight paths that intersect terrain. Rather than explaining how the phenomenon

occurs, however, the argument shifts the phenomenon to a specific component of the soft-

ware: the terrain conflict detection algorithm. The explanation is vacuous because it

reveals no details about how the phenomenon occurs.

A.8.5. VaguenessAn argument uses a poorly-defined term without providing a precise definition of it.

Example: The system is acceptably safe to operate. Fault tree analysis shows the most

likely component failure that could lead to catastrophic system failure to have a probabil-

ity of . Hazard analysis shows the most likely system failure due to component fail-

ure interactions to have a probability of .

Although the failure rates the argument gives are low, it does not indicate whether they

comply with the requirements for the system to be considered “acceptably safe.” This

information is essential for the reader to determine whether the argument is sound.

10 7–

10 8–

Appendix B

MSAW Case Study Safety Arguments

This appendix contains complete pre-failure safety arguments constructed for the

MSAW case study of Pandora. The arguments are expressed in Goal Structuring Notation

(GSN). Nodes added in the post-failure arguments are denoted by a bold outline.

169

170

B.1.

G

Mantratoalin

S01

FAA Order 7110.65


G101The risk of an altitude violation occuring in airspace in which MSAW monitoring has been inhibited has been sufficiently reduced.

USAir 105

Figure B.1: USAir 105 Safety Argument, Top Level

02

SAW visually alerts a controller whenever IFR-tracked target with a Mode C nsponder descends below, or is predicted

descend below, a predetermined safe titude unless the violation occurs in an hibited region of airspace.



G08

New controllers and trained to use MSAW in their initial schooling and on the job.


S02

FAA Order 7210.3



G01




G04


171


G16


G

Awspap

if an w the h.

Figure B.2: USAir 105 Safety Argument, Goal G02

G11




G02


13


G14


G15

MSAW raises an alertaircraft descends belopseudo-glideslope pat

172

tude ited

Figure B.3: USAir 105 Safety Argument, Goal G101

G102The creation of MSAW inhibit zones has been minimized to those regions of airspace where the use of MSAW would adversely impact operational performance.

G101The risk of an altitude violation occuring in airspace in which MSAW monitoring has been inhibited has been sufficiently reduced.

S101

FAA Order 7210.3M

G103

Inhibit zones are only defined temporarily.

G104Alternate measures are implemented to detect altideviations in MSAW-inhibairspace.

173

B.2.

G0

Mantratoaltinh

S01

FAA Order 7110.65


G17

MSAW parameters have been configured to minimize the extent of inhibit areas.

TAESA Learjet 25D

Figure B.4: TAESA Learjet 25D Safety Argument, Top Level

2


descend below, a predetermined safe itude unless the violation occurs in an ibited region of airspace.



G08



S02

FAA Order 7210.3



G01




G04


174


G16


G

Awspap

if an w the

th.

at

G204The terrain data have been verified to correspond to the actual terrain surrounding the airport.

Figure B.5: TAESA Learjet 25D Safety Argument, Goal G02

G11




G02


13


G14


G15

MSAW raises an alertaircraft descends belopseudo-glideslope pa

G201The runway parameters specified for the capture box definitions have been verified to be correct.

G202MSAW will issue an alert if a radar return for a tracked aircraft contains an altitude value below the MSA.

G203The risk of MSAW discarding a genuine radar return indicating than aircraft has descended below the MSA has been sufficiently reduced.

175

ould traffic SAW

Figure B.6: TAESA Learjet 25D Safety Argument, Goal G17

G17


G18FAA provides site-adaptation guidance to minimize the extent of MSAW-inhibited areas.

G19

MSAW functions can be temporarily inhibited when their continued use would adversely affect operational priorities.

G20A brief written report shbe sent to the FAA air directorate whenever Mfunctions are inhibited.

S02

FAA Order 7210.3

176

B.3.

G

Mantratoalin

S01

FAA Order 7110.65


G17


Beechcraft A36

Figure B.7: Beechcraft A36 Safety Argument, Top Level

02


descend below, a predetermined safe titude unless the violation occurs in an hibited region of airspace.



G08



S02

FAA Order 7210.3



G01




G04


G301The controller will recognize when a MSAW alert occurs.

177


G16


G

Awspap

if an w the h.

Figure B.8: Beechcraft A36 Safety Argument, Goal G02

G11




G02


13


G14


G15


178

ould traffic SAW


G17



G19



S02

FAA Order 7210.3

179

rts are distinct from s and indications control tower.

G304Controllers are trained to recognize MSAW alerts and indications.


G301The controller will recognize when a MSAW alert occurs.

G302

MSAW alerts are sufficiently conspicuous to attract the attention of a controller.

G303

MSAW aleother alarmused in the

C301MSAW alerts the controller to an altitude violation via a flashing circle around the data block of the offending aircraft.

C05MSAW alerts the controller to an altitude violation via an aural and a flashing circle around the data block of the offending aircraft.

G305The visual MSAW alert is sufficiently conspicuous to attrack the controller's attention to the offending data block on his radar display.

G306The visual and aural MSAW alerts are sufficiently conspicuous to attract the controller's attention to the offending data block on his radar display.

A301At least one controller is present in the control room whenever ATC service is available.

A

180

B.4.

G0

Mtradebethair

1

A Order 10.65


G17


uisance tion and

reduce zing the

Piper PA-32-300

Figure B.11: Piper PA-32-300 Safety Argument, Top Level

2

SAW alerts a controller whenever an IFR-cked target with a Mode C transponder scends below, or is predicted to descend low, a predetermined safe altitude unless

e violation occurs in an inhibited region of space.

C




S02

FAA Order 7210.3

S0

FA71

G01




G04


G401The controller will recognize a MSAW alert when one occurs.

C

G402

A program exists to review nalerts at each MSAW installaexamine site adaptations to nuisance alerts while minimiextent of inhibit areas.

181


G16


if an w the h.

GMwis

t consists of tion on the

dar display larm.

Figure B.12: Piper PA-32-300 Safety Argument, Goal G02

G11




G02

MSAW alerts a controller whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted to descend below, a predetermined safe altitude unless the violation occurs in an inhibited region of airspace.

G13

Approach capture boxes aligned with runway final approach courses specify the regions subject to approach path monitoring.

G14


G15


21SAW raises an alert hen an altitude violation detected.

G22

Proper alignment of approach capture boxes has been verified at all MSAW facilities.

C05

Alignment was verified based on a review of all user-defined site variables that control terrain warnings.

C401A MSAW alera visual indicacontroller's raand an aural a

182

ould traffic SAW


G17



G19



S02

FAA Order 7210.3

183

ined nitial b.



ts of the lay

G4

Thattto is


G08

New controllers and trato use MSAW in their ischooling and on the jo

G401The controller will recognize a MSAW alert when one occurs.

C401A MSAW alert consisa visual indication oncontroller's radar dispand an aural alarm.

G403

MSAW alerts are sufficiently conspicuous to attract the controller's attention to the aircraft that generated the alert.

A401At least one controller is present in the control room whenever ATC services are available.

A

04

e MSAW visual indication will ract the controller's attention the offending radar track if he viewing his radar display.

G405

The aural alarm will attract the controller's attention to the radar display if he is in the control room when the alert occurs.

G406The aural alarm is distinct and audible throughout the control room.

G407Controllers are prohibited from muting or muffling the aural alarm speaker.

184

B.5.

G0MScontargdessofpre

llers are briefed on respond to alerts.

llers issue an ry if MSAW tes an alert.

CAistr

G501

Controllers notice MSAW alerts when they occur.

Korean Air 801

Figure B.15: Korean Air 801 Safety Argument, Top Level

2AW visually and aurally alerts a troller whenever an IFR-tracked et with a Mode C transponder cends below, or is predicted by the tware to descend below, a determined safe altitude.

CG03

Controllers vigilantly monitor MSAW and provide timely notification to either another controller or a flight crew when an MSAW alert indicates the existence of an unsafe situation.

G09Controhow toMSAW

G04Controllers have recevied guidance on the use of MSAW.

G06

Controllers receive on-the-job MSAW training.

G07Controllers employed after MSAW was introducted receive initial hardware training on MSAW.

G08Controller issuance of a MSAW-based safety alert could be a first-priority duty equal to separation of aircraft.

G05Controadvisogenera

G01A controller will provide timely notification to the flight crew whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted to descend below, a predetermined safe altitude.

01 Mode C transponder an altitude-encoding ansponder.

185

G16

MSAW's general terrain monitoring detects altitude violations of all targets within a predetermined geographic area.

G20MSAW uses computer software that contains a terrain database customzed for the environment around each airport that utilizes ARTS processors.

D-

2AW detects altitude lations of all IFR-cked targets.

G1

MSaledesa

re he ARTS at FAA essing

lert is suppressed if the de violation occurs in ace in which MSAW ssing has been

ited.

Figure B.16: Korean Air 801 Safety Argument, Goal G02

G15

MSAW's approach path monitoring predicts altitude violations of aircraft operating within an approach capture box.

C03An approach capture box is a rectangular area surrounding a runway final approach course.

G18Predicted or actual descent below the "pseudo-glideslope" normally produces a low-altitude alert.

G19MSAW site variables have been reviewed to ensure compliance with prescribed procedures.

G14Proper alignment of the MSAW capture boxes has been verified at all MSAW sites.

G02MSAW visually and aurally alerts a controller whenever an IFR-tracked target with a Mode C transponder descends below, or is predicted by the software to descend below, a predetermined safe altitude.

C04FAA technical document NAS-M684

G11MSAW predicts altitude violations of IFR-tracked targets that are landing.

G13

MSAW uses approach capture boxes aligned with runway final approach courses to identify which aircraft are landing.

G17MSAW defines a "pseudo-glideslope" for each approach box that underlies the actual glideslope for the runway.

G1MSviotra

0

AW visually and aurally rts a controller when it tects or predicts an minimum fe altitude violation.

C02MSAW functions aincorporated into tsoftware installed terminal radar procsystems.

C501The aaltituairspproceinhib

186

G21If a ARTalerattegen


G502All MSAW installations are equipped with visual and aural alarms.


G22If a controller is present in the control room when an aural alert occurs, the alert will direct his attention to the radar dispay depicting the visual alert.

G25Supervisors check the MSAW speakers as part of the shift checklist and record the completion of this inspection on the appropriate facility logs.

S01

FAA Order 7210.3

G10

MSAW visually and aurally alerts a controller when it detects or predicts an minimum safe altitude violation.

C05

A visual MSAW alert consists of flashing the data block of the target that triggered the alert on the ARTS display.

C06

An aural MSAW alert consists of an aural alarm that sounds from a speakerlocated in the control room.

controller is observing his S display when a visual MSAW

t occurs, the alert will direct his ntion to the target that erated the alert.

G24

The MSAW aural alert speaker is audible throughout the control room.

187

G

Mtcim

G2Asraddisineloc

d

ite variables are audited to ensure ce with site

on procedures.

G503

The FAA periodically evaluates modifications to MSAW and to MSAW site variables in order to reduce the extent of inhibit zones.


S01

FAA Order 7210.3

G28A brief written report is sent to the FAA air traffic directorate whenever MSAW functions are inhibited.

G27Site adaptation guidance to minimize the extent of MSAW inhibited areas has been provided.


26

SAW functions can be emporarily inhibited when their ontinued use would adversely

pact operational priorities.

9surance of continued positive ar identification could place tracting and operationally fficient requirements upon the al controller.

S02

FAA Order 7110.65

G19MSAW site variables havebeen reviewed to ensure compliance with prescribeprocedures.

G504The air traffic directorate must approve the report before MSAW functions may inhibited.

G505

Technological measures exist to prevent unauthorized modification of MSAW site variables including inhibit zones.

G506MSAW sroutinelycomplianadaptati

Appendix C

Survey Instruments

This appendix documents the experimental procedure that was employed in the con-

trolled experimental evaluation of the fallacy taxonomy, and it contains the survey instru-

ments that were administered to participants.

C.1. Experimental ProcedureThe experimental procedure is divided into survey-packet generation, distribution, and

scoring. The procedure for generating and scoring the packets was common to all three tri-

als. Since paper forms were used in one of the trials and electronic forms were used in the

other two, however, two distinct procedures were devised for distributing the packets to

participants. These procedures are documented below.

C.1.1. Survey-Packet Generation1. Generate a table, where is an even whole number at least as large as the

number of participants expected to enroll in the trial.

2. Populate the first column of the table with a list of unique control numbers.

n 3× n

188

Survey Instruments 189

3. Populate the second column of the table with a list of boolean values in which half

the values are equal to true and half are equal to false. These values will represent

assignment to either the control group or to the treatment group.

4. Populate the third column with a list of random numbers. (Pseudo-random num-

bers generated by the RAND() function in Microsoft® Office Excel 2003 were

used for the experiment.)

5. Sort the second and third columns using the third column as a key, which has the

effect of randomizing the mapping of control numbers to group assignments.

6. For each control number in the first column, assign the survey packet associated

with that control number to the control group if its corresponding entry in the sec-

ond column is equal to true; otherwise, assign it to the treatment group.

7. Prepare a control-group survey packet for each control number assigned to the

control group, and mark the packet with the control number. The contents of the

control-group packets are listed in section 7.2.1 and appear in this appendix. (Steps

7 and 8 were accomplished using the Mail Merge feature of Microsoft Office®

Word 2003, which automatically marked each survey packet with its control num-

ber and tailored the survey instructions according to the control number’s group

assignment.)

8. Prepare a treatment-group survey packet for each control number assigned to the

treatment group, and mark the packet with the control number. The contents of the

treatment-group packets are listed in section 7.2.1 and appear in this appendix.


C.1.2. Survey-Packet Distribution (Paper Format)The following steps pertain to the trial conducted at the University of Virginia, which

was administered in paper format.

1. Publish the survey packets in paper form.

2. Seal each survey packet in an unmarked envelope, and shuffle the envelopes.

3. Distribute one sealed envelope to each participant.

C.1.3. Survey-Packet Distribution (Electronic Format)The following steps pertain to the trials conducted at NASA Langley Research Center

and the SCEH Conference, which were administered electronically.

1. Transmit the survey packets to a second experimenter who will not participate in

the scoring.

2. The second experimenter transmits a survey packet to each participant.

3. Upon receiving the completed packets from the participants, the second experi-

menter transmits the packets to the original experimenter for scoring.

C.1.4. Scoring1. Receive the completed survey packets from the participants.

2. Separate the packets into questionnaires and safety-argument assessment surveys.

3. For each questionnaire received, record the control number of the questionnaire

and the participant’s responses to each question.

4. For each safety-argument assessment survey received, record the control number

of the survey and the participant’s responses. For yes/no questions, code yes

responses as “1,” no responses as “0,” and any other response as a blank. For open-

ended questions, transcribe the participant’s response as faithfully as possible

given the constraints of the chosen text-entry system.


C.2. Survey InstrumentsThe remainder of this appendix contains a replica of the survey packet administered to

members of the treatment group excluding the safety-argument fallacy taxonomy, which is

documented in Appendix A. The control-group survey packet does not contain references

to the fallacy taxonomy but is otherwise identical to the treatment-group packet.


Safety Argument Assessment Survey

Thank you for agreeing to participate in this study. Your responses to this survey will help us to better understand the manner in which software professionals assess safety arguments and, in thelonger term, to improve practices for assuring the dependability of software systems.

In addition to this survey, the survey packet you received should contain the following items. Ifany items are missing, please contact the person listed below immediately.

Safety Argument Tutorial Safety Argument Fallacy Taxonomy Safety Argument Assessment Survey (this document)

This survey consists of two parts. Part I is a brief questionnaire about your educational and professional background. Your responses to these questions will give us insight into the types of professionals that have participated in our study. In Part II, you will be presented with a series ofarguments pertaining to safety-related software systems and asked to determine whether thearguments have adequately supported their claims. We estimate that it will take you about threehours to complete the survey.

The information that you give in the study will be anonymous. Your name will not be collected orlinked to the data.

Your participation in this study is completely voluntary. You may skip any questions that make you uncomfortable, and you may stop the survey at any time. You have the right to withdrawfrom the study at any time without penalty. If you want to withdraw, please write “withdraw” onyour survey packet and return it to the person or address below.

When you have finished, please place your completed survey back into the survey packet envelope and then return your sealed packet to the person or address listed below. You may retainyour copies of the tutorial and the fallacy taxonomy if you wish.

To ensure the validity of our results, please complete this survey without aid from others andwithout using any references except the materials included in your survey packet. Also, please donot discuss the survey with others until you have submitted your responses.

Please return your survey packet by «Stop_Date» to:

John Knight Olsson Hall Room 208 Department of Computer Science 151 Engineer’s Way, P.O. Box 400470 Charlottesville, VA 22904-4740 Telephone: (434) 982-2216

Contact this person if you have questions about the study.


Part I. Questionnaire

Please complete this questionnaire before beginning the examination. Your responses to this questionnaire will give us insight into the types of people who have participated in our study.

1) Is English your native language?

Yes No

2) What is your current occupation?

_________________________________________________________________

2a) For how many years have you been employed in this occupation?

___________________

2b) If your occupation concerns the development or certification of safety-related systems,please describe the role you play in the development or certification of those systems.

________________________________________________________________________

________________________________________________________________________

3) Please list any academic degrees you have earned or are currently pursuing. For each degree,please list the type of degree (e.g., BA, BS, MA, MS, or PhD) and the discipline in which the degree was or will be awarded.

_________________________________________________________________

_________________________________________________________________

_________________________________________________________________

_________________________________________________________________

4) Please list any professional accreditations you have received.

_________________________________________________________________

_________________________________________________________________

_________________________________________________________________

5) Please describe any formal training you have received in logic or philosophy.

________________________________________________________________________

________________________________________________________________________


Part II. Argument Assessment

Before you continue with the survey, please read the Safety Argument Tutorial and the SafetyArgument Fallacy Taxonomy included in your survey packet.

In this section you will be asked to evaluate several arguments concerning hypothetical softwaresystems. Each argument begins with a brief description of the system in question and thenpresents a series of claims. Each claim is supported by evidence that the claim is true. Claims are printed in bold type and denoted by the word “Claim,” and evidence appears as a bulleted list beneath the claim it supports.

For each claim, you will be asked whether the supporting evidence is sufficient to convince youthat the claim is true. Assume that the supporting evidence is true for this purpose. In some cases,a piece of evidence may be elaborated as its own claim elsewhere in the argument. Such evidence is italicized. Assume that this evidence is true for the purpose of evaluating the claim that it supports, and then consider the truth of the evidence when it later appears as its own claim.

The sample argument below is intended as an exercise to familiarize you with the question format

Argument 0. Sample Argument

Claim 1. All roses are red.

Supporting evidence:

A survey of roses in one researcher’s private garden revealed that all of the roses in thegarden were red.

Assume the evidence is true. Is the evidence sufficient to convince you that the claim is also true?

Yes No

If you answered “No,” please explain why the evidence is not sufficient.

___________________________________________________________________________

___________________________________________________________________________


Argument 1. Air Traffic Management System

A primary responsibility of air traffic control is to ensure that aircraft operating within controlledairspace are adequately separated from each other. If the required separation between two aircraft is lost, then there is a high risk that the aircraft will collide with each other. The airspaceis serviced by a network of air traffic control facilities, each of which is responsible for specificgeographic region of the whole airspace.

This argument concerns a software-based air traffic management system (ATM) that tracks aircraft operating within a region of airspace and alerts controllers if two or more aircraft are notadequately separated from each other. The software is installed at each air traffic control facility.

Claim 1. The air traffic management system will ensure adequate separation between

aircraft operating within the whole airspace.


The rules governing air traffic management ensure adequate separation. (Claim 2)

The implementation of these rules in the actual air traffic management system conforms to its specification. (Claim 3)


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 2. The rules governing air traffic management ensure adequate separation.


All credible loss-of-separation hazards in the airspace have been identified. (Claim 2.1)

Rules have been developed to prevent any of these hazards from occurring.

The sufficiency of these rules to prevent the occurrence of any single hazard has been established by mathematical proof.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 2.1. All credible loss-of-separation hazards in the airspace have been identified.


Industry-accepted best practices were used to identify the hazards.

The panel that conducted the hazard identification consists entirely of established expertsin air traffic control.

An independent review of the hazard identification certified it to be complete.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 3. The implementation of these rules in the actual air traffic management system

conforms to its specification.


The implementation of the rules in the air traffic management systems for each geographic region conforms to its specification.

The implementation of the rules governing transition from one geographic region to another conforms to its specification.

The implementation was developed in compliance with applicable industry standards.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Argument 2. Nuclear Reactor Shutdown System

This argument concerns an automatic reactor shutdown system for a nuclear power plant. Thesystem monitors a set of reactor and plant variables and will shut down the reactor if certainconditions are met. In the argument, the term “hazard condition” refers to a reactor or plantcondition under which continued operation of the reactor would compromise safety, and the term“monitored condition” refers a condition which causes the system to trigger a shutdown.

Claim 1. The automatic reactor shutdown system will shut down the reactor whenever a

hazard condition arises.


The set of monitored conditions encompasses the set of hazard conditions. (Claim 2)

When the system engages, it reliably shuts down the reactor. (Claim 3)

The system does not engage in circumstances that do not require the reactor to be shut down. (Claim 4)


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 2. The set of monitored conditions encompasses the set of hazard conditions.


All hazard conditions were determined for a previous version of the reactor.

The new version of the reactor has been shown to be identical to the old version in all aspects relevant to shut down.

The current system uses the same set of monitored conditions as was used on the oldsystem.

No instances of failure to engage have been recorded for the old system in 5 years ofoperation.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 3. When the system engages, it reliably shuts down the reactor.


The reliability target for the system is set at 1 failure in 100,000 hours of operation.

A state-of-the art reliability estimation tool was used to compute the system reliability.

The tool computed the system reliability as 1 failure in 1,000,000 hours of operation.

The input data to the tool was confirmed to be accurate.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 4. The system does not engage in circumstances that do not require the reactor to

be shut down.


The system chief engineer certified this to be the case.

An independent consultant confirmed the certification.

The methods used by both the engineer and consultant to reach their opinion conform to accepted industry practice.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Argument 3. Electronic Throttle

This argument concerns an electronic throttle for a passenger car. The throttle valve regulates airflow into the car’s engine and is traditionally controlled via a simple cable attached to theaccelerator pedal. In this electronic design, the throttle value is controlled by an actuator thatreceives digital signals from an engine management computer. Sensors located in the acceleratorpedal transmit the acceleration demand to the engine management computer; this input is usedin combination with other measurements to determine the optimal throttle valve position. Sincethe throttle controls the car’s acceleration, safety of the throttle design is a concern.

Claim 1. The electronic throttle is acceptably safe to operate.


All hazard probabilities are less than the targets identified by the developer. (Claim 2)

All of the hazard causes have been exhaustively identified. (Claim 3)

There is no software contribution to the system-level hazards. (Claim 4)

The identified hazards include and address human-factors issues. (Claim 5)

No single point of failure can lead to a system hazard. (Claim 6)

Development of the throttle was carried out in compliance with the ISO9000-3, MISRA, and dIEC1508 standards.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 2. All hazard probabilities are less than the targets identified by the developer.


The calculated probabilities from the fault-tree and Markov analyses meet the definedtargets.

The fault-tree analysis is representative and complete.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 3. All of the hazard causes have been exhaustively identified.


FMEA was used as a bottom-up confirmation of the fault-tree analysis results.

The FMEA source data are correct and complete.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 4. There is no software contribution to the system-level hazards.


The software safety properties have been shown to hold through formal proof against the relevant specifications.

The software satisfies its timing requirements. (Claim 4.1)

All of the software safety properties have been identified.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 4.1. The software satisfies its timing requirements.


No non-terminating loops are present in the software.

The software is capable of running at engine speeds far in excess of the physical engine revolution limit.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 5. The identified hazards include and address human-factors issues.


Human-factors issues (e.g., likely driver reaction) were incorporated into the hazard analysis.

Real-world testing was used to determine the behavior of the vehicle upon all identified failure conditions.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 6. No single point of failure can lead to a system hazard.


Failure modes and effects analysis (FMEA) was used as a bottom-up confirmation of thefault-tree analysis (FTA) results.

A mechanical backup exists to protect against a failure of the engine management computer.

The engine management computer features twin CPUs, one of which acts as the main CPU and the other as a monitor.

Each CPU runs a different version of the software developed following the principle of design diversity.

Diverse and redundant elements of the throttle architecture function independently of each other.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Argument 4. Explosive Chemical Plant

This argument concerns the safety assessment of a programmable control and protection system for a nitration vessel in an explosive chemical plant. The nitration vessel houses a reaction ofpentaerythritol (PE) and nitric acid in a batch process to produce pentaerythritol tetranitrate(PETN), which is used in detonators. This reaction is strongly exothermic, and the temperatureinside the vessel must be kept within acceptable limits to prevent decomposition of the PETN,which would produce toxic fumes and pose a fire hazard to other sections of the plant housingthe finished explosive.

The programmable control and protection system is responsible for monitoring the temperatureinside the nitration vessel and controlling the reaction to ensure that the temperature remainswithin acceptable limits. The system consists of a minicomputer and a separate programmablelogic controller (PLC). The minicomputer receives temperature measurements from two sensors positioned inside the nitration vessel, and it can adjust the rate at which PE is fed into the vesselto speed up or slow down the reaction accordingly. If the control function of the minicomputerfails to keep the temperature below the acceptable limit, then a separate protection function willopen a drain valve and divert the contents of the nitration vessel into a drowning tank wherethey are diluted in water. This protection function is duplicated in the PLC, which receivestemperature measurements from a third sensor positioned inside the nitration vessel. Either theminicomputer or the PLC may operate the drain valve and the diverter.

The plant is located in Scotland, and the relevant authority with regulatory oversight of the plantis the U.K. Health and Safety Executive (HSE). The HSE Guidelines require the safety assessment to: (1) conduct a hazard analysis; (2) identify the safety-related systems associated with thehazards; (3) allocate safety integrity requirements to those systems so that the hazards are addressed, (4) design the systems to meet the integrity requirements, (5) analyze the safetyintegrity achieved, and (6) compare the integrity achieved with that which is required.

Claim 1. The programmable control and protection system ensures that the nitration

vessel is acceptably safe to operate.


The safety assessment of the programmable control and protection system wasperformed in accordance with the HSE Guidelines.

The programmable control and protection system is a safety-related system. (Claim 2)

The safety integrity requirements allocated to the system provide adequate assurance that the hazards have been addressed. (Claim 3)

The system was designed in accordance with the HSE Guidelines. (Claim 4)

The safety integrity achieved by the system design satisfies the safety integrity requirements. (Claim 5)


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 2. The programmable control and protection system is a safety-related system.


A system is considered to be safety-related if its operation, or failure to operate, could lead to one or more hazards.

The primary hazard associated with the batch production of PETN is decomposition of the contents of the nitration vessel.

Fault tree analysis (FTA) is an appropriate method of investigating the characteristics ofthe plant to identify events that might lead to decomposition. (Claim 2.1)

FTA determined that decomposition may occur only if the temperature within the vessel exceeds safe limits and the protection system fails to dump the contents of the vessel.

FTA determined that a failure of the control function of the minicomputer could result in a high temperature.

FTA determined that a failure of the protection function of the minicomputer could resultin the contents of the nitration vessel not being dumped when the temperature exceedssafe limits.

FTA determined that a failure of the PLC could result in the contents of the nitrationvessel not being dumped when the temperature exceeds safe limits.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 2.1. Fault tree analysis (FTA) is an appropriate method of investigating the

characteristics of the plant to identify events that might lead to decomposition.


FTA is routinely used to investigate hazards associated with chemical plant systems.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 3. The safety integrity requirements allocated to the system provide adequate

assurance that the hazards have been addressed.


The programmable system is required to have a safety integrity that is at least as good as that of a non-programmable system.

There are established standards and procedures for non-programmable safety systems.

The characteristics of existing plants using non-programmable systems are well understood.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 4. The system was designed in accordance with the HSE Guidelines.


HSE Guidelines describe configuration, reliability, and overall quality as the three elements of a safety-critical system to be considered in attempting to satisfy its safety integrity requirements.

The configuration of the programmable control and protection system is similar to that adopted in non-programmable implementations.

There is comprehensive diversity in both the hardware and the software used in the programmable system. (Claim 4.1)

The reliability of the system was assessed by applying quantitative methods to the fault tree produced by FTA.

The probability assigned to each event in the fault tree was determined by a combinationof good engineering practice and existing data on failures.

The overall quality of the system was assessed with respect to the established standardsand procedures used with this area.

Although the established standards and procedures are concerned primarily with non-programmable systems, many aspects of programmable systems could be judged in asimilar manner.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 4.1. There is comprehensive diversity in both the hardware and the software used

in the programmable system.


The control and protection element is implemented using a computer, whereas the protection unit is in the form of a PLC.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 5. The safety integrity achieved by the system design satisfies the safety integrity

requirements.


The configuration adopted within the control and protection system fulfills all the integrity requirements set out in the HSE Guidelines. (Claim 5.1)

The reliability of the system corresponds to a failure rate of about one failure every 1,100 years of operation. (Claim 5.2)

The assessment of the overall quality of the system considered the quality standardsused, the staff involved, and the procedures followed. The hardware and software development processes were systematically examined using an extensive set of checklists following methods outlined in HSE Guidelines.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 5.1. The configuration adopted within the control and protection system fulfills all

the integrity requirements set out in the HSE Guidelines.


The configuration includes a level of redundancy consistent with that used in equivalent non-programmable systems.

The configuration provides protection against both random and systematic failures withinthe programmable elements.



Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 5.2. The reliability of the system corresponds to a failure rate of about one failure

every 1,100 years of operation.


A quantitative analysis of the reliability of the system was performed by adding failure rate data to the system’s fault trees and then computing numerical failure rates for eachof the events in the fault trees by studying the single and combined effects that wouldlead to the event.

The results of this analysis indicated a dangerous failure rate for the complete system ofabout 0.10 events per 106 hours of operation, or about 8.8 × 10-4 failures per year.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Argument 5. Nuclear Trip System

(Like Argument 2, this argument concerns a shutdown system for a nuclear reactor. Pleaseconsider this argument independently.)

This argument concerns a trip system for a nuclear power plant. The plant is a gas-cooled nuclear reactor containing 400 fuel pins. Each pin is in a separate gas duct and is cooled bycarbon dioxide gas. If the gas flow is restricted in any duct, the fuel pin in that duct couldoverheat and rupture. A reactor trip system is required to trip the reactor if an excessivetemperature is observed in any duct.

The temperature in each duct is measured by two thermocouples. The reactor trip isimplemented by dropping safety rods into the reactor that halt the reaction. The drop system isdesigned to be fail-safe—in case of a power loss, the control rods will drop into the reactor core.The thermocouples can fail to an open-circuit state, to a short-circuit state, or gradually degrade.The primary hazards associated with the trip system are (1) failure to trip on demand and (2)tardy trip in response to demand.

The trip system architecture consists of four independent computation channels. Each channelconsists of a protection algorithm computer (PAC) and dynamic check logic (DCL). Each PAC receives temperature observations from both thermocouples in each duct (a total of 800observations) and produces a dynamic output signal which is passed to the DCL. The DCL thenproduces a square-wave output signal indicating whether to trip the reactor. The output signals of all four DCLs are passed into a 2-out-of-4 voter.

As part of the fail-safe design, each DCL periodically tests the integrity of its associated PAC by injecting test signals into the PAC’s input lines. The test data consist of temperature values that should be just above and just below the trip level. Test values are obtained from one of two testsources, and the test sources are swapped between scans to ensure that a different test patternis used on alternate cycles. Thus, the DCL is able to detect “stuck-at” failures because it expects different trip patterns on alternate scans.

The integrity of the underlying computer hardware and compiler is checked using a concept known as reversible computing. This technique is able to detect systematic faults and random failures in the hardware or faults created by the compiler and should cause the system to halt,which is a fail-safe action.

A separate monitor computer verifies the output from each of the four PACs and performs startup checks on the consistency of the software configurations on each channel. It can also diagnosechannel and thermocouple failures by comparing the outputs from the channels.

If either thermocouple reading for an arbitrary duct is too high, the system will trip the reactor;this logic is referred to as 1-out-of-2 high trip logic. If both thermocouples for an arbitrary ducthave readings well below the average reading for all ducts, the system will trip the reactor; thislogic is referred to as 2-out-of-2 low trip logic.

Claim 1. The nuclear trip system is acceptably safe to operate.


The trip system will correctly activate if the temperature is too high in any gas duct.(Claim 2)

The probability of failure on demand of the trip system is less than 0.001 per annum.(Claim 3)

The maximum response time of the trip system is less than five seconds. (Claim 4)

The spurious trip rate of the trip system is less than 0.1 per annum. (Claim 5)


The time required for fault identification and recovery (mean time to repair) of the tripsystem is less than 10 hours. (Claim 6)

The trip system is testable while online. (Claim 7)

The trip system can be modified to meet anticipated changes with minimal risk of maintenance-induced faults. (Claim 8)

The trip system can withstand maintenance errors and malicious attacks. (Claim 9)

The validity of the safety case will be maintained throughout the operational life of the tripsystem. (Claim 10)

No single fault affects the availability of the trip system. (Claim 11)

No two independent faults affect the safety of the trip system. (Claim 12)


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 2. The trip system will correctly activate if the temperature is too high in any gas

duct.


Design simplicity assists in the test and verification of the trip function.

The software has been formally proven to perform the trip function as specified.

Program and trip parameters are maintained in separate PROMs to minimize the risk of introducing failures into the trip function.

Mature hardware and software tools have been used to minimize the risk of systematicfaults within the trip function.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 3. The probability of failure on demand of the trip system is less than 0.001 per

annum.


The four channels and dual thermocouples reduce the risk of a failure on demand.


The risk of failure on demand due to an unrevealed fault is reduced through continuous online checks.

The software has been formally proven to perform the trip function as specified.

If either thermocouple measurement is too high, the reactor will trip.

Program and trip parameters are maintained in separate PROMs to minimize the risk of introducing failures leading to failure to trip on demand.

Mature hardware and software tools have been used to minimize the risk of systematicfaults leading to failure on demand.

Failure per demand due to random failures is less than 0.001 per annum. (Claim 3.1)

Failure per demand is less than 0.001 per annum even if there are systematic faults.(Claim 3.2)

The design ensures that at least 90% of failures due to systematic faults are fail-safe.(Claim 3.3)


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 3.1. Failure per demand due to random failures is less than 0.001 per annum.


The inputs to the trip system have a fail-safe bias.

The demand rate of the trip system is 1 per annum.

The fault detection coverage, common mode factors, component failure rates, and repair times used as inputs to the Probabilistic Fault Tree Analysis (PFTA) are complete andcorrect.

Systematic faults are deemed to be incredible. (Claim 3.1.1)

The hardware reliability analysis supports the estimates used in the fault tree analysis.

PFTA estimates the probability of failure on demand to be 0.13 × 10-3 per annum.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 3.1.1. Systematic faults are deemed to be incredible.


Inherent flaws in the hardware will be revealed and removed as a result of extensive use.

Tests will reveal all improper wiring or improper configuration of hardware.

Functional tests can reveal compiler-induced faults.

The software requirements are correct.

The use of an established design implies that there will be no systematic hardware flaws.

Tests have not revealed any systematic hardware flaws.

The software has undergone functional tests to reveal compiler-induced faults.

The software code has been formally proven to be correct.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 3.2. Failure per demand is less than 0.001 per annum even if there are systematic

faults.


The requirements are correct.

The trip scenarios used in testing are realistic.

The trip system was tested to a level of 10-4 reliability without failure, which yields 99% confidence in a probability of failure on demand of 10-3.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 3.3. The design ensures that at least 90% of failures due to systematic faults are

fail-safe.


Thermocouples fail low in 90% of cases.

Online tests detect 90% of systematic failures.


Tests indicate a 99.995% fail-safe bias.

Double thermocouple disconnection or veto will cause a trip.

Compiler, loader, and processor flaws are protected against by the reversible computing technique.

Flaws in ADC, application software, configuration, trip limits, and trip logic will be revealed by dynamic online tests.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 4. The maximum response time of the trip system is less than five seconds.


Excessive or infinite loops will be detected by the reversible computing implementation.

Design simplicity means that worst-case response time is bounded and can be readilydetermined via timing tests or code analysis.

The worst-case response time was determined to be 2.7 seconds. (Claim 4.1)

The worst measured time during test was 2.4 seconds.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 4.1. The worst-case response time was determined to be 2.7 seconds.


The times used to estimate the executing time of individual instructions are correct.

The estimates of the time required to perform analog-to-digital conversions are correct.

Static analysis was used to determine the worst-case path through the trip function code.

The latency introduced on the inputs to and outputs from the software were determined as part of the analysis.


Yes No



___________________________________________________________________________

___________________________________________________________________________

Claim 5. The spurious trip rate of the trip system is less than 0.1 per annum.


2-out-of-4 voting over the redundant channels reduces the spurious trip rate.

Software has been formally proven to trip only when required.

2-out-of-2 low trip logic design can withstand a transient loss of a single sensor (e.g., for repair) for a low-reading sensor without using vetoes.

Program and trip parameters are maintained in separate PROMs to minimize the risk of introducing failures leading to spurious trips.

Mature hardware and software tools have been used to minimize the risk of systematicfaults leading to spurious trips.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 6. The time required for fault identification and recovery (mean time to repair) of

the trip system is less than 10 hours.


Fault detection is aided by the system failing safe if it encounters a systematic or random fault.

A separate monitor computer enables online diagnosis of channel failures and failures inthe thermocouples.

2-out-of-2 trip logic sensor comparison assists in detecting failed sensors.

Modular hardware replacement reduces the repair time.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 7. The trip system is testable while online.


The periodic online test interval is three months.

Redundant channels enable testing to proceed on a single channel without causing a trip.

A separate monitor computer enables online diagnosis of channel failures and failures inthe thermocouples.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 8. The trip system can be modified to meet anticipated changes with minimal risk

of maintenance-induced faults.


Redundancy in channels and thermocouples reduces susceptibility to maintenance-induced faults.

Design simplicity means that the system can be altered easily.

Simple input-output interfaces can be easily upgraded to accommodate new types ofsensors.

Separate storage of program and trip parameters in the PROM isolates maintenancechanges.

Sufficient protection is in place to prevent program or data updates from introducing dangerous faults. (Claim 8.1)

All anticipated changes can be accommodated by the design and safety case. (Claim8.2)


Yes No


___________________________________________________________________________

___________________________________________________________________________


Claim 8.1. Sufficient protection is in place to prevent program or data updates from

introducing dangerous faults.


Adequate support infrastructure is in place to safely accommodate anticipated changes (e.g., safety case review).

Procedures are in place to test program and data updates.

Flaws in the analog-to-digital converters, application software, configuration, trip limits,and trip logic will be revealed by dynamic online tests.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 8.2. All anticipated changes can be accommodated by the design and safety case.


A change in the number of inputs can be accommodated by the design and safety case.

A change in the computer hardware or software tools can be accommodated by the design and safety case.

Anticipated changes in functional requirements can be accommodated by the design andsafety case.

A change in sensors can be accommodated by the design and safety case.

Anticipated regulatory changes can be accommodated by the design and safety case.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 9. The trip system can withstand maintenance errors and malicious attacks.


The reversible computing technique will reveal malicious program modifications.

Separate monitor computer performs pre-start checks on the consistency of the softwarein the four channels.


Changes to trip parameters and logic cannot be made without PROM-burning equipment and physical access to the machine.

Equipment is locked and can only be accessed using the appropriate key, which is different for each channel.

Safeguards are in place for all anticipated maintenance and operational errors. (Claim 9.1)


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 9.1. Safeguards are in place for all anticipated maintenance and operational errors.


Safeguards are in place to protect against errors in proof testing.

Safeguards are in place to protect against errors in fault diagnosis.

Safeguards are in place to protect against errors in repair activities.

Safeguards are in place to protect against errors in operating vetoes.

Safeguards are in place to protect against errors in refueling.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 10. The validity of the safety case will be maintained throughout the operational life

of the trip system.


Adequate support infrastructure is in place to safely accommodate anticipated changes (e.g., safety case review).

Operational records will be kept and analyses performed to confirm the assumptions and estimates given within the safety case.


Yes No



___________________________________________________________________________

___________________________________________________________________________

Claim 11. No single fault affects the availability of the trip system.


Up to two of the four channels may fail into a no-trip mode (in which a channel does not trip the reactor on demand) before the safety function is compromised.


Yes No


___________________________________________________________________________

___________________________________________________________________________

Claim 12. No two independent faults affect the safety of the trip system.


The safety function is maintained even if two channels fail into a no-trip mode (in which a channel does not trip the reactor on demand).

The trip system will fail safe in the presence of systematic faults, random failures in the hardware, or faults created by the compiler.


Yes No


___________________________________________________________________________

___________________________________________________________________________


Safety Argument Tutorial

The following overview is adapted from Story’s Safety-Critical Computer Systems [1]:

Safety, in the context of engineered systems, is “a property of a system that it will not endangerhuman life or the environment.” A safety-related system is one whose operation can affect thesafety of its environment. Generally, there are two types of safety-related software systems: control systems and protection systems. A control system determines the operation of a piece of equipment or a plant. If the system being controlled is capable of hazardous operations, then thecontrol system is safety-related because the safety of the controlled system is dependent on the behavior of the software controlling it. Examples of safety-related control systems includenuclear reactor control systems, medical life-support systems, and aircraft flight control systems.

A protection system is one whose purpose is to enhance safety by detecting fault conditionsand producing outputs to mitigate their effects. These outputs might be commands to shut down a system, commands to engage an emergency-management system, or alarms or other indicationsintended to attract attention to the unsafe condition. Examples of protection systems include tripsystems for reactors and chemical plants, air traffic management systems, and fire alarm systems.Protection systems are often built into control systems to ensure that the control systems will not command hazardous actions.

Care must be taken in the development safety-related software to ensure that it possesses itsrequired safety properties. If the software is a control system, then it must be assured that thesoftware will not command hazardous operations. If it is a protection system, then its detectionand mitigation functions must be assured to be accurate and reliable. A plethora of softwaredevelopment techniques exist for enhancing safety, and to enumerate them would be beyond thescope of this tutorial. Roughly, these techniques either enhance safety by improving the quality of the software development process or by verifying that the software possesses the required safetyproperties.

Whatever development techniques he or she chooses, a developer must be able show that thesoftware produced meets its safety requirements. The safety case, produced by the developer, isthe comprehensive argument that this is the case. More precisely, a safety case is “a documented body of evidence that provides a convincing and valid argument that a system is adequately safefor a given application in a given environment” [2]. The essential elements of a safety case are:

An explicit set of safety claims about the system; Evidence supporting the claims; and A set of arguments linking the claims to the evidence.

Each claim in a safety case is accompanied by a safety argument that the claim is true. Thesafety argument presents the evidence supporting the claim, which may be facts about thesystem itself or its development (such as results from hazard analyses, verification, or testing), contextual details, assumptions, or more specific claims that are supported by their ownarguments.

References[1] Neil Storey. Safety-Critical Computer Systems. Harlow, England: Addison-Wesley. 1996. [2] Peter Bishop & Robin Bloomfield. “A Methodology for Safety Case Development.”

<http://www.adelard.co.uk/resources/papers/index.htm>

Appendix D

Taxonomy Evaluation Data & Plots

This appendix contains the coded responses to the quantitative component of the

Safety-Argument Assessment Survey administered as part of the controlled experimental

evaluation of the safety-argument fallacy taxonomy. Also included are the sample data,

and the frequency distributions of the sample data.

D.1. Coded Yes/No ResponsesTables D.1 through D.6 contain participants’ responses to the yes-or-no question that

was posed after each claim they were asked to consider in the survey: “Assume the evi-

dence is true. Does the evidence convince you that the claim is also true?” A “yes”

response is coded as a “1,” and a “no” response is coded as a “0.” A blank indicates no

response or a response that could not be coded (e.g., answering both yes and no). Tables

D.1, D.2, and D.3 contain the responses for the control group, and Tables D.4, D.5, and

D.6 contain those for the treatment group.

218

219

Argument 4

C 3:6 4:1 4:2 4:2.1 4:3

B 0 0 1 0 0

1 0 0 1 0 1

1 0 0 1 0 1

1 0 1 1 1 1

1 0 1 1 0 0

1 0 1 1 0 1

2 0 0 1 0 0

2 0 0 1 1 0

2 0 0 1 0 0

2 1 1 1 1 0

3 1 1 0 0 0

Table D.1: Yes/No Responses, Control Group

Argument 1 Argument 2 Argument 3

ontrol # 1:1 1:2 1:2.1 1:3 2:1 2:2 2:3 2:4 3:1 3:2 3:3 3:4 3:4.1 3:5

aseline 0 0 0 0 0 0 0 0 0 0 0 0 0 0

001 0 0 0 1 0 0 0 0 0 0 0 1

002 0 0 1 1 0 0 0 1 0 0 0 0 0 1

004 0 1 1 1 0 0 1 0 0 0 0 0

005 0 0 0 1 1 0 0 0 1 0 0 0 0 0

007 0 1 1 0 0 0 0 0 1 0 1 0 0 1

001 0 0 0 0 1 0 0 0 0 0 0 0 0 1

010 0 0 0 1 0 0 0 1 0 0 1 1 0 1

011 0 0 0 0 0 1 0 0 0 0 0 0 0 1

012 0 0 1 1 0 0 0 1 1 1 0 0 0 1

004 0 1 0 1 1 0 1 0 0 0 0 0 0

220

Con :4 5:4.1 5:5 5:6 5:7 5:8

Base 1 0 0 0 0 0

1001 1 0 0 0 1 1

1002 1 0 1 0 0 1

1004 1 1 0 0 1 1

1005 0 0 0 0 0 0

1007 1 1 0 0 1 1

2001 1 0 0 0 0 1

2010 1 0 0 0 0 1

2011 1 0 0 0 1 1

2012 1 1 1 1 1 1

3004 1 0 1 0 1 0

Table D.2: Yes/No Responses, Control Group (Continued)

Argument 4 Argument 5

trol # 4:4 4:4.1 4:5 4:5.1 4:5.2 5:1 5:2 5:3 5:3.1 5:3.1.1 5:3.2 5:3.3 5

line 0 0 0 0 0 1 0 1 1 0 0 0

1 0 1 1 0 0 0 0 0 0 0

0 0 1 0 0 0 1 1 0 1 1 0

0 0 0 1 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 1 0 0 0 0

0 0 1 1 0 1 1 1 1 0 1

1 0 0 0 1 0 0 1 0 0

1 0 0 0 0 0 0 1 1 0 1 1

0 1 0 0 0 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 0 1 1

1 0 0 0 0 0 0 0 1 0 1 0

221

Table D.3: Yes/No Responses, Control Group (Concluded)

Argument 5

Control # 5:8.1 5:8.2 5:9 5:9.1 5:10 5:11 5:12

Baseline 0 1 0 1 1 0 0

1001 0 0 0 0 0 1 0

1002 1 0 1 0 1 0 0

1004 1 0 0 1 1 0 0

1005 0 0 1 0 0 0 0

1007 1 0 1 0 1 1 1

2001 0 1 0 1 0 0 0

2010 1 1 1 1 0 0 0

2011 0 0 0 0 1 0 0

2012 1 1 1 1 1 0 1

3004 0 1 0 1 1 0 0

222

Argument 4

C 3:6 4:1 4:2 4:2.1 4:3

B 0 0 1 0 0

1 0 1 1 0 0

1 0 0 1 0 0

1 1 1 1 0 0

1 0 0 0 0 0

2 1 0 0 0 0

2 0 0 1 0 0

2 0 0 1 0 0

2 0 0 1 1 0

3 1 1 1 1 0

3 0 1 1 0 0

Table D.4: Yes/No Responses, Treatment Group

Argument 1 Argument 2 Argument 3

ontrol # 1:1 1:2 1:2.1 1:3 2:1 2:2 2:3 2:4 3:1 3:2 3:3 3:4 3:4.1 3:5

aseline 0 0 0 0 0 0 0 0 0 0 0 0 0 0

008 0 1 0 1 0 0 0 0 0 0 0 0 0 0

009 0 0 0 0 0 0 0 0 0 0 0 0 0

011 0 1 0 1 1 0 0 0 0 1 0 1 0 1

012 1 1 0 1 1 0 0 0 0 1 1 0 0 0

002 0 1 0 0 0 0 0 1 0 1 0 0 0 1

003 0 0 0 0 0 0 0 0 0 0 0 0 0

005 0 1 0 1 0 0 0 0 0 1 0 0 0 0

007 0 0 0 0 0 0 0 0 0 0 0 0 0 0

002 1 1 0 1 1 0 1 0 0 1 1 0 1

003 0 0 0 0 0 0 0 0 0 0 0 0 0 0

223

Con :4 5:4.1 5:5 5:6 5:7 5:8

Base 1 0 0 0 0 0

1008 1 1 0 0 1 1

1009 1 0 0 0 0 0

1011 1 1 0 0 1 1

1012 1 1 0 0 0 1

2002 1 0 1 0 1 0

2003 0 0 0 0 0 0

2005 1 0 0 0 0 1

2007 0 1 0 0 0 0

3002 1 0 0 0 1 1

3003 1 0 0 0 0 0

Table D.5: Yes/No Responses, Treatment Group (Continued)

Argument 4 Argument 5

trol # 4:4 4:4.1 4:5 4:5.1 4:5.2 5:1 5:2 5:3 5:3.1 5:3.1.1 5:3.2 5:3.3 5

line 0 0 0 0 0 1 0 1 1 0 0 0

0 0 1 0 0 1 1 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 0 1 1 1 1 0 1 1

0 0 0 1 1 1 0 0 0 1 0 0

0 1 0 0 1 1 0 1 1 0 1 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0

0 1 1 0 1 1 0 1 1 0 1 0

0 0 0 0 0 0 0 0 1 0 0 0

224

Table D.6: Yes/No Responses, Treatment Group (Concluded)

Argument 5

Control # 5:8.1 5:8.2 5:9 5:9.1 5:10 5:11 5:12

Baseline 0 1 0 1 1 0 0

1008 0 1 1 1 1 0 0

1009 0 0 0 0 0 0 0

1011 1 0 1 1 0 1 1

1012 1 1 0 0 1 0 0

2002 0 0 0 0 0 0 0

2003 0 0 0 0 0 0

2005 0 0 1 0 0 0 1

2007 0 0 0 0 0 0 0

3002 1 0 1 0 1 1 1

3003 0 0 1 0 0 0 0

Taxonomy Evaluation Data & Plots 225

D.2. Sample Data

D.2.1. Acceptance RateTables D.7 and D.8 contain acceptance rates for the control and treatment groups.

Table D.7: Acceptance Rates, Control Group

Control # Arg. 1 Arg. 2 Arg. 3 Arg. 4 Arg. 5

1001 1/4 0 1/6 5/9 4/19

1002 1/2 1/4 1/7 1/3 1/2

1004 3/4 0 1/7 5/9 7/10

1005 1/4 1/4 1/7 2/9 1/10

1007 1/2 0 3/7 5/9 14/19

2001 0 1/4 1/7 1/4 6/19

2010 1/4 1/4 3/7 1/3 1/2

2011 0 1/4 1/7 2/9 11/20

2012 1/2 1/4 4/7 8/9 9/10

3004 1/2 1/2 1/6 2/9 2/5

Mean 0.35 0.20 0.25 0.41 0.49

Table D.8: Acceptance Rates, Treatment Group


1008 1/2 0 0 1/3 11/20

1009 0 0 0 1/9 1/20

1011 1/2 1/4 4/7 4/9 3/4

1012 3/4 1/4 2/7 2/9 2/5

2002 1/4 1/4 3/7 2/9 7/20

2003 0 0 0 1/9 0

2005 1/2 0 1/7 2/9 1/4

2007 0 0 0 2/9 1/10

3002 3/4 1/2 2/3 2/3 3/5

3003 0 0 0 2/9 3/20

Mean 0.33 0.13 0.21 0.28 0.32


D.2.2. Accuracy RateTables D.9 and D.10 contain accuracy rates for the control and treatment groups.

Table D.9: Accuracy Rates, Control Group


1001 3/4 3/4 5/7 5/9 1/2

1002 1/2 3/4 6/7 7/9 9/20

1004 1/4 1/2 6/7 5/9 11/20

1005 3/4 3/4 6/7 8/9 13/20

1007 1/2 1 4/7 5/9 2/5

2001 1 3/4 6/7 7/9 4/5

2010 3/4 3/4 4/7 7/9 13/20

2011 1 3/4 6/7 8/9 3/5

2012 1/2 3/4 3/7 2/9 9/20

3004 1/2 1/2 5/7 2/3 3/4

Mean 0.65 0.73 0.73 0.67 0.58

Table D.10: Accuracy Rates, Treatment Group


1008 1/2 1 1 7/9 3/5

1009 1 1 6/7 1 7/10

1011 1/2 3/4 3/7 2/3 2/5

1012 1/4 3/4 5/7 2/3 13/20

2002 3/4 3/4 4/7 2/3 7/10

2003 1 1 6/7 1 13/20

2005 1/2 1 6/7 8/9 3/5

2007 1 1 1 8/9 13/20

3002 1/4 1/2 2/7 4/9 11/20

3003 1 1 1 8/9 7/10

Mean 0.68 0.88 0.76 0.79 0.62


D.3. Frequency Distributions

D.3.1. Acceptance RateFigures D.1 through D.5 depict the frequency distributions of the acceptance rate sam-

ple data for each of the five arguments in the survey.

Figure D.1: Frequency Distribution, Acceptance Rate, Argument 1

0

1

2

3

4

5

6

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Bin

Freq

uenc

y

Control

Treatment




0

1

2

3

4

5

6

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Bin

Freq

uenc

y

Control

Treatment

0

1

2

3

4

5

6

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Bin

Freq

uenc

y

Control

Treatment


D.3.2. Accuracy RateFigures D.6 through D.10 depict the frequency distributions of the acceptance rate

sample data for each of the five arguments in the survey.

Figure D.6: Frequency Distribution, Accuracy Rate, Argument 1

0

1

2

3

4

5

6

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Bin

Freq

uenc

y

Control

Treatment




0

1

2

3

4

5

6

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Bin

Freq

uenc

y

Control

Treatment

0

1

2

3

4

5

6

7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Bin

Freq

uenc

y

Control

Treatment

Bibliography

[1] Petroski, Henry. Design Patterns: Case Histories of Error and Judgment in Engineering.Cambridge: Cambridge University Press, 1994.

[2] Knight, John C. “Safety Critical Systems: Challenges and Directions.” Proceedings of the24th International Conference on Software Engineering (2002): 547-550.

[3] U.S. National Transportation Safety Board. Controlled Flight Into Terrain, Korean Air Flight801, Boeing 747-300, HL7468, Nimitz Hill, Guam, August 6, 1997. Aircraft accident reportNTSB/AAR-00/01. Washington, DC, 2000.

[4] Lions, J. L. “ARIANE 5 Flight 501 Failure: Report by the Inquiry Board.” Paris, 1996.

[5] Mars Climate Orbiter Mishap Investigation Board. “Phase I Report.” ftp://ftp.hq.nasa.gov/pub/pao/reports/1999/MCO_report.pdf (accessed August 27, 2006).

[6] JPL Special Review Board. “Report on the Loss of the Mars Polar Lander and Deep Space 2Missions.” http://sunnyday.mit.edu/accidents/mpl_report_1.pdf (accessed August 27, 2006).

[7] Main Commission Aircraft Accident Investigation Warsaw. “Report on the Accident to Air-bus A320-211 Aircraft in Warsaw.” http://www.rvs.uni-bielefeld.de/publications/Incidents/DOCS/ComAndRep/Warsaw/warsaw-report.html (accessed August 27, 2006).

[8] Aeronautica Civil of the Republic of Columbia. Controlled Flight Into Terrain, American Air-lines Flight 965, Boeing 757-223, N651AA, Near Cali, Columbia, December 20, 1995. San-tafe de Bogota, Columbia, 1996.

[9] Leveson, Nancy G. Safeware: System Safety and Computers. Boston: Addison-Wesley, 1995.

[10] McDermid, John A. and David J. Pumfrey. “Software Safety: Why is there No Consensus?”Proceedings of the 19th International System Safety Conference, Huntsville, Alabama (2001).

233

Bibliography 234

[11] Morarres, Mohammad, Mark Kaminskiy and Vasiliy Krivtsov. Reliability Engineering andRisk Analysis: A Practical Guide. New York: Marcel Dekker, 1999.

[12] Avižienis, Algirdas, Jean-Claude Laprie, Brian Randell, and Carl Landwehr. “Basic Conceptsand Taxonomy of Dependable and Secure Computing.” IEEE Transactions on Dependableand Secure Computing, vol. 1, no. 1. Jan-Mar 2004. pp. 11-33.

[13] Lowrance, William W. Of Acceptable Risk: Science and the Determination of Safety. LosAltos, California: William Kaufman, 1976.

[14] Federal Aviation Administration. “Advisory Circular.” AC 25.1309-1A. Washington, DC,1988.

[15] Littlewood, Bev and Lorenzo Strigini. “Validation of Ultrahigh Dependability for Software-based Systems.” Communications of the ACM 36, no. 11 (1993): 69-80.

[16] Butler, Ricky W. and George B. Finelli. “The Infeasibility of Quantifying the Reliability ofLife-Critical Real-Time Software.” IEEE Transactions on Software Engineering 19, no. 1(1993): 3-12.

[17] Weaver, Robert Andrew. “The Safety of Software — Constructing and Assuring Arguments.”PhD diss., University of York, 2004.

[18] RTCA, Inc. “Software Considerations in Airborne Systems and Equipment Certification.”RTCA/DO-178B. Washington, DC, 1992.

[19] Fenton, Norman. “How to Improve Safety-Critical Standards.” Safer Systems: Proceedings ofthe 5th Annual Safety-Critical Systems Symposium, Brighton (1997). ed. F. Redmill and T.Anderson.

[20] U.K. Ministry of Defence. “Defence Standard 00-55: Requirements for Safety Related Soft-ware in Defence Equipment.” 1997.

[21] U.K. Ministry of Defence: “Interim Defence Standard 00-56: Safety Management Require-ments for Defence Systems.” 2004.

[22] International Electrotechnical Commission. “Functional Safety of Electrical/Electronic/Pro-grammable Electronic Safety-Related Systems.” IEC 61508. Geneva, 1998.

[23] McDermid, John A. “Software Safety: Where’s the Evidence?” Proceedings of the Sixth Aus-tralian Workshop on Industrial Experience with Safety Critical Systems and Software, Bris-bane, Australia (2001): 1-6. ed. P. Lindsay.

[24] Kelly, Tim. “Software Safety – by Prescription or Argument?” Safety Systems. 1997. http://www-users.cs.york.ac.uk/~tpk/tpkscsarticle.pdf (accessed August 27, 2006).

[25] Bishop, Peter and Robin Bloomfield. “A Methodology for Safety Case Development.” Indus-trial Perspectives of Safety Critical Systems: Proceedings of the Sixth Safety-Critical SystemsSymposium (1998).

Bibliography 235

[26] Adelard. The Adelard Safety Case Development Manual. http://www.adelard.co.uk/resources/ascad/index.htm (accessed August 27, 2006).

[27] Kelly, Tim and Rob Weaver. “The Goal Structuring Notation – A Safety Argument Notation.”Proceedings of the Dependable Systems and Networks 2004 Workshop on Assurance Cases,Florence (2004).

[28] Adelard. “ACSE.” http://www.adelard.co.uk/software/asce/index.htm (accessed August 27,2006).

[29] CET Advantage, Ltd. “GSNCaseMaker: Goal Structuring Notation Tool.” http://www.cetad-vantage.com/website/GSNCaseMaker.aspx

[30] Govier, Trudy. A Practical Study of Argument, 6th ed. Belmont, California: Thomson-Wad-sworth, 2005.

[31] Kelly, Tim and John McDermid. “Safety Case Construction and Reuse Using Patterns.” Pro-ceedings of the 16th International Conference on Computer Safety, Reliability, and Security(1997).

[32] Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elementsof Reusable Object-Oriented Software. Addison-Wesley, 1995.

[33] Kelly, Tim and John McDermid. “Safety Case Patterns – Reusing Successful Arguments.”Proceedings of the IEE Colloquium on Understanding Patterns and Their Application to Sys-tem Engineering, London (1998).

[34] Kelly, Timothy Patrick. “Arguing Safety — A Systematic Approach to Managing SafetyCases.” PhD diss. University of York, 1998.

[35] Storey, Neil. Safety-Critical Computer Systems. Harlow, England: Prentice Hall, 1996.

[36] McCormick, Norman J. Reliability and Risk Analysis. New York: Academic Press, 1981.

[37] U.S. National Transportation Safety Board. Aviation Investigation Manual: Major TeamInvestigations. Washington, DC, 2002.

[38] Lyu, Michael R. (ed). Handbook of Software Reliability Engineering. New York: McGraw-Hill. 1996.

[39] Leveson, Nancy. “A New Accident Model for Engineering Safer Systems.” Safety Science 42,no. 4 (2004): 237-270.

[40] Johnson, Christopher W. “What Are Emergent Properties and How Do They Affect the Engi-neering of Complex Systems?” Reliability Engineering and System Safety 91, no. 12 (2006):1475-1481.

[41] Perrow, Charles. Normal Accidents: Living with High-Risk Technologies. Princeton: Prince-ton University Press, 1999.

Bibliography 236

[42] Johnson, C.W. Failure in Safety-Critical Systems: A Handbook of Incident and AccidentReporting. Glasgow: University of Glasgow Press, 2003.

[43] Ladkin, Peter and Karsten Loer. Causal System Analysis. 1998. http://www.rvs.uni-bielefeld.de/publications/books/ComputerSafetyBook/index.html (accessed August 27,2006).

[44] Johnson, C.W. “Forensic Software Engineering: Are Software Failures Symptomatic of Sys-temic Problems?” Safety Science 40, no. 9 (2002): 835-847.

[45] Rushby, John. “Using Model Checking to Help Discover Mode Confusions and Other Sur-prises.” Proceedings of the 3rd Workshop on Human Error, Safety, and Systems Development,Belgium (1999). ed. Denis Javaux.

[46] Buys, J. R. and J. L. Clark. “Events and Causal Factors Analysis.” http://www.eh.doe.gov/analysis/trac/14/trac14.html (accessed August 27, 2006).

[47] Leveson, Nancy G. and Nicolas Dulac. “Safety and Risk-Driven Design in Complex Systems-of-Systems.” http://sunnyday.mit.edu/space-exploration.doc (accessed August 27, 2006).

[48] Johnson, Chris. “Using IEC 61508 to Guide the Investigation of Computer-Related Incidentsand Accidents.” Computer Safety, Reliability, and Security. Berlin: Springer, 2003.

[49] U.S. National Transportation Safety Board. “NTSB History and Mission.” http://www.ntsb.gov/Abt_NTSB/history.htm (accessed November 28, 2006).

[50] Lebow, Cynthia C., Liam P. Sarsfield, William L. Stanley, Emile Ettedgui, and Garth Hen-ning. Safety in the Skies: Personnel and Parties in NTSB Aviation Accident Investigations.Santa Monica, California: RAND, 1999.

[51] U.S. National Transportation Safety Board. “About the NTSB: The Investigative Process.”http://www.ntsb.gov/Abt_NTSB/invest.htm (accessed December 9, 2006).

[52] International Civil Aviation Organization. “Aircraft Accident and Incident Investigation,” 9thed. July 2001.

[53] U.S. National Transportation Safety Board. Aviation Investigation Manual: Major TeamInvestigations. November 2002. http://www.ntsb.gov/info/inv_guides.htm (accessed Decem-ber 11, 2006).

[54] Gerdsmeier, Thorsten, Peter Ladkin and Karsten Loer. “Analysing the Cali Accident With aWB-Graph.” http://www.rvs.uni-bielefeld.de/publications/Reports/caliWB.html (accessedDecember 12, 2006).

[55] Kelly, T. P. and J. A. McDermid. “A systematic approach to safety case maintenance.” Reli-ability Engineering and System Safety 71, no. 3 (2001): 271-284.

[56] Greenwell, William S., Elisabeth A. Strunk, and John C. Knight. “Failure Analysis and theSafety-Case Lifecycle.” Human Error, Safety and Systems Development (2004): 163-176.

Bibliography 237

[57] Damer, T. Edward. Attacking Faulty Reasoning: A Practical Guide to Constructing Fallacy-Free Arguments, 5th ed. Belmont, California: Wadsworth, 2005.

[58] Greenwell, William S, John C. Knight, C. Michael Holloway, and Jacob J. Pease. “A Taxon-omy of Fallacies in System Safety Arguments.” Proceedings of the 24th International SystemSafety Conference, Albuquerque, New Mexico (2006).

[59] Dependability Research Group. “Safety Case Repository.” http://dependability.cs.vir-ginia.edu/research/safetycases/safetycasesexamples.php (accessed August 27, 2006).

[60] EUROCONTROL. “The EUR RVSM Pre-Implementation Safety Case.” http://dependabil-ity.cs.virginia.edu/research/safetycases/safetycasesexamples.php (accessed August 27, 2006).

[61] Nagra. “Project Opalinus Clay: Safety Report.” http://dependability.cs.virginia.edu/research/safetycases/Opalinus_Clay.pdf (accessed August 27, 2006).

[62] Kinnersly, Steve. “Whole Airspace ATM Safety Case - Preliminary Study.” http://dependabil-ity.cs.virginia.edu/research/safetycases/Opalinus_Clay.pdf (accessed August 27, 2006)

[63] Curtis, G. “Fallacy Files.” http://www.fallacyfiles.org/ (accessed August 27, 2006).

[64] Dowden, Bradley. “Fallacies.” Internet Encyclopedia of Philosophy. http://www.iep.utm.edu/(accessed August 27, 2006).

[65] Pirie, Madsen. The Book of the Fallacy: A Training Manual for Intellectual Subversives. Lon-don: Routledge & Kegan Paul, 1985.

[66] U.S. National Transportation Safety Board. “Korean Air Flight 801 B-747-300, Agana,Guam, August 6, 1997: Public Hearing Exhibit List.” http://www.ntsb.gov/Events/kal801/exhibit.htm (accessed August 27, 2006).

[67] Milton, J. Susan and Jesse C. Arnold. Introduction to Probability and Statistics: Principlesand Applications for Engineering and the Computer Sciences, 4th ed. New York: McGraw-Hill, 2003.

[68] Campbell, Donald T. and Julian C. Stanley. Experimental and Quasi-Experimental Designsfor Research. Boston: Houghton Mifflin, 1963.

[69] University of Virginia Office of the Vice President for Research & Graduate Studies. “SBSGuide for Researchers.” http://www.virginia.edu/vprgs/irb/sbs_help.html (accessed 26November 2006).

[70] Barker, Stephen, Ian Kendall, and Anthony Darlison. “Safety Cases for Software-intensiveSystems: an Industrial Experience Report.” Proceedings of the 16th International Conferenceon Computer Safety, Reliability, and Security, York (1997): 332-342.

[71] Kitchenham, Barbara A., Shari Lawrence Pfleeger, Lesley M. Pickard, Peter W. Jones, DavidC. Hoaglin, Khaled El Emam, and Jarrett Rosenberg. “Preliminary Guidelines for Empirical

Bibliography 238

Research in Software Engineering.” IEEE Transactions on Software Engineering 28, no. 8(2002): 721-734.

[72] Sjøberg, Dag I. K., Jo E. Hannay, Ove Hansen, Vigdis By Kampenes, Amela Karahasanovic,Nils-Kristian Liborg, and Anette C. Rekdal. “A Survey of Controlled Experiments in Soft-ware Engineering.” IEEE Transactions on Software Engineering 31, no. 9 (2005): 733-753.

[73] Clark, Donald. “Hawthorne Effect.” http://www.nwlink.com/~donclark/hrd/history/haw-thorne.html (accessed 26 November 2006).

[74] Draper, Stephen W. “The Hawthorne, Pygmalion, placebo and other expectancy effects: somenotes.” http://www.psy.gla.ac.uk/~steve/hawth.html (accessed 26 November 2006).

[75] Kitchenham, Barbara and Shari Lawrence Pfleeger. “Principles of Survey Research: Part 6:Data Analysis.” ACM SIGSOFT Software Engineering Notes 28, no. 2 (2003): 24-27.

[76] Anonymous. “Logical fallacy.” Wikipedia. http://en.wikipedia.org/wiki/Logical_fallacy(accessed August 27, 2006).

[77] MISRA Ltd. “Frequently asked questions about MISRA.” http://www.misra.org.uk/(accessed August 27, 2006).

[78] “Genetic fallacy.” The Oxford Companion to Philosophy. Oxford University Press, 2005.

[79] Rushby, John. “A Comparison of Bus Architectures for Safety-Critical Embedded Systems.”Technical report NASA/CR-2003-212161. NASA Langley Research Center.

Pandora: An Approach to Analyzing Safety-Related...

Documents

Transcript of Pandora: An Approach to Analyzing Safety-Related...