109186282 Risk Analysis
description
Transcript of 109186282 Risk Analysis
TO P I C 1
THE CONCEPT OF RISK
Preview 1.1 Introduction 1.1 Objectives 1.1 Required reading 1.1 Nature of risk 1.1 Loss and the two dimensions of risk 1.2 Subjective nature of risk 1.3 Hazard vs risk 1.3 Types of engineering risks 1.5 People risks 1.6 Asset risks 1.7 Environmental risks 1.9 Liability risks 1.10 Business interruption risks 1.11 Project risks 1.12 Summary 1.13 Exercises 1.13 References and further reading 1.14 Suggested answers
1.1 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PR E V I E W
INTRODUCTION
This topic examines the concept of risk. The emphasis is on engineering risks associated
with industrial activities, and not on the commercial risks of financing and money
management (which are dealt with in Unit 406 Corporate Finance), the risks associated
with insurance or a detailed legal appreciation of negligence and liability (which is dealt
with in Unit 202 Legal Studies).
We will begin by discussing the nature of risk and explaining how a risk differs from a
hazard. We will then discuss the various types of engineering risks including people risks,
asset risks, environmental risks, liability risks, business continuity risks and project risks.
This will lead us logically to Topic 2, where an overview of the issues related to managing
engineering risks is outlined.
OBJECTIVES
After studying this topic you should be able to:
define the terms 'risk' and 'hazard' and explain how they differ
recognise that there is no such thing as 'zero' risk
describe the different types of engineering risks
identify hazards, potential loss events and types of risks in a given scenario.
REQUIRED READING
There is no additional reading required for this topic.
NAT U R E O F R I S K
Risk is a very broad concept and means different things to different people. Here are three
examples.
a) Risk as perceived by a safety professional
A safety professional may interpret risk in a given industrial facility as the likelihood
that a major fire or explosion, structural failure, machine malfunction or human error
will occur with possible consequent injury or fatality.
b) Risk as perceived by a production manager
A manager in charge of production operations may see risk as the likelihood that a
major business interruption will occur, resulting in loss of production, because of an
accident, equipment breakdown, or industrial dispute.
c) Risk as perceived by a fund manager
A fund manager may interpret risk as fluctuations in the market (a combination of both
positive and negative outcomes), bond rate and interest rate variations, and volatility in
foreign exchange rates that could undermine the value of the investment, or affect
overseas borrowing, against which hedging is necessary.
1 .2 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Whilst perceptions and interpretations of risk may vary, the above examples illustrate three
facets of the nature of risk:
risk is associated with some form of 'loss'
risk involves two different dimensions—severity (of consequence) and likelihood
risk is often subjective. We will now explore these points in more detail. LOSS AND THE TWO DIMENSIONS OF RISK
Historically risk has been associated with some form of harmful loss such as:
loss of life or quality of life
loss of physical assets or infrastructure
loss of money
loss of environment. Regardless of the type of loss, risk involves two separate dimensions:
the severity or magnitude of the adverse consequences of the loss event
the likelihood or chance of the loss event occurring. It is essential that the technologist or risk manager appreciate both of these dimensions
because this leads to a two-pronged approach to managing risks—namely minimising the
severity or magnitude of a loss event, and minimising or eliminating the likelihood of the
event. The following definition of risk incorporates both the concept of loss and the
two-dimensional nature of risk.
Definition—Risk Risk is the chance of something happening that will have an impact upon objectives. Risk
is measured in terms of a combination of the consequences of an event and their likelihood.
(AS/NZS 4360:2004).
Let's apply this definition to some engineering examples.
a) Large oil tankers transport crude oil from production fields to the oil refineries in many
parts of the world. If there is an accidental release of oil, there is potential for major
environmental damage as was seen in the Exxon Valdez incident in Alaska, and the
incident involving a Spanish tanker in the Shetlands, off the coast of Scotland. In this
context, the risk in large tankers carrying oil could be characterised in terms of the
value of the oil lost, the damage it causes (severity of consequences), and the likelihood
of such an event occurring in a given time period.
b) Hundreds of people work in underground mines every day across the world.
Underground mining is associated with certain risks: for instance, the potential for
serious injury or fatality by roof fall. The mining company might use the following
criteria to measure such risks:
Likelihood of an accident resulting in serious injury to an employee in a given time
period (e.g. one year).
Likelihood of an accident resulting in the death of an employee in a given time
period (e.g. one year).
1.3 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
c) A mineral processing company has a production target to be met for the year. One of
the important steps in the operations is the crushing of raw material ore to size for
further processing. A large rotating ball mill is used to crush the ore. If a major failure
occurs in this section of the plant, the downstream processing will have to shut down
and considerable loss of production could occur. The following criteria might be used
to measure the risk.
Likelihood of 10% loss of production for one week.
Likelihood of total loss of production for one month.
d) A construction company has a contract to complete a railway overpass that can carry
heavy vehicle traffic. The project is to be completed by an agreed date and a cost
penalty applies for delays. The integrity of the installation is critical as the
consequential costs of a structural failure are very high. The construction company can
adopt a number of risk measures such as the following:
Likelihood of project completion being delayed by a specified period (one or two
months).
Likelihood of budget overrun by 15%.
Likelihood of a structural failure during the operational life of the overpass. SUBJECTIVE NATURE OF RISK
Risk is an abstract concept; it does not exist the way a thing or a physical attribute such as
size does. We often talk of 'estimating' the risk of a given situation by using information
from the past to predict the future, but in reality there is rarely sufficient, applicable data for
such estimates to be accurate. This means that risk analysis essentially involves estimating
uncertainty using the concept of likelihood. So risk is almost always an assigned quantity
that acquires credibility only by consensus. The consensus is most often professional and
managerial, but community and legal consensus usually underpins these opinions.
The subjective nature of risk raises many questions about the reliability of risk analysis. For
risk analysis to be meaningful, the assessment of a given risk must be considered relative to
that of other risks. HAZARD VS RISK
The terms 'hazard' and 'risk' are often wrongly used interchangeably. It is essential to
understand the difference between these two terms because both are used in risk
management.
Definition—Hazard Hazard may be defined as a source of potential harm or a situation with a potential to
cause loss. (AS 3931:1998 and AS/NZS 4360:2004).
1 .4 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Some examples of hazards include:
Smoking in bed in domestic dwellings and hotel rooms. This has the potential to cause
a fire and toxic smoke which can result in fatalities. In 1974, this was the cause of a
major hotel fire in Seoul, South Korea, which resulted in 88 fatalities.
Storage of large quantities of LP gas in a depot. A leak and ignition has the potential to
cause a major explosion and loss of life. In 1984, such an explosion in Mexico City
caused more than 450 fatalities and 7 000 injuries.
Storage of toxic gas in a chemical factory. A leak and dispersion downwind could
cause serious injury and possibly death among the exposed population. The leak of
methyl isocyanate gas from the Union Carbide pesticide manufacturing plant in Bhopal,
India, in 1984, resulted in at least 15 000 fatalities and 150 000 injuries.
An object falling from a height; e.g. a tool on a construction site. This can injure or kill
a person below.
Two aircraft on the same runway in an airport. Each plane represents a hazard to the
other. This could result in a collision with multiple fatalities and the loss of both
planes, as happened in the Canary Islands in 1977 when a KLM jet collided with a
PanAm jet in dense fog. There were 583 fatalities and 61 people injured.
Derailment of a commuter train. In 2003 a train travelling at excessive speed at
Waterfall on the outskirts of Sydney resulted in 7 deaths with 42 people injured (out of
a total of 49 people on board).
Production and storage of chemicals. On November 13, 2005 there was a series of
explosions at the No.101 Petrochemical Plant in Jilin City, Jilin Province, China. The
explosions killed five people, injured dozens, and caused the evacuation of tens of
thousands of residents. The blasts created an 80 km long toxic slick in the Songhua
River, a tributary of the Amur. The slick passed into the Amur River and into Russia
over subsequent weeks. Water supplies to millions of people in Harbin and other cities
were disrupted.
The essential point to note here is that a hazard is a potential and is not an actuality. In
other words, a hazard may not be realised if it is managed and kept under control.
You will also note that in all the examples of hazard above, there is no mention of
likelihood. This comes under the purview of risk.
The difference between a hazard and a risk can be seen clearly by thinking of a situation
and asking the following questions:
What can cause harm? (Hazard)
What are the adverse consequences if the hazard were realised? (Loss event)
How serious would these consequences be? (Severity, one dimension of risk)
How likely is it that the hazard could be realised? (Likelihood, the second dimension
of risk)
Have sufficient measures been adopted to reduce the likelihood of the hazard being
realised and/or to mitigate the severity of its adverse consequences? (Risk control)
1.5 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
TY P E S O F E N G I N E E R I N G R I S K S
All industrial activities involve risks. While the risks can be kept under control and
minimised, they cannot be totally eliminated without abandoning the activity altogether.
For instance, underground mining or offshore oil and gas production have certain intrinsic
risks due to the nature of the environment in which the activities are carried out. The only
way to achieve zero risk in these activities is not to carry out the activities at all.
There are many different types of risks which reflect various facets of an organisation's
operations. It is important to identify which risk types are applicable before undertaking a
risk analysis.
The main types of engineering risks are risks affecting:
people
assets
the environment
liability
business continuity
projects.
This is not an all-encompassing list and could be extended to include things like reputation,
competitive edge and information.
Table 1.1 provides an overview of each of these types of engineering risks. A discussion of
each of these follows. It should be noted that each risk type interlinks and overlaps with
others, and cannot be considered in isolation.
Table 1.1: Overview of engineering risk types
People Assets Environment Liability Business continuity
Projects
Injury Fatalities Illness or
disease
Direct losses: Damage to
buildings or plant Theft and
pilferage Indirect losses: Drop in
property value Drop in share
price Drop in
product value
Air pollution Water pollution
(surface, groundwater) Soil
contamination Loss of habitat Land and water
degradation
Contract default Omissions Legal Bankruptcy Employee
Failure of equipment Property loss Liability issues Industrial
disputes Sudden loss of
key employees Supplier failure
Budget blowout Completion
time blowout Contract
default by third party Political risk Project
financing problems Project
failure
Overlaps liability risk
Overlaps business continuity risk
Overlaps liability risk
Overlaps people, environment and project risks
Overlaps people, asset and liability risks
Overlaps environment, liability and business continuity risks
1 .6 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PEOPLE RISKS
People risks affect employees, contractors, other persons in the workplace (e.g. visitors) and
members of the public. They arise from unsafe environments, unsafe systems of work and
unsafe equipment and/or materials. People risks are generally described in terms of the
following adverse consequences of exposure to hazards:
the so-called 'near miss' i.e. the null outcome
workplace injury
workplace fatality
occupational illness or disease.
Most exposures to hazards result in a near miss and no damage. For example, a person
tripping over a small object may stumble but not actually fall or sustain an injury.
Injury
When a workplace injury occurs from an exposure to a hazard it is usually described in
terms of the type of injury, the extent of the injury, the part of the body affected and the
level of medical intervention required: for example, a minor facial cut requiring first aid or
a serious leg crush injury requiring medical intervention and amputation. Other terms used
may include lost time injury, temporary disability and permanent disability.
The tangible costs to an organisation from workplace injuries are generally reflected in the
premium paid for worker's compensation insurance. This covers the salary for time lost and
medical treatment as well as rehabilitation and related expenses. Note that it has been
estimated that the true cost of an injury is at least ten times the compensation costs due to
such things as lost production, investigation time, reporting time and training time to train a
replacement employee.
Fatalities
A workplace fatality negatively affects the morale of other employees and generates adverse
publicity for the organisation. If there are multiple fatalities, the ramifications for the
organisation can be devastating.
Example 1.1
In 2004 an explosion at BHP Boodarie Iron in Western Australia killed one worker
and seriously burned three others. The regulatory authorities immediately issued
BHP with a notice requiring they demonstrate that they could operate the plant safely
before they would be allowed to restart production. Production never re-started. In
2006 BHP commenced demolition of the $2.6 billion plant.
Illness or disease
Illness or disease can result from a number of hazards:
use of chemicals in the workplace and potential for worker exposure
exposure to substances that cause long-term effects such as lead, silica and asbestos
exposure to excessive noise from rotating machinery or construction equipment which
can result in permanent hearing loss
exposure to blood-borne pathogens or micro-organisms that can cause human infection
such as Legionnaire's disease.
1.7 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
In order to determine whether long-term exposure to a substance presents a risk to health,
the actual exposure usually needs to be quantified. Measuring worker exposures is the
domain of the occupational/industrial hygienist.
If an incident impacts on the health and safety of members of the public it can have major
ramifications for the organisation. The reputation of the company can suffer, affecting its
ability to stay in business.
Example 1.2
In 1986, a meltdown in one of the nuclear reactors at Chernobyl in the Ukraine
resulted in high levels of radioactive fallout over a very large area surrounding the
plant. There was an immediate loss of 28 lives due to acute radiation sickness
amongst workers involved in the emergency response. The airborne radioactive
fallout extended to many European countries, contaminating crops, animals and
water supplies. Even reindeer herders in the arctic regions of Scandinavia had their
livelihood threatened by radioactive contamination of lichens on which the animals
graze. Over 4 000 cases of thyroid cancer, mainly in children, have been attributed
to exposure to radioactive iodine following the accident. The plant ceased
operations and there is still an ongoing international effort to make the plant safe for
the future.
Example 1.3 In 2000, there were 101 cases of Legionnaire's disease among individuals who were
at or near the new Melbourne Aquarium between 11 and 25 April, making this
Australia's largest Legionnaire's outbreak. The disease claimed the lives of two
women aged 79 and 83. Two men aged 77 and 83 also died of the disease, but
health authorities could not confirm that their illnesses were associated with a visit to
the aquarium. The outbreak was caused by high levels of legionellae in the
aquarium's cooling towers. The Melbourne Aquarium replaced the water-cooled
air-conditioning system with an air-cooled system after the outbreak.
ASSET RISKS
Most organisations face the risk of loss of assets, although an industry with large sources of
hazardous materials or potentially damaging energy will generally have a higher exposure to
asset risk than an office-based organisation, unless the business of the latter is dealing with
property. Asset losses can be divided into two major sub-categories: direct losses and
indirect losses.
Direct losses
Direct losses of assets mainly take the form of:
damage to buildings or plant
theft and pilferage.
Damage to buildings or plant mainly arises from either industrial accidents such as fires in
warehouses and explosions in industrial plants, or from natural disasters such as storms,
floods and earthquakes. Theft and pilferage mainly arise from a breach of physical security
or a breach of 'intellectual security', i.e. industrial espionage.
For many engineering organisations, direct losses arising from damage to buildings or plant
tend to be greater than direct losses arising from theft and pilferage. However, if a breach
1 .8 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
of security results in sabotage or arson, the magnitude of loss could be much higher.
Equally, the cost of breach of intellectual security in an information technology (IT)
company can be very high.
Example 1.4
In 2003 at a BP refinery in Texas City, a series of explosions occurred during the
restarting of a hydrocarbon isomerization unit. Fifteen workers were killed and
about 170 others were injured. The explosions occurred when a distillation tower
flooded with hydrocarbons and was over-pressurised, causing a geyser-like release
from the vent stack.
Indirect losses
Indirect losses generally occur as a secondary effect and can be associated with a
non-property type of risk. The causes of the indirect losses may be internal or external to
the organisation. Indirect losses mainly take the form of:
drop in property value
drop in share price
drop in product value.
A drop in property value may occur for a number of reasons. Rapid changes in technology
can cause an organisation's assets in plant and equipment to become worthless if the
technology is completely superseded.
Example 1.5
In the 1950s and early 1960s, Gestetner of Germany invested significant capital in
the manufacture and distribution of stencil reproduction machines. Manuscript typed
from a typewriter on special stencil papers could be passed through a printing
process to make copies of the typed manuscript. The advent of photocopiers made
this technology obsolete almost immediately.
The value of land purchased for development will drop significantly if it is subsequently
discovered that the soil and possibly the groundwater table underneath has been
contaminated with chemicals during previous use. Land and physical assets can also be
rendered worthless by industrial accidents.
Example 1.6
Following the toxic gas leak from the Union Carbide pesticide manufacturing plant
in Bhopal, India, the plant was forcibly closed. Physical assets such as plant and
equipment had to be written off.
A drop in a company's share price most commonly occurs as a consequence of poor profit
performance, but it may also occur as a consequence of an industrial accident that damages
a company's reputation and results in subsequent legal and financial liabilities.
Example 1.7
Following the chemical accident at Bhopal, the share price of Union Carbide fell on
the New York stock exchange, mainly from speculation on the amount of liability
compensation that the company might have to pay. The share price recovery took
quite a few years.
1.9 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
A drop in the market value of an organisation's products can occur for many reasons. For
example:
If an automobile manufacturer or food manufacturer is seen to be regularly issuing
recall notices on defective products, consumer confidence in the company's products
will fall, along with the value of the products.
New products of next generation technology will cause the value of old products to fall.
Increased competition in the marketplace may permanently lower the sales price and
thus the value of products.
Food contamination scares, whether real, imagined or hoax, can lead to a loss of
consumer confidence and hence lost sales.
Example 1.8
The Australian beef industry lost a huge share of its main market when Japanese
consumers turned away from beef due to the emergence of 'mad cow' (Creutzfeldt-
Jakob) disease in a number of Japanese cattle.
ENVIRONMENTAL RISKS
Since the 1980s, organisations such as Greenpeace and Friends of the Earth have been
successful in raising public awareness of environmental risks and have encouraged many
companies to make environmental issues part of the decision-making and risk management
processes. In most developed countries today there are laws to protect the environment
from industrial processes and industrial accidents.
Risks to the environment mainly arise from land and water degradation, loss of habitat, air
pollution, water pollution and soil contamination. The longer-term consequences of these
types of risks present a major challenge for organisations. Unlike loss of assets, which can
be quickly replaced, damage to the environment almost invariably takes a long time to
repair. This means that clean-up, restoration and monitoring costs can be extremely high.
Example 1.9
In 2000, a breach in the tailings dam of a gold mine in Romania, operated by the
Romanian Government and the Esmeralda Company, released some 100 000 m3 of
cyanide-rich tailings waste into the Somes River. The cyanide found its way into the
Danube River, affecting aquatic life in Romania, Hungary and Yugoslavia.
Example 1.10
Leaks from underground storage of tanks for petroleum products and chemicals can
result in soil contamination. In some cases, there has been migration of polluted
rainwater to the groundwater aquifer.
Example 1.11
In 2006 in Indonesia a mishap at an exploratory oil well resulted in sulphurous hot
mud inundating a large area with over one million cubic metres of mud. Over 8 000
people were displaced and there was major disruption to business and commerce.
The Indonesian government declared that the company responsible would have to
pay all costs associated with the environmental and economic damage.
1 .10 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
LIABILITY RISKS
Some level of overlap exists between liability risks, people risks and environmental risks.
For example, environmental damage or an injury to a member of the public from an incident
carries a liability for the organisation under statute law (acts and regulations) and/or
common law.
Contract default
In many engineering enterprises, part of or all project work is contracted to external firms.
Whilst the contractor carries a liability risk for contract default on requirements such as
deadlines or quality of deliverables, the organisation also carries a liability risk because
contract default can cause things like increased interest payment on borrowing, depreciation
on non-performing assets, or loss of market share due to delays, all of which may not be
recovered through liability claims alone.
With more and more public and private organisations outsourcing goods and services, the
risk of contract default is becoming a serious issue.
Omissions
Omissions on the part of a goods or services provider carry liability risks. The omission
could be intentional or through negligence. If an organisation designs a bridge, and there
are design faults in the project resulting in a failure of the structure, a whole range of
liabilities arises. These include financial liability in rebuilding to a correct design,
compensation for the injured, and legal costs and possible penalties or damages associated
with criminal and/or negligence charges.
Legal
Legal liability may arise from the following:
common law claims on the company by a third party
industrial accident that requires coronial inquiry or inquest
prosecution by a government agency for breach of Occupational Health and Safety
(OHS) legislation
product defects that threaten the safety of the consumer (for example defective toys that
could affect child safety)
third-party damages arising from a firm's industrial activity; these may arise from
injury, environmental impairment, loss of amenities etc.
The major consequences of legal liability are legal costs, cost of complying with injunctions
and court orders for specific performance, money for settlements, fines and compensatory
damages. Legal costs include not only the cost of legal representation but also the cost of
the time of company staff in assisting legal counsel to prepare the case. The latter usually
far exceeds the former.
Bankruptcy
An organisation's inability to meet its liabilities would place it under receivership, and
ultimately result in bankruptcy. For the purposes of this unit we are not concerned with
bankruptcy arising from an organisation's poor commercial performance, but rather with
bankruptcy arising from the cost of liability risks.
1 .11 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Employee liability
In certain cases employees, as individuals, can be held liable. For example, there have been
a number of instances where managers or supervisors have been prosecuted for breach of
OHS law. Senior managers are increasingly being targeted by law enforcement agencies.
Example 1.12
The Enschede fireworks disaster in 2000 in the Netherlands was caused by a fire. In
the series of explosions that followed, 22 people were killed, 947 were injured and
about 2 000 homes were destroyed. The two managers of the company were later
sentenced to 15 months imprisonment for violation of environmental safety
regulations and dealing in illegal fireworks.
BUSINESS CONTINUITY RISKS
There is considerable overlap between business continuity risk and the other risks
previously discussed, as each of those could bring about an interruption to business.
Business continuity risks include:
Failure of critical equipment. If the facility does not carry the spare parts to carry out
repairs, or if the entire equipment item needs to be replaced, there may be considerable
lead time for delivery/installation.
Property loss caused by fires or explosions. Significant delays are likely to occur
before production can recommence due to investigations, insurance loss adjustment and
claims processing, as well as the lead time for replacing equipment.
Liability issues causing a temporary halt in operations. If a product defect is identified,
production may have to be suspended until the cause is identified and corrected.
Liability issues causing the permanent closure of the business. This is part of the
bankruptcy risk.
Industrial disputes.
In smaller organisations, the sudden loss of a few key employees (e.g. by resignation).
This may seriously upset operations until suitable replacements can be found. In large
organisations this risk is often less severe because staff may be able to be redeployed
from other areas of the organisation.
Failure of a supplier, particularly a sole supplier.
Example 1.13
In 1999, an explosion at the Esso Longford gas plant left the whole of Victoria
without gas for over two weeks as well as killing two workers. Parts of the facility
remained closed for some time due to investigations and the time taken to repair and
replace the plant. It also resulted in major interruptions for restaurants and other
businesses across Victoria. Subsequently, Esso was convicted of breaches of OHS
legislation and fined $2 million. The company also faced a huge class action under
common law by affected businesses which resulted in Esso having to pay damages of
$32.5 million. Loss to industry during the crisis was estimated at $1.3 billion.
Example 1.14
In 1998 after four power cable failures, Mercury Energy Limited, the major
distributor of electrical power to the City of Auckland in New Zealand, announced it
could no longer supply power to the central business district (CBD) of Auckland.
The disruption to supply and consequently to business in the CBD lasted several
months.
1 .12 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PROJECT RISKS
At the outset of a project it is essential to clearly understand and plan for the associated
risks. Some of the risks discussed above would be present as part of overall project risk.
Key project risks include:
Project budget blowout. If the project is in its early stages, this may cause the project
to be abandoned as the projected return on investment may be lowered significantly.
Project completion time blowout. This can result in financial loss due to interest
payment on non-performing capital, and any cost penalties for delivery delays in the
contract.
Contract default by third-party services. While this can be partially covered by liability
clauses in the contract, it would cause a blowout in both the cost and completion time
of the project.
Political risk. External interest groups with political influence may raise environmental
or other concerns that cause delays, expensive design modifications or the
abandonment of a project that is otherwise economically sound.
Project financing problems. If sources of finance collapse or fail to materialise, the
delay or abandonment of the project is inevitable.
Example 1.15
In the late 1980s Associated Pulp and Paper Mill (APPM) planned to build a pulp
plant at Wesley Vale in Tasmania. The Greens political movement generated
significant public controversy over effluent discharges to the ocean, especially
organo-chlorines from a chlorine bleach process, and after lengthy debates the
company abandoned its plan for the paper pulp plant.
Example 1.16
In 1986 Bayer Australia proposed to build an agricultural and veterinary chemicals
formulation facility on the Kurnell Peninsula in Sydney. Local residents expressed
considerable concern about the concentration of chemical, oil and gas facilities on
the peninsula, and the potential for toxic chemicals from the Bayer facility to reach
Botany Bay and threaten the local oyster industry. The environmental controls
subsequently imposed on the company were so severe that it decided the project was
not economically viable and abandoned the Kurnell site for the project.
A C T I V I T Y 1 . 1
List the major activities of your organisation and identify the hazards, potential loss
events and types of risks associated with each activity. Summarise your findings in a
table such as the one shown below.
Activity Hazards Potential loss events Risk types
Retain your list for Activity 2.1 in the next topic.
1 .13 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
SUMMA RY
In this topic we discussed the nature of risk and noted three critical points:
risk is associated with some form of 'loss'
risk involves two different dimensions—severity and likelihood
risk is often subjective.
We then discussed the difference between a hazard (a source of potential harm) and a risk
(the chance of something happening that will have an impact upon objectives). We
concluded the topic with an examination of the most common types of risks that can affect
engineering organisations, including some real life examples. EX E RC I S E S
1.1 Hazard, loss event and risk identification
Identify the hazards, potential loss events and types of risks arising from the following
activities. State any assumptions you make.
a) Storage of chlorine gas for public swimming pool disinfection.
b) Delivery of LP gas from bulk tanker to suburban automotive retail outlet.
c) Handling heavy items by crane for construction of a high-rise building.
d) Movement of large oil tankers carrying crude oil supply to a marine terminal.
e) Outsourcing equipment testing and maintenance.
f) Operating a suburban bus transport company.
g) Development of a cross-country high-pressure natural gas pipeline.
h) Provision of catering services to an airline.
i) Project management of bridge construction to a specified load bearing capacity.
j) Transportation of petrol using a bulk road tanker with a leaking valve.
k) Road transport of explosives from armament factory to army magazines.
l) Project management for the construction of an Olympic Aquatic Centre.
m) Development of combat software for computer control in a warship.
1.2 Case study—Tanker spill
A bulk road tanker carrying petrol was travelling along a road that had been partly closed
for road works. Due to inadequate lighting, sign posting and safeguarding, the driver of the
road tanker did not initially notice the road closure. This caused him to manoeuvre too
quickly and his truck overturned, rupturing the tank. The spilled petrol contaminated the
soil around the roadway. The soil was porous and some of the contaminants leached into groundwater used as the
sole source of drinking water for the surrounding community. As a result, the local
residents could not use the groundwater and feared adverse health effects, loss of amenities
and drop in property values. The tanker was owned and operated by separate businesses with separate insurers. There
were delays in sorting out who was to manage and pay for the clean-up costs.
a) Identify all the parties involved in this case.
b) Categorise the types of risks faced by each of the parties using the risk types described
in this topic (people, asset, environment, liability, business interruption and project).
c) Describe the adverse consequences to each party from each type of risk.
1 .14 TO P I C 1 THE C O N C E P T O F
R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
RE F E R E N C E S A N D F U RT H E R R E A D I N G
Bahr, Nicholas J. (1997) System Safety Engineering and Risk Assessment: A Practical
Approach, Taylor & Francis, Washington D.C.
Bernstein, Peter L. (1996) Against the Gods: The Remarkable Story of Risk, John Wiley &
Sons, New York.
Chapman, Chris & Ward, Stephen (2003) Project Risk Management: Processes,
Techniques and Insights, John Wiley & Sons, Chichester.
Gigerenzer, Gerd (2003) Reckoning with Risk: Learning to Live with Uncertainty, Penguin
Press, London.
Perrow, Charles (1999) Normal Accidents: Living with High Risk Technologies, Princeton
University Press, Princeton, New Jersey.
Smith, David J. & Simpson, Kenneth G.L. (2001) Functional Safety. A Straightforward
Guide to IEC 61508 and Related Guidance, Butterworth-Heinemann, Oxford.
Standards Australia (1998) AS/NZS 3931:1998 Risk Analysis of Technological Systems—
Application Guide, Standards Australia/Standards New Zealand, Sydney.
Standards Australia (2004) AS/NZS 4360:2004 Risk Management, Standards Australia/
Standards New Zealand, Sydney.
Standards Australia (2004) HB 436:2004 Risk Management Guidelines: Companion to
AS/NZS 4360:2004, Standards Australia/Standards New Zealand, Sydney.
Storey, Neil (1996) Safety-Critical Computer Systems, Addison-Wesley, Reading,
Massachusetts.
SU G G E S T E D A N S W E R S
EXERCISES
1.1 Hazard, loss event and risk identification
Note: There is no such thing as a single complete answer for this exercise. Your responses
will depend on the assumptions you make about each situation.
No. Hazards Potential loss events Risk types
a Storage of toxic material Leak of chlorine gas causing injury or health problems for staff and pool users
People, environment, liability
b Transferring flammable material
Leak and ignition of gas, tanker collision with bulk tank, overfill of bulk tank and release
People, assets, environment, liability
c Shifting heavy load Dropped load causing injury/ fatality, swinging load causing property damage and injury
People, assets, liability
d Transportation of toxic and flammable material
Oil spill and ignition People, assets, environment, liability, business continuity
e Reliance on supplier integrity
Contractor incompetence or failure to deliver, loss of internal knowledge and skills
Liability, business continuity
f Driving buses esp. in traffic Schedule delay causing inconvenience to users, road accident causing injury/fatality and asset damage/loss
People, assets, environment, liability, business continuity
g Flammability of gas under high pressure, project contract requirements, community perception of project
Community opposition to pipeline, failure to meet contract requirements, pipeline failure, gas release, ignition and major fire, extended interruption to gas supply
People, assets, environment, liability, business continuity, project
h Contract requirements, scheduling, food storage and handling
Inability to deliver food on time and to required quality, food contamination due to poor storage or handling, passenger illness, airline delays or strikes, excessive food wastage
Environment, liability, business continuity
i Contract requirements, financing, site suitability
Cost/time blowouts, quality problems, OHS, problems with financing sources, collapse of bridge during building or after completion causing injury/fatality and property damage, extended traffic interruption
People, assets, environment, liability, business continuity, project
j Transferring flammable material in an unsafe vehicle
Leak, ignition, fire, tanker explosion through escalation, injury, fatality
People, assets, environment, liability
k Transportation of explosives
Road accident, explosion followed by fire, injury, fatality, property damage
People, assets, environment, liability
l Contract requirements, financing, site suitability
Cost/time blowouts, quality problems, defective construction materials, site contamination, OHS, problems with financing sources, collapse of Centre during building or after completion causing injury/fatality and property damage
People, assets, environment, liability, business continuity, project
m Software operability, security of intellectual property
Software is defective, system malfunctions or fails to perform, software falls into wrong hands
Assets, liability, business continuity
1 .2 TO P I C 1 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
1.2 Case study—Tanker spill
a) Parties involved b) Types of risks c) Adverse consequences
Government department responsible for roads
Legal liability (negligence) Sued for poorly laid out road works causing the accident
Road maintenance contractor Legal liability (negligence) Sued for poorly laid out road works causing the accident
Oil company (product owner) Legal liability due to environmental impairment, asset loss
Fined for slow, inadequate advice during the emergency, loss of oil
Petrol transport contractor Legal liability due to environmental impairment liability, asset loss
Fined for slow, inadequate advice during the emergency, loss of tanker
Local public Loss of amenities, loss of quality of life
Contaminated soil
Water supply authority Legal liability (water supply contract)
Contaminated water supply
Environment protection authority
Reputation Criticised for inadequate planning and monitoring
Local government authority Reputation Criticised for inadequate planning and monitoring
TO P I C 2
RISK MANAGEMENT OVERVIEW
Preview 2.1 Introduction 2.1 Objectives 2.1 Required reading 2.1 Approaches to managing risk 2.1 One-dimensional severity control approach 2.2 Two-dimensional severity and likelihood control approach 2.2 Three-dimensional severity, likelihood and cost control approach 2.2 Reasons for managing risk 2.3 Legislative and regulatory requirements 2.3 Common law duty of care 2.5 Commercial reasons 2.6 Evaluating alternative options 2.6 Risk management framework 2.7 Other risk management models 2.10 Risk acceptability 2.11 The ALARP principle 2.11 Rational and emotive issues in risk management 2.12 Summary 2.13 Exercise 2.13 References and further reading 2.15 Readings Suggested answers
2.1 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PR E V I E W
INTRODUCTION
In the previous topic we distinguished between 'hazard' and 'risk' and provided definitions
of both appropriate to specific situations. We now move on to providing an overview of the
general framework in which risk management takes place. We will begin with a discussion
of different approaches to risk management and the reasons why organisations are
increasingly employing a proactive systems approach. We will then examine a framework
for risk management before concluding the topic with a brief discussion of risk acceptability
principles and issues.
OBJECTIVES
After studying this topic you should be able to:
discuss different approaches to managing risk
outline the legal and commercial reasons that organisations use a systematic approach
to managing risk
outline the steps involved in a typical risk management framework
explain the ALARP principle
develop an awareness of the significance and validity of different perceptions of risk
acceptability.
REQUIRED READING
Reading 2.1 'Reducing risks, protecting people'
Reading 2.2 'On the ALARP approach to risk management'
Reading 2.3 'Getting to maybe: some communications aspects of siting hazardous
waste facilities'
AP P ROAC H E S TO M A NAG I N G R I S K
Traditionally, a reactive approach was used to manage risk. For each loss event that
occurred, management reacted by developing countermeasures to prevent a recurrence. The
action was after the event. No attempt was made to systematically identify hazards and
estimate the risks associated with them before an event.
Over time, business and community attitudes have changed and the reactive approach has
ceased to be acceptable. Most large organisations have had to change their approach in
order to survive. However the reactive approach is still not uncommon in small business.
The traditional approach has been replaced by the proactive systems approach which is
undertaken before any loss event has occurred. The objective is to prevent the occurrence
of unwanted events by all reasonably practicable means.
2 .2 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
There are three types of proactive systems approaches to managing risk:
the one-dimensional severity control approach
the two-dimensional severity and likelihood control approach
the three-dimensional severity, likelihood and cost control approach.
ONE-DIMENSIONAL SEVERITY CONTROL APPROACH
The one-dimensional systems approach to managing risk attempts to identify the hazards in
a given scenario and reduce the severity of their adverse consequences if a loss event
occurs. There the effort ends. No attempt is made to estimate the likelihood of a loss event
occurring and reduce this likelihood if it is unacceptably high.
The advantage of this approach is that it is simple; it mitigates the severity of the
consequences of loss events. The disadvantages are that it does little to encourage risk
prevention or assist organisations in determining how to best use their limited risk
management resources. An example is given below to illustrate this point.
Example 2.1
A printing press uses a flammable solvent-based ink for printing. The solvent is
stored in a tank and pumped to the mixing vessel for dilution of the ink to the
required consistency. Solvent vapour is extracted by a ventilation fan from the
printing room.
The main hazard associated with the operation is the flammable solvent. If a
one-dimensional systems approach is applied, risk management will focus on
reducing the severity of the adverse consequences if the solvent catches fire, for
example by installing a sprinkler system. However, nothing will be done to reduce
the likelihood of a fire occurring, for example by better housekeeping, control of
ignition sources, control of spills, or regular maintenance of the ventilation system.
Emergency response measures that are aimed at mitigating the consequences of an
unplanned loss event are typical of the one-dimensional approach.
TWO-DIMENSIONAL SEVERITY AND LIKELIHOOD CONTROL APPROACH
The two-dimensional systems approach to managing risk attempts to identify the hazards in
a given scenario and estimate both the severity of the adverse consequences if a loss event
occurs and the likelihood of such an event occurring. Acceptability criteria are then applied
to determine the appropriate risk control measures that should be taken. However, the cost
of these control measures is not considered.
THREE-DIMENSIONAL SEVERITY, LIKELIHOOD AND COST CONTROL APPROACH
The three-dimensional systems approach to managing risk is a logical extension of the
two-dimensional approach. It includes the two dimensions of severity and likelihood, and
adds a third dimension, risk control costs.
2.3 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
All risk control measures involve a cost penalty, but the return on this investment is
loss-free operation of the business. However, a curve of risk versus cost of risk control
would be asymptotic, meaning that beyond a certain point there are diminishing returns as
expenditure increases.
The three-dimensional approach to managing risks involves conducting a cost-benefit
analysis of different control measures for a given risk and selecting the optimum option
based on the best return for the 'risk' dollar. This enables organisations to use their risk
dollars to control the maximum number of risks to the best effect rather than needlessly
using them to control only one or two risks. This is an important element of risk
management decision-making and will be discussed further in Topic 6.
RE A S O N S F O R M A NAG I N G R I S K
In the previous section we discussed how organisations have moved to a systems approach
to managing risk in order to survive in a changing world. Let's now examine some of the
reasons why this shift has occurred.
LEGISLATIVE AND REGULATORY REQUIREMENTS
In all industrialised countries and most developing countries there is some form of
legislation that governs various aspects of risks from industrial operations and requires
organisations to protect the health and safety of employees, the public and the environment.
Failure to comply with such legislation can lead to the prosecution of the company and, in
some cases, its directors and employees.
In Australia, legislative and regulatory requirements vary from State to State and may be
broadly divided into three groups.
Group 1: Protection of people in workplaces
Occupational health and safety Acts and Regulations
Exposure levels for airborne contaminants in the workplace
Risk management of major hazard facilities
Storage and handling of dangerous goods and hazardous substances
Fire protection and building regulations
Acts and Regulations regarding electrical safety, gas safety and radiation safety.
Group 2: Protection of the public and public health
Planning/zoning regulations
Design codes and standards
Siting of hazardous industries in relation to land use safety
'Safety case' requirements for major hazard facility operators addressing public safety
issues
Health risk regulations for contaminated land and contaminants in surface/groundwater
Drinking water quality standards
Surface water quality standards
Regulations covering cooling towers, public amusement equipment and fireworks.
2 .4 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Group 3: Protection of the environment
Air, water and noise control regulations
Environmentally hazardous chemicals control
Contaminated land management
Waste generation and disposal
Various other pollution control regulations.
The number of regulations is vast and it is beyond the scope of this unit to provide specific
references for every country or state. Some of the more important examples are given
below and selected websites are provided at the end of the topic.
In Australia, the National Occupational Health and Safety Commission has published a
National Standard and Code of Practice for the Control of Major Hazard Facilities
(NOHSC Australia, 1996), but it is not mandatory. Many jurisdictions have adopted, or are
in the process of adopting, safety case legislation for major hazards and specific areas such
as gas, rail and offshore petroleum.
The European Commission has developed legislation for the EU Community that includes
the environment, consumer and health protection. Member countries have developed
regulations to address these issues. The main framework for control of major hazards is the
Seveso II Directive [96/082/EC] December 1996.
In the United Kingdom, major hazards are controlled by the COMAH (Control of Major
Accident Hazards) Regulations (1999) administered by the UK Health and Safety
Executive. This is in response to the Seveso II Directive of the EC. The Health and Safety
at Work Act and its associated Statutory Instruments cover a very wide range of activities.
Major hazard regulations require facility operators to identify the hazards posed by their
facility, the potential effects of these hazards, both on-site and off-site, including the
severity and likely duration, and the control measures the operator has in place to prevent
major incidents and limit their consequence to persons and environment. They also require
operators to prepare on-site emergency plans and to collaborate with the local authorities in
the preparation of off-site emergency plans.
In the USA, there is no federal equivalent of the COMAH Regulations in the UK and the
control of major hazard facilities is dealt with by individual state regulations. The
Occupational Safety and Health Act of 1970 (with amendments), and associated regulations
and standards govern health and safety at work, and are administered by the Occupational
Safety and Health Administration (OSHA). Public health and land uses are protected by a
set of environmental acts and regulations administered by the US Environment Protection
Agency (US EPA), of which the following are relevant:
Emergency Planning and Community Right-to-know Act
Toxic Substances Control Act
Resource Conservation and Recovery Act (Hazardous Waste Regulation).
A C T I V I T Y 2 . 1
Using the list of organisational activities that you prepared in Activity 1.1, list the
safety and environmental acts and regulations applicable to your organisation's
operations. Focus on the specific site you are involved in, or if you work at
corporate level, choose one of the operating sites. Wherever possible, identify the
specific legislation applicable.
2.5 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Set up this list as a file to which you can add information as you proceed through this
unit, and check your list with relevant staff in your organisation (e.g. legal staff,
safety staff, colleagues). Producing a complete and accurate list is a difficult task (as
is keeping it up-to-date), but one well worth starting, even if you are not able to
complete it on your own.
COMMON LAW DUTY OF CARE
In those countries with an English common law heritage (especially the UK, US, Canada
and Australia), in addition to complying with legislation there is an all-embracing common
law 'duty of care'. Common law actions arise when one party who has suffered harm sues
another party whom they believe caused the harm in order to recover damages. In the event
of an accidental event, an organisation must be able to demonstrate that all reasonable care
has been taken in identifying the hazards and risks associated with the facility and its
operations, and that, on the balance of probability, adequate hazard control measures have
been put in place. This principle is illustrated in Figure 2.1.
Figure 2.1: How would a reasonable defendant or utility respond to the foreseeable risk?
Source: Sappideen & Stillman, 1995: 22.
Where the duty of care has not been visibly demonstrated, a company may be found
negligent, and therefore liable for damages, should an incident occur from its commercial
activities resulting in serious harm to people, property, business or the environment.
The overall situation is perhaps best summarised by Chief Justice Gibbs of the High Court
of Australia:
Where it is possible to guard against a foreseeable risk which, though perhaps not great, nevertheless cannot be called remote or fanciful, by adopting a means which involves little difficulty or expense, the failure to adopt such means will in general be negligent.
Turner v. The State of South Australia (1982) (High Court of Australia before Gibbs CJ, Murphy, Brennan, Deane and Dawson JJ).
In later topics we will see how duty of care is reflected in managing safety and
environmental risks in particular.
Magnitude of risk Probability of occurrence
Severity of harm
Expense
Difficulty and inconvenience
Utility of conduct
2 .6 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
COMMERCIAL REASONS
There are strong commercial reasons for maximising business continuity and minimising
equipment and property damage. A systematic risk assessment not only identifies the
hazards, but also helps to rank the allocation of resources in a cost- and time- effective
manner. Such an approach also assists in minimising the organisation's insurance costs.
Example 2.2
A gas producer has been contracted to supply natural gas to a power generation
utility. The contract is to supply gas to meet the consumer demand for at least 98%
of the time. This is an onerous task, as downtime in gas supply can occur from time
to time due to breakdown of gas well control equipment or gas processing plant
equipment.
Minimising downtime requires an assessment of the reliability of the gas supply
system design, the level of redundancies built into the design to cope with
breakdowns, the spare parts management, and maintenance planning. Without a
systematic reliability study, it would be difficult to develop a design to meet the
contractual obligations.
The study would also provide input into the optimum level and type of redundancy
required and the type of maintenance philosophy that should be adopted. These
decisions would have a significant impact on the overall capital cost of the project. EVALUATING ALTERNATIVE OPTIONS
In project feasibility studies, several alternative options are often initially considered. For
facility-related engineering projects, the options may be related to the site for the facility,
the process technology to be adopted, logistics of raw material supply and product
distribution, availability of skill base, etc. The final shortlist of options is generally based
on location and commercial considerations.
An assessment of the risks associated with each of the options provides an additional
dimension of input to decision-making process. It is possible that the options initially
arrived at may have to be reconsidered, based on risk.
Example 2.3
A producer of animal health and veterinary chemicals decided to construct a new
formulation plant near a major metropolitan area. Three possible locations were
selected. All the locations were suitable in terms of area of land, land prices and
proximity to markets.
Before making a final decision on purchasing a specific piece of land, the company
decided to undertake a preliminary risk assessment study of the impact of the
proposed plant on the surrounding areas. For near identical operations, each of the
sites revealed quite different aspects of risk related to environmental issues
(proximity to sensitive waterways) and transportation issues (movement of chemicals
along highly populated thoroughfares). It also became apparent that the costs of
mitigating the risks in the three sites were so different that, when these costs were
included in the cost–benefit analysis of the project, there was only one clear winner.
If a risk management survey had not been undertaken, and a piece of land had been
purchased without this additional dimension allowed for, the project might have
become financially non-viable and it could have been difficult to obtain the
necessary planning and environmental approvals from statutory authorities.
2.7 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
In infrastructure projects there may also be a number of options. For example, in order to
eliminate a railway level crossing, consideration may be given to building a rail bridge over
an existing road, building a rail tunnel under an existing road, building a road bridge over
an existing rail or building a road tunnel under an existing rail. Each of these solutions may
result in differing levels of risk for trains, vehicles and pedestrians.
RI S K M A NAG E M E N T F R A ME WO R K
The following risk management framework is based on the standard hazard-based risk
management models available in the literature. The framework represents a
three-dimensional systems approach to risk management and consists of seven broad steps
that underpin the remaining topics in this study guide.
Figure 2.2: Risk management framework
– Policies– Safety management system– Environmental management system– Emergency management plan– Training– Auditing– Quality management system– Perceptions– Communication
Step 1:Define system and riskmanagement objectives
Step 2:Identify hazards and potential
loss events (Topic 3)
Step 5:Measure and rank risk
(Topic 5)
Step 6:Make decisions
(Topic 6)
Step 7:Manage residual risk
(Topics 7-10)
Step 3:Estimate severity of
consequences (Topic 4)
Develop additionalprevention/mitigation
measures
Step 4:Estimate likelihood
of occurrence (Topic 5)
Is risk at orbelow ALARP
level?(Topic 6)
No
Yes
2 .8 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Step 1: Define system and risk management objectives
What is the system within which we want to manage the risks, and what are its boundaries?
What are our risk management objectives? The system may be a whole organisation, a
single department or an individual project (e.g. construction of a new bridge). The risk
management objectives may take many forms, depending on the various aspects of risk. For
example, a design safety objective can be that a bridge should be capable of sustaining
existing plus projected increases in load without failure for a period of 100 years.
Step 2: Identify hazards and potential loss events
This step is sometimes referred to as hazard identification and is the most critical of the
entire risk management process. If a hazard is not identified at this stage it is unlikely to be
addressed at all.
A number of techniques are available for identifying hazards or potential loss events. These
include:
Past experience
Checklist reviews
Hazard and operability study (HazOp)
Failure modes and effects analysis (FMEA)
Failure modes, effects and criticality analysis (FMECA)
Preliminary hazard or safety analysis
Scenario-based hazard identification.
No single technique is capable of identifying the hazards for all situations. Depending on
the system, a combination of two or more techniques should be used. We will discuss each
of the above techniques in detail in Topic 3.
Step 3: Estimate severity of consequences
Once the various hazards that could result in loss events are identified, the next step is to
estimate the severity of their adverse consequences. This could be the severity of an injury,
the cost of compensation and working days lost, the level and cost of asset loss or business
interruption, the extent of environmental damage and consequent clean-up costs, the level of
damage to reputation, the cost and flow-on effects of contract default, possible bankruptcy,
and so on.
For engineering risks, sophisticated mathematical techniques are available for estimating
severity. We will investigate a range of these techniques in Topic 4.
Step 4: Estimate likelihood of occurrence
This step is the principal contributor to uncertainty and subjectivity in the risk assessment
process, because there is often inadequate data for statistical validity.
The best likelihood estimates are based on statistically reliable historical data. However,
historical estimates can only be used for future predictions if the circumstances under which
the historical events occurred have not changed, i.e. design, operations and maintenance
philosophy, management systems, etc.
For major consequence events (e.g. major fire or explosion, structural collapse, dam failure)
where historical data is typically scarce, the likelihood of occurrence may be estimated by
logical combinations of a number of contributory causes for which more reliable statistical
data is available. In the comparatively new information technology industry, the estimation
2.9 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
of likelihood is even more difficult as a significant component is software failure/human
error.
If the likelihood of occurrence is quantified, it is desirable to conduct a sensitivity analysis
on the assumptions upon which the value was derived, in order to establish upper and lower
bounds on the estimate.
In situations where a quick estimate of risk likelihood is required, a qualitative rather than
quantitative assessment method may be used. This would be the case when evaluating
alternative options in the early stages of a project. In Topic 5 we will discuss both
quantitative and qualitative estimation methods.
Step 5: Measure and rank risk
For each hazard or loss event, the risk may be measured as a combination of the severity
and the likelihood. The severity gives the consequence per event, and the likelihood gives
the probability of the event occurring per unit of time. Thus, the risk is the occurrence of a
given consequence per unit of time.
For example, if the loss event is a vehicle accident that results in a fatality, and the
likelihood of such an event occurring is 0.000001 per year, then the risk of a fatality from a
vehicle accident is 1 in 1 000 000 per year; if the loss event is an environmental spill that
results in a clean-up cost of $100 000, and the likelihood of such an event occurring is 0.1
per year, then the cost of risk is $10 000 per year.
Once the risk of each hazard or loss event is measured, they may be ranked according to
magnitude. If risk is measured quantitatively, ranking becomes easier as the risk value is
numerically available. We will discuss techniques for measuring and ranking risk in
Topic 5.
Step 6: Make decisions
Based on the information generated in the previous steps, decisions have to be made
regarding how to best manage the identified risks. Considerations include:
Is the risk at or below regulatory requirements? This would apply to people risks,
environmental risks, and some liability risks.
Is the risk low enough in relation to internal risk targets and objectives?
If the risk is higher than acceptable levels, what control measures need to be taken to
reduce the risk, and at what cost?
Should the risk be avoided altogether, and what are the implications?
What is the residual risk after implementation of the risk control measures?
In Topic 6 we will discuss the considerations involved in making risk management
decisions, including the general principle that a risk is to be reduced to levels As Low As
Reasonably Practicable (ALARP). This principle is discussed later in this topic.
Step 7: Manage residual risk
The strategies for managing residual risk will form part of the organisation's overall risk
management system that we will address in Topics 7 to 10.
2 .10 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
OTHER RISK MANAGEMENT MODELS
The risk management framework we introduced in Figure 2.2 is similar to that used in the
Western Australian public sector (Department of Premier & Cabinet WA, 1996) and the
Australian Standard AS/NZ 4360:2004: Risk Management. Figure 2.3 shows the risk
management process described in AS/NZ 4360:2004.
Figure 2.3: AS/NZ 4360:2004 risk management process
Source: AS/NZS 4360:2004, page 13.
Establish the context
The internal contextThe external contextThe risk management contextDevelop criteriaDefine the structure
Identify risks
What can happen?When and where?How and why?
Treat risks
Identify optionsAssess optionsPrepare and implementtreatment plansAnalyse and evaluateresidual risk
Treatrisks
Evaluate risks
Compare against criteriaSet risk priorities
Analyse risksIdentify existing controls
Determineconsequences
Determine level of risk
Determinelikelihood
Mon
itor
and
revi
ew
Com
mun
icat
e an
d co
nsul
t
Yes
No
2 .11 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
A C T I V I T Y 2 . 2
How does your organisation's risk management framework compare to those
presented in this topic? Are all types of risk covered or only safety risks?
RI S K AC C E P TA B I L I T Y
THE ALARP PRINCIPLE
How do we know when a risk is low enough to be acceptable? How low is low enough, and
how do we strike an optimum balance between risk control and cost?
A principle known as ALARP (As Low As Reasonably Practicable) is commonly used to
guide such decisions. It is based on the idea that risks can be divided into three categories:
1. Those that are intolerable because the quantified risks cannot be justified except in
extraordinary circumstances.
2. Those that are broadly acceptable provided risk management systems are in place, and
do not require expenditure on further risk reduction.
3. Those that are 'tolerable' if a benefit is desired, and further risk reduction is either
impracticable or disproportionately costly. Such risks are considered 'as low as
reasonably practicable' at the time of assessment, but they must be kept under review.
Figure 2.4: Risk tolerability and the ALARP principle
Source: IEC/AS 61508-5: 1998–1999. Annex B, Figure B1–Tolerable Risk and ALARP.
When determining if a risk is ALARP, several parameters should be considered.
Is it technically possible to reduce the risk further?
Who gains the benefit and who wears the cost?
Is the risk ethically acceptable?
Intolerable region
The ALARP ortolerability region
(Risk is undertakenonly if a benefit isdesired)
Broadly acceptable region
(No need for detailed workingto demonstrate ALARP)
Negligible risk
Risk cannot be justified except in extraordinarycircumstances
Tolerable only if further riskreduction is impracticable or if itscost is grossly disproportionate tothe improvement gained.
As the risk is reduced, the less,proportionately, it is necessary to spend toreduce it further to satisfy ALARP. Theconcept of diminishing proportion is shownby the triangle.
It is necessary to maintain assurance that riskremains at this level.
2 .12 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Do we have enough information to make the decision ('the precautionary principle')?
What happens if we do nothing to reduce the risk?
What happens if we do not proceed (e.g. with a project or a proposed change)? In OHS legislation, 'practicable' is defined as having regard to the severity and likelihood of
the outcome, the state of knowledge about the hazard and the means and availability of
controlling the risk as well as the cost of controlling it. In general, the final decision is made by either management, a management committee or a
regulatory body. However, it must be remembered that risk is an assigned quantity and only
gains acceptance by consensus. Some guidelines on ALARP decision-making are suggested in Topic 6. You should now download Reading 2.1 'Reducing risks, protecting people' from the UK
Health & Safety Executive website http://www.hse.gov.uk/risk/theory/r2p2.pdf and read
pages 5–20. We will return to this reading in Topic 6. RATIONAL AND EMOTIVE ISSUES IN RISK MANAGEMENT
Risk assessment and risk management specialists generally agree that the principal standard
for judging and regulating risks should be based on the relative seriousness of the risk, i.e.
the severity of the consequences and the likelihood of occurrence. In recent years, more lay people in the community have become involved in risk
decision-making and have made very different judgments to the experts as to which risks
most merit public concern and regulatory attention. Whilst the experts sometimes dub the
lay people's arguments as emotional rather than rational, this response ignores the power of
perception and the validity of non-scientific views. It can lead to major problems for
organisations as the following example shows. Example 2.3
In 1990 the Australian Federal Airports Corporation undertook an environmental
impact assessment study for construction of a third runway at Sydney's Kingsford
Smith Airport. Aircraft noise at residential areas was identified as a potential
environmental risk.
Scientific calculations were carried out and noise contours were drawn up for the
various flight options. An extensive public consultation process was held, but
opposition to the proposal steadily increased from local residents and local
government agencies who had input into the decision-making process.
The environmental impact assessment identified only limited areas that would be
affected by the noise, and recommended soundproofing the residential dwellings in
these areas. Strong objections were raised by the public on the following grounds.
The scientific study was flawed and did not include a sensitivity analysis on the
assumptions made.
The noise contour could only represent a diffused and uncertain boundary on
either side of the 'scientific' contour and could not be used as a demarcation line
between a high noise and a low noise area.
Quality of life and amenity was being irreparably damaged, and soundproofing
was only a limited mitigation measure given that a resident spends a
considerable amount of time outside the house (for example in the garden).
2 .13 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Such opposition was dismissed as being emotional rather than rational and a decision
was made to proceed with the third runway.
Within a short time of the runway being completed and put into operation, it became
clear that the residents' fears were not unfounded, and that the noise levels were
much higher than originally thought by experts. As a result, the cost of
soundproofing exceeded all budget expectations and a passenger levy for use of
Sydney airport had to be imposed to cover the costs. The issues are still not fully
resolved.
This example shows that it is imperative that organisations recognise the significance and
validity of different perceptions of risk acceptability and attempt to manage both the social
and commercial aspects of risk. Topics 9 and 10 will be devoted to this subject area, but it
is important that you are aware of it as you examine the techniques that can be used to
identify, analyse and respond to risks presented in the following topics.
You should now read Reading 2.2 'On the ALARP approach to risk management'. This
article provides a good summary of many of the concepts we will deal with in this unit.
You should then read Reading 2.3 ‘Getting to maybe: some communications aspects of
siting hazardous waste facilities'.
SUMMA RY
In this topic we examined different approaches to risk management and discussed why most
organisations now use a proactive systems approach rather than the traditional reactive
approach. We then introduced a risk management framework that consists of seven broad
steps and underpins the remaining topics in this study guide. We concluded the topic with a
brief discussion of the ALARP principle of risk acceptability and the significance and
validity of both scientific and non-scientific perceptions of risk acceptability.
EX E RC I S E
2.1 APPLYING THE SYSTEMS APPROACH TO MANAGING RISK
Most large corporations have a formal risk management strategy in place. While there are
variations in the details, the general approach appears to be the same. However, many
small businesses involved in engineering do not have a formal risk management strategy
and sometimes come to grief in the event of an incident. (A small business may be taken as
an organisation employing less than 50 people.)
Select one of the following small engineering organisations and complete the following
tasks.
a) Discuss the reasons the organisation should adopt a three-dimensional systems
approach to risk management.
2 .14 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
b) Using the risk management framework in Figure 2.2:
(i) define the system and risk management objectives
(ii) identify the hazards and potential loss events
(iii) identify the information you would need to gather to estimate the severity of
consequences and likelihood of occurrence for each of the potential loss events.
1. Pipeline maintenance contractor
This company has the maintenance contract for inspection and maintenance of high-
pressure gas pipelines, owned and operated by a large organisation. The gas pressure may
be up to 100–120 bar, and runs cross-country in rugged terrain for several hundred
kilometres. The contract covers maintenance to the compressor station, intermediate valve
stations, and the pipeline corridor. The most common cause of a pipeline failure is
inadvertent third party interference such as excavation.
The company's responsibility includes monitoring the integrity of the pipeline, regular
inspections (external and internal), and carrying out of emergency maintenance work, as
required by the owner.
2. Equipment fabricator
This company fabricates equipment to engineering specifications for large corporations.
Equipment generally consists of vessels for storing bulk solids or liquids, including pressure
vessels.
The company's range of work can involve undertaking design, fabrication (including
welding of alloy steels), inspection, radiographic and magnetic particle testing of welds,
hydrostatic pressure testing, obtaining statutory registration where required, and delivery to
client. Strict adherence to fabrication design codes and quality assurance is essential as the
clients expect high standards of delivery.
3. Chemicals warehousing and distribution facility
This company stores a range of hazardous chemicals for distribution to clients. The
chemicals are owned by the clients, and the company's responsibility is restricted to contract
storage. This includes managing receipt of delivery, storage, and distribution according to
demand by the client. The warehouse buildings and on-site facilities are owned by the
company.
The types of chemicals stored include flammable liquids, flammable solids, oxidising agents
(e.g. pool chlorine), toxic liquids (e.g. pesticides) and corrosive liquids (acids and alkalis).
Apart from flammable liquids that are stored in bulk storage tanks, in filled drums or as
packaged products, all other substances are stored in packages. These packages are not
opened on the premises, and no other processing occurs on the site.
4. Fire protection systems custom design and construction
This small organisation undertakes custom design of fire protection systems (e.g. firewater
ring main, hydrants, firewater pumps, fire detectors, sprinkler systems, drainage systems)
and installs the systems at the clients' premises for a variety of industries. National
standards and relevant international standards are used in the design. Verification of the
design and quality assurance is critical, as is the performance guarantee of the installed
system. The adequacy of the design must be approved by the fire authority. Quality
assurance during procurement of the various components for construction is also crucial to
the delivery of goods and services.
2 .15 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
RE F E R E N C E S A N D F U RT H E R R E A D I N G
Publications
Department of Premier & Cabinet WA (1996) Guidelines for Managing Risks in the
Western Australian Public Sector. The Government of Western Australia, Perth.
Haldar, Achintya (2006) Recent Development in Reliability-based Civil Engineering,
World Scientific Publishing Co.
Health and Safety Executive (HSE) (1989) Risk Criteria for Land-Use Planning in the
Vicinity of Major Industrial Hazards, HSE Books, UK.
Health and Safety Executive (HSE) (2001) Reducing Risks, Protecting People: HSE's
Decision-Making Process, HSE website, http://www.hse.gov.uk/risk/theory/r2p2.pdf
(accessed 4 September 2006).
IEC/Standards Australia (1998–1999) IEC/AS 61508-5 Functional Safety of
Electrical/Electronic/Programmable Electronic Safety Related Systems—Part 5:
Examples of Methods for the Determination of Safety Integrity Levels, International
Electrotechnical Commission/Standards Australia.
McManus, J. (2004) Risk Management in Software Development Projects, Elsevier
Butterworth-Heinemann, Burlington, Massachusetts.
Melchers, R.E. (2001) 'On the ALARP approach to risk management', Reliability
Engineering and System Safety, 71(2), February: 201–208.
National Occupational Health & Safety Commission Australia (1996) National Standard
[NOHSC:104 (1996)] and National Code of Practice [NOHSC:2016 (1996)] for the
Control of Major Hazard Facilities, AGPS, Canberra.
Royal Society (1992) Risk: Analysis, Perception and Management, Royal Society
Publishing, London.
Sandman, P.M. (1986) 'Getting to maybe: some communications aspects of siting hazardous
waste facilities', Seton Hall Legislative Journal, Spring: 437–465,
http://www.psandman.com/articles/seton.htm (accessed 4 September 2006).
Sappideen, C. & Stillman, R.H. (1995) Liability for Electrical Accidents: Risk, Negligence
and Tort, Engineers Australia, Crows Nest, Sydney.
Standards Australia (1998) AS/NZS 3931:1998 Risk Analysis of Technological Systems—
Application Guide, Standards Australia/Standards New Zealand, Sydney.
Standards Australia (2004) AS/NZS 4360:2004 Risk Management, Standards Australia/
Standards New Zealand, Sydney.
Standards Australia (2004) HB 436:2004 Risk Management Guidelines: Companion to
AS/NZS 4360:2004, Standards Australia/Standards New Zealand, Sydney.
2 .16 TO P I C 2 RI S K
M AN AG E M E N T
O V E R V I E W
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Websites
Standards Australia http://www.standards.com.au
http://www.riskmanagement.com.au
Australian Safety & Compensation Council http://www.ascc.gov.au
BSI British Standards http://www.bsi-global.com
Engineers Media http://www.engaust.com.au
European Commission for the Environment http://ec.europa.eu/environment/index_en.htm
International Standards Organization http://www.iso.org/iso/en/ISOOnline.frontpage
Legislation in Australasia http://www.austlii.edu.au
UK Health and Safety Executive http://www.hse.gov.uk
US Environmental Protection Authority http://www.epa.gov
US Occupational Safety & Health Administration http://www.osha.gov
RE A D I N G 2 .2
ON THE ALARP APPROACH TO RISK MANAGEMENT
R. E. MELCHERS
1. INTRODUCTION
The management of risks associated with potential hazardous activities in society remains a
matter of profound public and technical interest. There has been and continues to be
considerable development in the range and extent of regulatory activity. Many new
regulatory frameworks have been established. Except for public input to risk assessments
for very specific and contentious projects, there appears to have been remarkably little
public debate (and perhaps even understanding) of the more general and philosophical
issues involved. This is despite the rather spectacular failure in recent years of electricity,
gas and other services over large regional areas and the occurrence of several major
industrial accidents.
One issue which might have been expected to have received some public discussion is how
decisions about hazardous facilities and activities are to be regulated. Should it be through
regulatory or consent authorities, and if so, what form and allegiances should such bodies
have? Alternatively, should it be through 'self-regulation', or should there be some other
mechanism(s)? These options have been explored in an interesting discussion paper.1
However, it appears largely to have been ignored in practice. Perhaps by default, the
regulatory approach is the most common route in attempting to exert control over
potentially hazardous activities. This trend is being followed in a number of countries. It is
appropriate, therefore, to review some aspects of these directions. In particular, the present
paper will focus on the use of the so-called as low as reasonably practicable (ALARP)
approach [also sometimes known as the as low as reasonably attainable/achievable
(ALARA) approach]. It will be viewed primarily from the perspective of so-called
'Common Law' countries, that is those with a legal system parallel to that of the USA or the
UK. For countries such as Norway, where ALARP is also very extensively used, some of
the comments to follow may not be completely applicable. However, it is considered that
the bulk of the discussion is sufficiently general.
The ALARP approach grew out of the so-called safety case concept first developed
formally in the UK. 2 It was a major innovation in the management of risks for potentially
hazardous industries. It requires operators and intending operators of a potentially
hazardous facility to demonstrate that (i) the facility is fit for its intended purposes, (ii) the
risks associated with its functioning are sufficiently low and (iii) sufficient safety and
emergency measures have been instituted (or are proposed). Since in practice there are
economic and practical limits to which these actions can be applied, the actual
implementation has relied on the concept of 'goal setting' regulations. The ALARP
approach is the most well known of these. It is claimed by some as being a more
'fundamental' approach to the setting of tolerable risk levels.3,4
2 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Conceptually the ALARP approach can be illustrated as in Fig. 1. This shows an upper
limit of risk that can be tolerated in any circumstances and a lower limit below which risk is
of no practical interest. Indicative numbers for risks are shown only for illustration—the
precise values are not central to the discussion herein but can be found in relevant country-
specific documentation. The ALARP approach requires that risks between these two limits
must be reduced to a level 'as low as reasonably practicable'. In relevant regulations it is
usually required that a detailed justification be given for what is considered by the applicant
to satisfy this 'criterion'.
Fig. 1: Levels of risk and ALARP, based on UK experience.3
As a guide to regulatory decision-making the ALARP concept suggests both 'reason' and
'practicality'. It conveys the suggestion of bridging the gap between technological and
social views of risk and also that society has a role in the decision-making process. In
addition, it has a degree of intuitive appeal, conveying feelings of reasonableness amongst
human beings. As will be argued in more detail below, these impressions are somewhat
misleading. There are also considerable philosophical and moral short-comings in the
ALARP approach. Perhaps rather obliquely, the discussion will suggest what should be
done to improve the viability of ALARP or what characteristics need to be embodied in
alternatives. However, it is acknowledged that this is not a paper offering 'solutions' but
rather one which it is hoped will focus more attention on the issues and stimulate discussion
in order to bring about solutions.
To allow attention to be focussed more clearly on the difficulties with the philosophy of
ALARP, it is necessary first to review some matters fundamental to the interpretation and
management of risk in society. These issues include: (i) risk definition and perception, (ii)
risk tolerance, (iii) the decision-making framework, and (iv) its implementation in practice.
3 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
2. RISK PERCEPTION
2.1. Risk understanding and definition
Increased levels of education, awareness of environmental and development issues and
greater political maturity on the part of society generally has led to a much keener interest in
industrial risk management practices, policies and effectiveness. Apart from hazardous
industries, public interest derives also from notable public policy conflicts over the siting of
facilities perceived to be hazardous or environmentally unfriendly. Despite this, 'risk' as a
concept perceived by the general public appears to be rather poorly defined, with confusion
between probability, something involving both probability and consequences and something
implying monetary or other loss.
Vlek and Stallen5 gave some ten different definitions of 'risk' or riskiness, using various
ways of 'mixing' all or parts of the two main component ideas. Traditional decision
analysis, of course, simply multiplies the chance estimate by the consequence estimate.
This is only a 'first-order' approach, with both the chance estimate and the consequence
estimate being mean values. It is possible, at the expense of greater complexity in analysis,
but perhaps reflecting more accurately personal and societal perception, to invoke measures
of uncertainty, such as the standard deviation of each estimate.6 Nevertheless, there is likely
to remain some disagreement over a core definition of risk (as there appears to be in most
sociological and psychological works about any term) depending on ones view-point and
stake in the eventual outcome.1
In the mathematical/statistical literature and in most engineering oriented probability
discussions, risk is simply taken as another word for probability of occurrence or 'chance',
with consequences, however they might be measured, kept quite separate. Herein the
approach will be adopted to use 'risk' as a generic term, implying both probabilities and
consequences without specifying how these are to be combined.
2.2. Risk as an objective matter
It has become increasingly clear that 'risk' is not an objective matter. Thus all risk
assessment involves both 'objective' and 'subjective' information. Matters generally
considered to be capable of 'objective' representation, such as physical consequences, are
seldom completely so, since in their formulation certain (subjective, even if well accepted)
decisions have had to be made regarding data categorization, its representation, etc. This
also applies to areas of science once considered to be 'objective', a matter which is now
considered briefly.
In the development of mathematical and numerical models in science, model 'verification' is
the proof that the model is a true representation. It may be possible to do this for so-called
'closed' systems. These are completely defined systems for which all the components of the
system are established independently and are known to be correct. But this is not the
general case or the case for natural systems. For these 'verification' is considered to be
impossible.7
Model 'validation', on the other hand, is the establishment of legitimacy of a model,
typically achieved through contracts, arguments and methods. Thus models can be
confirmed by the demonstration of agreement between observation and prediction, but this
is inherently partial. "Complete confirmation is logically precluded by the fallacy of
affirming the consequent … and by incomplete access to natural phenomena … Models can
4 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
only be evaluated in relative terms."7 Philosophical arguments also point to the
impossibility of proving that a theory is correct—it is only possible to disprove it.8,9
Moreover, in developing scientific work, models are routinely modified to fit new or
recalcitrant data. This suggests that models can never be 'perfect'.10 It follows that for
theories and models to be accepted, there is necessarily a high degree of consensus-forming
and personal inter-play in their development and the scientific understanding underpinning
them.11 Some of this can be brought about by 'peer' reviews of risk assessments and
procedures, such as widely practiced in the nuclear industry.
These concepts carry-over directly to risk estimation since risk estimates are nothing but
models of expectation of outcomes of uncertain systems (i.e. 'open' systems), couched in
term of the theory of probability. Thus, in the context of PSA, "… often the probabilities
are seen as physical properties of the installation and how it is operated …" and while this
view is useful for making comparative statements about riskiness or for comparison to
standards, this interpretation is inconsistent with "all standard philosophical theories of
probability …"12
2.3. Factors in risk perception
There are many factors involved in risk perception.1 These include:
(i) the likely consequences should an accident occur;
(ii) the uncertainty in that consequence estimate;
(iii) the perceived possibilities of obviating the consequences or reducing the probability of
the consequences occurring, or both;
(iv) familiarity with the 'risk';
(v) level of knowledge and understanding of the 'risk' or consequences or both; and
(vi) the interplay between political, social and personal influences in forming perceptions.
The last two items in particular deserve some comment. Knowledge and understanding of
risk issues on the part of individuals and society generally implies that (risk) communication
exists, that it is utilized to convey meaningful information and that the capacity exists to
understand the information being conveyed and to question it. Perhaps the most critical
issue is the actual availability of relevant and accurate information. For a variety of
reasons, there has been an increasing requirement placed on governments and industry to
inform society about the hazards to which its members might be exposed. There has
developed also greater possibility for access to government and government agency files
under 'Freedom of information'-type legislation. Whether these developments have been
helpful in creating a better informed public is not entirely clear, as it involves also issues
such as truthfulness in communications and the trust which society is willing to place in the
available information.
That there will be an interplay between individual and societal perceptions of risk follows
from individuals being social beings. Their very existence is socially and psychologically
intertwined with that of others. Formal and informal relationships and institutions "set
constraints and obligations upon people's behavior, provide broad frameworks for the
shaping of their attitudes and beliefs, and are also closely tied to questions both of morality
and of what is to be valued and what is not. There is no reason to suppose that beliefs and
values relating to hazards are any different from other more general beliefs and values …"1
5 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
3. DECISION FRAMEWORKS
3.1. New technology
Society as a whole is constantly faced with the need to make decisions about existing
hazardous or potentially hazardous projects. Usually these decisions are delegated to
organizations with recognized expertise in the area. For existing technology, that expertise
will rely on past experience, including accident statistics and 'incident' (or 'near-miss')
statistics for hazardous facilities. In many cases hazard scenario and contingency planning
also will be carried out. It is in this area that the techniques of probabilistic risk analysis are
recognized to have validity in the sense of Section 2.2.6
For the potential risks associated with new technologies, however, the problem of
management is more acute. This is because the basis for making decisions, that is a base of
accumulated knowledge and experience, is not available. The dilemma can be seen clearly
in the earlier writings related to nuclear risks, prior to the occurrence of the accidents at
Three Mile Island, Chernobyl and the like. For example, Stallen13, in reviewing the works
of Hafele and Groenewold notes that the only solutions for the control of risks caused by
new technology tend to involve extensive use of other (and older) forms of technology.
History suggests that a new technology will only survive if it has no major catastrophes
early in its development. Thereafter, the risks are apparently small because: (i) the
operating experience base is small; (ii) particular care tends to be taken; and (iii) there has
not been enough time for in-service problems to become sufficiently evident. This may lead
to the false sense that the actual risks involved are small. Further, for new technologies it is
generally the case that the scientific understanding of the total socio-technical system, its
limitations and assumptions, is rather incomplete, adding further to the difficulties of
satisfactory risk estimation. The 'trial-and-error' underpinning much of the understanding of
conventional and well-developed technology is missing.
In connection with the development of science, Popper8,9 has argued that only falsifications
(i.e. failures) lead to new developments—verifications of existing ideas merely add to our
apparent confidence in them, but they could be wrong. The inferences for risk analysis are
not difficult to make.14
3.2. A wider perspective
Under these circumstances, how can society deal with the evaluation of risks imposed by
new technology? It is suggested that some light may be thrown on this question by an
examination of the parallel issue of the rationality of science. Noted philosopher
Habermas15 has argued that the rationality of science stems not from any objective, external
measures such as 'truth' but from agreed formalisms (see also Section 2.2). This involves
transactions between knowledgeable human beings and agreement between them about what
can be considered to be 'rational', given the base of available knowledge and experience. It
presupposes a democratic and free society with equal opportunities for contributing to the
discussion, for discourse and for criticism. It also requires truthfulness of viewpoint and the
absence of power inequalities. Although these might seem like tall orders indeed,
Habermas argues that there are very few situations where these conditions are not met or
cannot be met eventually since open and free discourse will uncover the limitations which
might exist. The implication for risk analysis and evaluation is that the rationality of the
criteria and the degree to which risk might be accepted should be based, ultimately, on the
6 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
agreed position of society obtained through internal and open transactions between
knowledgeable and free human beings.
Such a position has been put in different, but essentially analogous ways by others.1 The
importance of giving consideration to public opinion underlies much writing on risk criteria.
However, the practical difficulties of "arriving at consensus decisions over the question of
acceptable risk in society " are considerable. According to Layfield16 in commenting on
Britain's Sizewell B reactor … "The opinions of the public should underlie the evaluation of
risk. There appears to be no method at present for ascertaining the opinions of the public in
such a way that they can be reliably used as the basis for risk evaluation. More research on
the subject is needed."
Moreover, society is a complex mix of sub-groups with differing aims, ambitions, views,
opinions and allegiances. It is not surprising then that when faced with most matters about
which profound decisions need to be made society responds with a variety of view-points
and courses of action. Although there are always inter-plays between short-term and
longer-term self-interests and morally 'high-ground' views, it appears in many cases that the
diversity of views and the convictions with which they are held is inversely related to the
knowledge sub-groups of society have about the matter being considered.
Layfield16 noted …"As in other complex aspects of public policy where there are benefits
and detriments to different groups, Parliament is best placed to represent the public's
attitude to risks." In practice, of course, such a course of action might be taken only for
major policy decisions, such as whether the nation should have nuclear power or not, etc.
However, Wynne17 and others have argued that Parliament is ill-equipped both in time and
expertise to fully appreciate the implications and changes likely to be brought about by the
introduction or further development of new technologies. In his view, particularly for major
new technology issues, the political process can only be considered to be defective.
A historical review of the introduction of any really new technology shows, however, just
how ill-informed and ill-equipped parliaments tend to be, mostly being even unaware of the
changes taking place around them. For most major technological innovations (irrespective
of their hazard potential) parliamentary interest tends to follow well after the technologies
have been introduced. There are many examples of this in the developing Industrial
Revolution18 and more recent examples include IVF technology, gene technology, internet
technology, etc.
Moreover, even within society more generally there is seldom much awareness of potential
problems and hence little or no debate or detailed consideration of it. Usually only after the
technology has been established and some of its problems have become evident does public
perception become active. This suggests that risk assessment in general, and approaches
such as ALARP, can deal only with the control of the further development of already
established technology.
3.3. Practical decisions
Whatever the idealized situation ought to be, the need to make day-to-day decisions about
lesser hazards in society has invariably led to regulatory approaches as more convenient
substitutes for public or parliamentary debate. One reason sometimes given for leaving the
decisions to public servants is that the public is uneducated, ill-informed and irrational in
dealing with complex issues; arguments which can hardly be sustained as essential in a
7 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
modern society. However, to invoke public debate and discussion ideally requires time and,
for many individuals, much back-ground education when the discussion is about complex
issues. None of these conditions tends to be met in practice, for a variety of reasons (see
also Section 2.3). Often regulators will facilitate some form of public participation, such as
through making available documents and through providing back-ground briefings.
Unfortunately, in advancing along this line, there is a danger that there may no longer be
much left of Habermas's vision of transactions between knowledgeable and free individuals
in coming to a consensus.
The methods which have evolved for the solution of acceptable or tolerable risk problems in
a bureaucratic setting may be categorized broadly to include: 1
1. professional judgement as embodied in institutionally agreed standards (such as
engineering codes of practice) or as in commonly accepted professional skills;
2. formal analysis tools such as cost-benefit analysis or decision analysis, with or without
public discussion opportunities; and
3. so-called 'boot-strapping' approaches employing techniques such as 'revealed
preferences' as used in social–psychology, or using extrapolations from available
statistical data about risks currently accepted in other areas of endeavor.
Aspects of all three are commonly in use. As will be seen, the ALARP approach falls
essentially in the third category.
4. RISK TOLERABILITY
The levels of risk associated with a given facility or project that might be acceptable to, or
tolerated by, an individual or society or sub-groups is an extremely complex issue, about
which much has been written. It is not possible to deal with this matter here, but see Reid19
for a useful summary and critique.
Of course, 'tolerability' and 'acceptability' are not necessarily the same, although it has been
common in risk analysis to loosely interchange the words. According to the HSE3,
" 'tolerability'… refers to the willingness to live with a risk to secure certain benefits and in
the confidence that it is being properly controlled. To tolerate a risk means that we do not
regard it as negligible or something we might ignore, but rather as something we need to
keep under review and reduce still further if and when we can." Acceptability, on the other
hand, implies a more relaxed attitude to risk and hence a lower level of the associated risk
criterion. According to Layfield16, in terms of the nuclear power debate, the term
'acceptable' fails to convey the reluctance that individuals commonly show towards being
exposed to certain hazardous activities.
Although the distinction between the terminology 'acceptability' and 'tolerability' is
important, it is also the case that the term 'acceptable' has been used in relation to consent or
acceptance of a proposed risk situation on the part of regulatory authorities. This suggests
by implication that the decisions of the regulatory authorities in some manner reflect
'tolerability' on the part of society.
8 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
5. ALARP
5.1. Definition of terms
As noted, the ALARP approach has been advocated as a more fundamental approach to the
setting of tolerable risk levels, particularly suitable for regulatory purposes.20 Fig. 1
summarizes the approach, in which the region of real interest lies between the upper and
lower limits. This is the region in which risks must be reduced to a level ALARP. Since
this objective is central to the approach a very careful discussion and explanation of terms
might be expected. However, apart from appeals to sensible discussion and reasonableness
and the suggestion that there are legal interpretations, there is little in print which really
attempts to come to terms with the critical issues and which can help industry focus on what
might be acceptable.3 The critical words in ALARP are 'low', 'reasonably' and 'practicable'. Unfortunately, these
are all relative terms—standards are not defined. 'Reasonably' is also an emotive word,
implying goodness, care, consideration etc. However, as will be discussed below, what may
be reasonable in some situations can be seen as inappropriate in others. Regarding 'practicable', the Oxford Dictionary refers to 'that can be done, feasible…', i.e.
what can be put into practice. Of course, many actions can be implemented, provided the
financial rewards and resources are sufficient. Thus there are a very clear
financial/economic implications—" 'reasonable practicability' is not defined in legislation
but has been interpreted in legal cases to mean that the degree of risk can be balanced
against time, trouble, cost and physical difficulty of its risk reduction measures. Risks have
to be reduced to the level at which the benefits arising from further risk reduction are
disproportionate to the time, trouble, cost and physical difficulty of implementing further
risk reduction measures."3 It is therefore clear that financial implications are recognized—"in pursuing any safety
improvement to demonstrate ALARP, account can be taken of cost. It is possible, in
principle, to apply formal cost-benefit techniques to assist in making judgement(s) of this
kind."3 This assumes that all factors involved can be converted to monetary values.
Unfortunately, it is well-known that there are not inconsiderable difficulties and hence
implied value judgements in evaluating or imputing monetary values for both benefits and
costs. This problem is particularly acute for the analysis of hazardous facilities where the
value of human life and the (imputed) cost of suffering and deterioration of the quality of
life may play a major role in the analysis. Further, an approach based on cost analysis implicitly assumes equal weighting for each
monetary unit, a proposition known to cause difficulties with cost benefit analysis when
applied to issues with social implications. It is considered that the selection of tolerable risk
is of this type. Value judgements which society might make are subsumed in the valuations
required for cost analysis. In addition, there is also the problem that the optimum obtained in cost benefit analyses is
seldom very sensitive to the variables involved. This means that cost benefit analysis alone
is unlikely to provide a clear guide to the selection of appropriate policy. Finally, it is unclear how value judgements such as 'low', 'reasonably' and 'practicable'
correlate with a minimum total cost outcome. The value judgements required involve issues
well beyond conventional cost benefit analysis, a matter well recognized in dealing with
environmental issues.21
9 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
5.2. Openness
In the expositions of the ALARP approach it appears that the specific tolerable probability
levels which would qualify for acceptance by a regulatory authority are not always in the
public domain. The tolerable risk criterion may not be known to the applicant and some
process of negotiation between the regulatory authority and the applicant is needed.
Societal groups concerned about openness in government might well view this type of
approach with concern.
A related problem with implementation of the ALARP approach can arise in the evaluation
of two similar projects assessed at different times, possibly involving different personnel
within the regulatory body and different proponents. How is consistency between the
'approvals' or 'consents' to be attained? Irrespective of the care and effort expended by the
regulatory authority, there is a real danger that an applicant with a proposal which needs to
be further refined or which is rejected, will cry 'foul'. Without openness and without
explicit criteria, such dangers are not easily avoided. Is there not also a danger of
corruption?
5.3. Morality and economics
The issue of morality and how this is addressed by the ALARP approach can be brought
most clearly into focus by a discussion based around the nuclear power industry. That
industry took a major blow in the USA with the Three Mile Island and other incidents.
Currently there are no new facilities planned or under construction. This is possible in the
USA because there are alternative sources of electric power with perhaps lower perceived
risks, including political risks. Opposition to nuclear power and the potential consequences
associated with it are clearly in evidence. Such an open opposition may not always be
tolerated in some other countries, nor may there be viable alternative power sources. Thus
there may be pressures for public opposition to be ignored and to be discredited and for
access to information to be less easy to obtain. For example, there have been claims of
'cover-ups', such as over UK nuclear accidents. Whatever the precise reasons, it is clear
that in some countries the nuclear industry remains viable. Comparison to the US situation
suggests that what might be considered 'reasonable and practical' in some countries is not so
considered in the US, even though the technology, the human stock and intellect and the
fear of nuclear power appear to be much the same. The only matters which appear to be
different are: (i) the economic and political necessities of provision of electrical power; and
perhaps (ii) acquiescence to a cultural system as reflected in the political authority and legal
systems and which preclude or curtail the possibility of protracted legal battles apparently
only possible on Common Law countries. Do these matters then ultimately drive what is
'reasonable and practical'? And if they do, is the value of human life the same?
The dichotomy between socio-economic matters and morality issues has other implications
also. It is known that in some countries the nuclear power system is of variable quality,
with some installations known to have a considerable degree of radiation leakage—far in
excess of levels permitted under international standards. Even if, as is likely, the costs to
bring the facilities to acceptable standards are too high, there will be economic pressures to
keep the facilities in operation, despite the possibility that some plant workers would be
exposed to excessive radiation. It is known that in some case maintenance problems in high
radiation areas have been carried out through hiring, on a daily basis, members of the lowest
socio-economic classes to do the work. Because the remuneration was good by local
standards there was no shortage of willing workers, even though it has come to be known
that many develop radiation sickness and serious tumors within weeks of being exposed.
10 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Although somewhat starkly, this illustrates that the criteria of 'reasonableness' and
'practicability' so essential in the ALARP approach are ultimately issues of morality. While
for projects having the potential for only minor or rather limited individual or social
consequences there is probably no need to be concerned, for other, more significant projects
the question must be asked whether it is acceptable for decisions about such issues to be left
for private discussion between a regulatory authority and project proposers.
5.4. Public participation
As noted earlier, for many systems in common usage there is a long and established base of
experience (both good and bad) upon which to draw. This is not necessarily the case for all
facilities and projects, particularly those subject to risk assessment requirements. It would
seem to be precisely these projects for which risk analysis should be open to public scrutiny
and debate so that the issue of their rationality in respect to society can be considered. As
noted, the ALARP approach would appear to permit a small group of people making
decisions about a potentially hazardous project, away from public scrutiny, and in
consultation with the proponents of the project. According to the Royal Society report1,
"The (ALARP) approach has …been criticised on the grounds that it does not relate
benefits clearly enough to tolerability. More importantly, however, it does not address the
critical issue of how public input to tolerability decisions might be achieved, beyond an
implicit appeal to the restricted, and now much criticised … revealed-preferences
criterion"…and…"The question of how future public input to tolerability decisions might
be best achieved is also closely related to recent work on risk communication…" It is acknowledged that public debate and participation at a level leading to worthwhile input
is not always practical. As noted earlier, only some participants will have the time, energy
and capability to become fully acquainted with the technical intricacies involved in significant
projects. There are the dangers also of politicizing the debate and perhaps trivializing it
through excessive emotional input. Nevertheless, there are strong grounds for not ignoring
non-superficial public participation and involvement in risk-based decisions.1
5.5. Political reality
Risk tolerability cannot be divorced from wider issues in the community. It is intertwined
in matters such as risk perception, fear of consequences and their uncertainty etc. as well as
various other factors which influence and change society with time. Societal risk
tolerability would be expected to change also. Change can occur very quickly when there is
a discontinuity in the normal pattern of events in society—a major industrial accident is one
such event. The implication for the ALARP approach might well be as follows. What
would have been considered sufficiently 'low' for a particular type of facility prior to an
'accident' might not be considered sufficient for other generally similar facilities after an
accident. Yet there will be very considerable societal and political pressures for changing
the acceptance criteria. Is it appropriate to do so? Following an accident, there is, usually, a call for an investigation, better safety measures,
more conservative design approaches, better emergency procedures etc. However, some
accidents must be expected. The fact that it is admitted at the consent, approval or design
stage of a project that there is a finite probability of failure associated with the project
implies that an accident is likely to occur sooner or later. The fact that the probability might
have been shown to be extremely low does not alter this fact. Perhaps unfortunately,
probability theory cannot suggest, usually, when an event might occur. Rationality demands
that 'knee-jerk' political and regulatory responses might well be inappropriate—yet this is
implicit in the 'reasonable' and 'practical' aspect of ALARP.
11 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
6. DISCUSSION AND POSSIBILITIES
In science, it is recognized that progress comes in relatively slow steps, learning by trial-
and-error and modifying the body of theory and understanding in the light of apparent
contradictions. Similarly, in the more practical arts such as engineering, progress comes
about through a slow progression, carefully learning from past mistakes. Major problems in
engineering are likely when past observations and understanding appear to have been
forgotten or ignored.22,23 It may be that an appropriate strategy for risk management lies
along these lines also. Moreover, it is increasingly being recognized that such matters are
best treated using risk analysis and that risk analysis is best performed using probabilistic
methods.24 Even then, the issues dealt with in probability-based risk management have, however, an
added problem when it has to deal with low probability—high consequence events. These,
morally and practically, do not allow the luxury of a trial and error learning process. There
may be just too much at stake—hence advocates of the 'precautionary principle'.
Nevertheless, it is generally the case that the technology involved is not totally new but
rather is a development of existing technology for which there is already some, or perhaps
already extensive, experience. Associated with that existing technology are degrees of risk
acceptance or tolerance reflected in the behavior of society towards them. It is then
possible, in principle, to 'back-calculate'25,26 the associated, underlying, tolerance levels,
even if the analysis used for this purpose is recognized to be imperfect. The new
technology should now be assessed employing, as much as possible, the information used to
analyze the existing technology and using a risk analysis methodology, as much as possible,
similar in style and simplifications to that used to determine the previous tolerance levels. The process sketched above is one which elsewhere has been termed 'calibration'25,26, i.e.
the assessment of one project against another, minimizing as much as possible the
differences in risk analysis and data bases and not necessarily attempting to closely anchor
the assessment in societal tolerable risk levels. The risk levels employed are derived from
previously accepted technology only, using admittedly simplified models, and are of a
nominal nature, having no strong validity outside the framework in which they have been
employed. A somewhat similar approach is already implicit in the nuclear industry, with professionally
agreed or accepted models being used for probability and other representations and with a
strong culture of independent ('peer') reviews of risk analyses. The resulting probability
estimates are likely to be internally consistent, and to have a high degree of professional
acceptance, even though they may not relate very closely to underlying (but perhaps
unknowable) probabilities of occurrence. 7. CONCLUSIONS
Risk management should embody fundamental principles such as societal participation in
decision-making. It is recognized that this may be difficult for a variety of reasons and that
alternative decision-making procedures are required. The current trend appears to be one of
increasing involvement of regulatory authorities, with acceptance criteria not always open
to the public or the applicants and in some cases settled by negotiation. This is also the case
with the ALARP approach. It is suggested that there are a number of areas of concern
about the validity of this approach. These include representativeness, morality, philosophy,
political reality and practicality. It is suggested that risk assessments recognize peer review
and the incremental nature of technological risks.
12 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
ACKNOWLEDGEMENTS
The support of the Australian Research Council under grant A89918007 is gratefully
acknowledged. Some parts of this paper appeared in an earlier conference contribution.
The author appreciates the valuable comments on a number of issues made by the
reviewers. Where possible their comments have been addressed.
REFERENCES
1. Royal Society Study Group. Risk: analysis, perception and management, Royal
Society, London (1992).
2. Cullen The Hon. Lord. The public inquiry into the Piper Alpha disaster, HMSO,
London (1990).
3. HSE. The tolerability of risk from nuclear power stations, Health and Safety Executive,
London (1992).
4. Kam JCP, Birkinshaw M, Sharp JV. Review of the applications of structural reliability
technologies in offshore structural safety. Proceedings of the 1993 OMAE, vol. 2,
1993. p. 289–96.
5. C.J.H. Vlek and P.J.M. Stallen, Rational and personal aspects of risk. Acta
Psychologica (1980), vol. 45, pp. 273–300.
6. M.G. Stewart and R.E. Melchers. Probabilistic risk assessment of engineering systems,
Chapman and Hall, London (1997).
7. N. Oreskes, K. Shrader-Frechette and K. Belitz, Verification, valididty, and
confirmation of numerical models in the earth sciences. Science. (1994), 263:4 pp.
641–646.
8. K. Popper The logic of scientific discovery. Basic Books: New York.
9. K. Popper. The growth of scientifc knowledge, Basic Books, New York (1963) (see
also Magee B., Popper. Fontana Modern Masters, 1987).
10. T.S. Kuhn. The structure of scientific revolution, University of Chicago Press, Chicago,
IL (1970).
11. J.R. Ravetz. Scientific knowledge and its social problems, Clarendon Press, Oxford
(1971).
12. S.R. Watson, The meaning of probability in probabilistic safety analysis. Reliability
Engineering and System Safety (1994), vol. 45, pp. 261–269.
13. P.J.M. Stallen. In: J. Conrad, Editor, Risk of science or science of risk? Society,
technology and risk assessment, Academic Press, London (1980), pp. 131–148.
14. D.I. Blockley, Editor, Engineering safety, McGraw-Hill, London (1990).
15. M. Pusey. Jurgen Habermas, Ellis Horwood/Tavistock, Chichester, UK (1987).
16. F. Layfield. Sizewell B public inquiry: summary of conclusions and recommendations,
HMSO, London (1987).
17. B. Wynne. In: J. Conrad, Editor, Society and risk assessment—an attempt at
interpretation, Society, technology and risk assessment. Academic Press, London
(1980), pp. 281–287.
13 RE A D I N G 2 .2 ON T HE ALARP
AP P R O AC H T O
R I S K
M AN AG E M E N T
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
18. J.R. Lischka. Ludwig Mond and the British alkali industry, Garland, New York (1985).
19. S.G. Reid. In: D.I. Blockley, Editor, Acceptable risk, Engineering Safety, McGraw-
Hill, London (1992), pp. 138–166.
20. JV Sharp, JC Kam, M. Birkinshaw, Review of criteria for inspection and maintenance
of North Sea structures. Proceedings of the 1993 OMAE, vol. 2, 1993. p. 363–8.
21. P.R.G. Layard. Cost-benefit analysis: selected readings, Penguin, Harmondsworth
(1972).
22. A.C. Pugsley, The prediction of proneness to structural accidents. The Structural
Engineer, 51 6 (1973), pp. 195–196.
23. PG Sibley, AC. Walker, Structural accidents and their causes. Proceedings of the
Institute of Civil Engineers. Part. 1. 1977. p. 191–208.
24. C. Kirchsteiger, On the use of probabilistic and deterministic methods in risk analysis.
Journal of Loss Prevention in the Process Industries 12 (1999), pp. 399–419.
25. R.E. Melchers. Structural reliability analysis and prediction (2nd ed.), Wiley,
Chichester, UK (1999).
26. R.E. Melchers. In: R.E. Melchers and M.G. Stewart, Editors, Probablistic calibration
against existing practice as a tool for risk acceptability assessmentIntegrated risk and
assessment, Balkema, Rotterdam (1995), pp. 51–56.
Source: Reliability Engineering & System Safety, February 2001, 71(2): 201–208.
RE A D I N G 2 .3
GETTING TO MAYBE: SOME COMMUNICATIONS ASPECTS OF SITING HAZARDOUS WASTE FACILITIES
PETER M. SANDMAN, WITH FOREWORD BY JAMES S. LONARD
FOREWORD
Professor Sandman's article must be given a great deal of attention by community leaders,
government officials and industry representatives. It begins to develop an innovative
approach to the dilemma of where to site unwanted hazardous waste facilities. If its
proposals and recommendations (or appropriate modifications thereto) are followed, a
successful facility siting process could emerge which would result in: an acceptable, more
environmentally sound waste facility; a stronger, more empowered community; a
government with credibility in the host community; and a developer who will be able to
build its facility with minimal delays and few additional expenses. If a process other than
the one Sandman develops is utilized, a waste facility may be sited but it will be one which
falls short of having all the safeguards for which a community group could have negotiated.
It will also be a site decided upon only after a long delay caused by litigation, and a site
realized at a great financial cost to the community, the government and the developer. Such
a process would also result in a serious loss of trust in government and industry by the host
community.
In defining the dilemma, community leaders ask three difficult questions whenever a new
waste facility is proposed: (1) Do we really need it? If so, then (2) can it be made safe?
And if this answer is also "yes", then (3) will it remain safe?
None of these questions are trivial, and none have obvious answers as proponents and
regulators of new waste facilities often suggest. Environmentalists (including myself) have
concluded that a few new facilities are probably needed, although there must first be a
serious move toward source recycling and source reduction. While many of us often share
the host community's concerns about the site selection process, we do not know which types
are the most appropriate or where they should be located. Many environmentalists also
believe that new technology exists which allows us to conclude that the initial design of a
new facility could be made safe. I believe that once built and operating, a new waste
facility will only remain safe if there has been continuous and comprehensive community
oversight and monitoring during the facility's entire construction, operation and
maintenance phases.
It is relatively easy for me to answer these questions. I currently do not live in a community
which may become the home for such a facility and, in addition, I work full-time on
environmental affairs and study waste generation and waste disposal issues very closely.
But what about community residents who have full-time jobs and full-time family
responsibilities to whom this subject is so foreign? How should they find the answers? I do
2 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
know one thing: they must find the answers for themselves. They should not rely on
government and industry. While they may ask traditional environmentalists like myself for
some advice, they still need their own sources and data. Let us try to understand the
community's perspective for a moment (Sandman's article does this in much greater detail)
so we will be in a position to attempt to resolve the dilemma.
A. THE COMMUNITY'S PERSPECTIVE
Before the disasters such as Love Canal and Chemical Control, citizens were not very
involved in, nor knowledgeable about, the siting of landfills and other hazardous waste
disposal practices. The public trusted the government and its experts. Most assumed they
were protected against these types of disasters. But, with the serious health problem
discovered by residents living near Love Canal, and with the extensive human exposure to
toxic fumes caused by the fire at Chemical Control, the public quickly began to feel
betrayed by their government. They lost confidence, developed a good deal of cynicism
and distrust, and realized that they now had to play a major role in the decision-making
process for future waste disposal facilities.
The community, of course, does not have the resources to compete with the government and
the developer when it comes to obtaining the technical resources needed to fully assess a
proposed waste facility. While the present hazardous waste facility siting law1 does provide
for some resources to be given to the local government for its use to review a proposed
waste facility2, community acceptance of a new waste facility is extremely unlikely.
Community residents have no real incentive to support it; they usually have been severely
let down in the past, and in all likelihood, they believe that their community already bears
more of the burden than it should when it comes to hosting unwanted facilities. These
misfit facilities include: sewage treatment plants, prisons, old and polluting factories. By
choosing to fight the proposal, the community can muster a large amount of resources and
can cause long delays before a final decision to build or not to build is reached.
The problem which remains, then, is how to get the community to the negotiating table?
How do you convince the community to temporarily forgo their efforts to block a proposal
and agree to discuss possible solutions which will be acceptable to them, to the developers
and to the State? Sandman's article suggests a significant portion of the answer. Let me
give the reader a glimpse of what is to come with several concise thoughts about this
question. It should be kept in mind that the community negotiating team can call off the
negotiations at any time and institute a full scale effort to block the proposed facility. This
tactic should only be employed after the community decides that the developer is not
negotiating in good faith or if the developer is not willing to meet the community's bottom
line.
B. RESOLVING THE DILEMMA
The basic presumption that underlies the negotiation process is that the developer is willing
to sit down with the community. This presumption is strong, given that there is virtually no
risk involved since the negotiations are not binding until each side agrees to be bound.
Furthermore, the community will surely oppose the proposed facility without prior
negotiation. The community must also consider what the benefits of negotiation may be. I
will discuss several areas of concern which should be negotiated but which would not be
mandated by the DEP if negotiation was absent.
3 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
1. Oversight: The developer should provide resources to the community to enable it to
hire its own experts to participate in any changes to the proposed plans. The
community should also require regular and frequent (but unannounced) access to the
facility by a committee of community residents and by the community's professional
experts, paid for by the developer, but hired by and working for the community.
2. Operation and Maintenance: The developer would agree to a procedure whereby the
community experts' suggested improvements and/or changes to the planned operation
and maintenance of the facility would be reviewed and implemented as appropriate.
3. Emissions Offsets: A community with foresight would require the developer to pay for
new pollution control equipment to reduce the emissions of neighboring old facilities
so that even with the increase of emissions from the new waste disposal plant, the
overall emissions in the community would be less than if the plant were not built at all.
4. Stipulated Penalties: Any violations of operating permits could not be contested.
Rather, the fines would go immediately into a community trust fund which would be
administered by community leaders for use in monitoring the community environment.
5. Insuring Property Values: This would guarantee that property values near the facility
would not be affected by their proximity to the facility by having the developer insure
against this.
6. Protection Against Transportation-Related Accidents: This would require specifying
routes for trucks to use to and from the facility and provide for immediate fines for any
transportation-related accidents (stipulated penalties) and for any time a truck fails to
use a specified route.
These are only a few of the ways a community group can effectively participate in the
decision-making process for hazardous waste facilities. The benefits accrue to all interested
parties. The community is empowered to make meaningful and educated decisions about
the proposed facility and is protected against improper operation and maintenance of the
facility. On the other hand, the developer is able to build and operate its facility without
long and costly delays and litigation. Finally, the State is able to continue its efforts to
ensure that hazardous waste is disposed of as safely as possible and is not forced to exercise
its powers of eminent domain and override local ordinances. Of course, we all have to get
to the table. Peter Sandman's Getting to Maybe should help us get there.
INTRODUCTION
The United States generates roughly fifty million metric tons of non-radioactive hazardous
wastes annually.3 While much can be done to reduce this figure, a healthy economy will
require adequate facilities for transporting, treating, storing and disposing of hazardous
wastes for the foreseeable future. Current facilities are far from adequate; new ones and
safer ones must be sited and built. The alternatives are dire—economic and technological
slowdown on the one hand, or "midnight dumping" and similar unsafe, illegal and
haphazard disposal practices on the other.
The principal barrier to facility siting is community opposition: "not in my backyard".
Experience amply justifies this opposition. Communities have learned, largely from the
media, that hazardous waste facilities endanger public health, air and water quality, property
values, peace of mind and quality of life. They have also learned, largely from the
environmental movement, that they can mobilize politically to block the siting of a facility,
eminent domain statutes notwithstanding.
4 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Technical improvements have reduced, though not eliminated, the risk of "hosting" a
hazardous waste facility. State governments have learned how to regulate facilities more
effectively. Responsible hazardous waste generators have come to terms with the need to
reduce waste flow and handle remaining wastes properly. Responsible environmentalists
have come to terms with the need to accept some waste and some risk in its disposal. A
consensus is emerging in behalf of state-of-the-art facility design, development and siting.
However, this consensus is not enough. The community typically rejects the consensus, and
may well enforce its dissent through its exercise of a de facto veto.4
The comments that follow are predicated on several assumptions: (1) A facility can be
designed, managed and regulated so that risks are low enough to justify community
acceptance (without this, the task of siting is unethical); (2) Community acceptance is more
desirable and more feasible than siting over the community's objections (without this, the
task of meeting with a community is unnecessary); and (3) The positions of the siting
authority and the developer are sufficiently flexible legally, politically and economically to
permit meaningful concessions to community demands (without this, the task of gaining
community approval is unachievable).
ACKNOWLEDGE THE COMMUNITY'S SUBSTANTIAL POWER TO SLOW OR STOP THE SITING PROCESS
Despite the preemption and eminent domain provisions of New Jersey's Major Hazardous
Waste Facilities Siting Act5, many observers are convinced that a facility cannot be sited
over a community's objections. The resources in the community's hands are many: legal
delay, extralegal activities, political pressure, legislative exemption, gubernatorial override.
The subtitle of one of the leading books on the siting problem testifies to the conviction of
authors David Morell and Christopher Magorian that the community has something close to
a veto. The book is entitled Siting Hazardous Waste Facilities: Local Opposition and the
Myth of Preemption.6 Moreover, in a January 25, 1985 interview with The New York
Times, Department of Environmental Protection (DEP) Commissioner Robert E. Hughey
agreed. "Siting," he said, "will be fought everywhere. I think everything else but this has
an answer."7 At the Seton Hall Symposium on siting, Douglas Pike of Envirocare
International acknowledged the veto power of communities when he stated: "We have to
operate as if there is no eminent domain."
Ironically, nearly everyone is impressed by the community's power of opposition except the
community, which sees itself as fighting a difficult, even desperate uphill battle to stop the
siting juggernaut. From a communication perspective, this is the worst possible state of
affairs. Suspecting that the "fix" is in, the community judges that it simply cannot afford to
listen, to consider alternatives, or to negotiate modifications. Intransigence looks like its
best shot, perhaps its only shot. But suppose the Commission and the developer were to
acknowledge to the community its considerable power: "Look, we probably can't site this
thing unless you agree, and there are plenty of chances for you to stop it further on down the
pike. Why don't we put the possible battle on ice for now and explore whether there is any
possible agreement. If the talks fail, you can always go back to the fight." It will not be
easy, of course, to persuade the community that this is not a trick, that it is forfeiting
nothing by negotiating now, that it can switch its stance from "no" to "maybe" while
protecting the road back to "no". It will take some effort not to overstate the community's
power. Though more powerful than it thinks, the community is not omnipotent, and the risk
of override is real. The goal is to let the community know, publicly, what other participants
already know privately: that it will be extremely difficult to site a facility over community
5 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
objections, and that the siting authority would greatly prefer not to try. Formal
acknowledgments of community power, such as a developer's pledge to honor a community
referendum on any agreement that might be negotiated, are sometimes possible. But even
an informal acknowledgment will reduce intransigence and encourage open discussion.
Acknowledging the community's substantial power will have three other desirable impacts.
First, it will reduce community resentment of what is seen as a power imbalance, an
outrageous imposition of state control over local self-determination. This resentment and
the deep-seated feeling of unfairness that accompanies it are major factors in community
rejection of hazardous waste facilities. Residents look at New Jersey's siting law and note
that in the final analysis, state action prevails over local preference. Angrily, they resolve to
resist. Open acknowledgment of de facto power will lessen the anger at the imbalance of de
jure power.8
Second, acknowledging community power will reduce fear about the health effects of a
hazardous waste facility. One of the best documented findings in the risk perception
literature is that we fear voluntary risks far less than involuntary ones. According to one
study people will accept one thousand times as great a risk if it is chosen, than if it is
imposed by others.9 Therefore, to the extent that the community feels itself in control of the
siting decision, the risks of the facility become much more acceptable and much less fear-
arousing.
Third, acknowledging community power will put the dialogue on a more frank footing than
the classic "one-down/one-up" pattern that tends to dominate siting discussions. Under this
pattern a community tries to prove itself the equal of the developer and the siting authority,
while secretly feeling that it is not. The developer and the authority adopt a parental "the-
decision is-not-yours-but-we-value-your-input" attitude, while secretly fearing the
community's de facto veto. Negotiations are much easier when the parties are
acknowledged equals.
AVOID IMPLYING THAT COMMUNITY OPPOSITION IS IRRATIONAL OR SELFISH
Nothing interferes so thoroughly with the settlement of a dispute as the suggestion from
either side that the other is being irrational or selfish. Yet developers, siting authorities and
their expert consultants often aim this charge at community opponents. The acronym
"NIMBY"—Not In My Back Yard—has become a sarcastic code, implying that opponents
approve of siting in principle but oppose it in their neighborhoods for insupportable
reasons. Some community groups, by contrast, still use the phrase as an anthem of their
battle to prevent the Love Canals of the future. For example, Nicholas Freudenberg's book
on how to organize community opposition is entitled Not In Our Backyards.10 But the
sarcastic meaning prevails. Opponents now take offense when developers or siting
authorities start talking about "the NIMBY syndrome"—and they are correct to be
offended.
Some opponents disapprove of siting new facilities anywhere, but choose to fight only in
their own communities where their stake is greatest and their power base strongest. Some
argue that source reduction and recycling can eliminate the need for new facilities, or that
facility siting should be conditioned on policies that will reduce the waste stream, or that
expansion of existing facilities is a wiser alternative, or that we should wait for
improvements in waste treatment technology. Some take the position that the type of
6 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
facility proposed is unduly dangerous, or that the site chosen is environmentally
inappropriate, or that the developer's record is unsatisfactory. Others assert that equity
dictates a different location. Rural dwellers argue that they should not serve as host to a
facility because they did not produce the waste in the first place. Urbanites argue, on the
other hand, that they have suffered enough pollution already. These are all coherent
positions that deserve respectful responses. Dismissing them as a manifestation of the
NIMBY syndrome is not fair, accurate nor strategically wise.
Similarly, community distrust of risk estimates by experts is not irrational. The experts
generally work for interests with a stake in reassuring answers. Even with total integrity,
non-resident experts in pursuit of a site can be expected to reach less cautious conclusions
than residents with no special interest in siting. Moreover, there is ample precedent in the
last several decades of siting experience to justify fears of a lack of integrity, or of
incompetence or callousness. At best, the field is new and risk estimates are inherently
uncertain. It is rational to distrust the experts even without any expertise of one's own.
People who are trying to sell a hazardous waste facility are no different from people who
are trying to sell, say, insulation for a home. One does not have to understand what they are
saying technically to suspect that they are not to be trusted.
Furthermore, many siting opponents have acquired impressive expertise of their own. They
have sifted the evidence in pursuit of technical arguments to support their position. In some
cases, the opponents have become impressively knowledgeable. When pro-siting experts
dismiss all objections as ignorant because some are without foundation, they are fighting ad
hominem, inaccurately and unfairly.
It is important to note that many siting questions have no technical answers: How much risk
is too much? What should you do when the answers are uncertain? These are "trans-
scientific" questions, sometimes couched in technical language but unanswerable by
technical methods.
Sociologists divide people into the categories "risk-aversive" and "risk-tolerant". What
separates them is a fundamental values difference. The risk-aversive believe that if you are
not sure of what you are doing you should not do anything, that meddling usually makes
things worse. The risk-tolerant believe that problems should be solved incrementally, that
the new problems caused by their tinkering will be solved later by someone else's tinkering.
(See Note below.) Neither position is unreasonable, and neither can be supported or refuted
by technical information.
Note: Since I wrote this, it has become clearer to me that risk-tolerance and risk-aversion
are less global than I thought. Sky-divers may be afraid of spiders. More relevantly, the
political left tends to be risk-aversive about the ecosphere but risk-tolerant about the
sociosphere—don't muck around with the environment if you're not sure what you're doing,
but go ahead and experiment with social values. The political right has the opposite
tendency, assuming the natural environment to be robust and remediable but social norms
to be fragile and at constant risk of irreparable harm.
7 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
It takes courage for community activists to pit their newly acquired knowledge and deeply
felt values against the professional stature of the experts. Unsure of their technical ground,
these activists defend it all the more tenaciously, sensitive to the merest hint of disrespect.
They deserve respect instead and they will not listen until they feel they have it.
INSTEAD OF ASKING FOR TRUST, HELP THE COMMUNITY RELY ON ITS OWN RESOURCES
Most of the people working to site a hazardous waste facility consider themselves moral and
environmentally responsible people. Many are incredibly dedicated to meeting society's
need for a decent facility. They also view themselves as professionals, as careful specialists
who know what they are doing. In both of these roles they feel that they deserve at least
trust, if not gratitude. They experience community distrust—sometimes even community
hatred—with great pain. The pain often transforms into a kind of icy paternalism, an "I'm-
going-to-help-you-even-if-you-don't-know-what's-good-for-you" attitude. I suspect that
much of the rhetoric about community irrationality, selfishness and the "NIMBY syndrome"
has its origins in hurt feelings. It is entirely reasonable for socially responsible experts to
want to be trusted, to feel that they deserve to be trusted, and to resent the fact that they are
not trusted.
It is sometimes said that the solution to the siting problem is to build trust. To be sure, the
siting authority and the developer must make every effort not to trigger still more mistrust.
For example, any hint of ex parte discussions between the siting authority and the developer
must be avoided. But just as it is reasonable for siting experts to expect to be trusted, it is
also reasonable for local citizens to withhold their trust, to insist on relying on their own
judgment instead. The Commission must not only accept this, but also encourage and
facilitate it.
Information policy is an excellent case in point. As noted earlier, one need not understand a
technology in order to distrust experts with a vested interest. One, however, must
understand the technology in order to decide whether the experts are right despite their
vested interest. There is wisdom in the Siting Act's provision of research grants to the
community at two stages in the siting process.11 Methods should be found for the
Commission to help the community inform itself even earlier in the process, when positions
are still relatively fluid. The advantage of an independently informed community is not
only that citizens will understand the issues, but that they will be satisfied that they
understand the issues, and thus feel less pressure to construct a rejectionist front. A
community that believes it has the knowledge to decide what should be done and the power
to do it can afford to be reasonable. A community that believes it lacks sufficient
knowledge and power, even if it has them, must conclude that the undiscriminating veto is
the wisest course.
Similarly, communities want to know that if a facility is built they will not need to rely on
outside experts for monitoring and enforcement. Many mechanisms can provide this
autonomy:
1. training of local health authorities, and citizen activists, to monitor effluents;
2. funding for periodic assessments by consultants accountable to the community;
3. duplicate monitoring equipment in a public place, so citizens can check, for example,
the incinerator temperature for themselves;
4. establishment of a trust fund, with trustees acceptable to the community, to supervise
compensation in the event of accident, so citizens need not rely on the state courts.
8 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Do not underestimate the depth of community disillusionment. Modern society depends on
letting experts decide. When experts fail to decide wisely we are jolted into belated and
reluctant attention. We feel betrayed. We are angry because we must now pay attention.
We feel guilty for having relinquished control in the first place. We do not know what to do
but are convinced we cannot trust others to decide for us. Above all, we fear that others
will impose their unwise decisions on us even now that we are paying attention.
When the community grimly demands its autonomy, it is too late to ask for trust. Experts
must instead presume distrust while helping the community exercise its autonomy wisely.
ADAPT COMMUNICATIONS STRATEGY TO THE KNOWN DYNAMICS OF RISK PERCEPTION
When people consider a risk, the process is far more complex than simply assessing the
probability and magnitude of some undesired event. Departures from statistical accuracy in
risk perception are universal and predictable. Communications strategy can therefore take
the departures into consideration. It is crucial to understand that the following patterns of
risk perception are "irrational" only if one assumes that it is somehow rational to ignore
equity, uncertainty, locus of control and the various other factors that affect, not "distort",
our sense of which risks are acceptable and which are not. Rational or not, virtually
everyone considers getting mugged a more outrageous risk than skidding into a tree on an
icy highway. And virtually everyone is more frightened by a hazardous waste facility than
by a gasoline storage tank. Our task is not to approve or disapprove of these truths, but to
understand why they are true and how siting communication can adapt to them.
The points in the following section deal with why communities fear hazardous waste
facilities more than technical experts judge that they "should", and how communication can
be used to reduce the discrepancy. It might be possible to employ this counsel to the
exclusion of all else in this article, hoping to pacify community fears without
acknowledging, much less honoring, community power. Such an effort would, I think, fail
abysmally. Communications strategy must be part of fair dealing with the community, not a
substitute for it.
Patterns of risk perception
1. Unfamiliar risks are less acceptable than familiar risks. The most underestimated
risks are those, such as household accidents, that people have faced for long periods
without experiencing the undesired event. The sense of risk diminishes as we continue
to evade it successfully. Thus, the perceived riskiness of a hazardous waste facility is,
in part, a reflection of its unfamiliarity. Stressing its similarity to more familiar
industrial facilities can diminish the fear; so can films, tours and other approaches
aimed at making the facility seem less alien. Even more important is to make the
wastes to be treated seem less alien. Detailed information on the expected waste
stream—what it is, where it comes from and what it was used to make—should reduce
the fear level considerably.
2. Involuntary risks are less acceptable than voluntary risks. As mentioned earlier, some
studies show acceptance of voluntary risks at one thousand times the level for
involuntary risks.12 Eminent domain, preemption and the community's general feeling
of outside coercion thus exacerbate the level of fear. Acknowledging the community's
power over the siting decision will lessen the fear and make siting a more acceptable
outcome.
9 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
3. Risks controlled by others are less acceptable than risks under one's own control.
People want to know that they have control over not only the initial decision but also
the entire risky experience. To some extent this is not possible. Once a facility is built
it is difficult to turn back. But credible assurances of local control over monitoring and
regulation can be expected to reduce risk perception by increasing control. Similarly,
trust funds, insurance policies, bonds and such contractual arrangements can put more
control in local hands. Quite apart from any other advantages, these arrangements will
tend to diminish the perception of risk.
4. Undetectable risks are less acceptable than detectable risks. A large part of the dread
of carcinogenicity is its undetectability during its latency period. As a veteran war
correspondent told me at Three Mile Island, "In a war you worry that you might get hit.
The hellish thing here is worrying that you already got hit." While it is not possible to
do much about the fear of cancer, it is possible to make manifest the proper, or
improper, operation of the facility. For instance, a local monitoring team, or a satellite
monitoring station in the City Hall lobby, can make malfunctions more detectable, and
can thereby reduce the level of fear during normal operations. Not coincidentally,
these innovations will also improve the operations of the facility.
5. Risks perceived as unfair are less acceptable than risks perceived as fair. A
substantial share of the fear of hazardous waste facilities is attributable to the fact that
only a few are to be sited. A policy requiring each municipality to manage its own
hazardous waste would meet with much less resistance. A more practical way of
achieving equity is to negotiate appropriate benefits to compensate a community for its
risks and costs (this is, of course, after all appropriate health and safety measures have
been agreed to). In a theoretical free market, the negotiated "price" of hosting a facility
would ensure a fair transaction. The point to stress here is that compensation does not
merely offset the risk faced by a community. It actually reduces the perceived risk and
the level of fear.
6. Risks that do not permit individual protective action are less acceptable than risks that
do. Even for a very low-probability risk, people prefer to know that there are things
they can do, as individuals, to reduce the risk still further. The proposed protective
action may not be cost-effective, and the individual may never carry it out, but its
availability makes the risk more acceptable. Discussion of hazardous waste facility
siting has appropriately focused on measures to protect the entire community. Some
attention to individual protective measures may help reduce fear.
7. Dramatic and memorable risks are less acceptable than uninteresting and forgettable
ones. This is generally known as the "availability heuristic": people judge an event as
more likely or frequent if it is easy to imagine or recall.13 The legacy of Love Canal,
Kin-Buc, Chemical Control and the like has made hazardous waste dangers all too easy
to imagine and recall. A corollary of the availability heuristic is that risks that receive
extensive media treatment are likely to be overestimated, while those that the media fail
to popularize are underestimated. The complex debate over media handling of
hazardous waste goes beyond the scope of this article.
8. Uncertain risks are less acceptable than certain risks. Most people loathe uncertainty.
While probabilistic statements are bad enough, zones of uncertainty surrounding the
probabilities are worse. Disagreements among experts about the probabilities are worst
of all. Basing important personal decisions on uncertain information arouses anxiety.
In response, people try either to inflate the risk to the point where it is clearly
unacceptable or to deflate it to the point where it can be safely forgotten.
Unfortunately, the only honest answer to the question "Is it safe?" will sound evasive.
Nonetheless, the temptation, and the pressure, to offer a simple "yes" must be resisted.
Where fear and distrust coexist, as they do in hazardous waste facility siting, reassuring
10 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
statements are typically seen as facile and self-serving. Better to acknowledge that the
risk is genuine and its extent uncertain.
9. Cross-hazard comparisons are seldom acceptable. It is reasonable and useful to
compare the risks of a modern facility to those of a haphazard chemical dump such as
Love Canal. The community needs to understand the differences. It is also reasonable
and useful to compare the risks of siting a facility with the risks of not siting a
facility—midnight dumping and abandoned sites. This comparison lies at the heart of
the siting decision. On the other hand, to compare the riskiness of a hazardous waste
facility with that of a gas station or a cross-country flight is to ignore the distinctions of
the past several pages. Such a comparison is likely to provoke more outrage than
enlightenment.
10. People are less interested in risk estimation than in risk reduction, and they are not
interested in either one until their fear has been legitimized. Adversaries who will
never agree on their diagnosis of a problem can often agree readily on how to cope with
it. In the case of facility siting, discussions of how to reduce the risk are ultimately
more relevant, more productive and more satisfying than debates over its magnitude.
Risk reduction, however, is not the only top priority for a fearful community. There is
also a need to express the fear and to have it accepted as legitimate. No matter how
responsive the Commission is to the issue of risk it will be seen as cold and callous
unless it also responds to the emotional reality of community fear.
DO NOT IGNORE ISSUES OTHER THAN HEALTH AND SAFETY RISK
The paramount issue in hazardous waste facility siting is undoubtedly the risk to health,
safety and environmental quality. But this is not the only issue. It is often difficult to
distinguish the other issues so they can be addressed directly—especially if legal and
political skirmishes have thrust the risk issue to the fore.
Negotiated compensation is especially useful in dealing with these other issues. Moreover,
negotiation helps to distinguish them from the risk issue. It is not uncommon, for example,
for a community group to insist in adversary proceedings on marginal protective measures
at substantial expense. In negotiations where other issues can more easily be raised, the
group may reveal that it is also worried about the possible fears of prospective home
purchasers and the resulting effect on property values. The developer may find it easy to
bond against this risk. The homeowners have thus protected their property at a cost that the
developer, who plans to establish an excellent safety record, expects will be low. It is
extremely useful, in short, to probe for concerns other than risk, and to establish a context,
such as mediated negotiation, where such concerns can be raised.
Aside from health risk, the impacts of greatest concern are: (1) the decline in property
values; (2) the inability of the community to keep out other undesirable land uses once one
has been sited; (3) the decline in quality of life because of noise, truck traffic, odor and the
like; (4) the decline in the image of the community; (5) the overburdening of community
services and community budgets; and (6) the aesthetically objectionable quality of the
facility.
11 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Apart from these possible impacts, a number of non-impact issues may create adverse
community reaction to a proposed facility:
Resentment of outside control, including the threat of preemption and eminent domain.
The sense of not being taken seriously; resistance to one-way communication from
planners and experts who seem to want to "educate" the community but not to hear it;
perceptions of arrogance or contempt.
The conviction that the siting process is unfair, that "the fix is in".
The conviction that the choice of this particular community is unfair, that the community
is being asked to pay a high price for the benefit of people who live elsewhere, and that
it would be fairer to ask someone else to pay that price. This feeling is especially strong
in communities that are poor, polluted or largely minority. These communities see their
selection as part of a pattern of victimization.
Support for source reduction and recycling instead of new facilities.
Another issue that often surfaces is whether the facility will accept non-local waste. In a
recent Duke University poll of North Carolina residents, only seven percent approved of
allowing out-of-state waste to be disposed of in their county.14 By contrast, thirty-eight
percent would allow waste from other North Carolina counties and forty-nine percent would
allow waste from within the county.15 Technically, it may well be impractical to require
each community to cope with its own waste. Psychologically, however, this is far more
appealing than central facilities, for at least three reasons:
It seems intrinsically fairer to have to dispose of one's own waste than to be forced to
dispose of everyone else's;
A strictly local facility will not earn a community an image as the hazardous waste
capital of the state or region; and
Local wastes already exist, either stored on-site or improperly dumped, and a new local
facility thus represents no net increase in local risk.
Enforceable guarantees to limit "imported" waste should alleviate in part at least one source
of opposition to a facility.
MAKE ALL PLANNING PROVISIONAL, SO THAT CONSULTATION WITH THE COMMUNITY IS REQUIRED.
A fatal flaw in most governmental public participation is that it is grafted onto a planning
procedure that is essentially complete without public input. Citizens quickly sense that
public hearings lack real provisionalism or tentativeness. They often feel that the important
decisions have already been made, and that while minor modifications may be possible to
placate opponents, the real functions of the hearing are to fulfill a legal mandate and to
legitimize the fait accompli. Not surprisingly, citizen opponents meet what seems to be the
charade of consultation with a charade of their own, aiming their remarks not at the planners
but at the media and the coming court battle.
This scenario is likely even when the agency sees itself as genuinely open to citizen input.
For legal and professional reasons, experts feel a powerful need to do their homework
before scheduling much public participation. In effect, the resulting presentation says to the
citizen: "After monumental effort, summarized in this 300-page document, we have reached
the following conclusions … Now what do you folks think?" At this point it is hard enough
for the agency to take the input seriously, and harder still for the public to believe it will be
taken seriously. Thus, Siting Commission Chairman Frank J. Dodd complained that the
12 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
siting hearings "have turned into political rallies. The last thing that was discussed was
siting criteria. It was how many people can you get into an auditorium to boo the speakers
you don't like and cheer for the ones you support."16
The solution is obvious, though difficult to implement. Consultations with the community
must begin early in the process and must continue throughout. Public participation should
not be confined to formal contexts like public hearings, which encourage posturing. Rather,
participation should include informal briefings and exchanges of opinion of various sorts,
mediated where appropriate. The Commission must be visibly free to adjust in response to
these consultations, and must appear visibly interested in doing so. Above all, the proposals
presented for consultation must be provisional rather than final—and this too must be
visible. A list of options or alternatives is far better than a "draft" decision. "Which shall
we do?" is a much better question than "How about this?"
This sort of genuine public participation is the moral right of the citizenry. It is also likely
to yield real improvements in the safety and quality of the facilities that are built. As a
practical matter, moreover, public participation that is not mere window-dressing is
probably a prerequisite to any community's decision to forgo its veto and accept a facility.
This is true in part because the changes instituted as a result of public participation make the
facility objectively more acceptable to the community. Public participation has important
subjective advantages as well. Research dating back to World War II has shown that
people are most likely to accept undesirable innovations, such as rationing, when they have
participated in the decision.17
Much in the Siting Act and in the behavior of the Commission represents important
progress away from the traditional "decide–announce–defend" sequence, whereby an
agency ends up justifying to the public a decision it has already made. Holding hearings on
siting criteria instead of waiting for a site was progress.18 The money available for
community research is progress.19 There is also progress evidenced in a recent statement by
Commission Executive Director Richard J. Gimello that hearings have persuaded him that
two incinerators would be wiser than the one originally proposed in the draft hazardous
waste management plan.20 However, there is a long history of "decide–announce–defend"
to be overcome before we achieve what communication theorists call "two-way symmetric
communication" and politicians call "a piece of the action".
INVOLVE THE COMMUNITY IN DIRECT NEGOTIATIONS TO MEET ITS CONCERNS
The distinction between community input and community control is a scale, not a
dichotomy. Planning expert Sherry Arnstein describes an eight-rung "ladder of public
participation", as follows: manipulation; therapy; informing; consultation; placation;
partnership; delegated power; citizen control.21 She adds:
Inviting citizens' opinions, like informing them, can be a legitimate step toward their full
participation. But if consulting them is not combined with other modes of participation,
this rung of the ladder is still a sham since it offers no assurance that citizen concerns and
ideas will be taken into account.22
A really meaningful participation program, Arnstein argues, involves some framework for
explicit power-sharing with the community.23
13 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
In hazardous waste facility siting, today's community has two kinds of power: (1) the legally
guaranteed right to provide input at many stages of the siting process; and (2) the political
ability to delay, harass and quite possibly stop that process. The first, as Arnstein points
out, is not enough to reassure a community that feels little trust for those at whom the input
is directed.24 That leaves the other source of power, the de facto veto.
This sort of analysis has led many observers to propose siting legislation that accords
greater power to the community. Indeed, one state, California, makes siting virtually
contingent on community acceptance.25 Others, such as Massachusetts and Connecticut, do
not go so far as to provide a de jure community veto, but do require the community to
negotiate with the developer, with binding arbitration in the event of deadlock.26 Still other
states permit local regulation of the facility, but grant to a state agency the authority to
override community regulations that make siting impossible.27 As Morell and Magorian
note, "expanded public participation procedures in a preemptive siting process are a far cry
from such a balance of state and local authority".28
While New Jersey's Siting Act does not require negotiations with the community, it
certainly does not foreclose the option—an option far more useful to the community than
mere input, and far more conducive to siting than the de facto veto. The most productive
option is probably negotiation between the developer and the community, with or without a
mediator. If they are able to come to terms, the Commission could incorporate these terms
in its own deliberations while still retaining its independent responsibility to protect health
and environmental quality. If they are unable to come to terms, the Commission could
retain its preemptive capabilities and the community its political ones. For the community,
then, the incentive to negotiate is the likelihood that it can secure better terms from the
developer than it can get from the Commission in the event of deadlock. For the developer,
the incentive is the considerable possibility that there will be no facility at all unless the
community withdraws its objections.
What is negotiated? What the community has to offer is of course its acceptance of the
facility. What the developer has to offer is some package of mitigation (measures that make
an undesirable outcome less likely or less harmful), compensation (measures that
recompense the community for undesirable outcomes that cannot be prevented) and
incentives (measures that reward the community for accepting the facility). The terms are
value judgments. For example, a developer is likely to see as an incentive what the
community sees as mere compensation. The distinctions among the three nonetheless have
great psychological importance. Communities tend to see mitigation as their right.
Compensation for economic costs is seen as similarly appropriate, but compensation for
health risks strikes many people as unethical. Incentive offers, especially where health is
the principal issue, may strike the community as a bribe.
Of course some forms of mitigation, compensation, and incentives are built into the Siting
Act; among the most notable provisions are the five percent gross receipts tax29 and the
provision for strict liability30, which permits compensation for damage without proof of
negligence. Clearly a still more attractive package is needed to win community support.
What can help the parties in negotiating the package? I suggest training in negotiation for
community representatives. An impartial mediator might also be provided, perhaps from
the Center for Dispute Resolution of the Public Advocate's Office. Finally, a clear
statement from the Siting Commission on how it will deal with a settlement if one is
achieved would be useful.
14 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Much will depend, of course, on the delicacy and skill of the developer. Compensation, in
particular, should be tied as closely as possible to the damage to be compensated. A
straight cash offer may be hotly rejected, whereas a trust fund to protect water quality would
be entirely acceptable. Similarly, cash for damage to health is much less acceptable than
cash for damage to community image. Where possible, compensation and incentive
proposals should come from the community or mediator to avoid any suggestion of bribery.
Some risks, of course, are so terrible that they are, and should be, unacceptable regardless
of the compensation. No negotiation is possible unless the community agrees that a
hazardous waste facility does not pose an unacceptable risk.
A great advantage of negotiation is that it encourages an openness about goals and concerns
that is inconceivable in an adjudicatory process. Citizens concerned about property values
may find themselves in a hearing talking instead about safety—but in a negotiation they will
talk about property values. Similarly, a developer in an adjudicatory proceeding tends to
understate risk. In a negotiation the community will insist that if the risk is so low the
developer should have no objection to bonding against it. Suddenly both the developer and
community will have an incentive to estimate the risk accurately. This pressure to be open
affects not only the compensation package but the actual facility design as well. If
developers must contract to compensate those they injure, they will be more likely to take
the possibility of injuries into account in their planning than if they are merely instructed to
"consider" social costs.
ESTABLISH AN OPEN INFORMATION POLICY, BUT ACCEPT COMMUNITY NEEDS FOR INDEPENDENT INFORMATION.
Former EPA Administrator William D. Ruckelshaus was fond of quoting Thomas Jefferson:
"If we think [the people are] not enlightened enough to exercise their control with a
wholesome discretion, the remedy is not to take it from them, but to inform their
discretion." Ruckelshaus usually added, "Easy for him to say".
Part of the problem of informing the public about hazardous waste facility siting is that the
skills required to explain technical information to the lay public are uncommon skills. They
are especially uncommon, perhaps, among those who possess the requisite technical
knowledge. There are techniques to be learned: a standard called "communicative
accuracy" to help determine which details may be omitted and which may not; various sorts
of "fog indexes" to measure readability and comprehensibility; and other ways of
simplifying, clarifying and dramatizing without distorting. The range of media available for
the task also extends well beyond such standbys as pamphlets and formal reports.
The desire to explain technical issues in popular language is at least as difficult to acquire as
the ability to do so. Experts in all fields prefer to confine their expertise to fellow
professionals; "if laypeople misunderstand me I will have done them a disservice, and if
they understand me what will have become of my expertise?" All fields ostracize their
popularizers. When the information is uncertain, tainted with values, and potent
ammunition in a public controversy, the case for professional reticence becomes powerful
indeed.
Nonetheless, it is essential to the success of the siting effort that information policy be as
open as humanly possible. Unless legally proscribed, all information that is available to the
Commission should be available to the community. The Commission should also make
available simplified summaries of key documents and experts to answer whatever questions
15 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
may arise. It is particularly important that all risk information be available early in the
siting process. Failure to disclose a relevant fact can poison the entire process once the
information has wormed its way out—as it invariably does. The standard is quite simple:
any information that would be embarrassing if disclosed later should be disclosed now.
Even the most open information program, however, can expect only partial success.
Individuals who are uninvolved in the siting controversy will not often bother to master the
information, since there is nothing they plan to do with it. Individuals who are heavily
involved, on the other hand, generally know what side they are on, and read only for
ammunition. This is entirely rational. If changing one's mind is neither attractive nor likely,
why endure the anxiety of listening to discrepant information? When many alternatives are
under consideration, as in a negotiation, information has real value and helps the parties
map the road to a settlement. When the only options are victory and defeat, objective
information processing is rare.
Even in a negotiation, information carries only the limited credibility of the organization
that provides it. As a rule, the parties prefer to provide their own. The Siting Commission
would be wise to facilitate this preference. Rather than insisting that its information is
"objective" and berating the community for distrusting it, the Commission can guarantee
that all parties have the resources to generate their own information. The information
should be generated as early as possible, while positions are fluid. Finally, the Commission
should make sure the community has a real opportunity to use the information it acquires—
ideally in negotiation. Information without power leads only to frustration, while the power
to decide leads to information-seeking and a well-informed community.
CONSIDER DEVELOPING NEW COMMUNICATION METHODS
There are a wide variety of all-purpose methodologies for developing means to facilitate
interaction, communication, trust and agreement. Some are a bit trendy or "touchy–feely";
some are potentially explosive—all require careful assessment and, if appropriate at all,
careful design and implementation in the hands of a skilled practitioner. The list that
follows is by no means exhaustive. These are tools that are available to the Siting
Commission, to a developer, to a community group, or to anyone interested in making
negotiation more likely or more successful.
1. Delphi methodology. This is a formal technique for encouraging consensus through
successive rounds of position-taking. It is appropriate only where the grounds for
consensus are clear—for helping the community clarify its concerns, for example, but
not for helping it reach agreement with the developer.
2. Role-playing. Playing out the stereotyped roles of participants in a controversy can
help all sides achieve better understanding of the issues. Under some circumstances
this can greatly reduce the level of tension. There are many variations. Most useful for
facility siting would probably be exaggerated role-playing, in which participants
burlesque their own positions. This tends to produce more moderate posturing in real
interactions. Counter-attitudinal role-playing, in which participants take on each other's
roles, tends to yield increased appreciation of the multi-sidedness of the issue. Both
require some trust, but much can be learned even from role-playing without the
"enemy" present.
3. Gaming-simulation. This is a variation on role-playing, in which the participants
interact not just with each other but with a complex simulation of the situation they
confront. Game rules control how the participants may behave and determine the
results—wins, losses, or standoffs. Participants learn which behaviors are effective and
16 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
which are self-defeating. As with any role-playing, the participants may play
themselves or each other, and may undergo the game in homogeneous or heterogeneous
groups. Massachusetts Institute of Technology has recently developed a hazardous
waste facility siting gaming-simulation.
4. Coorientation. This is a tool to help participants come to grips with their
misunderstanding of each other's positions. A series of questions is presented to all
participants, individually or in groups. First they answer for themselves, then
participants predict the answers of the other participants (those representing conflicting
interests). Responses are then shared, so that each side learns: (a) its opponent's
position; (b) the accuracy of its perception of its opponent's position; and (c) the
accuracy of its opponent's perception of its position. The method assumes that
positions taken will be sincere, but not that they are binding commitments.
5. Efficacy-building. This is a collection of techniques designed to increase a group's
sense of its own power. In some cases this includes skills-training to increase the
power itself. In other cases, the stress is on increasing group morale, cohesiveness, and
self-esteem. To the extent that community intransigence may be due to low feelings of
efficacy, then efficacy-building procedures should lead to increased flexibility.
6. Focus groups. A focus group is a handful of individuals selected as typical of a
particular constituency. This focus group is then asked to participate in a guided
discussion of a predetermined set of topics. Often the focus group is asked to respond
to particular ideas or proposals, but always in interaction with each other, not in
isolation as individuals. The purpose of the focus group methodology is to learn more
about the values of the constituency and how it is likely to respond to certain
messages—for example, a particular compensation package in a siting negotiation.
Focus groups do not commit their constituency, of course, but in the hands of a skilled
interviewer and interpreter they yield far better information than survey questionnaires.
7. Fact-finding, mediation, and arbitration. These are all third-party interventions in
conflict situations. Fact-finding concentrates on helping the parties reach agreement on
any facts in contention. Mediation helps the parties find a compromise. Arbitration
finds a compromise for them. These approaches assume that the parties want to
compromise, that each prefers agreement to deadlock or litigation. They have been
used successfully in many environmental conflicts, including solid waste siting
controversies. The Center for Dispute Resolution of the Public Advocate's Office
offers these services, as do several specialized environmental mediation organizations.
8. Participatory planning. This is the label sometimes given to a collection of techniques
for making public participation more useful to the decision-maker and more satisfying
to the public. To a large extent the value of public participation is in the agency's
hands. It depends on how early in the process participation is scheduled, how flexible
agency planners are, and how much real power is given to the community. Even if
these questions are resolved in ways that make participation more than mere window-
dressing, the success of the enterprise still depends on technique: on how people are
invited, on how the policy questions are phrased, on what speakers are allowed to talk
about, what issues for how long, on who moderates the meeting, etc. Many techniques
of participatory planning, in fact, do not involve a meeting at all.
9. Feeling acceptance. A classic misunderstanding between communities and agencies
centers on their differing approaches to feeling; citizens may sometimes exaggerate
their emotions while bureaucrats tend to stifle theirs. Not surprisingly, "irrational" and
"uncaring" are the impressions that result. Feeling acceptance is a technique for
interacting with people who feel strongly about the topic at hand. It involves
identifying and acknowledging the feeling, then separating it from the issue that
aroused it, and only then addressing the issue itself.
17 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
10. School intervention. In situations where strong feelings seem to be interfering with
thoughtful consideration, it is sometimes useful to introduce the topic into the schools.
Primary school pupils, in particular, are likely to approach the issue less burdened by
emotion, yet they can be relied upon to carry what they are learning home to their
parents. It is essential, of course, to make sure any school intervention incorporates the
views—and the involvement—of all viewpoints in the community. Any effort to teach
children a single "objective" agency viewpoint will bring angry charges of
indoctrination. Existing curricula that are themselves multi-sided can augment the local
speakers.
11. Behavioral commitment. People do not evolve new attitudes overnight; rather, change
comes in incremental steps. The most important steps are not attitudes at all, but
behaviors, preferably performed publicly so as to constitute an informal commitment.
The behavioral commitment methodology, sometimes known as the "foot in the door",
asks people to take small actions that will symbolize, to themselves and their
associates, movement in the desired direction. Among the possible actions which can
be taken: to request a booklet with more information, to urge rational discussion on the
issue, to state that one is keeping an open mind, to agree to consider the final report
when it is complete, to agree to serve on an advisory committee, to meet with citizens
concerned about Superfund cleanup, etc.
12. Environmental advocacy. In a large proportion of successfully resolved siting
controversies in recent years, respected environmentalists played a crucial intermediary
role. Environmental organizations may need to play that role in New Jersey's
hazardous waste facility siting. By counseling caution on industry assurances while
agreeing that new facilities are needed and much improved, environmentalists position
themselves in the credible middle.
A credible middle is badly needed on this issue, but it will take time. Now is not the time to
ask any New Jersey community to accept a hazardous waste facility. From "no" to "yes" is
far too great a jump. We should ask the community only to consider its options, to explore
the possibility of a compromise. Our goal should be moderate, fair, and achievable: getting
to maybe.
NOTES
1. N.J. Stat. Ann. 13:1E–49 to –91 (West Supp. 1985); see also Lanard, "The Major
Hazardous Waste Facilities Siting Act," 6 Seton Hall Legis. J. 367 (1983), and
Goldshore, "Hazardous Waste Facility Siting," 108 N.J.L.J. 453 (1981).
2. See N.J. Stat. Ann. 13:1E–59 (West Supp. 1985).
3. See Superfund Strategy (Apr. 1985) (Office of Technology Assessment).
4. Black's Law Dictionary (5th ed. 1979) defines "de facto" as a "phrase used to
characterize a state of affairs which must be accepted for all practical purposes but is
illegal or illegitimate."
5. N.J. Stat. Ann. 13:1E–81 (West Supp. 1985) ("Eminent domain").
6. D. Morell & C. Magorian (1982).
7. Carney, "D.E.P.: The Record and the Problems," N.Y. Times, Jan. 27, 1985, 11 at 6.
18 RE A D I N G 2 .3 GE T T I N G T O
M AY B E : S O M E
C O M M U N I C AT I O N S
AS P E C T S O F S I T I N G
HAZ AR D O U S W AS T E
FAC I LI T I E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
8. Blacks Law Dictionary (5th ed. 1979) defines "de jure" as "descriptive of a condition
in which there has been total compliance with all requirements of the law." Here the
term refers to the actual legal authority of the state to site a facility over the objection
of a municipality, whether or not that approach will ever be taken.
9. Starr, "Social Benefit Versus Technological Risk," 165 Science 1232–38 (1969).
10. N. Freudenberg (1984).
11. N.J. Stat. Ann. 13:1E–59.d. (West Supp. 1985); see also N.J. Stat. Ann. 13:1E–60.c.(4)
(West Supp. 1985).
12. See Starr supra note 9.
13. Slovic, Fischoff, Layman & Coombs, "Judged Frequency of Lethal Events," 4 Journal
of Experimental Psychology: Human Learning and Memory 551–578 (1978).
14. D. Morell & C. Magorian, "Siting Hazardous Waste Facilities: Local Opposition and
the Myth of Preemption," at 74 (1982).
15. Id.
16. Goldensohn, "Opponents, Officials Charge Politicizing of Waste Site Debate," Star-
Ledger (Newark, NJ), Dec. 12, 1984, at 12.
17. M. Karlins & H. Abelson, Persuasion, at 62–67 (2d ed. 1970).
18. See Dodd, "The New Jersey Hazardous Waste Facilities Siting Process: Keeping the
Debate Open" in this issue.
19. See supra note 11.
20. See Response to Comments on "Draft" Hazardous Waste Facilities Plan Issued
September 1984 (Mar. 26, 1985) (copies available from the Siting Commission, CN–
406, Trenton, NJ 08625).
21. S. Arnstein, "A Ladder of Citizen Participation," in The Politics of Technology,
at 240–43 (1977).
22. Id.
23. Id.
24. Id.
25. See Duffy, 11 B.C. Env. Affairs L. Rev. 755, 755–804 (1984).
26. Id.
27. Id.
28. D. Morell & C. Magorian, supra note 14, at 102.
29. N.J. Stat. Ann. 13:1E–80.b. (West Supp. 1985).
30. N.J. Stat. Ann. 13:1E–62 (West Supp. 1985) ("Joint and several strict liability of
owners and operators").
Source: Seton Hall Legislative Journal, Spring 1986: 437–465,
http://www.psandman.com/articles/seton.htm
(accessed 4 September 2006).
SU G G E S T E D A N S W E R S
EXERCISE
2.1 Applying the systems approach to managing risk
Note: There is no such thing as a single complete answer for this exercise. Your responses
will depend on the assumptions you make about each situation.
1. Pipeline maintenance contractor
a) Reasons the organisation should adopt a three-dimensional systems approach to risk
management.
In most countries the company would need to comply with legislative and
regulatory requirements to protect the health and safety of employees, the public
and the biophysical environment.
If the company is operating in a common law country it will have an additional
common law duty of care obligation to employees and the public.
Given the nature of the work, a commitment to risk management may be imposed
by the principal or owner of the gas pipelines, in which case the contractor has to
follow it or forfeit the contract.
As a small business with limited resources it is critical that the contractor allocate
risk management resources in the most cost- and time- effective manner.
A systematic approach to risk management is likely to assist in minimising the
contractor's insurance costs.
b) (i) System definition and risk management objectives
The system consists of the high-pressure gas pipeline, valve stations, compressor
station, associated instrumentation and monitoring system.
The risk management objectives for the maintenance contracting company are to:
provide an efficient maintenance service to the pipeline owner and ensure the
continuation of the contract
provide a safe working environment for their employees
control the level of resources that are spent on risk management.
(ii) Hazards and potential loss events
Hazards Potential loss events
Human error Failure to detect and report pipeline deterioration during inspections results in liability for loss of asset/gas supply interruption
Inadvertent third party interference by excavation etc.
Failure to prevent third party interference results in liability for damage to pipeline and gas supply interruption
Flammability of gas under high pressure
Fire/explosion resulting in employee/public injuries/fatalities and loss of assets
Terrain Employee injury/fatality due to working in difficult conditions/undertaking pipeline surveillance from a helicopter/light plane
Employee availability Inability to supply maintenance personnel on call in an emergency results in liability for delays in restoring supply
2 .2 TO P I C 2 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
(iii) Information required to estimate the severity and likelihood for each of the
potential loss events
Cost of gas supply interruption per day to the owner which could be passed on
to the maintenance contractor.
Land uses along the pipeline corridor that could cause third party interference.
Likely extent of damage should third party interference occur.
Likely extent of damage should a fire or explosion occur.
Cost of workers compensation and rehabilitation for injured employees.
2. Equipment fabricator
a) Reasons the organisation should adopt a three-dimensional systems approach to risk
management.
In most countries the company would need to comply with legislative and
regulatory requirements to protect the health and safety of employees and the
public.
If the company is operating in a common law country it will have an additional
common law duty of care obligation to employees and the public.
As a small business with limited resources it is critical that the company allocate
risk management resources in the most cost- and time- effective manner.
A systematic approach to risk management is likely to assist in minimising
insurance costs.
Prevention of loss events through risk management leads to increase profitability
by minimising asset loss and business interruption.
Prevention of loss events protects the company's reputation and will assist it in
gaining and keeping clients.
b) (i) System definition and risk management objectives
The system consists of equipment design, fabrication shop, materials store, testing
and inspection area, and product storage area.
The risk management objectives are ensure the delivery of quality products on time
and according to specifications.
(ii) Hazards and potential loss events
Hazards Potential loss events
Human error Design error or incorrect selection of material leading to product of the wrong specification
Fumes, noise Employee injury
Welding process Employee spark injuries; equipment failure due to incorrect welding technique; fire in fabrication shop/warehouse resulting in loss of assets and employee injuries/fatalities
Testing process Failure to perform testing to required standard; damage to products during testing process
Materials availability Problems in supply of materials for fabrication causing delays in production and delivery
Employee availability Strikes/illness causing delays in production and delivery
Transportation of equipment to clients
Accident resulting in equipment/vehicle damage and/or employee/general public injury/fatality
2.3 TO P I C 2 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
(iii) Information required to estimate the severity and likelihood for each of the
potential loss events
Historical data on the rate and cost of human errors.
Historical data on the lost time injury rate.
Cost of workers compensation and rehabilitation for injured employees.
Likely extent of damage should a fire occur.
Historical data on employee strike actions.
Data regarding the reliability of suppliers.
Current skill level of employees.
3. Chemicals warehousing and distribution facility
a) Reasons the organisation should adopt a three-dimensional systems approach to risk
management.
In most countries the company would need to comply with legislative and
regulatory requirements to protect the health and safety of employees, the public
and the environment.
If the company is operating in a common law country it will have an additional
common law duty of care obligation to employees and the public.
As a small business with limited resources it is critical that the company allocate
risk management resources in the most cost- and time- effective manner.
A systematic approach to risk management is likely to assist in minimising
insurance costs which are likely to be significant for chemicals storage.
Prevention of loss events is essential to protect the company's reputation and
maintain and build clients.
A major loss event for this type of company could easily result in bankruptcy.
b) (i) System definition and risk management objectives
The system includes the warehouse complex, the products stored, and the receipt
and dispatch area.
The risk management objectives are to:
operate the facility safely without a major incident
accommodate the client storage and dispatch requirements on an 'as needed'
basis.
(ii) Hazards and potential loss events
Hazards Potential loss events
Flammable chemicals Fire/explosion where flammable chemicals are transported/stored resulting in toxic fumes, employee/public injuries/fatalities, damage to storage facility, asset loss/business interruption for clients, damage to the biophysical environment through firewater runoff
Toxic/corrosive chemicals Storage containers break/leak causing injury to employees from exposure to chemicals, damage to storage facility, asset loss/business interruption for clients
Human error Asset loss from storage of incompatible goods in the same storage location
Transportation of chemicals for clients
Accident resulting in fire, equipment/vehicle damage, employee/general public injury/fatality, damage to the biophysical environment
2 .4 TO P I C 2 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
(iii) Information required to estimate the severity and likelihood for each of the
potential loss events
Historical data on the rate of major loss events for this facility and for other
similar facilities.
Cost of business interruption per day to the company.
Cost of business interruption per day to each client.
Likely extent of damage should a fire or explosion occur.
Likely extent of damage should containers break/leak occur.
Historical data on the lost time injury rate.
Cost of workers compensation and rehabilitation for injured employees.
4. Fire protection systems custom design and construction
a) Reasons the organisation should adopt a three-dimensional systems approach to risk
management.
In most countries the company would need to comply with legislative and
regulatory requirements to protect the health and safety of employees, the public
and the environment.
If the company is operating in a common law country it will have an additional
common law duty of care obligation to employees and the public.
As a small business with limited resources it is critical that the company allocate
risk management resources in the most cost- and time- effective manner.
A systematic approach to risk management is likely to assist in minimising
insurance costs.
Prevention of loss events is essential to protect the company's reputation and
maintain and build clients.
b) (i) System definition and risk management objectives
The system consists of critical evaluation of customer needs, design of the fire
protection system, procurement and installation, testing and commissioning,
handover.
The risk management objective is to provide a 'fit for purpose' fire protection
system design that is reliable and effective.
(ii) Hazards and potential loss events
Hazards Potential loss events
Client consultation/custom specifications
Incorrect understanding of customer needs resulting in ineffective design of fire protection system, modifications delaying system implementation and liability for client losses sustained in the event of a fire
Component quality and availability
Problems in supply of required components that meet quality standards causing delays in installation of the system
System installation Incorrect installation of fire protection system resulting in liability for client losses sustained in the event of a fire
(iii) Information required to estimate the severity and likelihood for each of the
potential loss events
Data regarding the reliability of suppliers.
Cost to the company of modifying a system after installation.
Cost of business interruption per day to each client.
TO P I C 3
IDENTIFYING HAZARDS AND POTENTIAL LOSS EVENTS
Preview 3.1 Introduction 3.1 Objectives 3.1 Required reading 3.1 Coupling and interactions 3.2 Engineering system components 3.2 Linear interactions 3.4 Complex interactions 3.5 Hazard identification techniques 3.10 Past experience 3.10 Checklist reviews 3.11 Failure modes and effects analysis (FMEA) and failure modes, effects and criticality analysis (FMECA) 3.12 Hazard and operability study (HazOp) 3.16 Preliminary hazard or safety analysis 3.22 Scenario-based hazard identification 3.27 Summary 3.27 Exercises 3.28 References and further reading 3.31 Readings Suggested answers
3.1 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PR E V I E W
INTRODUCTION
In the risk management framework described in Topic 2, the first two steps are:
1. Define system and risk management objectives.
2. Identify hazards and potential loss events.
Systematic identification of hazards and potential loss events is one of the crucial steps in
risk management. It can yield a wealth of information for the risk management team and
form the basis on which the risk management plan is developed.
In this topic we will explore how to define a system and its risk management requirements,
and how to select and apply appropriate techniques for identifying hazards and potential
loss events. The techniques we will examine can be applied across a range of industries,
once their philosophy is understood.
For the purposes of this topic, the meaning of the word 'hazard' has been stretched to its
limit to encompass anything that has the potential to cause some form of loss, regardless of
the specific nature of that loss. For example, in project risk management, anything that
might cause a project to fail to meet its performance objectives is a hazard because the
outcome is likely to be a financial loss or project delays. Note that textbooks on project risk
management may not necessarily use the term hazard in this way. Another term commonly
used is 'threat', which is broader and not specific to safety.
OBJECTIVES
After studying this topic you should be able to:
define an engineering system and its risk management objectives
understand both linear and complex interactions in engineering systems
outline the various structured techniques available for hazard identification
outline the advantages and limitations of each technique, and select and use the
appropriate technique for a given engineering context
identify contributors to hazards so that prevention and/or mitigation measures may be
developed for managing the risk.
REQUIRED READING
Reading 3.1 'Hazard identification checklists'
Reading 3.2 'Software FMEA Techniques'
Reading 3.3 'Hazard and operability (HAZOP) studies applied to computer-controlled
process plants'
Reading 3.4 'Using a modified Hazop/FMEA methodology for assessing system risk'
Reading 3.5 'Preliminary safety analysis'
3 .2 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
CO U P L I N G A N D I N T E R AC T I O N S
A review of major accidents in engineering enterprises raises the following questions:
What kinds of systems are most prone to system accidents?
Why were these events not anticipated and identified?
Why is it that in those situations where the event was identified as a potential hazard,
though remote, no action was taken by management? The answers lie in the fact that modern industrial systems are strongly coupled and have
significant interactions. Failure to identify these couplings and interactions often results in
the hazard escaping the scrutiny of analysts.
Before we discuss hazard identification techniques it is therefore necessary to gain an
understanding and appreciation of these couplings and interactions. ENGINEERING SYSTEM COMPONENTS
In order to analyse interactions, it is useful to think of an engineering system as having six
subsystems—Design, Equipment, Procedures, Operators, Supplies and materials, and
Environment. This is sometimes referred to as the DEPOSE framework (Perrow, 1999: 77).
Design
The design of an engineering system includes the following:
philosophy of how a set of inputs (e.g. raw materials) can be transformed into a set of
outputs (e.g. goods or services)
the production capacity
codes and standards applicable to the design
the specification for various equipment items required, including constraints and
tolerances
quality assurance of the design process.
A design error, if not identified at this stage, can propagate through the other subsystems
and ultimately result in a major loss event.
Equipment
The plant and equipment required to produce the outputs must be:
fit for purpose
in conformance with design specifications
quality assured
inspected, tested and properly maintained.
Fitness for purpose is an important criterion. This is illustrated in the following example.
Example 3.1
In 1998, the fuel tanker ship Westralia of the Royal Australian Navy underwent
some modifications to the fuel system in the engine room. A flexible line was
installed. When put back into operation, the line failed, resulting in a major engine
room fire, killing four naval personnel. Subsequent public inquiry found that the
flexible line installation process was flawed as no stress analysis had been carried
out, and that the modified equipment was not fit for purpose.
3.3 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Procedures
Once the equipment is installed, a set of procedures is required for operation and
maintenance of the equipment. These include:
operating procedures and work instructions
maintenance procedures including preventive maintenance schedules
manufacturer-recommended practices
emergency procedures in the event of an operational deviation. The operating procedures not only ensure that production proceeds routinely, but also
establish that the system can be started up and shut down safely. Similarly, the maintenance
procedures are designed to ensure that, at the end of the maintenance and handover to
production, the equipment is fit for purpose.
Operators
Next in the chain of subsystems are the human resources required to operate the production
process and maintain the plant and equipment. It is critical that all personnel are:
qualified for the duties required of them
trained in the operating and maintenance procedures
trained to identify potential operational deviations, and respond correctly to alarms, etc.
involved in regular emergency drills and exercises to reinforce the response plan. Human errors have contributed to many industrial accidents. There should be
reinforcement of the operating limits of the plant, i.e. a plant should not be operated outside
its design parameters.
Supplies and materials
Once the plant is built to a certain design, and the operators are trained, a supply of
materials is required to perform production. These include:
raw materials and storage
other accessories to production
material testing facilities (e.g. laboratory)
finished goods and storage
equipment spare parts
quality control of materials. Many production problems may be attributed to changes in the material supplied for which
the plant was not designed.
Environment
The operating environment forms the final important subsystem. It includes both the
workplace environment and the external environment.
Workplace environment
The workplace environment is important in influencing the attitudes and aptitudes of
operators. The major parameters are:
workplace aesthetics and ergonomics—an unpleasant, uncomfortable or poorly
designed working environment can lead to lower productivity, lower levels of
employee commitment and increased workplace injuries and illnesses
3 .4 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
management commitment—if the top management of an organisation does not
sincerely believe that safety and loss prevention are 'good business', the message is
unlikely to pass down to the workforce, despite the best efforts of middle management
quality systems and procedures—a well developed quality system with supporting
procedures and training improves the workplace environment and provides operational
efficiency
organisational culture and workplace climate—people will tend to respond to situations
in accordance with cultural and workplace norms, for example, Australians tend to be
individualistic and perceive a relatively flat power gradient between manager and
subordinate, so if they are given a directive they believe to be either impractical or
unsafe they will tend to assess the situation and do it their own way.
External environment
A number of elements in the external environment affect the overall operating environment
of an organisation. These include:
legislative and regulatory requirements—as we discussed in Topic 2, all industrialised
countries and most developing countries require organisations to protect the health and
safety of their employees, the public and the environment
changes in the marketplace—these may include new players entering the market and
new technology threatening loss of market share, however, as these are business risks
rather than engineering risks, they will not be discussed in detail in this topic
public perception and the political environment—as we mentioned in Topic 2, these
can significantly affect an organisation by preventing projects from proceeding or
leading to changes in legislative requirements which may increase operating costs.
LINEAR INTERACTIONS
All the subsystems in our DEPOSE framework interact with one another. Since one is
dependent on the other in a more or less linear chain—i.e. design leading to equipment
specification, development of procedures, selection and training of operators, ordering of
supplies, and operating in a given environment—Perrow (1999: 78) terms these 'linear
interactions' and defines them as follows:
'Linear interactions are those in expected and familiar production or maintenance sequence, and those that are quite visible even if unplanned'.
It is essential to note that the notion of a linear system in this context does not mean the
physical layout of the plant or production processes, nor does it mean an assembly line.
The main import of a linear system is that a subsystem tends to interact mainly with one
other subsystem in a visible manner.
Linear interactions predominate in all systems, and the first step in hazard identification for
engineering risk management is the recognition of all linear interactions, and the provision
of adequate decoupling to minimise these interactions. Example 3.2
Let us consider a factory that manufactures detergents and operates continuously 24
hours a day. The factory has three major production units:
1. A manufacturing unit that produces the detergent base.
2. A processing unit that mixes the detergent base with additives to create liquid or
powder detergents.
3. A packaging, warehousing and dispatch unit.
3.5 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The three units are linearly coupled because the output of one unit becomes the input
of the next. This means that if the manufacturing unit has to shut down production
due to operating or maintenance problems, the other two units will also have to shut
down as they will have no inputs to work with. This is a business interruption risk.
The interaction can, however, be decoupled by providing intermediate buffer storage
for the detergent base so that if Unit 1 is shut down for a period, there would be
sufficient buffer inventory of the product to feed Unit 2. This storage capacity could
also be used to keep Unit 1 operating in the event that Unit 2 was shut down for a
period and could not immediately use the detergent base. The decoupling of Units 1
and 2 via the intermediate buffer storage thus becomes critical in minimising
business interruption risk, and good risk management would consider possible shut
down reasons and durations and ensure that the buffer storage capacity is designed to
cope with this contingency.
A C T I V I T Y 3 . 1
Look up the US Chemical Safety and Investigation Board website at
http://www.csb.gov and go to their Video Room. Download and view the video
titled 'Dangers of Flammable Gas Accumulation: Acetylene Explosion at ASCO,
Perth Amboy, New Jersey'. Consider this event in terms of the DEPOSE
components presented earlier. Is this event an example of linear interactions causing
an explosion?
COMPLEX INTERACTIONS
Whilst 99% of the interactions in most operations are linear, 1% are complex, and it is these
that pose the greatest risk. Many major industrial accidents have occurred, and many lives
have been lost, because the 1% of complex interactions escaped scrutiny.
Complex interactions are those in which one component can interact with one or more
components outside of the normal production sequence, sometimes by design but often
unintentionally. Perrow (1999: 78) defines these as follows:
'Complex interactions are those of unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not immediately comprehensible.'
The main problems that can arise from complex interactions are common mode failures,
human error and hidden interactions.
Common mode failures
Common mode failures, or dependent failures, refer to the simultaneous failure of multiple
components or systems due to a single, normally external, cause. They can be distinguished
from discrete single mode failures of individual components or systems that are caused by a
defect arising locally within that component or system.
Recognition of common mode failure at the design and operational stages, and provision of
an inherently robust design backed up with error diagnostics and operator training, is
critical in managing engineering risks. However, the increasing complexity of modern
technology makes this recognition difficult unless significant effort is directed towards it.
Because of the importance of common mode failures, some examples are provided to
illustrate the concept.
3 .6 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Example 3.3
In the early days of motor vehicle design, there was a single master cylinder for
hydraulic brakes. A single failure in the hydraulic line from the cylinder would
disable all the brakes at the same time. This is a common mode failure.
To overcome this problem, Volvo designed a brake system with dual master
cylinders, but with each cylinder supplying fluid to one front brake and its diagonally
opposite rear brake. This way, if a failure occurs in one cylinder, at least one brake
at both the front and rear remain operational.
Example 3.4
There are two chemical reactors in a facility. In Reactor A, heat is created by the
reaction and has to be removed to maintain the reactor operation within a small
temperature range. A heat exchanger (cooling coil) is installed to remove the heat,
and at the same time raise steam. This is quite common in process plants.
However, if this steam is utilised somewhere else in the process, there is significant
energy saving, reducing production costs. In this facility the steam is used to drive a
steam turbine pump that pumps one of the raw materials to Reactor B, at some
distance away. The system is schematically shown in Figure 3.1.
Figure 3.1: Reactor heat removal system schematic
If the feedwater pump to the heat exchanger fails, this results in two problems at the
same time.
1. Heat is no longer removed from Reactor A, so if the reactor is not shut down
immediately, there could be a runaway reaction, resulting in an explosion.
2. There is no steam to drive the turbine pump, and one type of raw material is no
longer added to Reactor B, creating a separate set of problems.
The system design is energy efficient, but the coupling between units means the
interactions are now complex rather than linear and could cause common mode
failures.
Reactor A raw material feed
Reactor A
Cooling coil
Feedwater tank
Feedwater pump Steam separator
Steam trap
Steam turbine and pump
Reactor B
Reactor B raw material feed
Reactor Braw material
tank
3.7 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Example 3.5
In a fire protection system design, the designer decided to install two firewater
pumps (redundancy), so that in the event of one pump failing, the second one could
operate and provide the necessary water for fire fighting. There are three choices as
to how to do this.
a) Provide two electric motor driven pumps. The common mode failure problem
in this design is that if there is a power failure, both pumps are disabled. Fire
service authorities have recognised this problem and generally do not approve a
two-electric pump installation.
b) Provide two diesel engine driven pumps. This makes the system independent of
power failures. However, a single diesel storage tank is provided from which
the engines draw fuel. The common mode failure in this design is that if the fuel
runs out, both pumps are disabled. Regular inspection checks and topping up of
the fuel tank are essential to maintain integrity.
c) Provide one electric pump and one diesel pump. This system decouples the
common mode and provides a higher reliability.
A common mode could still be the main water valve in the common manifold for the
pumps; if this valve fails to open, no water is delivered, even if the pumps operate.
An important observation may be made from these examples:
The more coupled a system is, the more the chance of a common mode failure. The
design should therefore cater for decoupling as much as possible and, if this is not
possible, provide fallback systems for failures.
Human error
A complex system does not run by itself, it needs humans to operate it. Whilst equipment
failure rates have decreased from better engineering, some major catastrophes involving
modern technology have highlighted the importance of human error. For example, a major
contributor to the Chernobyl disaster was undue reliance on operating rules in the design,
and improper plant operation. Similarly, the equipment failure in the space shuttle
Challenger crash was augmented by complacency in management and pressure to meet
deadlines (Feynman, 1988).
Very often, post-disaster inquiries find human error was a major contributor, and the
organisation reacts with more procedures, more training and more discipline. However, the
coupling of human interactions with a sophisticated high technology production process is
highly non-linear, and human error is just one factor in a set of complex interactions.
When assessing human error rates there are a number of key references in the field of
human reliability assessment (HRA) including the seminal US nuclear reactor safety study
(United States Atomic Energy Commission, 1974), Lees (1996) and Kirwin (1994).
The figures in the following table show the failure rate of humans performing different tasks
recorded in the 1974 US nuclear reactor safety study.
3 .8 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.1: Human error rates
Type of activity Probability of error per task
Critical Routine Task (tank isolation) Non-Critical Routine Task (misreading temperature data) Non Routine Operations (start up, maintenance) Check List Inspection Walk Around Inspection High Stress Operations; Responding after major accident – first five minutes – after five minutes – after thirty minutes – after several hours
0.001 0.003 0.01 0.1 0.5 1 0.9 0.1 0.01
Source: United States Atomic Energy Commission 1974, Table 1: Human Error Rates.
The 'critical routine task' described in Table 3.1 can be compared to driving through a red
traffic signal in a car. Most of us will have done this once in our lives. It is something we
have been trained not to do but despite our best efforts we occasionally get it wrong.
Human error does not have to be confined to making a mistake. In some cases, inaction can
cause a problem. For example, in Example 3.5, if the feedwater pump fails the alarms from
both reactors will sound in the control room. This may confuse the operator since it is an
unexpected interaction between two normally independent subsystems, Reactors A and B.
If the operator is unable to establish the common mode link and take appropriate action
quickly, Reactor A will experience a runaway reaction and explode.
Recognising the human error failure modes in the operation of a complex system, and
countering them as much as possible by engineering design and better information
management, contributes greatly to minimising risks. Trying to eliminate human error
through training and procedures soon reaches the point of diminishing returns. Total
elimination of human error is impossible, and this should be recognised and acknowledged.
Hidden interactions
If a complex interaction can be identified it can be dealt with using design and procedures.
However, not all complex interactions are visible. Hidden interaction is an important
attribute of complex systems, and this has only been adequately recognised in the aftermath
of some terrible industrial disasters such as those discussed in the following examples.
Example 3.6: Flixborough
On June 1 1974, a major explosion occurred in a chemical plant at Flixborough,
England. Its aftermath had long-term consequences for the industry.
The plant produced a monomer caprolactum for the manufacture of nylon. This
requires oxidation of cyclohexane. The reaction was carried out in a cascade of six
reactors in series, each successive reactor located at a slightly lower level than its
predecessor.
Reactor No. 5 had to be taken out of service for corrosion related repairs, and the
decision was made by the management to connect Reactor No. 4 to No. 6 and
continue production. No one 'appears to have appreciated that the connection of
No. 4 reactor to No. 6 reactor involved any major technical problems or was
3.9 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
anything other than a routine plumbing job' (United Kingdom Department of
Employment, 1975). Minimising production delay was important, and the temporary
modification was conducted 'as a rush job'. No drawing was made, nor any
calculation of strain on the pipework, and the designer's guide for such a bypass was
not consulted.
The plant operated with minor problems for about two months until June 1 1974,
when the temporary 500 mm diameter connection between Reactors 4 and 6 failed,
resulting in an explosive gas cloud of approximately 30 tonnes of cyclohexane being
released into the atmosphere. The gas cloud ignited and exploded. The force of the
blast was estimated to be that of 15 to 45 tonnes of TNT.
The blast killed 28 employees and injured another 36. Beyond the plant boundary,
53 people were injured according to official records, and many more suffered
unreported injuries. The plant was destroyed, and in the surrounding community at
least three houses were demolished and approximately 2 000 homes sustained some
form of damage, including some with broken windows as far as 2.5 km away.
A commission of inquiry identified the main factors that contributed to the event as
organisational ineptitude, shortage of engineering expertise, production pressures
dictating hasty decisions and failure to get expert advice (United Kingdom
Department of Employment, 1975). A number of recommendations arose from the
inquiry, and these were later reflected in the Control of Industrial Major Accident
Hazards (CIMAH) legislation in the UK, which was a precursor to the later Control
of Major Accident Hazards (COMAH) regulations. Example 3.7: Piper Alpha
An explosion occurred on the Piper Alpha oil and gas platform in the North Sea in
1988. One of the platform's two large compressors had been isolated for
maintenance, and its pressure relief valve had been removed. The on-duty engineer
had filled out a form stating that the compressor was not ready and must not be
switched on under any circumstances, however this form was subsequently lost.
During the evening, the second operating pump failed and could not be restarted.
Not realising that the pressure relief valve had been removed from the compressor
isolated for maintenance, the evening shift personnel decided to use it and continue
the operation. When the compressor was started, gas leaked out, caught fire and
resulted in an explosion that destroyed the switch room.
Normally if a fire occurred, the platform's automatic fire-fighting system would
switch on and suck in large amounts of seawater to extinguish the flames. However,
on this occasion the system had been switched to manual because there were divers
in the water who could be sucked in with the seawater. The only way to manually
start the fire-fighting system was through the switch room, but the explosion in the
switch room made this impossible.
Staff gathered under the helicopter deck and in the living quarters because the fire
prevented them from getting to the lifeboat stations. The platform and living
quarters filled with smoke causing asphyxiation of personnel, but no evacuation
order was given.
After the first explosion, the Piper Alpha immediately stopped oil and gas
production to prevent new oil from feeding the fire. However, Piper Alpha was part
of a network of platforms and two other platforms continued to pump oil into the
network in accordance with management policies. A riser pipe connecting Piper
Alpha to one of the other platforms melted and tonnes of gas escaped. This caused a
much larger explosion that engulfed and destroyed the entire platform.
3 .10 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Of the 229 crewmen on board, 167 were killed. A whole community was shattered
and a nation and the entire oil and gas industry were shaken.
Numerous interactions and factors contributed to this event. These included:
shift handover communication problems
equipment that was not fit for purpose
inadequate training for senior personnel on emergency management
management policies that failed to appropriately balance safety and productivity
facility design problems, including unrecognised (and unnecessary) couplings
and insufficient redundancies in safety systems. (Paté-Cornell, 1993)
A C T I V I T Y 3 . 2
Return to the Video Room in the CSB website and look at the video titled 'Explosion
at BP Refinery, Texas City, Texas'. This shows an example of complex interactions
involing procedural failures, component failures and human error.
A C T I V I T Y 3 . 3
Consider a work process with which you are familiar that involves complex
interactions. Using either your own sketch of the process or any available schematic
diagrams, try to identify any potential common mode failures, human errors or
hidden interactions that could occur. How does your organisation try to identify and
manage such problems?
HA Z A R D I D E N T I F I C AT I O N T E C H N I Q U E S
Hazard identification is a requirement of OHS legislation in most western countries. In this
section we will discuss each of the hazard identification techniques mentioned in Topic 2.
Remember that no single technique is capable of identifying the hazards and potential loss
events for all situations, so in every instance a combination of two or more techniques
should be used.
PAST EXPERIENCE
Past experience can be useful for identifying hazards and potential loss events, but it has
significant limitations and cannot be used in isolation, even when the system's interactions
are linear rather than complex.
The limitations associated with relying on past experience include:
a) Not all previous incidents may have been reported, and for those that were, the level of
detail recorded will depend on the organisational culture and systems in place.
b) It is unlikely that all credible threat scenarios for a plant or organisation have occurred
in the past.
c) The causes of past loss events are often complex and may not have been fully
established, particularly if evidence was destroyed in the incidents. Thus, past
experience may yield a list of incidents but no information about the sequence of events
that led to each incident, which is needed to identify possible preventive measures.
3 .11 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
d) Most organisations to do not publish information on incidents or things that go wrong
so there is limited information in the public domain. Generally, major incidents are
only fully analysed and publicly reported by those charged with responsibility for
investigation and enforcement. Useful information may be found in alerts and bulletins
issued by public authorities or in the transcripts of court cases.
CHECKLIST REVIEWS
A checklist is a list of questions about plant organisation, operation, maintenance, and other areas of concern. Historically, the main purpose for creating checklists has been to improve human reliability and performance during various stages of a project or to ensure compliance with various regulations and engineering standards. Each item can be physically examined or verified while the appropriate status is noted on the checklist. Checklists represent the simplest method used for hazard identification. (Hessian & Rubin, 1991)
Checklists are useful to ensure that various requirements have not been overlooked or
neglected both before and after activities such as concept design or construction are
complete. Such requirements may include those set out in engineering codes of practice
and statutory regulations.
There are nine steps involved in developing and carrying out checklist reviews.
1. Define the objectives of each checklist. What is its purpose, where will it be applied
and what is the expected outcome?
2. Identify the areas of content that each checklist must cover.
3. Identify any specialist areas of content where expert input may be needed. For
example, a design completion checklist might require expert input regarding
mechanical, electrical, civil, structural and process requirements.
3. Select and consult with expert personnel in each specialist area of content.
4. Develop a first draft of each checklist. Each checklist should begin with a statement of
objectives and contain a logical and systematic list of questions or requirements that is
divided into subsections as required. Tailor the level of detail in the checklist to the
complexity of the system—the test of whether to include an item is the extent to which
it contributes to achieving the checklist's objectives.
5. Organise for the draft checklists to be reviewed by people not involved in the drafting
process but who are familiar with the intended content. This will help to identify any
items that are missing, unclear, unnecessary or illogically ordered.
6. Revise the checklists to address issues raised by the reviewers.
7. Undertake a final 'walk through' of the checklists (i.e. physically check against each
checklist subject) to ascertain there are no gross omissions.
8. Finalise the checklists and put them into use.
9. Periodically review and revise the checklists as part of an ongoing cycle of continuous
improvement.
Examples of checklists are given in Reading 3.1. Whilst the details of their content relate
to the chemical process industry, the concepts they illustrate are relevant across other
engineering industries.
Advantages of checklist reviews
Checklists are rule-based and can be implemented by people with minimal training
once they have been developed by knowledgeable and experienced personnel.
Checklists provide a valuable audit tool for checking design items, construction items,
project handover, etc.
3 .12 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Limitations of checklist reviews
Checklist items tend to depend largely on the existence of applicable codes and
standards and/or the knowledge and expertise of the preparer and the reviewers.
If checklists are prepared by inexperienced persons and/or are not independently
verified, any omitted items may go undetected.
Even where applicable codes and standards exist, these often cover 'minimum
requirements' and may be inadequate for the situation or activity. For example, the
separation distances specified in some codes for storage of flammable liquids are more
for protecting the facility from activities outside its site boundary than for protecting
the environment surrounding the facility from its hazardous activities.
Checklists focus on a single item at a time; they do not provide any insight into system
interactions or interdependencies.
Checklists merely provide the status of the item in question, but not the reasons for this
status. For example, if a checklist attribute is 'Compressor Running?' and the answer is
'No', this does not provide any insight into the reason for its failure.
Checklists do not rank the items in order of priority.
Checklists have to be very detailed and specific if they are to be used by 'non-experts'.
A C T I V I T Y 3 . 4
Using the methodology provided in this section, compile a checklist for identifying
hazards in a small section of your workplace. You may be able to find a checklist on
the internet which you can modify to suit your industry.
FAILURE MODES AND EFFECTS ANALYSIS (FMEA) AND FAILURE MODES, EFFECTS AND CRITICALITY ANALYSIS (FMECA)
The failure modes and effects analysis (FMEA) methodology is designed to identify
potential single failure modes that could cause an accident or loss event. The analysis
focuses on equipment failures and does not usually specifically consider human error,
except as a cause of an equipment failure mode. An extension of the FMEA methodology is
the failure modes, effects and criticality analysis (FMECA) in which the criticality of a
failure mode is assessed and used as a ranking tool.
A FMEA/FMECA is conducted by a small team of experienced people who are familiar
with the operation and plant equipment under investigation. The process is led by a team
leader and consists of the five key steps shown in Figure 3.2 and discussed in detail below.
The outcome is usually documented in the form of a datasheet such as the one shown in
Table 3.5 at the end of this discussion. Further examples of FMEA and FMECA datasheets
can be found at http://www.fmeainfocentre.com/examples.htm.
Step 1: Develop a block diagram system description
A block diagram or flow chart is used to identify and visually illustrate the system
components, limits and dependencies. The level of detail included in this diagram will
depend on the size and complexity of the system and the extent of analysis desired. As a
general rule it is not necessary to document the system sub-components (e.g. the individual
elements that make up a centrifugal pump) unless the sensitivity of application means there
is a specific need for it (e.g. nuclear or aerospace industry).
3 .13 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 3.2: Failure modes, effects and criticality analysis
Step 2: Identify potential failure modes
A failure mode is a way in which a piece of equipment or operation can fail. Typical failure
modes for system components are:
failure to open/close/start/stop or continue operation
spurious failure
degradation
erratic behaviour
scheduled service/replacement
external/internal leakage.
For example, failure modes for a belt conveyor system might include: belt snaps; roller
bearing fails; roller seizes; conveyor collapses.
Step 3: Identify potential causes of failure
There are many different causes of equipment failure, some of which relate to the materials
and mechanisms involved, and others of which relate to some form of human error. For
example, a centrifugal pump may stop working due to defective materials or the effects of
ageing (materials and mechanisms), but it may also stop due to poor maintenance or poor
workmanship (human error).
Step 4: Identify possible effects and criticality
The possible effects of the identified failure mode(s) for the specific piece of equipment
should be examined from multiple perspectives including safety to personnel, plant damage,
financial loss due to production interruption and environmental damage.
As part of this process, the probability of failure may be assessed based on typical values
derived from industry 'norms' such as those shown in Tables 3.2 and 3.3. The level of
criticality may also be determined based on the way the failure mode affects the system.
Develop a block diagram and system description
Identify potential failure modes
Identify potential causes of failure
Identify possible effects (and criticality)
Recommend possible actions
3 .14 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.4 shows an example of a criticality ranking system based on that used by the US
Department of Defense.
Table 3.2: Qualitative measures of frequency (components) Rating of frequency Failures/hour of operation Probable 1 in 104 Reasonably probable 1 in 104 – 105 Remote 1 in 105 – 107 Extremely remote 1 in > 107
Table 3.3: Qualitative measures of frequency (human error) Rating of frequency/ performance
Situation Probability of error/operation
Low Routine 0.0001 – 0.001 High Emergency 0.1 – 0.9
Table 3.4: Qualitative measures of criticality Criticality classification Description of effects
Category 1: Catastrophic A failure which may cause death or [major property or system] loss.
Category 2: Critical A failure which may cause severe injury, major property damage, or major system damage that will result in major downtime or production loss.
Category 3: Marginal A failure which may cause minor injury, minor property damage, or minor system damage which will result in delay or loss of system availability or degradation.
Category 4: Minor A failure not serious enough to cause injury, property damage, or system damage, but which will result in unscheduled maintenance or repair.
Source: Based on United States Department of Defense, MIL-STD-1629A, 1980: 9–10.
The estimation of probability and criticality, which is covered in Topics 4 and 5, is not
essential to the process as the objective of the analysis is to scrutinize possible failure
modes and recommend actions to prevent them.
Step 5: Recommend possible actions
For each of the system components analysed a decision must be made as to the acceptability
of the potential failure modes and effects based on any existing controls in place. Existing
controls may include automatic system shutdown mechanisms or the ability of an operator
to respond in time. If the current situation is unacceptable then you will need to recommend
possible actions to reduce the probability of occurrence or severity of effects. Such actions
might include hardware changes or the introduction or modification of procedures.
3 .15 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.5: Typical FMEA datasheet
Failure Modes & System Name: Precipitation Protector FMEA Number: ___________
Effects Analysis Major Function: Protect user from Rainfall Page: ____________________
Prepared By: Precipitation Protection Team Date: ____________________
Item Potential Failure Modes
Potential Causes of Failure
Possible Effects
Detection Method/ Design Controls
Additional Actions Recommended
Responsibility & Target Completion Date
Fabric Tear in protective fabric
Foreign sharp object damages material
User gets wet; fabric flaps and contacts user
Fabric must have high toughness and must withstand 5 N/mm^2 of pressure
Use material at least as strong as current umbrellas
Excessive tension on fabric when in use
Limit tension to 5 lb/f
Fabric separates from arm
Stitching breaks
User gets wet; fabric flaps and contacts user
Key life testing for operation (10 hr*300day*8yr=2 4000 hours)
Arms Arm of device breaks
User abuse during operation
User gets wet; fabric flaps and contacts user; arm swings and contacts user
Key life testing for opening and closing (8x*300day*8yr=19000 cycles
Evaluate possibility of thicker arms, or high strength materials
High winds Must withstand steady 30mph wind
Folding Mechanism
Folding mechanism jams
User improperly operates device
User can't fold or unfold device
Clarify instructions Poke Yoke process for operation Control clearance between cap and arms
Improper assembly of arm pivots and chassis
Revise assembly procedure
Tolerances of arm joints not correct
Re-tolerance arm joints
Insert falls out
Press fit of insert fails
Device falls apart
Re-tolerance insert-to-chassis joint
Advantages of FMEA/FMECA
FMEA/FMECA enables critical failures to be identified quickly and easily.
It is the most useful hazard identification technique for machinery and material
handling systems, for systems with predominantly linear or sequential interactions, and
for man/machine interactions.
FMEA/FMECA provides valuable information on the failure modes which can be used
in more sophisticated techniques such as fault tree analysis for quantification of system
failure frequency. This is described in Topic 5.
3 .16 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Limitations of FMEA/FMECA
It addresses only one component at a time, and may not reveal the complex and hidden
interactions in the subsystem and between subsystems in the system that lead to
accidents. In some cases, this coupling can be identified by asking: 'What is the effect
of failure on the system? What other system/component is affected?'
It does not provide sufficient detail for quantification of system consequences.
You should now read Reading 3.2 'Software FMEA Techniques' which examines the
application of FMEA to software.
HAZARD AND OPERABILITY STUDY (HAZOP)
The purpose of a Hazard and Operability Study (HazOp) is to systematically identify actual
or potential deficiencies in the design, layout or operating procedures of a proposed or
existing installation. A HazOp is generally undertaken before beginning construction or
major modifications, provided the relevant engineering diagrams are completed. This is
because the earlier a potential problem is found, the less expensive and easier it is to rectify,
and the more likely it is that the solution will be implemented.
The HazOp technique was originally pioneered in the chemical industry (Tweeddale, 1992)
and has since been adapted in a wide range of industries. It can be applied to almost any
operational situation, whether simple or complex. If the HazOp is being conducted on a
major or complex installation it may be necessary to sub-divide the study into sections.
The essential features of a HazOp study are as follows.
It is a systematic examination of the design and/or operation of the selected system.
It concentrates on exploring the causes and consequences of deviations from the usual
operating conditions.
A team who know most about the project or facility, typically those who designed and
operate it, participate in the process.
A series of guidewords are used repeatedly to ensure consistency and repeatability.
The success of the method depends heavily on the skills, experience and commitment of
those taking part. The team should comprise approximately ten people, including a team
leader who is responsible for facilitating the HazOp and a documenter responsible for
recording the process and outcomes. It is desirable to have at least one person with
expertise in each of the main technical disciplines relevant to the installation or component
that is being examined. The assembled team must have the authority to make on-the-spot
decisions when required.
Where a HazOp study identifies serious deficiencies, a detailed examination of the
likelihood and severity of potential loss events will need to be undertaken, along with a
cost-benefit analysis of any major design or procedural changes that are suggested.
However, it is important that the HazOp does not degenerate into a redesign session.
A HazOp study could form the basis of a submission to a statutory authority requesting
approval for a new installation or significant modifications to an existing installation. In
jurisdictions where Major Hazard Facilities regulations exist, HazOp studies are expected to
form part of the submission to gain a licence to operate a facility.
3 .17 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Methodology
The study begins with a discussion of the broad function of the relevant installation or
procedure. Each of its elements is then systematically examined using a checklist of
guidewords designed to focus attention on deviations from the normal operating conditions.
Guidewords are developed by combining a primary word that describes the process or
design intentions with a secondary word that suggests a possible deviation.
Some examples of primary guidewords are as follows: flow composition absorb load shut down
movement concentration drain reduce start up
pressure density purge react signal
temperature viscosity separate maintain inert
heat transfer quality mix monitor trip
position size filter test action
level energy isolate inspect protection
amount timing vent control containment
Some examples of secondary deviation guidewords are as follows: no part small wide failure
none multi-phase large narrow change
loss high thick imbalance vibration
more low thin uneven friction
less fast weak misaligned slip
inadequate slow strong reverse obstacles
excessive early short incorrect vacuum
contaminated late long poor other
Typical HazOp guidewords for fluid and non-fluid systems are shown in Tables 3.6 and 3.7. Table 3.6: Sample HazOp guidewords for fluid systems Primary guidewords Secondary deviation guidewords
Flow: High, low, no, reverse, uneven, loss, multi-phase
Level/Pressure/Temperature: High, low, no, loss, uneven
Amount: More, less, incorrect, excessive, inadequate, changes
Concentration: Incorrect, imbalance, thick, thin, weak, strong, changes
Reaction: Failure, no, late, slow, fast, incorrect, changed, multi-phase
Monitoring/Control: No, failure, inadequate, excessive, slow response
Maintenance/Testing: None, slow, inadequate, failure, incorrect, changes, late
Containment: Loss (fugitive emissions, minor leaks, major leaks, isolation)
3 .18 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.7: Sample HazOp guidewords for non-fluid systems Primary guidewords Secondary deviation guidewords
Position: Too high, too low, too far, misaligned, incorrect
Movement: Fast, slow, none, reverse, vibration, friction, slip, obstacles
Load: High, low, loss of, uneven, imbalance
Energy (e.g. electrical, pneumatic, hydraulic, steam):
Low, high, failure, no
Timing: Late, early, short, long, incorrect sequence
Size: Too large, too small, too long, too short, too wide, too narrow
Quality: Contaminated (water, oil, dust), inadequate, poor, low, uneven
Monitoring/Control: No, failure, inadequate, excessive, slow response
Maintenance/Testing: None, slow, inadequate, failure, incorrect, changes, late.
Once the set of guidewords have been determined, each element of the design or procedure
is examined systematically by following the process shown in Figure 3.3.
Figure 3.3: HazOp study process
NoYes
Yes
No
No
No
Yes
Yes
Select an element to examine
Select deviation guideword(e.g. no pressure)
Identify and list all possible causes andconsequences
Record outcome and move on tonext guideword or element
Accept risk
Are any of theseconsequences of concern?
Are thesesafeguards adequate?
Is the cost ofthe proposed
actions justifiable?
Identify actions to improvesystem and/or safeguards
List existing/proposed safeguards toprevent incident or reduce consequences
Can thedeviation occur?
3 .19 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
You will notice that this process involves answering four key questions: 1. Can the deviation occur?
For physical and other reasons, not all deviations are feasible. For example, in a line
where flow is from a high-pressure system to a low-pressure system, 'reverse flow' is
not possible. If the deviation cannot occur, proceed to the next guideword or element. 2. Are any of the consequences of concern?
Even if a deviation is possible, its consequences may not cause enough concern to
warrant any action. If this is the case, proceed to the next guideword or element.
However, if the consequences are of any level of concern, continue to the next step in
the process. If the team is unsure about the answer to this question, a detailed analysis
should be undertaken of the severity of the consequences if the deviation occurs. 3. Are the existing/proposed safeguards adequate?
Existing or proposed safeguards may include alarms, automated response systems or
manual detection by the operator. It is critical to consider whether these allow enough
time for corrective action before an incident escalates. Questions to ask include:
What if an automated response system fails? Is there sufficient time for an
operator to detect the error and make a manual correction?
Can an operator detect, understand and respond to a deviation quickly enough if he
or she has other responsibilities and may not be immediately available?
What if the operator responds incorrectly? Is there sufficient time to detect the
error and make a correction?
It is important not to over-estimate the reliability of automated response systems or the
quick diagnostic ability and response speed of operators. 4. Is the cost of the additional actions justified?
If the team is unsure about the answer to this question, a cost-benefit analysis should be
completed. If the cost of the additional actions is prohibitive and there are no
alternatives, you must accept the risk and move on to the next guideword or element. When all elements have been completed, the design or procedure as a whole is examined
against a set of overview guidewords. Typical overview guidewords are given in Table 3.8. Table 3.8: Overview guidewords for HazOp Overall primary guidewords Overall secondary guidewords
Dangerous goods: Storage and handling (toxicity, handling procedures, precautions, exposure monitoring, escape routes)
Electrical systems: Hazardous area classification, isolation, earthing
Equipment integrity: Materials of construction (vessels, piping/valves/gaskets/pumps/seals, others), codes and standards
Breakdown: Utilities and services (instrument air, plant air, nitrogen, cooling water, process water, demin. water, steam, electricity, natural gas, aux. fuel), computer control, hydraulics
Commissioning: Sequence, procedures
Start up: First time, routine
Shut down: Planned, unplanned, emergency
Waste: Effluent (gaseous, liquid, solid), treatment, disposal
OH&S: Noise (sources, statutory limits, control measures), safety equipment (personal protection, breathing apparatus), access/egress, training, location of safety showers
Fire protection: Fire/explosion detection systems, separation distances, blast proofing, passive and active fire protection, access
Quality: Output and efficiency (reliability, conversion, product testing)
3 .20 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
HazOp study documentation
For each of the subsystems considered in a HazOp study, a datasheet is usually completed
consisting of the following elements.
A header showing the name of the subsystem and system, relevant drawings, study
team, date and location of study.
Primary and secondary guidewords used in the review. Sometimes these are combined
in a single column (e.g. reverse flow).
Possible causes that could give rise to the deviation in question. It is essential to list
both equipment failures and secondary causes from linear or complex interactions.
Possible consequences caused by the deviation. Immediate consequences as well as
escalation potential in other areas through complex interactions are listed here.
Existing/proposed safeguards to either prevent the deviation occurring or enable its
detection and reduce its consequences. If none exist, this should also be recorded.
Any additional agreed actions. If a decision is made to accept a risk and do nothing
further, this should also be recorded.
The person or department responsible for implementation of any agreed actions. Example 3.8: HazOp study
A company plans to manufacture electrical components for industrial applications.
To ensure product quality, the components must be free of oil and grease. This will
be achieved by cleaning the components in a tank containing trichloroethylene
solvent. The solvent is required to be maintained at 70oC for effective degreasing.
Figure 3.4 shows a schematic diagram of the degreasing system.
Figure 3.4: Schematic diagram for degreasing system
Solventtank
Pump
Vent
Cleaning tank(batch)
Solvent recovery
H
L
TITE
Power supply
Heating element
The solvent tank will be maintained at between 65oC and 75oC by electrical heating
coils immersed in the solvent. A temperature element (TE) and a temperature
indicator (TI) will be installed. The TI has high and low settings to control the
temperature. When the temperature reaches a high of 75oC, a relay will open the
circuit breaker to cut off the power supply to the heating coils. When it reaches a
low of 65oC, the relay will close the circuit breaker to begin heating again.
Once the solvent is at the required temperature, it will be pumped to a cleaning tank
(batch process), where the electrical components are immersed for a specified
duration. The 'dirty solvent' will then be pumped to a solvent recovery still and
recycled back to the solvent tank. The solvent recovery still will be periodically
cleaned and the residue/sludge removed.
A HazOp study datasheet for this system is shown in Table 3.9.
3 .21 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.9: HazOp study datasheet for degreasing system
A final report is then prepared containing the following information.
Study purpose and scope
Team members
Installation elements/procedures addressed by the study
Study procedure adopted including documentation examined and guidewords used
Completed HazOp study datasheets
Summary of outcomes and recommendations including a list of any unresolved issues.
From the above lists you can see that a lot more information is required for the HazOp study
than for the FMEA study because the HazOp study tries to unravel the full effects of an
unplanned deviation on couplings and interactions.
Pro
pose
d sa
fegu
ards
Ope
rato
r to
be
pres
ent d
urin
g fi
lling
of
clea
ning
tank
. Rem
ote
switc
h to
turn
off
pum
p to
be
prov
ided
at c
lean
ing
tank
.
Ope
rato
r in
vest
igat
es w
hen
clea
ning
tank
isno
t bei
ng f
ille
d at
exp
ecte
d ra
te.
Pro
vide
mea
sure
s fo
r re
cove
ring
pro
duct
from
the
bund
, e.g
. air
dri
ven
pum
p.P
erso
nal p
rote
ctio
n eq
uipm
ent m
ust b
ew
orn.
Pro
vide
an
inde
pend
entT
E a
nd h
igh
tem
pera
ture
ala
rm, t
o cu
t off
pow
er s
uppl
yto
hea
ter.
Dev
elop
em
erge
ncy
resp
onse
pla
n fo
r a
pote
ntia
l vap
our
rele
ase
even
t.
The
inde
pend
entT
E to
ala
rm if
the
tem
pera
ture
dro
ps b
elow
65
C.
Ens
ure
that
the
vent
siz
ing
is a
dequ
ate.
Cle
ar a
ny b
uild
up in
the
vent
line
at r
egul
arin
terv
als.
As
for
high
pre
ssur
e.
The
inde
pend
ent h
igh
and
low
tem
pera
ture
alar
ms,
and
hig
h te
mpe
ratu
re c
utou
t sho
uld
be te
sted
at l
east
at q
uart
erly
inte
rval
s.
Pos
sibl
e co
nseq
uenc
es
Cle
anin
g ta
nk f
illed
too
quic
kly.
Ove
rflo
w p
oten
tial.
Del
ays
in f
illin
g cl
eani
ng ta
nk. N
otse
riou
s.
Los
s of
pro
duct
, but
con
tain
ed w
ithin
bund
. Env
iron
men
t pro
blem
s.
Sol
vent
boi
ls a
nd v
apou
r re
leas
esth
roug
h ta
nk v
ent.
If ig
nite
d, a
tank
fire
is p
ossi
ble.
Toxi
c va
pour
toat
mos
pher
e.To
xic
com
bust
ion
prod
uct i
n a
fire
.
Deg
reas
ing
not e
ffec
tive
in c
lean
ing
tank
. Pro
duct
qua
lity
prob
lem
s.
Pote
ntia
l for
tank
fai
lure
and
loss
of
cont
ents
. Ser
ious
saf
ety/
envi
ronm
enta
l iss
ue.
Tank
'suc
ked
in'.
Maj
or s
truc
tura
lfa
ilur
e.
Pos
sibl
e ca
uses
Pum
p ra
cing
Pum
p ca
vita
ting,
pum
p st
oppe
d
Dra
in v
alve
in ta
nk le
aks
TE
rea
ding
low
, and
hea
ting
cont
inue
s.C
ircu
it br
eake
r fa
ils to
ope
n on
high
tem
pera
ture
.
TE
rea
ding
hig
h, n
o he
atin
g. C
ircu
itbr
eake
r fa
ils in
ope
n po
sitio
n.
Tank
con
tent
boi
ls a
nd v
ent i
sre
stri
cted
.
Ven
t is
bloc
ked.
Vac
uum
in ta
nkw
hen
prod
uct i
s w
ithdr
awn.
Gui
dew
ords
Hig
h fl
ow
Low
flo
w
Low
leve
l
Hig
h te
mpe
ratu
re
Low
tem
pera
ture
Hig
h pr
essu
re
Low
pre
ssur
e
Test
ing—
trip
s an
dal
arm
s
Res
pons
ibili
ty
Eng
inee
ring
Prod
uctio
n
Prod
uctio
n
Prod
uctio
n
Eng
inee
ring
Prod
uctio
n
Eng
inee
ring
Eng
inee
ring
Mai
nten
ance
Eng
inee
ring
Mai
nten
ance
Mai
nten
ance
Stud
y ti
tle:
HA
ZO
P of
deg
reas
er s
yste
mU
nit:
Deg
reas
ing
tank
Lin
e/eq
uipm
ent
desc
ript
ion:
Sol
vent
line
fro
m ta
nk to
cle
anin
g ta
nk
By:
Dra
win
g no
:3.4
Pag
e: 1
of
1
Dat
e: 8
Dec
embe
r 20
06L
ocat
ion:
Bri
sban
e pl
ant
Issu
e: A
3 .22 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Advantages of HazOp
The multidisciplinary approach helps identify a whole range of issues (safety,
operations, maintenance, design, construction etc.).
It is a powerful medium for communicating the designer's intent to the operations
personnel.
It identifies both linear and complex interactions between various subsystems in the
system, and between systems.
It highlights hazardous events that could occur from a combination of causes, both
visible and hidden, and provides input for detailed hazard analysis.
For new projects and extensions to existing operations, the review is conducted on
paper before the design is complete and offers the flexibility to identify operability
issues and make the necessary design changes before commissioning, thus avoiding
costly shutdowns and modifications at a later stage.
When conducted on an existing operation following an incident, it reveals not only the
appropriate action to be taken to prevent a recurrence, but also a whole range of other
actions to prevent potential incidents that may not yet have occurred.
Limitations of HazOp
It is a highly time-consuming exercise and requires the participation of a number of key
personnel for significant periods (depending on the project size).
If it is conducted on an existing plant, there is a limit to which hardware changes can be
implemented due to design and installation constraints.
The effectiveness of the HazOp is very dependent on the composition and experience
of the participating team members and the experience of the team leader; if the team is
inexperienced, it is possible to miss identifying some of the hazards.
Like all schematic analyses, it may not detect zonal or geographic interactions.
You should now read Reading 3.3 ‘Hazard and operability (HAZOP) studies applied to
computer-controlled process plants'. Then read Reading 3.4 'Using a modified
Hazop/FMEA methodology for assessing system risk' which demonstrates how the two key
techniques we have just studied can be combined.
PRELIMINARY HAZARD OR SAFETY ANALYSIS
A preliminary hazard or safety analysis is conducted during the early stages of a project
before the design is complete. The aim is to identify all the hazardous characteristics of the
plant, process or project prior to final design or specification stage so that they can be more
easily designed out or reduced.
A number of different methods can be used to carry out a preliminary hazard or safety
analysis. These include:
concept safety review
concept hazard analysis
critical examination of system safety
preliminary consequence analysis
preliminary hazard analysis
functional concept hazard analysis
threat and vulnerability analysis.
You should now read Reading 3.5 'Preliminary safety analysis' for an overview of the first
five of these methods. We will then discuss the final two methods separately below.
3 .23 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Functional concept hazard analysis
Rasmussen and Whetton (1993) developed a variation on the concept hazard analysis
method that can be used for identifying adverse variances in outcome in any operation. In this method, a plant, process or project is divided into functional subsystems that each
comprises the three elements shown in Figure 3.5:
An intent which describes the functional goal of the specific plant activities in question
Methods which describe the items (personnel, procedures, hardware, software, codes,
etc.) that are used to carry out the intent or operations
Constraints which describe the items (physical laws, organisational context, control
systems, contractual requirements, regulatory requirements, production requirements,
etc.) that exist to supervise or restrict the intent. Figure 3.5: Functional concept hazard analysis model
For example, a subsystem of a construction project might be:
Construct a bridge [intent] using prestressed concrete [method] as set out in a specified
building code [method] without accident or incident [safety constraint] and within a
given timeframe [time constraint] and budget [cost constraint]. Alternatively, a subsystem of a plant might be:
Run a production unit [intent] using specified staff, equipment, materials and
procedures [methods] without interruptions between scheduled shutdowns [production
constraints]. Each method and constraint may itself be treated as a separate subsystem or a component of
a subsystem with its own intent, methods and constraints.
To carry out a functional concept hazard analysis, complete the following steps.
1. Define the overall intent of the system.
2. Subdivide the system into subsystems (and components if necessary).
3. For each subsystem, identify the intent, methods and constraints.
4. Decide on a set of keywords. These are similar to the primary guidewords used in a
HazOp study and are best generated from the intent, methods and constraints of the
specific system/subsystems. Examples are shown in Table 2 of Reading 3.3 and in
Table 3.10 on the next page, and also in our previous discussion of primary guidewords
for HazOp studies.
5. For each method and constraint associated with a given intent, systematically apply the
keywords to identify:
possible deviations (dangerous disturbances or undesired events)
possible consequences of the deviation (including complex interactions)
suggested safeguards/prevention measures required
actions and comments.
6. Summarise the findings and prioritise key areas for further in-depth study (e.g.
HazOp, FMEA).
Using WithIntent Methods Constraints
3 .24 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.10: Additional generic keywords for concept hazard analysis Type of risk Keywords Staff Workplace environment
Safe work practices Safety management system (SMS) Organisational culture Human error management Training Emergency preparedness
Environmental Atmospheric discharges Liquid waste Solid waste Pollution Contamination Surface water quality Groundwater quality
Liability Breach of contract Regulatory requirements Employer 'duty of care' issues Negligence
Software Software quality Fit between system and tasks Software error Software failure Error diagnostic tools Hardware compatibility Compatibility with socio-technical changes
(structure, task, technology, users) Application scope Backup system System performance Real time performance Maintainability Extendability User interface Internal support External support
Advantages of functional concept hazard analysis
Good basis for a more detailed study.
It identifies hazards prior to final design or specification stage enabling them to be
more easily designed out or reduced.
The multidisciplinary approach helps identify a whole range of issues (e.g. safety,
operations, maintenance, design, construction).
It identifies both linear and complex interactions between various subsystems in the
system, and between systems.
It tests underlying design assumptions particularly within the commercial framework.
Limitations of functional concept hazard analysis
It concentrates only on major hazards.
It may not detect zonal or geographic interactions.
It is possible to miss identification of some hazards if the study is conducted by an
inexperienced team.
3 .25 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
A C T I V I T Y 3 . 5
a) Select a specific operation from your work environment. The operation should
have a man/machine interface and require a sequence of manual operations to be
performed. Both the sequence and correctness of operations are important for
the safe and successful completion of the operation.
Using the functional concept hazard analysis technique, analyse the sequence of
operations by identifying the intent, methods, constraints and potential
deviations.
Some examples of operations that could be analysed include:
Transfer of a shipping container containing hazardous substances from the
ship to the wharf using the container terminal crane.
Filling an above ground LPG storage tank from a bulk road tanker in an
automotive retail outlet.
or
b) Select a project with which you are or have been associated and use functional
concept hazard analysis technique to identify the risks involved in the project.
Some examples of projects might be:
A component of a construction contract, either local or offshore. (If it is a
joint venture, identify the risks for one party only.)
Upgrading an inventory management software system for a small
supermarket chain wishing to expand its operations.
Vulnerability analysis
A vulnerability analysis is a top down method that involves identifying the assets or critical
success factors for a plant or project and matching these against credible threats to identify
critical vulnerabilities. Originally developed by military intelligence organisations, it has
many variations and is often used as a preliminary hazard or safety analysis because it
provides a completeness check to ensure that no significant vulnerabilities have been
overlooked in the initial stages of design or functional specification.
A vulnerability is the weakness of an asset with respect to a threat. It may be intrinsic to the
asset, for example train seats are more vulnerable to vandalism than train wheels, or it may
be due to location, for example facilities in northern Australia are vulnerable to damage by
tropical cyclones. Vulnerabilities are deemed critical if they can halt the business or cause
damage to a significant part of its operations. A tropical cyclone in Tasmania is not a
credible threat and so a credible vulnerability cannot arise from this threat in this region.
Figure 3.6 shows a simple diagram of the vulnerability analysis process.
There are four steps involved:
1. Identify all of the plant or project's assets or critical success factors. Examples include
staff, physical assets, reputation, business continuity and customer loyalty.
2. Identify all credible threats to the plant or project. Examples include smoke, fire,
explosion, natural hazards such as rain, snow, wind, earthquake, staff injury or illness,
critical plant failure, failure of a major supplier, sabotage and acts of aggression.
3. Systematically assess the extent to which each asset or critical success factor is
vulnerable to each threat. This is often done using a matrix or table such as that shown
in Figure 3.7.
4. Develop risk management strategies for all critical vulnerabilities.
3 .26 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 3.6: Vulnerability analysis process
Figure 3.7: Sample vulnerability analysis matrix
Threats Assets
Technical failure
Community issues
Political (change of government)
Credit squeeze
Flood
Reputation
xx
–
x
xxx
x
Operability
xx
–
x
xxx
xxx
Staff
xx
xx
x
xx
xx
Scoring system xxx xx x – va
Critical potential vulnerability that must be (seen to be) addressed Moderate potential vulnerability Minor potential vulnerability No detectable vulnerability Possible value adding
The power of the process rests on the fact that whilst there may be a large number of
identified assets or critical success factors to be protected against a large number of threats,
the actual number of critical vulnerabilities is usually quite small, typically about 10% of
the intersections of an asset/threat matrix. The process therefore prevents the
misapplication of resources to things that are really only threats and not vulnerabilities.
Advantages of vulnerability analysis
It is one of the few techniques that attempts to provide a 'completeness' check. If all
assets or critical success factors are defined and all threats are defined then all
vulnerabilities can be identified and analysed.
The multidisciplinary approach helps identify a whole range of issues.
It is a powerful medium to ensure contextual awareness of designers.
If done on a zonal basis for a plant design it is very good at identifying propagation
potentials.
Assets / CriticalSuccess Factors
CredibleThreats
CriticalVulnerabilities
Risk ManagementStrategies
ResidualVulnerabilities
3 .27 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Limitations of vulnerability analysis
If an asset that requires protection is not identified then unwanted surprises may occur.
If too many overlapping assets are identified then it becomes unwieldy.
As a top down technique, it can become sidetracked by small issues if insufficient
high-level analysts are present.
The vulnerability technique is very useful for project risk management at the concept stage.
However, care must be taken to differentiate between assessing overall project risk as
opposed to assessing the risk of several project options. An overall project risk assessment
is concerned with minimising impacts during the life of the project so that it is completed on
time and on budget. However, during the concept stage it may also be appropriate to assess
the risks associated with several different design options or possible locations, as we
discussed in Topic 2 with regard to the elimination of a level crossing. These are two
distinctly different risk assessments.
SCENARIO-BASED HAZARD IDENTIFICATION
Application of many of the hazard identification techniques described in this topic results in
a tabulation of deviation/causes/consequences that can be used to construct risk scenarios.
Scenario creation is important because most of the techniques we have discussed are
bottom-up, that is they examine individual components or process deviations. Scenario
creation requires postulating multiple failures or deviations concurrently or sequentially.
An example would be what happens if two seemingly independent systems fail at the same
time—such as compressed air supply and cooling water. Is there a hidden common failure
mode? Whilst failures of each may be manageable if they occur at different times, can
failure of one mask failure of the other and can a dual failure have serious consequences?
SUMMA RY
In this topic we discussed the first two steps of the risk management framework: defining
the system and identifying hazards and potential loss events. We started with a discussion
of the significance of couplings and interactions in engineering systems and then discussed
each of the following hazard identification techniques:
Past experience
Checklist reviews
Hazard and operability study (HazOp)
Failure modes and effects analysis (FMEA)
Failure modes, effects and criticality analysis (FMECA)
Preliminary hazard or safety analysis
Scenario-based hazard identification.
Selecting the appropriate techniques for a given situation is a skill that you will develop
with experience. If a technique is not giving you the results you're looking for, try another
one, and remember that no single technique is capable of identifying the hazards and
potential loss events for all situations.
3 .28 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
EX E RC I S E S
3.1 CASE STUDY—FUEL STORAGE TERMINAL
A company intends to establish a petroleum products storage and distribution terminal. The
site will include the storage tank farm for bulk fuels, butane storage facilities and a tanker
loading facility.
Unleaded automotive fuel, automotive diesel fuel, jet fuel and bunker fuel will be imported
by ship from the nearby wharf via an underground 350 mm pipeline. Blending facilities
will be provided in the terminal to enable the production of premium unleaded automotive
fuel from the unleaded automotive fuel by the controlled addition of butane and tetra ethyl
lead (TEL). Butane and TEL will be imported by bulk road tankers to the terminal.
Four truck loading bays will be constructed for product distribution. The following
equipment and operations are included in the project:
14 above ground petroleum storage tanks and piping consisting of 5 x 17 megalitre
(ML) tanks, 3 x 10 ML tanks, 3 x 5.3 ML tanks, 1 x 1.5 ML tank and 4 day tanks
21 product transfer pumps
butane storage vessel of capacity 40 tonnes
underground petroleum pipeline from wharf to the terminal (approximately 2.5 km)
ship unloading of product
product transfer from the wharf to the terminal
filling of road tankers
butane unloading from a road tanker
management of waste water on site
TEL storage area
additives tanks.
Delivery of the products into the terminal will be via the ship's pump.
The following safety systems are proposed.
Ship unloading hoses will include dry break couplings.
Electronic monitoring of tank levels during all product movements.
High-level alarms on all tanks, and high-level cut-out switches on the smaller blend
tanks and day tanks.
Access to road tanker loading bays controlled by a card swipe system identifying
driver, truck and load requirements.
Road tanker loading using a 'Scully' probe type system to ensure that the static probe is
installed before the computer controls can be activated. The system will stop the
transfer should the road tanker drive away still connected, or on a high tanker level via
links to sensing probes on each dip point of each compartment.
Computer controlled loading of road tankers. Each truck compartment volume is
pre-entered into the system so that a fixed amount can be filled, preventing both
overfilling and overloading of the vehicle.
Top loading flow controlled via a spring to close dead man loading valve combined
with a timer system to prevent the control valve opening fully until after an elapsed
time with the loading valve held open.
Foam injection provided to all unleaded automotive fuel and jet fuel storage tanks.
Fire monitors and hydrants provided via a ring main system to cover all tanks, pumps,
butane storage and tanker loading bays, with the provision to deliver both water and
foam.
3 .29 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Onsite water and foam storage to meet a fire fighting demand for cooling water for 1.5
hours and foam to several of the tanks for 20 minutes. Main fire pump and foam
generating pump to be diesel-driven in case of power failure.
The significant hazard in the terminal is fire. Some of the specific potential loss events are:
atmospheric tank roof fires
tank farm bund fires (intermediate and full bund)
pool fire at tanker loading bay and pump slab
butane tank fire and explosion
pool fire due to product release from shipping pipeline
spills at wharf.
Task
Use the checklists in Reading 3.1 to identify specific hazards in the terminal.
3.2 FAILURE MODES AND EFFECTS ANALYSIS
It is necessary to maintain a spray of warm water at a fixed temperature to control a
biological process. The process is operated at 45oC. Too low a temperature would result in
insufficient reaction, and too high a temperature would destroy the micro-organisms. Cold
water is supplied at ambient temperature and could vary depending on the time of the year.
Hot water is supplied from the site's hot water source at about 80oC. The spray system for
mixing hot and cold water to deliver at the set temperature is shown in Figure 3.8 below.
Figure 3.8
Both hot and cold water are supplied from overhead head tanks. The levels in the tanks are
maintained by float valves. The area is generally unattended but is patrolled at regular
intervals by an operator who takes a sample from the reactor for laboratory analysis.
The cold water flow is controlled by providing a set point using a hand switch. The flow
rate measured by a flow element (FE) is controlled by a flow controller (FC), which in turn
Hot waterhead tank
Cold waterhead tank
Manual setpoint
FCV1
FCV2
FC
TE
TC
FE
Manualset point(45 C)
3 .30 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
adjusts the flow control valve (FCV2) to provide the set flow. The temperature of the spray
is measured by a temperature element (TE). Based on the difference between the
temperature measured by TE and the temperature set point, the temperature controller (TC)
adjusts the hot water flow control valve (FCV1) at the required temperature.
Task
a) Using the failure modes and effects analysis technique, analyse the above circuit and
identify the conditions under which the reaction may become ineffective, or the 'bugs'
would be destroyed. Record the findings in an FMEA datasheet similar to that shown
in Table 3.5.
b) Suggest additional measures that may be required in the design to reduce the risk of
losing the 'bugs' and to improve workplace safety.
3.3 HAZARD AND OPERABILITY STUDY
Repeat Exercise 3.2 using the HazOp technique and relevant guidewords selected from
Tables 3.6 and 3.7. Record the results in a datasheet similar to that shown in Table 3.9.
3.4 FUNCTIONAL CONCEPT HAZARD ANALYSIS
A bus transport company decided to explore the use of compressed natural gas instead of
liquid fuels in its buses. This would result in significant savings in operating costs.
Metered low-pressure natural gas supply is available from the street mains. It is
compressed to a pressure of 12 000 kPa in a multi-stage reciprocating compressor, and
filled into a thick walled cylinder that could be mounted on the bus, similar to LPG
cylinders in motor vehicles. A number of gas filled cylinders would be filled and stored for
use. Empty cylinders removed from the buses would be stored in a separate dedicated area.
The compressor only needs to operate for about eight hours per day; no night time operation
would be required. The compressor would be located within a building and provided with
acoustic protection to meet the noise regulations. Water cooling of gas in between
compression stages in the multi-stage compressor is to be provided by installing a small
dedicated cooling tower, an off-the-shelf design.
An operator will conduct regular inspection/surveillance of the compressor house wearing
suitable ear protection. The compressor house will be air-purged to keep the ambient
temperature in the room to workplace health and safety standards for operator comfort.
A preliminary review revealed a number of risk issues associated with natural gas. There is
potential for fire and explosion in the compressor house in the event of a gas leak. A leak
of high-pressure gas from the cylinder storage outside the building may result in a jet fire if
ignited, and could impinge on buses parked nearby. The buses are also parked close to one
another (less than 1m apart), to maximise the depot floor space. There is also concern as to
whether there would be an incremental reduction in passenger safety.
The Operations Manager is also concerned that if something goes wrong with the new
technology, the buses may have to be taken off the road, severely affecting the company's
ability to service the sectors according to established schedule. This may undermine
passenger confidence in the bus company.
3 .31 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The company wants to ensure that all risks are identified and that adequate prevention and
mitigation measures are developed for protection of assets and employee/passenger safety,
before making the capital expenditure decision.
Task
Carry out a functional concept hazard analysis for the natural gas compressor station and
cylinder storage/handling area. Make relevant assumptions where appropriate. Note that
since students of this unit are from different engineering disciplines, only a simple analysis
is required for this exercise.
3.5 VULNERABILITY ANALYSIS
Your company has won a government tender to complete a major freeway upgrade to a
regional centre, and you have been appointed as project manager. Undertake a vulnerability
analysis for this project by adapting the vulnerability matrix and scoring system shown in
Figure 3.6.
RE F E R E N C E S A N D F U RT H E R R E A D I N G
Bowles, J.B. & Wan, C. (2001) 'Software failure modes and effects analysis for a small
embedded control system', 2001 Proceedings Annual Reliability and Maintainability
Symposium, IEEE: 1–6.
Chapman, Chris & Ward, Stephen (2003) Project Risk Management: Processes,
Techniques and Insights, 2nd edn, John Wiley & Sons, Chichester.
Charoenngam, C. & Yeh, C-Y. (1999) 'Contractual risk and liability sharing in hydropower
construction', International Journal of Project Management, 17(1): 29–37.
Chung, P. & Broomfield, E. (1995) 'Hazard and operability (HAZOP) studies applied to
computer-controlled process plants', Computer Control and Human Error, Institution
of Chemical Engineers, Warwickshire, UK.
Cooper, Dale et al. (2004) Project Risk Management Guidelines: Managing Risk in Large
Projects and Complex Procurements, John Wiley & Sons, West Sussex, England.
Department of Planning, NSW (1995) Hazardous Industry Planning Advisory Paper No. 8:
HazOp Guidelines, NSW Department of Planning, Sydney.
Edwards, Peter J. & Bowen, Paul (2005) Risk Management in Project Organisations,
UNSW Press, Sydney.
Energy Institute (UK) (2005) Top Ten Human Factors Issues Facing Major Hazards
Sites—Definition, Consequences, and Resources, available at:
http://www.energyinst.org.uk/content/files/hftopten.doc, accessed 11 December 2006.
Feynman, R.P. (1988) What Do YOU Care What Other People Think? Further Adventures
of a Curious Character, (as told to Ralph Leighton), Norton, New York.
Goddard P.L. (1993) 'Validating the safety of embedded real-time control systems using
FMEA', 1993 Proceedings Annual Reliability and Maintainability Symposium, IEEE:
227–230.
3 .32 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Goddard, P.L. (2000) 'Software FMEA techniques', 2000 Proceedings Annual Reliability
and Maintainability Symposium, IEEE: 118–123.
Hessian, R.T. Jr & Rubin, J.N. (1991) 'Checklist reviews', in Greenberg, H.R. & Cramer,
J.J. (eds), Risk Assessment and Risk Management for the Chemical Process Industry,
van Nostrand Reinhold, New York: 30–47.
Keil, M., Cule, P.E., Lyttinen, K. & Schmidt, R.C. (1998) 'A framework for identifying
software project risks', Communications of the ACM, 41(11): 76–83.
Kirwin, B. (1994) A Guide to Practical Human Reliability Assessment, Taylor & Francis,
London.
Lam, Patrick T.I. (1999) 'A sectorial review of risks associated with major infrastructure
projects.' International Journal of Project Management, 17(2), 77–87.
Lees, F.P. (1996) Loss Prevention in the Process Industries: Hazard Identification,
Assessment and Control, 2nd edn, Butterworth-Heinemann, Boston. (3 Volumes)
Leveson, N.G. (1995) Safeware—System Safety and Computers, Addison-Wesley.
Lyytinen, K., Mathiassen, L. & Ropponen, J. (1998) 'Attention shaping and software risk—
a categorical analysis of four classical risk management approaches', Information
Systems Research, 9(3), September: 233–255.
McKelvey, T.C. (1988) 'How to improve the effectiveness of hazard and operability
analysis', IEEE Transactions on Reliability, 37(2), June: 167–170.
Nguyen, D. (2001) 'Failure modes and effects analysis for software reliability', 2001
Proceedings Reliability and Maintainability Symposium, IEEE: 219–222.
Paté-Cornell, M.E. (1993) 'Learning from the Piper Alpha accident: A postmortem analysis
of technical and organizational factors', Risk Analysis, 13(2): 215–231.
Pentti, H. & Atte, H. (2002) Failure Mode and Effects Analysis of Software-Based
Automation Systems, STUK, Helsinki, available at: http://www.stuk.fi/julkaisut/tr/stuk-
yto-tr190.pdf, accessed 13 December 2006.
Perrow, C. (1999) Normal Accidents: Living with High Risk Technologies, Princeton
University Press, Princeton, New Jersey.
Rasmussen, B. & Whetton, C. (1993) Hazard Identification Based on Plant Functional
Modelling, The University of Sheffield, UK, and Riso National Laboratory, Roskilde,
Denmark.
Sherrod, R.M. & Early, W.F. (1991) 'Hazard and operability studies', in Greenberg, H.R. &
Cramer, J.J. (eds), Risk Assessment and Risk Management for the Chemical Process
Industry, van Nostrand Reinhold, New York: 101–25.
Smith, David J. & Simpson, Kenneth (2004) Functional Safety: A Straightforward Guide to
IEC 61508 and Related Standards, 2nd edn, Elsevier, Burlington.
Standards Australia/Standards New Zealand (2004) Risk Management, Australian/New
Zealand Standard AS/NZS 4360:2004.
Standards Australia (2004) HB 436:2004 Risk Management Guidelines: Companion to
AS/NZS 4360:2004, Standards Australia/Standards New Zealand, Sydney.
Thompson, P.A. & Perry, J.G. (1992) Engineering Construction Risks: A Guide to Project
Risk Analysis and Assessment Implications for Project Clients and Project Managers,
Thomas Telford, London.
3 .33 TO P I C 3 ID E N T I FY I N G
HAZ AR D S AN D
P O T E N T I AL LO S S
E V E N T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Trammell, S.R. & Davis, B.J. (2001) 'Using a modified Hazop/FMEA methodology for
assessing system risk', Proceedings of Engineering Management for Applied
Technology (EMAT) 2001, 2nd International Workshop, 16–17 August: 47–53.
Tummala, V.M.R. & Burchett, J.F. (1999) 'Applying a risk management process (RMP) to
manage cost risk for the EHV transmission line project', International Journal of
Project Management, 17(4): 223–235.
Tweeddale, H.M. (1992) Risk Management, Engineering Education Australia, Milsons
Point, NSW.
United Kingdom Department of Employment (1975) The Flixborough Disaster: Report of
the Court of Inquiry, HMSO, London.
United States Atomic Energy Commission (1974) Reactor Safety Study: An Assessment of
Accident Risks in US Commercial Nuclear Power Plants, United States Atomic Energy
Commission, Washington, DC.
United States Department of Defense (1980) Procedures for Performing a Failure Mode,
Effects and Criticality Analysis, MIL-STD-1629A, US Department of Defense,
Washington, DC.
United States Department of Energy Quality Managers (2000) Software Risk Management:
A Practical Guide, US Department of Energy, available at:
http://cio.energy.gov/documents/sqas21_01.doc, accessed 13 December 2006.
Van Well-Stam, D. et al. (2004) Project Risk Management: An Essential Tool for
Managing and Controlling Projects, Kogan Page, London.
Wells, G., Wardman, M. & Whetton, C. (1993) 'Preliminary safety analysis', Journal of
Loss Prevention in Process Industries, 6(1): 47–60.
Wideman, R. Max (1998) 'Project risk management', Chapter 9 in Pinto, J.K. (ed.) Project
Management Handbook, Jossey-Bass, San Francisco, 138–158.
Yeo, K.T. & Tiong, R.L.K. (2000) 'Positive management of differences for risk reduction in
BOT projects', International Journal of Management, 18(4): 257–265.
RE A D I N G 3 .1
HAZARD IDENTIFICATION CHECKLISTS
ROBERT T. HESSIAN JNR & JACK N. RUBIN
The following sample checklists have been developed to assist a hazards analyst in
identifying problems that may require further attention. The examples are general, and
therefore a paragraph stating the objective and describing the focus for the checklist is not
provided. The checklists should be modified to reflect specific objectives and facilities
prior to application in an actual facility. CHECKLIST A—PLANT ORGANIZATION AND ADMINISTRATION
1. Organization
a) Corporate organization chart detailing areas of responsibility for each division and the
name and telephone number of the key person responsible.
b) Divisional organization chart identifying supervisors, group assignments and functions,
and the names of personnel in each group.
c) Is a procedure in place to periodically update these charts and distribute to appropriate
personnel?
d) Specialty areas highlighted for quick reference (e.g., Fire Warden, Plant Safety
Supervisor, Emergency Response Coordinator).
e) Are adequate facilities available (e.g., offices, technical library, warehouses,
laboratories)?
f) Are personnel with technical expertise readily available?
g) Are there any plans for expansion or modernization of the facility?
2. Administration
a) Plant operators
1. Are plant procedures readily available?
2. Are emergency procedures available?
3. Are the operators periodically evaluated to check their competency?
4. Are operators periodically retrained?
5. Has the training program been formalized?
6. Are the operators periodically drilled on responses to random simulated emergency
situations?
b) Maintenance group
1. Are adequate facilities available (e.g., offices, records library, warehouses,
maintenance equipment)?
2. Are vendor equipment manuals available for quick reference?
3. Have personnel been periodically retrained and educated on new techniques?
4. Are personnel supported by an engineering staff or contracted maintenance
professionals?
5. Is a program in place for preventive and predictive maintenance?
2 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
6. Are findings from maintenance activities cataloged and routed to the engineering
staff for evaluation?
7. Are functions and responsibilities, especially safety and inspection interfaces, well
defined?
c) Emergency response group
1. How is the plant shut down in case of a fire emergency?
a) Panic button to emergency shut-down (ESD) system.
b) Individual motor-operated valves (MOVs).
c) Fire alarm to ESD system.
d) Manual valve operation.
2. Is an emergency response plan available and supported by management?
3. Are procedures in place for activation of the plan in place?
4. Emergency protocol: Is there a notification sequence, and is it prominently
displayed on the operating floor and in the control room?
5. Is the plan evaluated and updated periodically?
6. Have local authorities been briefed and trained in the plan and its major features?
7. Is emergency support equipment in place and adequately maintained?
8. Are procedures for deactivation and recovery detailed in the plan? CHECKLIST B—GENERAL OPERATIONS
1. Inventory control
a) Are dangerous or hazardous substances stored in remote locations?
b) Is on-site inventory maintained at a minimum acceptable level?
c) Are detectors and alarms provided for detection of leaks or spills?
d) Is inventory maintained in a safe fashion (e.g., are drums stacked a maximum of
two high) and hazardous substances segregated?
e) Is storage area in compliance with local building codes (e.g., electrical utilities, fire
protection)?
2. Production area
a) Are dangerous or hazardous substances staged to the process in an acceptable
manner?
b) Is staging area protected from adjacent operations or traffic?
c) Has process instrumentation been adequately maintained?
d) Is local instrumentation readily accessible or visible to operators from local control
panels?
e) Are drain connections valved and capped?
f) Are maintenance valves locked in the appropriate position for operation?
g) Are local annunciators furnished to alert floor operators of problems?
3. Intermediaries and by-product discharges
a) Are all hazardous intermediaries properly labeled?
b) Are discharges monitored?
c) Are safeguards in place to prevent improper discharges?
d) Are vents routed to flares or scrubbers?
4. Final product handling
a) Is product packaged for on-site use or for off-site use?
b) Is product adequately protected from other operations?
c) Is product adequately labeled?
5. Are alternate operating modes discussed and researched?
6. Are equipment qualifications reviewed with operators?
7. Are interim training sessions held when plant modifications are performed?
3 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
8. Is a full-time training instructor assigned for process operators and maintenance
personnel?
9. Is a training room available with various visual aid apparatus (e.g., overhead projector,
video recorder/monitor, large drawings and charts, film projector)?
10. Is a training course curriculum available with printed handbooks, test sheets, and other
learning aids?
11. Are process operators and maintenance personnel kept up to date when plant
modifications or new equipment are introduced by retraining? CHECKLIST C—MAINTENANCE
1. Has a maintenance program been formalized?
a) Are warehouse inventory control procedures in place?
b) Is an automated or manual inventory procurement program in place?
c) Can a surplus of hazardous materials be procured?
2. How are maintenance department activities coordinated with plant operating?
3. Are maintenance personnel available when required by operations?
4. Is equipment usually operated at its optimum design range? If not, what problems have
been encountered?
5. Has degraded equipment forced operating requirements to be outside design
parameters?
a) Is the instrumentation and control system maintained adequately?
6. Is operation of instrumentation in the manual mode required because of
a) Process stability problems?
b) Inadequate maintenance?
7. Are analyses performed to determine the best approach:
a) Repair/delay.
b) Repair/replace.
8. Who determines repair or replacement?
9. What efforts are made to upgrade equipment?
10. How are feedback and new technology incorporated?
11. Are spare parts available in support of maintenance? Which spare parts are fabricated
at facility? Are all spare parts original equipment by manufacturer? Is inventory
inspected periodically?
12. Are spare parts and chemical stocks replaced after maintenance? How are stocking
levels determined? Is a spare part inventory available?
13. What type of storage system exists? Are new materials inspected?
14. Are spare parts and chemical inventories interfaced with other plants?
15. Are replacement materials made in kind or is the state of the art considered? Is
obsolescence considered?
16. Are spare parts available for maintenance during an unscheduled shutdown?
17. Are spares and materials classified by replacement cost, frequency, delivery, labor
intensity, sources, or effect on production or safety?
18. What records are maintained?
a) Time and personnel staffing records.
b) Equipment and machinery maintenance logs.
c) Record system (coding and inventory control).
d) Lubrication schedules.
e) Instrument and control calibration.
f) Actual expenditures and schedules vs. budgets (performance).
4 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
g) Frequency of unscheduled shutdowns and causes.
h) Are maintenance findings routed to the engineering staff for evaluations?
19. Technical manuals and prints.
a) Are vendors' manuals available and up to date?
b) Are prints available and up to date?
c) Are as-built drawings up to date?
d) Are vendor recommendations followed?
20. Are written maintenance orders or work requests used and is there a written procedure
defining the system?
21. Do work requests contain the following information?
a) Clear description of malfunction or problem.
b) Description of work.
c) Tools required and special test equipment.
d) Tagging requirements.
e) Test required.
f) Safety precautions.
g) Drawings or procedures' references.
h) Identification of material needed and spare parts.
i) Priority (who assigns it?).
j) Estimated time to repair.
k) Status of plant during repair.
l) Personnel requirements.
m) Means for documenting cost.
n) Approval and authorization provisions.
22. Are sparkproof tools available? Who determines whether sparkproof tools are to be
used?
23. Work schedules: Are the following used?
a) Maintenance staff available for all shifts.
b) Daily and weekly work schedules.
c) Personnel assignments.
d) Long-range planning schedules.
24. Are job planners used?
25. Are maintenance schedules coordinated with plant operation?
26. Who coordinates the turnaround?
27. What meetings, if any, are held during turnaround?
28. Is the sequence of maintenance work defined? If so, are the functions of each step in
the procedure defined (e.g., job planner, coordinator)?
29. Is there a preventive maintenance program?
30. Turnaround planning.
a) Is planning process a daily activity? How is backlog addressed?
b) Are priorities established for modifications or repairs during an unscheduled plant
shutdown?
c) What is the constraint to reducing typical scheduled turnaround time?
d) How is the interface of area activities with systems activities achieved?
31. Personnel.
a) Morale.
1. Has impact of daily work on quality of life been stressed?
b) Overtime practices
1. Which department shows the highest amount of overtime?
5 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
c) Use of subcontractors?
1. For routine maintenance.
2. For specialty services.
3. For plant turnaround.
32. Training.
a) Training records.
b) Apprentice training or similar program.
c) Periodic review training.
d) Vendor schools.
e) On-the-job training.
f) Personnel goals.
g) Levels of qualification.
h) Educational and training material available.
i) Does management support the training effort?
1. Organizationally.
2. With budget and resources.
The following checklist was developed to verify various activities performed during a
modification.
CHECKLIST D—INSPECTION
1. Replacement equipment procurement
a) Are appropriate specifications prepared? Have data sheets been completed and
verified?
1. Are references to consensus standards included?
b) Have vendor shops been visited to verify qualifications?
1. Is a quality-assurance program in place?
2. Is a certification program available?
c) Is a receipt inspection program in place?
1. Verification against procurement specifications required?
2. Equipment storage
a) Have appropriate provisions and precautions been taken to protect equipment
while it is in storage?
b) Has shelf life of subcomponents been noted?
c) Is equipment protected from other storage area activity?
3. Piping and vessels
a) Is ultrasonic thickness testing of vessels and piping done on a regular basis (e.g.,
during turnaround)?
b) What other methods of inspection and nondestructive testing are used (e.g., dye
penetrant, magnetic particle)?
c) Does the maintenance department do this testing or are there special personnel for
inspection and testing? Is new or modified piping tested, and how is this done?
d) How often and in what manner is PSV testing performed?
e) Are corrosion-prone areas of process piping and vessels inspected on a regular
basis?
f) If pipe metal failure or weld failure has occurred, was analysis done by outside
laboratories?
g) Is X-ray inspection apparatus available; can plant maintenance personnel interpret
X-rays?
6 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
4. Instrumentation
a) Are trip circuits tested on a regular basis?
1. Are procedures prepared for this work?
2. Is there a sign-off list for these tests?
3. Are operators doing a functional test after each trip to verify system
availability?
4. Are bypass switches provided for testing?
5. Are these bypass switches accessible for all personnel or are they locked in a
cabinet, with special personnel responsible for keys?
b) Are instruments zero checked or calibrated on a routine basis, or are they checked
when reason for accuracy or doubt exists?
c) Is an instrument technician available on a 24-hour-per-day basis?
1. Are instrument technicians on call (Is a roster of personnel available)?
2. Are instrument technicians' skills upgraded on a routine basis through special
training or other means?
5. Pumps and compressors
a) Are records kept to trace frequency of failure of seals and other parts? Do records
include exact description of spares used, mechanics who did job, and other job
specifics?
b) Are compressors or other large, nonspared machinery inspected on routine basis
(such as during turnaround), or is maintenance based on problem observation?
1. Is large rotating machinery fitted with vibration-analysis equipment?
2. Is portable vibration equipment available for spot-checks?
3. Is vibration spot-checking done on a regular basis?
4. Was large rotating machinery voice-printed for vibration at initial plant
startup?
c) Is major overhaul performed by plant maintenance, or are vendors' representatives
called in?
1. Is this work done by an outside contractor or shop?
2. What is experience with outside shop work, if any?
CHECKLIST E—SAFETY
1. Are procedures available and used when isolating equipment for maintenance?
2. Is Safety Department responsible for work order signature, or is this done by operations
or maintenance personnel?
3. Are blind lists made for each isolation job, who keeps them, and who checks that all are
installed or removed?
4. Is safety and life-saving equipment inspected on a regular basis, and who is responsible
for this work?
5. Are operators and maintenance personnel instructed and trained in firefighting and
first-aid procedures?
6. Are plant personnel trained to respond to major emergency situations?
7. What is the level of firefighting equipment or capability in the plant? Is outside backup
available?
8. Is emergency medical treatment available at all times?
9. Is an automatic gas or vapor detection system installed showing location and alarm
point in control room?
10. Is the fire water system tested on a regular basis?
11. Are steam or water curtains provided for critical equipment and areas?
12. Are automatic fire-extinguishing systems installed (Halon, CO, Foam, etc.)?
7 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
13. Is the control room located and built to withstand certain fire and explosion hazards?
14. Are remotely operated emergency shutoff valves provided? If so, are these tested on a
regular basis?
15. Are air packs provided; if so, what is their location and who tests and refills these?
What are site rules regarding personnel with beards?
16. How are vessels checked before entering? What nitrogen safety procedures are used?
17. How are vessels freed of hydrocarbons and mercury before entering? How are they
checked?
18. Is safety consciousness emphasized?
19. Are good safety records rewarded in any way?
20. Is a safety committee established in the Operations Department? In the Maintenance
Department?
21. Are standard operating procedures reviewed for safety hazards? Who reviews them?
22. Is the Safety Department entitled to enforce housekeeping?
23. Which department is responsible for gate perimeter security?
24. Is all safety equipment checked on a regular basis for proper function? Who signs off?
25. Is safety shoe and eyeglass protection mandatory?
26. Are lines marked for contents (acid, caustic substances, etc.)? Are adequate safety
showers and eyewash facilities provided?
27. Is a safety training course in effect? How often does it convene, who takes part, who
teaches it? How many hours per month are spent in training?
28. Are operating and maintenance techniques updated when new equipment is introduced?
29. Are motors, switch panels, ignition panels, and solenoids adequate for the electrical
area classification?
30. Is the integrity of electrical grounds maintained?
31. Are fire isolation considerations applied to curbs, drains, or sewer systems?
32. Are operating personnel instructed in purpose and functioning of mechanical safety
devices (e.g., tank breathers, overspeed protective devices, float switches, trip
systems)?
33. Are charts available identifying every chemical or compound being used in the plant,
and are toxicity and first-aid measures described?
34. Are ignition sources (switchgear, smoking areas, workshops, etc.) close to the boundary
of a hazardous area?
CHECKLIST F—HAZOPS
1. Is there a hazards and operability study available for facilities?
2. Is each piece of equipment protected against overpressure caused by operational
upsets?
3. Is each piece of equipment protected against overpressure caused by fire?
4. What coincidental conditions is the flare system designed for?
5. Can PSVs be taken out of service when the plant is on-line?
6. Have any modifications been made since the plant was built? If so, how are the
modifications documented? Is the HAZOPS study updated? Are as-built drawings
updated?
7. Is it possible to overpressurize atmospheric storage tanks by
a) Loss of liquid level in vessel feeding tanks?
b) High vapor pressure material being sent to tanks?
8. Are trip circuits normally energized or normally deenergized?
9. How are trip circuits tested, and how often?
10. What are consequences of trip failure?
8 RE A D I N G 3 .1 HAZ AR D
I D E N T I FI C AT I O N
C HE C KLI S T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
11. What are the consequences of temporary fuel gas failure? Can gas be restored to a hot
furnace?
12. Is rotating machinery protected against backspin when a relief valve blows?
13. Is the flare system protected against liquid entrainment?
14. What is the design velocity at flare tip?
15. What is the radiation level at the edge of the flare field? Is the flare field fenced off?
16. What is the location of the oily sewer relative to forced draft fans and other combustion
sources?
17. Are combustible gas detectors installed at all combustion sources?
18. What trips are bypassed in day-to-day operation? How are they documented?
19. How does the plant operate compared to design:
a) Closer to PSV settings?
b) Higher throughout?
c) Colder?
d) Hotter?
e) Lower voltage?
f) High cooling water?
Source: Extract from Chapter 3 'Checklist Reviews' in Greenberg, H.R. & Cramer, J.J.,
Risk Assessment and Risk Management for the Chemical Process Industry,
Van Nostrand Reinhold, New York, 1991: 33–47.
RE A D I N G 3 .2
SOFTWARE FMEA TECHNIQUES
PETER L. GODDARD
SUMMARY AND CONCLUSIONS
Assessing the safety characteristics of software driven safety critical systems is problematic.
Methods to allow assessment of the behavior of processing systems have appeared in the
literature, but provide incomplete system safety evaluation. Assessing the safety
characteristics of small embedded processing platforms performing control functions has
been particularly difficult. The use of fault tolerant, diverse, processing platforms has been
one approach taken to compensate for the lack of assurance of safe operation of single
embedded processing platforms. This approach raises cost and, in at least some cases
where a safe state can be demonstrated, is unnecessary. Over the past decade, the author
has performed software FMEA on embedded automotive platforms for brakes, throttle, and
steering with promising results. Use of software FMEA at a system and a detailed level has
allowed visibility of software and hardware architectural approaches which assure safety of
operation while minimizing the cost of safety critical embedded processor designs.
Software FMEA has been referred to in the technical literature for more than fifteen years.
Additionally, software FMEA has been recommended for evaluating critical systems in
some standards, notably draft IEC 61508. Software FMEA is also provided for in the
current drafts of SAE ARP 5580. However, techniques for applying software FMEA to
systems during their design have been largely missing from the literature. Software FMEA
has been applied to the assessment of safety critical real-time control systems embedded in
military and automotive products over the last decade. The paper is a follow on to and
provides significant expansion to the software FMEA techniques originally described in the
1993 RAMS paper "Validating The Safety Of Real-Time Control Systems Using FMEA".
1. INTRODUCTION
Failure Modes and Effects Analysis, FMEA, is a traditional reliability and safety analysis
techniques which has enjoyed extensive application to diverse products over several
decades. Application of FMEA to software has been somewhat problematic and is less
common than hardware and system FMEAs. Software FMEA has appeared in the literature
as early as 1983. However, the number of papers dedicated to software FMEA has
remained small and the number of those which provide descriptions of the exact
methodology to be employed have been few. This paper provides a summary overview of
two types of software FMEA which have been used in the assessment of embedded control
systems for the past decade: system software FMEA and detailed software FMEA. The
techniques discussed are an expansion and refinement of those presented in reference 1.
System level software FMEA, which was not discussed in reference 1, can be used to
evaluate the effectiveness of the software architecture in ensuring safe operation without the
large labor requirements of detailed software FMEA analysis. The FMEA techniques
described in this paper are consistent with the recommendations of SAE ARP 5580,
reference 2.
2 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
2. SOFTWARE FMEA
2.1 Software FMEA application
Software FMEA can be applied to diverse system designs, allowing the analysis to identify
potential design weaknesses and allowing design improvements to be recommended.
System level software FMEAs can be performed early in the software design process,
allowing safety assessment of the chosen software architecture at a time when changes to
the software architecture can be made cost effectively. System level software FMEA is
based on the top level software design: the functional partitioning of the software design
into CSCIs, CSCs, and modules. Detailed software FMEA is applied late in the design
process, once at least pseudo code for the software modules is available. Detailed software
FMEA is used to verify that the protection which was intended in the top level design and
assessed using system level software FMEA has been achieved. Both system and detailed
software FMEAs evaluate the effectiveness of the designed in software protections in
preventing hazardous system behavior under conditions of failure. Software failure can be
the result of errors in software design being expressed due to the specific environmental
exposure of the software or of transient or permanent hardware failures. The exact cause of
the failure is comparatively unimportant to the analysis results. Software FMEA assesses
the ability of the system design, as expressed through its software design, to react in a
predictable manner to ensure system safety.
The techniques of system and detailed software FMEA have been used extensively on
embedded control systems. Specific applications have included braking, throttle, and
steering for automotive applications. Each of these systems has the potential for safety
critical failures occurrences. These systems have also had defined safe states which the
control system was driven to in cases of failures. However, application of software FMEA
techniques, particularly system level software FMEA techniques, does not appear to be
limited to systems with safe states. The methodology can be applied to redundant systems
to assess the ability of the software and hardware to achieve a known state under conditions
of hardware and software failure, allowing redundant elements to effect system recovery.
Detailed FMEA may also be required for fault tolerant control processing depending on the
hardware protection provided.
2.2 Architectural considerations
The software FMEA techniques described in the remainder of this paper were developed in
response to a need to validate hardware and software designs for embedded control
platforms. These embedded control platforms have several unique characteristics which
help make software FMEA a valued technique for assessing effectiveness of their safety
design.
A typical, and much simplified, hardware architecture for an embedded control system is
shown in Figure 1. The basic hardware architecture provides for input from a variety of
sensors and output of control signals to various control elements such as motors, valves, etc.
In modern embedded control systems, the physical hardware is often simplified through the
use of highly integrated controllers which include a microprocessor, A to D and D to A
conversion capability, multiplexing, and specialized control and communications circuitry
on board a single integrated circuit. This can result in the peripheral circuits being limited
to those needed to buffer incoming signals to protect the microcontroller and amplifying
and providing current sources for output control signals. These highly integrated
microcontroller integrated circuits typically have minimal or no memory, internal
3 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
communications, or processor integrity protection. Thus, analysis methods which assess
hardware and software failure effects must include the effects of memory, processing
integrity, and communications failures.
Figure 1: Hardware architecture
As shown in the non-italicized pseudo code of Figure 2, embedded control system software
follows a straightforward architecture: read sensors, calculate control values, output control
signals to actuators. The read-calculate-output loop is repeated endlessly for the control
being exercised. Failures of the software or the supporting hardware can result in either
incorrect control values, the result of which is detected by the system user, or no system
output due to a sufficiently incorrect fault response (e.g. execute no-ops to the end of
memory). For safety critical systems, the response of the system to plausible hardware and
software failures must be able to be determined prior to failure occurrence. The design
must leave the system in as safe a state as is plausible given the occurrence of failure. The
requirement for deterministic behavior under failure conditions results in a software
architecture which more closely approximates the complete pseudo code of Figure 2:
perform self checks, read sensors, validate sensor values, calculate control values, validate
control values, validate output hardware condition, enable hardware outputs if output
hardware correct, output control to actuators if all checks pass else return to safe state. The
technique of continually validating the correctness of the supporting hardware, along with
checks to ensure that software has executed the expected routines in the correct order is the
minimum necessary for embedded safety critical control systems. Additionally, functional
redundancy, implemented in the software through the use of diverse control calculation
algorithms and variables is sometimes needed.
4 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 2: Control system software architecture
Program Control
begin
sys_valid: = test_all_control_hw();
initialize;
done: = false;
while ((not done) and sys_valid)
begin
read_sensors();
sys_valid: = sys_valid and validate_sensor_values();
calculate_control_values();
sys_valid: = sys_valid and validate_control_values();
sys_valid: = sys_valid and validate_output_hardware();
if(sys_valid)
enable_output_hardware();
output_control_signals();
sys_valid: = sys_valid and test_critical_hardware();
end;
set_system_to_safe_state();
end.
2.3 Software hazard analysis
Unlike hardware and system FMEAs, a software FMEA cannot easily be used to identify
system level hazards. Since software is a logical construct, instead of a physical entity,
hazards must be identified and translated into software terms prior to the analysis. Prior to
beginning the development of a software FMEA, a system preliminary hazard analysis
(PHA) for the system should exist. The PHA needs to include all the hazards which can
have software as a potential cause. The first step in developing a software FMEA is to
translate potential system hazards with possible software causes into an equivalent set of
system and software states through the process of software hazard analysis. To perform a
software hazard analysis, the analyst begins with each hazard identified in the PHA and
performs a fault tree analysis of the potential causes of the hazard. For each potential
hazard and potential hazard cause which could be the result of software failures, the analyst
must extend the fault trees through the system hardware and software until a sensible set of
software input and output variable values is identified. The value set associated with each
hazard cause is then identified as a software hazard. Figure 3 shows the form of the output
table which results from the software hazard analysis and which is used to determine the
criticality of the result of any software failures.
5 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 3: Software hazard analysis results
Critical software variables
Variable 1 Variable 2 — Variable n
Hazard 1 Cause 1 Value Value — Value
Cause 2 Value Value — Value ●
● ●
● ● ●
● ● ●
—
● ● ●
Cause n Value Value — Value
Hazard 2 Cause 1 Value Value — Value
Cause 2 Value Value — Value ●
● ●
● ● ●
● ● ●
—
● ● ●
Cause n Value Value — Value ● ● ●
● ● ●
● ● ●
● ● ●
—
● ● ●
Hazard n Cause 1 Value Value — Value
Cause 2 Value Value — Value ●
● ●
● ● ●
● ● ●
—
● ● ●
Cause n Value Value — Value
2.4 Software safety requirements
One of the crucial elements of any safety program for a software intensive system is the
development of software requirements to guide the design team in the creation of the
software architecture and implementation which includes all the features needed to support
safety critical processing. The existence and understanding of these requirements by both
the safety and software design groups is crucial to achieving a system design which is
adequate for the intended application, and allows the software design group to understand
the results of and recommendations from the software FMEA. Safety requirements,
appropriate for critical software, can be found in several published sources (references 3–8).
A compendium of requirements selected from these sources and tailored for the specific
application should be released early in the software design process, ideally prior to the start
of top level software design. Discussions of FMEA findings can then be organized to relate
to achievement of the previously identified requirements, significantly simplifying the
communications process between safety and software engineering.
In addition to requirements imposed directly on the software design, safety requirements
will need to be imposed on the software development and execution environments and on
development tools. The safety analyst needs to ensure that requirements are imposed which
ensure that the behavior of the software is consistent with that expected by the software
developer and the analyst. One of the critical elements of the software design which needs
to be controlled is the language which is used for software development and the compiler
for that language. Compilers which have been carefully tested to the language specification
and certified for accuracy of the compiled code must be used in the development of safety
critical software if analysis based on the high order language listings for the compiled code
is to have validity. Use of the language itself also needs to be limited to those features
which are fully defined by the language specifications. Elements of a language whose
behavior has been left to the compiler designer to decide should be avoided. A good
discussion of the needed controls for the language 'C' can be found in reference 9. The
software safety requirements. must also specify that indeterminate behavior of the compiler
be avoided. Features such as optimization, which can produce indeterminate results in the
6 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
final object code, must be specified as being disabled. Any operating system or scheduler
intended for use with safety critical software also needs to be carefully selected. The
executive functions provided by the operating system or scheduler can significantly impact
the ability of the developed software to provide the intended level of safety. Requirements
which specify the use of a safety certified executive as a part of the software are appropriate
if a software FMEA is to have validity.
2.5 System software FMEA
System software FMEA should be performed as early in the design process as possible to
minimize the impact of design recommendations resulting from the analysis. The analysis
may need to be updated periodically as the top level software design progresses, with the
final system software FMEA update occurring during detailed design, in parallel with the
detailed software FMEA. The organization performing the system level software FMEA
needs to balance the update periodicity and expected benefits with the associated costs.
Labor costs for system level software FMEAs are modest and allow identification of
software improvements during a cost effective part of the design process.
Once the software design team has developed an initial architecture and has allocated
functional requirements to the software elements of the design, a system software FMEA
can be performed. The intent of the analysis is to assess the ability of the software
architecture to provide protection from the effects of software and hardware failures. The
software elements are treated as black boxes which contain unknown software code, but
which implement the requirements assigned to the element. The failure modes which are
used to assess the protection provided by each software element are shown in Figure 4. The
failure modes to be applied to each software element include: failure of the software
element to execute, incomplete execution of the software element, incorrect functional
result produced, and incorrect execution timing. Additional 'black box' failure modes may
need to be added which are specific to the intended software application. Failure of the
software to execute and incomplete execution are particularly important to real time
systems. The potential for 'aging' of data in real time control systems must be carefully
evaluated. In addition to the failure modes for each software element, the analyst must
evaluate the ability of the software design to protect against system failures in hardware and
software. As shown in Figure 4, the system level software failure modes evaluate the ability
of the system to provide protection against incorrect interrupt related behavior, resource
conflicts, and errors in the input sensor and output control circuits.
Figure 4: System level software failure modes
Fails to execute
Executes incompletely
Output incorrect
Element Failure Modes
Incorrect timing—too early, too late, slow, etc.
Input value incorrect (logically complete set)
Output value corrupted (logically complete set)
Blocked interrupt
Incorrect interrupt return (priority, failure to return)
Priority errors
System Failure Modes
Resource conflict (logically complete set)
7 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
To perform the system level software analysis, the analyst assesses the effect of the four
primary and any appropriate additional failure modes for each element on the software. The
effect on the software outputs of the failure mode is then compared to the previously
performed software hazard analysis to identify potentially hazardous outcomes. If
hazardous software failure events are identified, the analyst then needs to identify the
previously defined software safety requirement which has not be adequately implemented in
the design. If the potentially hazardous failure mode cannot be traced to an existing
requirement, the analyst needs to develop additional software requirements which mandate
the needed protection. In addition to the failure modes for each software element, the
analyst assesses the effect of each of the system level software failure modes on the
software outputs and compares the effects against the software hazards and software safety
requirements.
The system level software FMEA should be documented in a tabular format similar to that
used for hardware FMEAs. Tabular FMEA documentation techniques are well developed
in most organizations and familiar to the design engineering staff. Tabular documentation
techniques also allow extensive, free form, commentary to be provided as a part of the
failure effect documentation. The ability to provide extended commentary on the software
design and design requirements is crucial to allowing software engineers to understand the
FMEA results and the needed design changes. In many organizations, software engineers
can only respond effectively to requirements based presentation of results.
2.6 Detailed software FMEA
Detailed software FMEA is used to validate that the implemented software design does
achieve the safety requirements which have been specified for the design, providing all
needed system protection. Detailed software FMEA is similar to component level hardware
FMEA. The analysis is lengthy and labor intensive. The results are not available until late
in the design process. Thus, detailed software FMEAs are mostly appropriate for critical
systems with minimal or no hardware protection of memory, processing results, or
communications. For large systems with hardware provided protection against memory,
bus, and processing errors, detailed software FMEA may be difficult to economically
justify.
Detailed software FMEA requires that a software design and an expression of that design in
at least pseudo code exist. Implicit in this requirement is the existence of software
requirements documentation, top level design descriptions, and detailed design descriptions.
Final implemented code may not be necessary if the software elements are described in
pseudo code and the software development process provides adequate assurance that the
implemented design matches the pseudo code description of the detailed design
documentation. To perform the analysis, the analyst postulates failure modes for each
variable and each algorithm implemented in each software element. The analyst then traces
the effect of the postulated failure through the code and to the output signals. The resultant
software state is then compared to the defined software hazards to allow identification of
potentially hazardous failures.
If the software hazard analysis has previously been completed to support system level
software FMEA, the first step in the detailed software FMEA is development of a variable
mapping. The analyst will need to develop, or have produced by automated software
development tools, a mapping which shows which variables are used by each software
module and whether the variable is an input variable, an output variable, a local variable, or
a global variable. As a part of the variable mapping, the analyst needs to clearly identify the
8 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
source of each input variable and the destination(s) of each output variable. This mapping
will be used to allow the analyst to trace postulated failures from the originating location to
the output variable set.
Once the variable map is complete, the analyst should develop software 'threads' for the
processing being analyzed. The software threads are mappings from an input set of
variables through the various processing stages to the system output variables. The
software threads will assist the analyst in rapidly tracing postulated failures to system
variables and effects. Definition of the software 'threads' will often be available from the
software design team through existing design documentation or as a defined output of the
automated design tools being used by the design team.
To perform the detailed software FMEA, the analyst next needs to develop failure modes
for the processing algorithms as they are implemented in each module. The algorithm
failure modes are unique to each software development. A logically complete set of failure
modes for each of the variable types also needs to be developed. Reference 1 provides a
description of the straightforward process used to develop variable failure modes for simple
variable types: boolean, enumerated, real, integer. Development of a logically complete set
of variable failure modes for more complex variables will need to be done based on the
specifics of the language in use and the compiler implementation. Since the primary
purpose of postulating failure of each variable is to assess the impact of memory failures in
processing platforms which do not have effective memory protection, a detailed knowledge
of the underlying storage scheme is required. For high order languages, it may be necessary
to obtain the needed implementation details from the developer of the compiler and from
the language specification.
Once the variable and algorithm failure modes have been developed, the analyst can
perform the detailed software FMEA. For each module, algorithm failures are postulated,
the effect traced to the module outputs and in turn to the software system output variables
using the software threads and the variable map. The system variable effects are then
compared against the software hazard analysis to determine whether or not the postulated
failure could lead to a system hazard. The analyst then postulates failures for each of the
variables used in the module and traces the effects to the system outputs and the defined
software hazards in a similar manner. The detailed software FMEA process is analogous to
the component level hardware FMEA process except that variables and the variable map
substitute for the signals and signal paths of electronic hardware.
If the detailed FMEA identifies failure modes which trace to the defined software hazards,
the analyst needs to assess which software safety requirements have not been implemented
correctly, or if one or more requirements are missing. Similar to system level software
FMEA, the most effective way to communicate software design deficiencies is through
identification of those requirements which have not been met.
Documentation of the detailed software FMEA can be either tabular or using the matrix
documentation recommended in reference 1. Matrix documentation provides some
desirable compactness for detailed software FMEA. However, tabular documentation is
more familiar to most design groups and allows extensive commentary to be included. The
choice of documentation style can be left to the preference of the individual analyst or
analysis team.
9 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
2.7 Analysis limitations
Software FMEA can provide insight into the behavior of safety critical software intensive
systems, particularly embedded control systems. However, as with all FMEAs, the analysis
cannot provide complete system safety certification. Software FMEA examines the
behavior of the system being analyzed under conditions of software single point failure. In
many cases, the assumption of single point failures may be difficult to fully justify. Many
software failures can be induced by failures in the underlying hardware. For systems with
minimal memory protection, failures in the memory hardware can appear as errors in
variable storage values which can propagate errors through the software into the output
variables and subsequently to system behavior. Single point memory failure assumptions
can be appropriate for processing memory which has been carefully architected to preclude
multiple errors, but may not be safe to generally assume unless the implementation of the
storage is known. The implementation details for memory circuitry for highly integrated
microprocessors and microcontrollers is likely to be proprietary to the device manufacturer
and unknown to the analyst.
Software FMEA does not provide evaluation of the behavior of a software intensive system
under conditions of unfailed operation. For many control systems, the stability of the
control loop is a crucial parameter in determining safety of operation. Simulation and
modeling are appropriate tools for evaluating control stability. FMEA cannot provide the
needed evaluation of control loop stability under either normal or failed operation.
Similarly, software FMEA provides limited insight into the safety risks associated with
changes in timing due to either software or hardware failures. Timing and sizing analysis
for worst case interrupt arrivals and resource demands may be needed to provide insight
into the effects of some failures postulated during the software FMEA.
3. CONCLUSIONS
Software FMEA has been applied to a series of both military and automotive embedded
control systems with positive results. Potential hazards have been uncovered which were
not able to be identified by any other analytical approach, allowing design corrections to be
implemented. Additionally, system level software FMEA can be applied early in the design
process, allowing cost effective design corrections to be developed. System software
FMEA appears to be valuable for both small embedded systems and large software designs,
and should be cost effective so long as a mature software design process—one which can
provide needed software design information in a timely manner—is in use. Detailed
software FMEA is appropriate for systems with limited hardware integrity, but may not be
cost effective for systems with adequate hardware protections. For designs with limited
hardware integrity, detailed software FMEA provides an effective analysis tool for verifying
the integrity of the software safety design.
10 RE A D I N G 3 .2 SO FT W AR E FMEA
T E C HN I Q U E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
4. REFERENCES
1. Goddard, P. L., "Validating The Safety Of Real Time Control Systems Using FMEA',
Proceedings of the Annual Reliability and Maintainability Symposium, January 1993.
2. SAE Aerospace Recommended Practice ARP-5580, Recommended Practices For
FMEA, Draft Version, June 1999.
3. Underwriters Laboratory Standard UL-1998, Standard For Safety: Safety Related
Software, First Edition, January 1994.
4. NATO Standardization Agreement STANAG 4404, Safety Design Requirements And
Guidelines For Munition Related Safety Critical Computing Systems, Edition 1.
5. United States Air Force System Safety Handbook SSH1-1, Software System Safety,
5 September 1985.
6. Electronic Industries Association Bulletin SEB6-A, System Safety Engineering In
Software Development, April 1990.
7. Leveson, N. G., Safeware: System Safety And Computers, ISBN 0-201-11972-2, 1995.
8. Deutsch, M. and Willis, R., Software Quality Engineering, ISBN 0-13-823204-0,
1988.
9. Hatton, L., Safer C, ISBN 0-07-707640-0, 1994.
5. BIOGRAPHY
Pete Goddard is currently employed as a Senior Principal Engineer with the Raytheon
Consulting Group in Troy, Michigan. He holds a bachelors degree in Mathematics from
the University of Laverne, and a masters degree in Computer Science from West Coast
University. Mr. Goddard has published papers in the proceedings of the Annual
International Logistics Symposium, the RAMS Symposium, the AIAA Computers in
Aerospace Symposium, and the INCOSE Symposium. He was the principal investigator for
the 1984 Rome Labs sponsored "Automated FMEA Techniques" research study and was
program manager and part of the research team for the 1991 Rome Labs sponsored
"Reliability Techniques For Combined Hardware And Software Systems" research study.
He is a co-author of "Reliability Techniques for Software Intensive Systems". Mr. Goddard
is an active member of the SAE G-11 Division and is part of the subcommittee on FMEA in
the G-11. He is a member of IEEE and an ASQ member and CRE.
Source: 2000 Proceedings Annual Reliability and Maintainability Symposium, IEEE:
118–123.
RE A D I N G 3 .3
HAZARD AND OPERABILITY (HAZOP) STUDIES APPLIED TO COMPUTER-CONTROLLED PROCESS PLANTS
PAUL CHUNG & EAMON BROOMFIELD
'There is a strong family resemblance about misdeeds, and if you have all the details of a
thousand at your finger ends, it is odd if you can't unravel the thousand and first.'
Sherlock Holmes in A Study in Scarlet by Arthur Conan Doyle
1. INTRODUCTION
Due to the speed and flexibility of computers, there is an increasing use of software in
industry to control or manage systems that are safety-critical. In some cases, as systems
become more and more complex, and faster and faster response time is required, the use of
computer and application software is the only feasible approach. In this chapter a safety-
critical system refers to a system which, if it malfunctions, may cause injury to people, loss
of life or serious damage to property. To ensure the quality of safety-critical systems with
software components, standards and guidelines have been, or are being, produced by
government and professional organizations.
The guidance generally given is that software quality is achieved through rigorous
management of the software life cycle which involves requirement analysis, specification,
design, implementation, testing, verification and validation. Safety assessment is a new
dimension which needs to be added to the life cycle of safety-critical software. For
example, the draft Defence Standard 00–56: Safety Management Requirements for Defence
Systems Containing Programmable Electronics states that, 'The contractor shall identify
hazards and their associated accident sequences, calculate safety targets for each hazard and
assess the system to determine whether the safety targets have been met'. Although safety
assessment has been accepted as an important part of the software life cycle, little help is
given to engineers about when and how to do it. Safety assessment involves two different
activities: hazard identification and hazard analysis. The aim of the former is to identify the
potential hazards that may arise from the use of a particular safety-critical system, and their
possible causes. The aim of the latter is to quantify the risks that are associated with the
identified hazards and to assess whether the risks are acceptable. The focus of this chapter
is on hazard identification.
In the process industry, Hazop (hazard and operability studies) is a long-established
methodology used for identifying hazards in chemical plant design. Some attempts have
been made to modify conventional Hazop for computer-related systems. Modified versions
of Hazop are generally referred to as Chazop (computer Hazop) or PES (programmable
electronic systems) Hazop in the literature.
2 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
In this chapter we provide a brief description of the conventional Hazop as used in the
process industry and an overview of the different Chazop frameworks/guidelines suggested
by engineers and researchers over the past few years. The overview shows that there is as
yet no agreed format on how Chazop should be done and that the different approaches were
made ad hoc. The main emphasis of the rest of the chapter is on a new Chazop
methodology which we have systematically developed and which is based on incident
analysis. We discuss the strategy used to develop the methodology and illustrate the
application of the methodology using examples.
2. COMPUTER-RELATED HAZARDS
Hazards are sometimes caused by system failures, or by systems deviating from their
intended behaviour. System failures can be categorized into two classes:
random failures typically result from normal breakdown mechanisms in hardware; the
reliability based on failure rate can often be predicted in a quantified statistical manner
with reasonable accuracy;
systematic failures are all those failures which cause a system to fail, and which are not
due to random failures.
McDermid has pointed out that, 'software is quite different to hardware in that its only
"failure mode" is through design or implementation faults, rather than any form of physical
mechanism such as ageing'. Therefore, all software-induced system failures are systematic
failures. 'There is some evidence that as the level of complexity [of a system] increases the
proportion of systematic failures increases'.
However, a piece of software in itself is not hazardous. It is hazardous only when it
interacts with equipment that can cause injury to people, loss of life or damage to property.
Therefore safety-critical software should, as far as possible, be:
able to respond to external failures, hardware or human, in an appropriate manner.
This means that the design specification should have no omissions, and every
conceivable problem should be considered and dealt with accordingly;
free from error, so that it will not make any wrong decisions and cause wrong actions to
be taken.
An ideal hazard identification methodology, therefore, should be able to deal with system
design/specification, software implementation and maintenance.
3. HAZOP
Hazop is a methodology developed by ICI in the 1960s for reviewing chemical plant
designs. A Hazop team should consist of a leader who controls the discussion and members
from the production, technical and engineering departments. This is to ensure that the
required expertise for reviewing a particular design is present at the meeting. The team has
an engineering line diagram (ELD) in front of them and the general intention of the system
is explained. To help the team go through the design in a systematic manner, members
review the design section by section, or line by line. Guide words are used as prompts to
help them explore possible causes and consequences of deviations from design intent. For
example, the guide words include: none, more of and less of. The deviations associated
with the guide word none are no flow and reverse flow. The team then consider questions
such as What will cause no flow along this line? and What will cause low level in this tank?
If the cause of a particular deviation is credible and the consequence is believed to be
3 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
significant then a change is made to the design or method of operation, or the problem is
considered in detail outside the Hazop meeting. An action may specify that protective
equipment needs to be installed, or detailed analysis of the cause and consequence needs to
be carried out. Thus a Hazop meeting generates a report in the format shown in Table 2.1.
This conventional form of Hazop is carried out when the ELD of a design is completed.
However, delaying hazard studies until the ELD is available means that many major design
decisions will have been made and orders will have been placed. Therefore, changes made
at this stage can be very costly. For this reason ICI introduced two preliminary hazard
studies prior to the ELD stage (which is referred to as Study 3). The purpose of Study 1 is
to ensure 'that the hazardous properties of all the materials involved in the process and their
potential interactions are understood'. Study 2 is carried out when the process flow
diagrams are available. The sections making up the plant—for example, reaction,
scrubbing, distillation, etc—are studied in turn. The approach used is to consider 'top
events', potential hazardous events such as fire, explosion and so on, and to 'identify those
which present a serious hazard, so that an appropriate design can be developed'.
Table 2.1: Conventional Hazop table
Guide word Deviation Possible causes Consequences Action required
None No flow … … …
Reverse flow … … …
More More flow … … …
More pressure … … …
More
temperature
… … …
More level … … …
Less (similar to more) … … …
Part of Concentration … … …
Other Maintenance … … …
Start-up … … …
Shutdown … … …
Extra constituent
or phase
… … …
… … … …
ICI later added Hazard Studies 4 to 6. Prior to plant start-up, Study 4 is done by the plant
or commissioning manager to check that all actions from previous studies have been carried
out and to review that appropriate procedures for operating the plant are in place. Study 5
involves a site inspection, paying particular attention to means of access and escape,
guarding, provision of emergency equipment, etc. Study 6 reviews changes made during
commissioning of the plant.
An earlier study (Hazard Study 0) is now being introduced. It is carried out at the start of a
project, before the engineering design department is involved, and asks if the right product
is being made by the most suitable route and in the most suitable location.
Two related hazard identification techniques—FMEA (Failure Modes and Effects Analysis)
and FMECA (Failure Modes Effects and Criticality Analysis)—will also be referred to later
in this chapter. In contrast to Hazop, FMEA and FMECA represent a 'bottom up' approach
4 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
to hazard identification. They start by focusing on a component and then address the
questions:
what are the modes of failure (that is, what equipment can fail and in which way)?
what are the causes of the failures?
what are the consequences?
FMECA goes further then FMEA by considering the questions 'How 'critical are the
consequences?' and 'How often does the failure occur?'
4. COMPUTER HAZOP
As mentioned earlier, because of the successful application and widespread use of Hazop in
the process industry, researchers and engineers are suggesting ways of adapting Hazop to
safety-critical systems. This section describes the results of some of these adaptations of
Hazop. The description is brief. It highlights the different guide words and questions
proposed under different schemes to assist the hazard identification process during Chazop
meetings. Interested readers should refer to the original articles referenced throughout the
section. A general discussion about the different schemes is given at the end of the section.
4.1 Scheme 1
An obvious way of developing a Chazop methodology is to simply replace or supplement
the process-related guide words and deviations with computer-related ones. Burns and
Pitblado have identified two sets of guide words for reviewing computer control systems.
One set is for considering the hardware and logic of the system (see Table 2.2), and the
other is for considering human factors (see Table 2.3).
Table 2.2: PES Hazop guide words and deviations (after Burns and Pitblado)
Guide word Deviation
No No signal
No action
More More signal
More action
Less Less signal
Less action
Wrong Wrong signal
Wrong action
The draft guideline for Chazop produced by the UK Ministry of Defence extends the list of
guide words associated with conventional Hazop with the following words: early, late,
before and after. The words early and late are for considering actions or events relative to
time and the words before and after are for considering the ordering of actions or events.
5 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 2.3: Human factors Hazop guide words and deviations (after Burns and Pitblado)
Guide word Deviation
No No information
No action
More More information
Less Less information
Wrong Wrong action
During a Chazop meeting a team will go through a diagrammatic representation of a system
by considering all the links between different components on the diagram. Possible
deviations from design intent are investigated by systematically applying the guide words to
attributes such as dataflow, control flow, data rate, data value, event, action, repetition
time, response time and encoding.
Not all combinations of guide words and attributes are meaningful. The guideline
recommends that 'inappropriate guide words should be removed from the study list during
the planning stage' and 'the interpretations of all attribute/guide word combinations should
be defined and documented by the study leader'. At the discretion of the study leader, new
guide words may also be added.
Fink et al have devised a set of application-specific guide words and deviations. The
application is a clinical laboratory information system where patient test details are kept
Access to the system is provided via computer terminals, and it is interfaced to computers
which control large capacity analysers (7000 tests/hr). Patient information, including patient
identity and test request code, is entered into the system and sent to the analysers. Each
sample tube also has a label identifying the patient from whom the sample was drawn.
The guide words used for the Chazop of this system were: no, not, more, less, as well as,
part of, other than, sooner, later, where else, interrupt, reverse, more often and less often.
Example deviation for the guide word no are no label and no operating. Chazop was used
to consider complex and interrelated procedures. A complementary technique, FMECA,
was used to consider individual component failures.
4.2 Scheme 2
In developing guidelines for carrying out Chazop on computer-controlled plants, Andow's
approach is that a Chazop methodology should have the essential ingredients of the
'traditional' Hazop but need not stick rigidly to the format. The ingredients identified as
essential are:
interdisciplinary team must carry out the study,
the methodology must be based on questions;
the methodology must be systematic.
6 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Andow suggests that Chazop should be done in two stages: preliminary and full. The
purpose of a preliminary Chazop is to identify early in design critical factors that influence
the overall architecture and functionality of the system; it should be carried out as part of an
early Hazop. He recommends that the following be considered at the early stage:
the proposed architecture of the system;
safety-related functions;
system failure;
failure of power and other services.
The full Chazop is to evaluate the design in detail at a later stage. The team should consider
three different aspects of the system:
computer system/environment;
input/output (I/O) signals;
complex control schemes.
A short list of headings and/or questions is provided for each aspect (see Tables 2.4, 2.5 and
2.6).
4.3 Scheme 3
Lear suggests a Chazop scheme for computer control systems which is similar to Andow's
full Chazop. In Lear's scheme the three top level concerns are:
hardware;
continuous control;
sequence control.
In this scheme guide words used for hardware include short- and long- term power supply
failure. It also suggests using the check-list published by the UK Health and Safety
Executive. Examples of guide words/questions relating to continuous control and sequence
control are shown in Tables 2.7 and 2.8.
Table 2.4: Headings and questions relating to computer system/environment (after Andow)
Failure Hardware Question
Gross Whole machine What should happen?
Will the operator know?
What should he do?
Will the failure propagate to
other machines?
Any changes needed?
Random Cabinets, crates, etc (similar to whole machine)
Controller, I/O cards (similar to whole machine)
Communication links (similar to whole machine)
Operator consoles (similar to whole machine)
Power supplies (similar to whole machine)
Watchdog timers (similar to whole machine)
Other utilities (similar to whole machine)
7 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 2.5: Headings and questions relating to input/output signals (after Andow)
Signal/actuator Deviation Question
Signal Low Does it matter?
Will the operator know?
Any action required by the
operator or other systems?
Any changes needed?
High (similar to deviation low)
Invariant (similar to deviation low)
Drifting (similar to deviation low)
Bad (similar to deviation low)
Actuator Driven/failure high (similar to signal deviation low)
Driven/failure low (similar to signal deviation low)
Stuck (similar to signal deviation low)
Drifting (similar to signal deviation low)
Table 2.6: Considerations relating to complex control schemes (after Andow)
Scheme consideration Aspects to be considered
Purpose and method of operation Safety-related functions
I/O signals used
Points of operator access Set-points, cascades that may be
made or broken, etc
Limits applied Careful use of limits gives a good
safeguard and/or early warning
Interaction with other schemes Start-up, normal operation, shutdown.
Synchronization and timing issues.
Expected or required operator
actions.
Controller tuning Initialization and wind-up
Relationships with trips and alarms
Action in the event of major plant
upsets
Loss of utilities. Spurious or correct
operation of emergency shutdown
valves.
Protection against unauthorized
modifications
Other Spreading a large scheme over more
than one controller file
4.4 Scheme 4
The Chazop framework used by Nimmo et al for reviewing process plant design also
highlighted three aspects for consideration:
hardware;
software interactions;
the effect software has on the process.
8 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
In this scheme, the first stage is to carry out a conventional Hazop on a plant design;
treating the computer as a 'black box' (see Chapter 1, item 4.1, page 17). The next stage is
to re-trace the process route taking into account concerns from the first stage but this time
concentrating on determining how the software will respond under different circumstances.
The third stage is to consider how the software achieves its control actions. The software is
divided into major areas such as sequence control, continuous control, operator
conversations and data links. Key enquiries in the second and third stages revolve around
such questions as:
how will the computer know what it has to do or has already done?
how sensitive is the level of input or output to transmission of the correct action?
what are the potential interactions?
Table 2.7: Considerations for continuous control (after Lear)
System Consideration
Input/output parameters Bad measurement
Transmitter accuracy
Conditioning
Tuning parameters Correct?
Change in process conditions
Entire loop Control philosophy
Safety-related
Performance
Overall system Interaction
Order of tuning/implementation
Training
Table 2.8: Considerations for sequence control (after Lear)
Review stage Consideration
Overall operation Files/reports
What (de)activates the sequence?
Communications
Start-up module Is operator interaction required?
Any areas of critical timing?
Major equipment interactions
Running module (similar to start-up)
Shutdown module (similar to start-up)
Step (a small number of
flow chart symbols)
(similar to start-up)
Final overview Testing
Display of sequence to operator
Training
Nimmo also provides several lists of topics for discussion in a series of Chazop meetings.
The discussion topics are listed under the following headings: the overall plant, the safety
backup system, instrumentation and the PES.
9 RE A D I N G 3 .3 HAZ AR D AN D
O P E R AB I LI T Y
( HAZ O P ) S T U D I E S
AP P LI E D T O
C O M P U T E R -
C O N T R O LLE D
P R O C E S S P LAN T S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
4.5 DISCUSSION
Ideas on how Chazop should be done are still evolving. A consensus view that is emerging
is that a Chazop methodology requires a 'total' system view. Software cannot be considered
in isolation. The work by Burns and Pitblado emphasizes the need to assess the logic of the
system and also human factors; Fink et al couple Chazop with FMECA; the frameworks
suggested by the other authors also include hardware, software and the environment in
which they operate. The main strength of conventional Hazop is that it facilitates systematic exploratory
thinking. The use of guide words and deviations prompts the team to think of hazards
which they would otherwise have missed. However, up to now, attempts made by
researchers and engineers to create various Chazop 'schemes and to derive guide
words/headings and questions are rather ad hoc. Some guide words, headings or questions
are obvious as they appear in different, schemes. On the other hand, it is not clear why
some are included and why some are left out. It is difficult to assess the relative merits of
the different schemes as there is very little experience in applying them. The relevance of
various guide words or questions will only become evident through practical applications.
An overview of the above schemes shows that there are different methods of generating and
grouping guide words/deviations and questions. Scheme 1 follows very closely the format
of conventional Hazop. The procedure is based on selecting interconnections in the design
representation. However, it concentrates on identifying hazards rather than operability
problems. New guide words and computing-related attributes are proposed. It is
recognized that the combinations of some of the guide words/attributes may not be
meaningful or may be ambiguous. On the other hand, application-specific attributes are not
likely to be useful in general because safety-critical systems can be very varied. Schemes 2 and 3 group guide words and questions according to the general categories of
hardware, software, input/output and other considerations. This approach attempts to cover
the total system separately. It is very important, however, to understand and consider the
interactions between different system components in order to identify hazards in a complex
safety-critical system. This approach falls short in this respect. Scheme 4 makes a strong distinction between hardware and software. However, the
strength of this scheme is that the assessment procedure is geared towards understanding
how the computer will respond to a process deviation and how the computer will control
and affect the process. This scheme provides an interesting way of linking Chazop with
conventional Hazop for the process industry. The problem is that the Chazop scheme as
outlined cannot be applied in the early stages of the design process to identify any potential
problems. Instead of trying to synthesize a new scheme by merging different schemes or by modifying
a particular scheme, in the next section we consider the systematic development of a new
Chazop methodology based on incident analysis. Our aim is to develop a general Chazop
methodology that will apply to different industrial sectors. Past incidents provide us with a
wealth of information on what can go wrong with safety-critical systems. Our basic premise
is that this information can be organized to provide a structured framework for considering
future applications. Source: Kletz, T., Chung, P., Broomfield, E. & Shen-Orr, C. Computer Control and
Human Error, Institution of Chemical Engineers, Warwickshire, UK, 1995:
45–56.
References omitted.
RE A D I N G 3 .4
USING A MODIFIED HAZOP/FMEA METHODOLOGY FOR ASSESSING SYSTEM RISK
STEVEN R. TRAMMELL & BRETT J. DAVIS
1. REASONS TO USE RISK ASSESSMENT
Many regulatory programs and customer quality and environmental management
expectations have been the impetus for Motorola to institute risk management processes
utilizing both qualitative and quantitative risk assessment techniques. As briefly described
below, in some cases the regulator or customer has prescribed the risk assessment
techniques to be used for risk management, while in other cases there is leeway given to
select a risk assessment technique of choice.
Motorola's experience in the implementation of these risk management activities has
demonstrated the synergistic benefits from cross-functional risk assessments of process
designs and modifications.
Participation by environmental and safety compliance, operations, maintenance and
engineering functions allows for risks to be properly ranked and for agreement on
acceptable levels of residual risk. We have founded a risk assessment "core team" that
facilitates and keeps records of many of the required risk assessments as well as those
initiated by Motorola for process quality assurance and control. For these latter
assessments, the core team has developed a risk assessment technique that is tailored to
effective analysis of a wide range of our processes. The team also keeps the formal records
of risk assessments, ensuring the tracking of best practices and lessons learned.
2. REGULATORY REQUIRED RISK ASSESSMENTS
The United States Environmental Protection Agency's (EPA) Risk Management Program
(RMP) prescribes a risk assessment methodology for listed substances above an established
storage quantity threshold. Risk is determined by calculating the "populations potentially
affected" by worst and alternative case releases of gases and vapors. In this risk assessment,
risk is essentially equated to consequence alone. Likelihood is not quantified, but the
program attempts to reduce it by mandating the development of release prevention and
response plans.
The United States Occupational, Safety and Health Administration's (OSHA) Process
Safety Management (PSM) program requires risk assessments, known as hazard analyses,
for listed substances above an established storage quantity threshold. A variety of risk
assessment methodologies are identified as acceptable under the standard, including Hazop
and FMEA. In addition, the program calls for written procedures for management of
change. While Motorola does not have any above threshold processes for either the RMP
or PSM programs, we have accepted our responsibilities under the General Duty Clause of
2 RE A D I N G 3 .4 US I N G A
M O D I FI E D
HAZ O P / FM E A
M E T HO D O LO G Y
FO R AS S E S S I N G
S Y S T E M R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
the RMP program to perfonn risk assessments on a variety of hazardous chemicals and
wastes, stored in quantities below the RMP and PSM thresholds. OSHA's Voluntary
Protection Program requires Job Safety Analyses (JSA) be performed to ensure that safety
is considered in the development of operational procedures. At Motorola we perform JSAs
to identify hazards and develop procedures or physical system changes required to perform
tasks safely. JSAs are also used to comply with OSHA regulations (29 CFR 1910.132)
requiring employers to base selection of personal protective equipment on a hazard
assessment of the subject work process.
The Uniform Fire Code (UFC) allows the chief to authorize "alternate materials and
methods" that comply with the "intent of the code" (1997 UFC 103.1.2). The Austin Fire
Department (AFD) encourages the use of quantitative risk to compare the level of risk
provided by code compliant design and an alternative. Motorola has used Fault Tree
Analyses (FTA) to accomplish this comparison and successfully demonstrate that an
alternative design is safer than that prescribed by the UFC. AFD has recently implemented
a "distinct hazard" policy prohibiting bulk chemical storage operations that represent a risk
exceeding 1.4 x 10-6 exposed persons per year. This risk equates to the generally accepted
risk from underground storage at a gasoline station. The risk calculation is a function of
consequence determined using a gas dispersion model and population density, and
probability of component failure and fire, using established component failure rates and fire
rates based upon AFD experience. Motorola has developed a spreadsheet that allows an
assessment of whether or not any proposed bulk chemical system will be designated as a
distinct hazard, in which case risk reduction strategies are employed typically to reduce the
likelihood ofrelease.
3. CUSTOMER REQUIRED RISK ASSESSMENTS
ANSI/ISO 14001-1996 requires an annual analysis of potential impacts from
"environmental aspects" of an operation for the determination of environmental objectives.
At Motorola, ranking the impacts using a quantitative risk assessment methodology
prescribed in a Management Systems (MS) document enhances this analysis. Action items
are assigned to environmental staff to reduce the severity and/or likelihood of any impacts
above an acceptability threshold established in the MS document. In addition, formal and
informal processes are in place to identify pending process changes requiring risk
management.
Motorola's semiconductor manufacturing operations are required to be QS9000 certified by
our automotive industry customers. The QS system mandates management of change to
minimize impact to product quality. At Motorola, this objective is accomplished by
performing an FMEA risk assessment on all new or modified processes, including
environmental and safety systems.
4. MOTOROLA REQUIRED RISK ASSESSMENTS
Motorola requires that all semiconductor manufacturing equipment that it purchases be
compliant with Semiconductor Equipment and Materials International (SEMI) Safety
Guideline S2, Environmental, Health and Safety Guideline for Semiconductor
Manufacturing Equipment which establishes a risk assessment requirement for a variety of
hazards posed by such equipment. The technique to be used for these risk assessments, in
which hazards are ranked to determine which are acceptable and which require further
mitigation, is prescribed in SEMI S1O, Safety Guideline for Risk Assessment.
3 RE A D I N G 3 .4 US I N G A
M O D I FI E D
HAZ O P / FM E A
M E T HO D O LO G Y
FO R AS S E S S I N G
S Y S T E M R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
And finally, for quality assurance of new processes and quality control of process
modifications, Motorola has developed a hybridized Hazop and FMEA technique that is the
primary focus of this paper. The risk prioritization method developed for this technique
allows separate consideration of risks to human safety, the environment, facility or product
damage and business interruption. Because of this multiple functionality, this hybrid
Hazop/FMEA technique has been well accepted by the Environmental, Health and Safety,
Facilities Operations, Maintenance and Engineering, and Manufacturing Operations
functions. Process designs are no longer considered complete until a thorough
Hazop/FMEA has been performed.
5. DEVELOPMENT OF THE HAZOP + FMEA METHODOLOGY
The purpose of developing a risk assessment methodology is to provide a systematic
method to thoroughly review failure modes of complex, interacting system components, and
the effects of failures on the overall system. Required within the methodology is the
requirement and ability to review effects on safety of personnel, the facility and/or
infrastructure, and on the manufacturing process (ability to manufacture good product).
The addition of the business interruption review element was a logical evolution of the
methodology. Although the analysis method could be applied to individual EHS and
system reliability evaluation efforts, it is clearly evident that much commonality exists, both
in review team members and solution development when reviewing overall effects of failure
events. Accordingly, we realize significant efficiencies when combining EHS and
reliability assessments with regard to utilization of personnel resources.
6. METHODOLOGIES
Several risk assessment methodologies are used within Motorola. The Hazop and the
FMEA are most common, although Fault Tree Analysis has been used for specific
assessment efforts involving fire and building code alternative method submittals. Hazop
has historically been used as a general risk assessment technique on systems to evaluate
potential hazards mainly to personnel and the environment. This method is favored by
many of our design consultants because of its relative ease of use, ability to draw on diverse
expertise and proven track record in the chemical processing industry. Many of the risk
assessments performed by third party evaluators on purchased equipment or packaged
chemical delivery systems are of the Hazop type. The FMEA is the method of choice for
the Reliability and Quality Assurance (R&QA) organizations within Motorola. Although
used mainly for evaluations in the product design phase, process systems and some support
systems within the manufacturing envelope have also been subject to FMEA. The primary
driver for use of this methodology within R&QA is the requirements set by QS9000. All of
our automotive customers require Motorola to comply with the methods within QS9000,
including the requirement to systematically review a system for failure modes.1 Although
FMEA is not mandated, it is the method most preferred by the customer.
7. STRENGTHS AND WEAKNESSES
Hazop is a mature methodology, with system failure mode identification as its strength. By
dividing complex systems into smaller more manageable "nodes" for study, and the
systematic identification of process parameter deviations, makes for a thorough
identification of system failure modes. However, a typical Hazop is not strong or
necessarily effective in prioritization of effects of the failures. Also, a Hazop usually does
not study the relative effectiveness of identified corrective actions. On the other hand, the
4 RE A D I N G 3 .4 US I N G A
M O D I FI E D
HAZ O P / FM E A
M E T HO D O LO G Y
FO R AS S E S S I N G
S Y S T E M R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
QS9000 based FMEA method contains a thorough, semi-quantitative evaluation of effects
of failure modes. By studying and scoring based on severity, occurrence and detection
attributes, the team gains a thorough understanding of the failure mechanism, and more
importantly, insight on determining truly effective corrective actions. The FMEA method
also assists in prioritizing failure mode effects such that resources can be applied more
effectively. Conversely, the FMEA is relatively weak in failure mode identification, as it
does not provide a systematic method of evaluating system deviations (other than reviewing
every individual component and subcomponent of a system). This "bolt-by-bolt" approach
is extremely laborious and can become an extreme challenge to the long-term efficiency of
the study team.
8. HAZOP+FMEA
Historically, certain groups within Motorola's Environmental Health and Safety (EHS) and
Facilities organizations have used both Hazop and FMEA methods with varying degrees of
success. As EHS moved towards a risk-based approach for decision making and as the
importance of facility support systems' reliability grew, both organizations were looking for
techniques that would improve the quality of these studies. It was also observed during a
number of FMEA studies, that the review team struggled with the basic concept of failure
mode identification. The typical component-by-component review was taking a
considerable amount of time, and the teams were becoming frustrated with the fact that the
majority of components assessed had minimal if any impact on the system. Soon the teams
were skipping review of sometimes potentially critical components based solely on the
perception that no potential hazard existed. This led to a "shotgun" type approach to failure
mode identification as the team members picked system components to review based on
personal history or experience. It was clear that a structured approach to system evaluation
was needed. Our experience with Hazop led to the idea that if the failure mode
identification method utilizing the concept of deviations from known or expected process
parameters could be married to the strong scoring mechanism of the FMEA, the overall
methodology could be improved. Documentation of typical Hazop and FMEA studies was
reviewed, and with slight modification of our QS9000 based FMEA spreadsheet, we were
able to develop a documentation scheme which captured results from our Hazop-type
failure mode identification method, while keeping the risk scoring and prioritization method
used in the FMEA.
9. HAZOP AND FMEA METHODOLOGY
The starting point for the Hazop/FMEA process is to obtain a complete set of the piping and
instrumentation diagrams. If the design is still in progress, the FMEA should be delayed
until the design is complete, because the process review will be a better product if the
design package is fairly complete. A key point in the process is for the facilitator to keep
the team focused on evaluation of the failure modes and to avoid the tendency to try to
"engineer" the corrective actions. Determining improvements to the design has a place in
the FMEA process; however, this should take place in an orderly fashion. The FMEA
process is more efficient if the role of facilitator and scribe are kept seperate.
The challenge of evaluating a complex piping diagram is overcome by breaking the system
into manageable sections. These are typically called nodes for the purposes of the study.
Nodes are sections of the design with definite boundaries, such as line sections between
major pieces of equipment, tanks, pumps, etc. The power of the Hazop lies in identifying
the failure modes through the Hazop deviation. The Hazop utilizes process parameters and
5 RE A D I N G 3 .4 US I N G A
M O D I FI E D
HAZ O P / FM E A
M E T HO D O LO G Y
FO R AS S E S S I N G
S Y S T E M R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
guidewords to systematically identify deviations to the system or failure modes. An
example of a guidewords and process parameters chart is shown in the following:
Hazop Guidewords Process Parameters
No Flow Voltage
Less Level Addition
More pH Temp.
Part of Time Speed
As Well As Viscosity
Reverse Pressure
Other Than Information Deviations to be evaluated would be "no flow", "less flow", "more flow", "reverse flow",
etc. As these deviations are identified, the Hazop node and the deviation are logged on the
worksheet. Hazop deviations are noted on the FMEA worksheet as potential failure modes.
Each of these deviations are reviewed to determine the consequences and logged onto the
FMEA worksheet as potential Effects failure. The Hazop causes are logged onto the FMEA
form as Potential Cause Mechanisms. Note the worksheet in Figure 1. Figure 1: Hazop/FMEA Methodology Worksheet
FMEA WORKSHEET Issue: 0 Project Title: Control Number/Issue: FMEA Type: Design System Company/Group Site/Business Unit: Prepared By: (Rev.) Core Team: Process Function/ Requirements (Hazop Node/Item)
Potential Failure Mode (Hazop Deviation)
Potential Effect(s) of Failure (Hazop)
SEV
Potential Cause(s)/ Mechanisms (Hazop Causes)
OCC
Current Design/ Process Controls
DEF
RPN
Recommended Action(s)
SEV
OCC
DET
RPN
The next step in the FMEA evaluation is the rating of the severity, occurrence and detection
of the failure modes and effects. The following definitions are used: Severity: A rating corresponding to the seriousness of an effect of the potential failure
mode. Occurrence: An evaluation of the rate at which a first level cause and the failure mode will
occur. Detection: A rating of the likelihood that the current controls will detect/contain the failure
mode before it affects persons, process or the facility. Each of the nodes of the diagram are
evaluated and then rated using the FMEA method. The severity of the "Potential Effect of
Failure", the occurrence of the "Potential Cause Mechanisms" and the detection of the
"Current Design/Process Controls" are ranked by the cross-functional FMEA team. A
6 RE A D I N G 3 .4 US I N G A
M O D I FI E D
HAZ O P / FM E A
M E T HO D O LO G Y
FO R AS S E S S I N G
S Y S T E M R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
typical ranking scale is integer values from 1 to 10. A standardized scoring chart should be
used to maintain consistency. A typical scoring chart is shown in Figure 2. Figure 2: Hazop and FMEA Scoring Chart
Severity Severity is a rating corresponding to the seriousness of an effect of the potential failure mode.
Occurrence Occurrence is an evaluation of the rate at which a first level cause and the failure mode will occur.
Detection Detection is a rating of the likelihood that the current control will detect/contain the failure mode before it affects persons, process or facility.
1 No effect on people. No production impact. Process utility in spec. System or equipment or operations failures can be corrected after an extended period.
Failure unlikely in similar processes or products. No Motorola or industry history of failure. <1x10-6
(1 event in 114 years)
Reliable detection controls are known with similar processes or products. Online instrumentation with automated controls to prevent failure. Example: UPW return divert system automatically activated by low resistivity.
2 People will probably not notice the failure. Nuisance effects. No production impact. Process utility in spec. System or equipment or operations failure can be corrected at next scheduled maintenance.
Remote chance of failures. <5x10-6 (1 event in 23 years)
History with similar processes or products is available. Online instrumentation with trend data indicating potential failure with no automatic controls. Example: Online resistivity with automated data acquisition.
3 Slight effects. No injury to people. No production impact. Process utility in spec. Equipment or operations failures to be corrected ASAP.
Very few failures likely. <1x10-5 (1 event in 11 years)
Controls highly likely to detect the failure mode. Online instrumentation with no trend data or controls to potentially prevent failure.
4 Minor effects. No injury to people. No production impact. Process utility in spec. Equipment or operation failure to be corrected immediately.
Few failures likely. <5x10-5 (1 event in 2.3 years)
Controls likely to detect the failure mode. Advanced predictive maintenance program utilizing SPC to predict failure, or monitoring performed several times daily. Example: vibration analysis, operator rounds.
5 No injury to people. No production impact. Process utility out of spec. No tool impact. No product scrap.
Occasional failures. <1x10-4 (1 event per year)
Controls might detect the failure mode. Preventative maintenance based on daily monitoring and performed less than the average failure frequency.
6 No injury to people. Production impact confirmed or likely. Critical process utility out of spec. One or more production tools impacted. Possible product scrap.
Moderate number of failures. <5x10-4 (1 event every 3 months)
Low likelihood that controls will detect the failure mode. (Highest reliable human-only based control method). Preventative maintenance program. Example: scheduled lubrication, operator observations or walk by.
7 No injury to people. Production outage < 8 hrs. Critical process utility outage < 4 hrs, or severely out of spec < 4 hrs. Product scrap likely.
Frequent failures likely. <1x10-3 (1 event every 1.5 months)
Slight likelihood that controls will detect failure mode. (Typical human-only based control). Once weekly observation by operators or laboratory testing.
7 RE A D I N G 3 .4 US I N G A
M O D I FI E D
HAZ O P / FM E A
M E T HO D O LO G Y
FO R AS S E S S I N G
S Y S T E M R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
8 Possible minor injury or regulatory investigation. Production outage < 24 hrs. Critical process utility outage 4–12 hrs or severely out of spec 4–12 hrs. Substantial product scrap likely.
High number of failures likely. <5x10-3 (1 event per week)
Controls unlikely to detect the failure mode. Maintenance performed when problem is indicated. Random or quarterly maintenance program.
9 Possible major injury or regulatory action. Production outage < 48 hours. Critical process utility outage 12–24 hrs or moderate contamination of cleanroom or process utility. Substantial product scrap likely.
Failures certain to occur in near future. Some company or industry history. <1x10-2 (2 events per week)
Controls remotely likely to detect the failure mode. No maintenance program.
10 Possible severe injury or regulatory action will occur. Production outage > 48 hrs. Critical process utility outage. 24 hrs or severe contamination of cleanroom or process utility. Substantial product scrap likely.
Certain to occur soon. Significant company or industry history. <1x10-1
(3 events per day)
Controls are almost certain not to detect the failure mode. No controls are available or no practical or scientific method to detect failure.
Each of the parameters is ranked and multiplied together. The Risk Priority Number (RPN)
is the product of Severity, Occurrence and Detection rankings. The RPN values should be
used to rank order the concern in the process in Pareto fashion. The resulting RPNs are
evaluated for recommended actions that could reduce the calculated risk through corrective
actions. Corrective action should be directed at the highest ranked RPN. Effort should be
applied to identify positive corrective actions to minimize risk from the failure mode by
eliminating or controlling the potential cause mechanisms. The effect of the recommended
actions can be re-evaluated for the Severity, Occurrence, and Detection with the resulting
RPN noted. Properly applied, the FMEA ranking method is an interactive continuous
improvement process that can be used to minimize the system risk.
10. CONCLUSION
Multiple assessments using the Hazop+FMEA methodology have been performed to date.
In all cases, the diverse teams of EHS, Facilities, Maintenance, Engineering and
Manufacturing worked well and efficiently with the method. It was noted that about 15
minutes of method description with simplistic worked samples was enough to orient the
team to the method. Within an hour of the meetings, all team members were fully engaged
and participating in the review. One key to maximizing effectiveness was the presence of a
strong facilitator familiar with the methodology and a dedicated scribe recording the results.
Another key to the success of the method is the previous familiarity of most manufacturing
personnel to the QS9000 FMEA method. This "automatic" buy in of the scoring criteria
resulted in minimal debate on validity of the method.
NOTE
1. "Potential Failure Mode and Effects Analysis (FMEA) Reference Manual"
ASQC/AlAG, Second Edition, Feb 1995.
Source: Proceedings of Engineering Management for Applied Technology (EMAT)
2001, 2nd International Workshop, 16–17 August: 47–53.
RE A D I N G 3 .5
PRELIMINARY SAFETY ANALYSIS
GEOFF WELLS, MIKE WARDMAN & CRIS WHETTON
Various major safety studies are carried out at appropriate stages during a project. Many
companies do some form of preliminary analysis at points between initial project concept
and when the process design is completed. These studies aim to ensure that the decisions
on process design and site selection take full account of process safety requirements and
related risk and environmental constraints. Methods have been incorporated and developed during this work to take account of best
industrial practice for such safety studies. These are listed under the general heading of
preliminary safety analysis (PSA) and are carried out from the time of the concept safety
review until such time as reasonably firm process flow diagrams or early P & I diagrams
are available. The methods included are as follows:
concept safety review (CSR)
critical examination of system safety (CE)
concept hazard analysis (CHA)
preliminary consequence analysis (PCA)
preliminary hazard analysis (PHA). These have been developed from a model of the plant and its interpretation as part of an
incident scenario. The emphasis throughout is on utilizing the best points to start the
search to identify undesired events contributing to the development of accidents. For the main method described, preliminary hazard analysis, this search has as its starting
point and fulcrum the 'dangerous disturbances of plant' which arise at a point in the
incident scenario just after emergency control measures have failed to control the
situation. The study should be conducted using risk evaluation sheets which model each
stage of the incident scenario and allow for a short-cut assessment of risk when this is
desired. The above methods are demonstrated by part of a simplified case study. The methods
function well and provide not only a good model of incident scenarios but are readily
developed into fault and event trees and operating procedures. They are invaluable for the
development of safety reports for regulatory authorities. Furthermore, by not imitating
HAZOP methods they strengthen the effectiveness of the search process. THE PURPOSE OF PRELIMINARY SAFETY ANALYSIS
Preliminary safety analysis is a systematic approach to the identification of potential
hazards and hazardous conditions which is carried out at an early stage of the design of the
plant, before the commencement of detailed engineering (except for specially selected
items). It aims to make safety objectives more readily tenable by subsequent design,
engineering, realization commissioning and productive methods. It suggests ways to
challenge the design and encourages an understanding of the consequences of failures as
well as identifying the principle incident scenarios stemming from deviations from normal
or expected behaviour.
2 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The objective of a preliminary safety analysis is not to identify all possible scenarios and
initiators of incidents.1 It is to consider any impact (either safety, health or environmental)
which the project may have either on-site or off-site and identify significant hazards.
Special attention is paid to loss of containment leading to a significant release of material
which can have major consequences, usually resulting in harm or damage to the system and
its total environment. The preliminary safety analysis should also identify those changes to
process conditions which could lead to an adverse discharge leading to the consent levels
for gaseous, liquid or solid effluents being exceeded. Where the project can create
significant on-site or off-site impacts, then the risk of such consequences should be
evaluated and compared with appropriate criteria in order to determine whether further
action must be taken to reduce the risk or abandon the project in its present form. In some
cases a quantified risk analysis should be completed.
Concept safety review follows or is incorporated in the review of the scope of the project
and the means for an early assessment of safety, health and environmental hazards. It links
in with other project work beginning at this time and contributes to key policy decisions
such as siting and preferred route.
A concept hazard analysis is used for the identification of hazard characteristics to identify
areas which are recognized as being particularly dangerous from previous incidents. It also
identifies the need to explore any difficulties which might be experienced with unwanted
reactions. As well as identifying environmental damage, the analysis may also consider
whether the proposal fulfils the 'green' policies of the company.
A critical examination of system safety is used either to eliminate or to reduce the possible
consequences of a hazardous event by an early study of the design intent of a particular
processing section. This should be carried out at an early stage and well before the process
design is completed.
A preliminary consequence analysis can be used to identify likely major events. Such
studies assist in the selection of the site if this is a required project objective. This is an
abbreviated form of preliminary hazard analysis in which gross assumptions are made for
the frequency of events. It enables the major events which may result from the process to
be identified. The event tree section of the HAZCHECK knowledge base provides the
necessary information on the development of incident scenarios.
A review of health hazards should consider measures proposed to prevent employees being
exposed to either chronic or acute health hazards and should be carried out considering
periodic emissions and fugitive emissions.
A preliminary hazard analysis is undertaken to identify applicable hazards and their
possible consequences with the aim of risk reduction, i.e. to reduce the frequency of
significant consequences to an extent that is comparable with project and manufacturing
objectives and which meets the constraints imposed by regulatory and local authorities. It
should be carried out at a stage when change in the design is still possible.
The methods listed above are a compilation of techniques used in industry. Several of these
have been described by Turney 19902 and James 19923. This work has modified the way
they are carried out and has modified the documentation procedure. The technique
developed for preliminary hazard analysis is, as far as we are aware, original.
3 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
CONCEPT SAFETY REVIEW (CSR)
At the start of a preliminary safety analysis the analyst and others should carry out a
preliminary concept review. This is carried out as early as possible, sometimes during
process development.
The objectives and scope of the project should be previewed and defined. This should
include general information about the development plan and the plant or object being
analysed. It is particularly important to ascertain the need for a range of options including
process development, available processes and whether these will be licensed, the
availability of alternative sites and modes of transport of raw materials and products, the
availability of experience within the company and site etc. It may be that a particular
project does not require study of all these items and it is as well to make such matters clear
at the start. Subsequently the concept safety review should determine the need for safety
reviews and their timing.
Information should be obtained on the safety, health and environmental hazards of all
chemicals and materials involved in the new process. This should take account of both
individual and collective properties of materials. Helpful information is contained in
regulations such as COSHH and CIMAH in the UK. General appreciation should also be
generated of the main hazards presented by the plant such as fire, explosion and release of
harmful substances such as toxic gases and liquids, effluent, radioactive and corrosive
materials etc.
The study should review information on previous incidents on the plant using both
information available on incidents within the company and its affiliates and information
available from global sources. For a project under development the latter information
should be augmented by studies of the route and incidents affecting plants using related
reactions.
At each site under consideration it is necessary to consider on-site and off-site transport of
raw materials, products and wastes including loading, off-loading, type of transport and
route. The requirements for facilities and services, emergency planning, interaction with
other plants etc. must be examined.
The study should consider all organizational factors affecting the project including the
availability of experienced staff both within the company and at the site. This experience
should be reviewed in terms of general experience, experience of related plants and specific
experience of the plant. Means to overcome any problems should be discussed. The impact
of the plant on the general health and safety management policy of the site should be
identified. Criteria should be established for all safety, health and environmental factors
with which the plant must comply together with relevant company standards, national
legislation and other regulatory approvals and consents. Any effect on the position of the
site with respect to effluents and emissions and status under CIMAH regulations must be
reviewed. General project criteria should be defined including the codes of practice to be
followed and the extent and timing of all safety reviews.
The preliminary concept safety review should be a means by which improvements in design
procedures are made known to the designers and by which it is ensured that current thinking
on ways of improving the design practice is implemented.
4 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
CONCEPT HAZARD ANALYSIS (CHA)
The concept hazard analysis must identify the hazardous characteristics of the project. A
hazard has the potential to cause harm, including: ill-health and injury; damage to property,
plant, products or the environment; production losses; business harm and increased
liabilities. Ill-health includes acute and chronic ill-health caused by physical, chemical or
biological agents, as well as adverse effects on mental health. Hazards are system
independent. They can be split into the categories: chemical, thermodynamic, electrical and
electromagnetic and mechanical. Chemicals can be further subdivided into toxics,
flammables, pollutants and reactants. Further lists can be used to identify health hazards. A
hazard is any potential source of threat or potential danger. There is a need to identify
external threats to the system and these include unplanned changes in the plant or its use.
It is important to distinguish between a hazard and a hazardous condition. A hazard is
solely a qualitative term but a hazardous condition includes a quantitative element in its
description of a hazardous state, e.g. the amount of hazardous material used. It is not an
undesired event in itself, but has the potential to induce one or more undesired or dangerous
events. Hazardous characteristics embrace both hazards and hazardous conditions. Hence
when reference is made to hazard identification, it is more often than not the identification
of hazardous characteristics which is of concern. After all a hazard can be identified with
relative ease. It is the impact of a hazard and the frequency of occurrence which is difficult
to estimate.
The structure of a concept hazard analysis
The methodology of a concept hazard analysis is shown in Table 1.
Table 1: Methodology of a concept hazard analysis
Assemble a study team
Define the objectives and scope of the study
Agree a set of keywords
Partition each process flow diagram or block diagram into reasonably-sized sections
Identify the dangerous disturbances and consequences generated by each keyword
Determine if the hazard can be designed out or the hazard characteristics reduced
Determine any protections and safeguards
Determine comments and actions
Report using proforma
A concept hazard analysis may be commenced at a stage when the block diagrams or a
preliminary process flow diagram are available. It aims to identify the main hazards which
the proposed plant will generate or face. The approach used can vary considerably from a
general identification of hazards to a thorough look at each section of plant. Usually each
section of the plant is evaluated at a preliminary meeting considering the items given in
Tables 2 and 3.
A list of streams and substance characteristics should be prepared beforehand by process
engineering. A brief review of each stream is generally helpful and describes the process.
The report should be updated as actions are taken or resolved with respect to safeguards and
the assembly of further information. As fresh hazardous conditions are identified these can
be incorporated within the record for appropriate action.
5 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
Table 2: Keywords
Flammables IgnitionFireExplosion/detonation
Chemicals ToxicityCorrosionOff-specification
Pollutants EmissionsEffluentsVentilation
Health hazards Chemical contactNoiseIllumination
Electrical/radiation hazards ElectricalRadiationLaser
Thermodynamic hazards OverpressureUnderpressureOver-temperatureUnder-temperature
Mechanical hazards Structural hazardsCollapse, drop
Mode of operation Start-upShutdownMaintenanceAbnormalEmergency
Release of material Release on ruptureRelease by dischargeFugitive emissionsPeriodic emissionsHandlingEntry
Loss of services ElectricityWaterOther services
External threats Accidental impactDrop/fallAct of GodExtreme weatherExternal interferenceLoosening/vibrationVibrationSabotage/theftExternal energetic eventExternal toxic eventExternal contaminationCorrosion/erosion
6 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3: Keywords in concept hazard analysis
7 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
The keywords in Tables 2 and 3 are related to specific hazardous events. The perceived
dangers are noted together with suggestions for safeguards (the latter denoting a general aim
rather than an actuality). Appropriate comments are added for action. As well as
identifying general hazards the opportunity is taken to add any specific hazards for which
the equipment has previously given problems. Various companies use different keywords
and additional ones include off-specification, fire, effluents, loss of services etc.
An example of a concept hazard analysis is applied to the methanator section of a hydrogen
plant in Table 4. An early P&I diagram of this plant is given in Figure 1. The process
involves removing small quantities of oxides of carbon from a hydrogen product by reaction
with hydrogen at 400°C and 20 bar. Some companies may prefer at this point to use
HAZOP keywords to highlight further problem areas. Such actions are more likely to be
taken if this study is carried out as a form of preliminary hazard analysis. Such action is not
recommended as it is important to use alternative search procedures at different stages in
project development.
The documentation shown here is more extensive than that independently developed at
BNFL.3 These simply document keywords, discussion and action/recommendations. This
approach has the advantage of speed and is particularly recommended when the initial
information is scanty and one objective is to give advice to the designer team.
The study undertaken at this stage will vary considerably according to the knowledge which
the participants have about the process. Many projects considered by industry are
modifications to process plant, costing up to £1 million (1992 values). For these
considerable information will be available. In other projects the study can be used to
transfer information from process licensers etc. In the case of a development project the
study can highlight key safety areas requiring further study. This is important to determine
whether both a concept hazard analysis and a preliminary safety analysis are required.
CRITICAL EXAMINATION OF SYSTEM SAFETY
At some stage it is important to review the design seeking radical change to improve safety.
A critical examination of system safety is one such means of tackling the problem.
Method study became widely used in the 1960s. Numerous courses were run to give
information on how to conduct the critical examination of any problem. The initial
questions aimed to resolve 'what, when, how and where?' relating to a particular activity or
operation. The answer to each of these questions was further probed by asking 'why, why
then, why that way, why there?' etc. There was also emphasis on the use of brainstorming
to generate alternatives.
Critical examination arises to reveal any problem and its formulation. The argument is
made that only when designers understand the reason why they are being asked to produce a
solution are they really likely to solve the problem. Here a revised approach is suggested
for critical examination, which differs from that used by Elliott and Owen4 in its aims and
rigour. The emphasis is on process safety, if possible, without the need for add-on safety.
The need for rigour is reduced as criteria are subsequently evaluated by other safety studies.
The only deviations considered under how the task might be accomplished are major
disturbances affecting plant safety.
8 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 4: Concept hazard analysis
9 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
Table 4: Concept hazard analysis (continued)
10 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 1: A P&I diagram of the methanator section of a hydrogen plant, to which concept hazard analysis was applied
11 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
The method
Examples of the method are given in Table 5 and these should be consulted to ascertain the
format to be used. The first feature of the method is to write down a statement of the design
intent describing clearly what is to be done or achieved and how this is to be accomplished.
Individual statements may be necessary for some processes or task activities covering all the
what, when, how, where and who questions of the proposal. If the plant is not in normal
operation for the purpose of the study then this must be stated, identifying in minimum
detail the change of state achieved by an operation reaction or activity. This usually
indicates the operating conditions and equipment involved but not the full details. These
are made available to the analyst in other documents. A similar statement is subsequently
added indicating any dangerous condition, here defined as one leading to a dangerous
disturbance of plant.
Each significant aspect of the achievement is then probed by querying the proposal or
existing facts and its purpose. The aim is to expose the strengths and weaknesses of the
present situation. The emphasis is on how to avoid the dangerous conditions noted and not
on how to improve the process economics etc. Such conditions should be those which are
essentially a function of the process and its structure rather than a list of standard features
which are automatically checked (for example, the loss of lubricating oil to a compressor).
Alternatives are then generated. Some keywords with which to systematically associate
each significant part of the achievement are given in Table 6. Doubtless other effects than
those noted can be generated. However, the important matter is that a structure is given to
aid the generation of possible improvements.
For a safety study it is important to examine how the proposal is achieved, paying particular
attention to the following:
materials: change the quantities or qualities/use extra or different materials
method: change the operating conditions or activities/change the route and method of
processing/change the sequence, frequency, absolute time or duration
equipment: use different equipment.
The impetus for change should be to make the frequency of a major incident less likely and
to lessen the consequences of such an incident.
The technique, when applied in this manner, ensures that an attempt has been made to
improve the inherent safety of the proposed system by using a formal procedure rather than
leaving it as a matter for consideration by individuals.
It is also essential to study any dangerous condition and its cause. These should be readily
identifiable from an equipment knowledge base or the knowledge of the process engineer.
Then the keywords are used to effect analysis. Alternatives or modifications can be
suggested. The analyst should try to avoid only recommending measures to control the
situation or shutdown plant. These should be a back-up only to other protective barriers.
There is no reason to complete the study of both sections independently. The dangerous
condition affects the decisions made on how the process should be achieved and vice versa.
12 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 5: Critical examination of methanator section
13 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
Table 6: Critical examination: keyword dictionary
Keyword Examples of use
Eliminate Eliminate by a completely different method or part of a methodEliminate certain chemicals, change the route, use a lean technologyEliminate additives, solvents, heat exchange mediums, additivesChange the equipment or processing methodEliminate leakage points; use a weld not a bolted fitting, etc.Eliminate a prime mover or heat exchange or agitatorEliminate a separation stage or stepEliminate intermediate storageEliminate an installed spareEliminate manual handlingEliminate sneak paths, openings to atmosphereEliminate wasteEliminate entry into vessels or disconnectionEliminate products that are harmful in useEliminate an ignition source, particularly permanent flame
Avoid Avoid extremes of operating conditionsAvoid operating in a flammable atmosphereAvoid possible layering of materials, inadequate mixingAvoid flashing liquids, particularly in extensive heat exchanger networksAvoid production of large quantities of dangerous intermediatesAvoid unwanted reactions in and outside reactorsAvoid operating near extremes of materials of constructionAvoid operating conditions leading to rapid deterioration of plantAvoid maintenance on demand and in short time periodsAvoid items of plant readily toppled by explosionsAvoid stage, step or activity by doing something as well as or instead of
Modify Modify any topics aboveModify batch operation to continuous operation or vice versa
Alter Alter the composition of waste, emission and effluentsAlter the sequence, method of workingAlter the time or duration of an activity (faster/slower, earlier, later?)Alter the frequency of an activity (more/less, why then?)Alter quality, quantity, rate, ratio, speed of any part of an operation or activityAlter who does an activity (why them? more/less people)
Prevent Prevent emissions and exposure by totally enclosed processes and handling systemsPrevent exposure by use of remote control
Increase Increase heat transfer and separation efficiency or capacityIncrease conversion in reactions
Reduce Reduce inventory; less storage, hold-up, smaller size of equipment, less pipingReduce amount of energy in systemReduce pressure and temperature above ambientReduce emissions and exposure by improved containment, piped vapour return,covers, condensation of return, use of reactive liquids, wetting dustReduce frequency of opening, improve ventilation, change dilution or mixingReduce size of possible openings to atmosphere
Segregate Segregate by distance, barriers, duration and time of daySegregate plant items to avoid certain common-mode failuresSegregate fragile items from roads, etc.
Isolate Isolate plant by shutdown systems, emergency isolation valves
Improve Improve plant integrity, reliability and availabilityImprove control or computer control. Use user-friendly controlsImprove responseImprove quality of engineering, construction, manufacture and assembly
14 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PRELIMINARY CONSEQUENCE ANALYSIS
A preliminary consequence analysis of major incidents examines the impact of what might
occur on a particular process plant. It is usually carried out as soon as a description of the
process flow diagram is available. If the site is to be selected it may be done very early.
Such a study may well only consider pipe breaks and common leaks. The analysis can be
carried out following critical examination before a decision is made to proceed with more
extensive design. Although here the emphasis is on plant, it is necessary to do similar
studies on the transport of raw materials and products.
Process information
In order to ascertain the problems, it is necessary to identify the proposed site and
approximate layout of the plant. The basic information required is listed below and some of
this information is subsequently transmitted to regulatory and planning authorities when
required.
Information should be obtained on the nature and scale of the use of dangerous substances
at a site and how the proposed activity fits in with the existing requirements of regulatory
bodies, local authorities, river authorities, etc. (See the preliminary concept safety review.)
This information is also required on every dangerous substance involved in the activity.
This should indicate the concentrations of those materials likely to be present and the names
of the main impurities. Inventory levels of vessels are required and the analyst requires
information on the possible impact of any hazardous chemicals on people and the
environment.
Information normally noted about a major hazard installation is given in the CIMAH
regulations5 and includes the following items:
A map of the site and its surroundings, to a scale large enough to show any features that
may be significant in the assessment of the hazard or risk associated with the site. If the
environment is at risk then it may be necessary to show the site and surrounding area on
a scale that is large enough (1:100 000) to show all the significant features of the natural
and built environment.
A scale plan of the site identifying the location and quantities of all significant
inventories of the dangerous substances.
A description of the process or storage involving the dangerous substance, its inventory
and an indication of the conditions under which it is normally held.
The maximum number of persons likely to be present on site.
Information about the nature of the land use and the size and distribution of the
population in the vicinity of the industrial activity to which the report relates.
The general information should be sufficient to enable any external threats to the plant to be
identified including adjacent plants, major hazard sites in the locality, roads etc.
Information on effluents, noise, risk etc., should be assembled. This data should be
supplemented by information on the arrangements for safe operation of the site and the new
activity, the emergency planning requirements and the requirements for additional expertise
for the operation of the plant. A safety audit of the management and organization should be
carried out, if not carried out earlier for other projects.
15 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
Preliminary consequence analysis of major hazards
The preliminary consequence analysis of major hazards will not give an accurate
assessment of the frequency of any incident or the measures used to control or avoid the
release. It should, however, consider ways of dealing with the resulting emergency and
instigating the emergency response.
The report should at this stage concentrate on the response to the emergency rather than
countermeasures to a specific release. However, due attention must be given to the possible
escalation of the incident, including escalation as a result of mitigating efforts, such as
fighting fires. The main factors to be considered in the modelling of the behaviour and
impact of a substances on release are:
release size, phase and properties
duration of release
weather and terrain
probability of ignition and explosion
probability of escape
probability of persons evacuated
duration of exposure
population density
proportion of persons indoors
building ventilation rates.
For preliminary studies it is often necessary only to consider general values should no
danger arise outside the plant boundaries.
Hazardous events and their impact
The main hazardous events that should be considered are as follows:
fire: flash fire, pool fire, torch fire
explosion: confined chemical explosion, dust explosion, physical explosion, BLEVE
(boiling liquid expanding vapour explosion), vapour cloud explosion
release of missiles
release of toxic materials to humans, water, land, flora or fauna
release in a form liable to cause normal accidents.
It is particularly important to identify the worst accident which might occur such as the
largest release of toxic gas, the most severe contamination of an aquifer and the greatest fire
or explosion. This is required for emergency planning purposes.
Accurate assessments of damage and harm are difficult especially for a toxic release as the
basic toxicology data is generally not based on the effects on humans. On top of this
inaccuracy is the probability of mitigation. On detection of a leak about 80% of persons in
the immediate vicinity are likely to escape but 20% will act inappropriately or have no
opportunity to escape. For a toxic release the general advice is to find shelter (not cars) and
evacuation is usually only worthwhile in the event of a change in wind direction during
prolonged release, or for cases where there is a progressive warehouse fire. This is due to
there being little or no opportunity for either plant management or local services to
influence the chances of escape.
16 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The impact of an explosion is more readily assessed apart from the likelihood of ignition.
Escape action is generally obvious for trained personnel. For a BLEVE there is a high
probability of escape; a probability greater than 0.5 when the time from initial release to
BLEVE is 20 minutes or more. For delayed ignition of a flammable cloud only early
escape action by individuals is relevant. In the event of a conventional fire the aim should
be to escape immediately, closing any doors in buildings on escape. Also the heat radiating
on doors should be checked before opening doors. Unfortunately people act
inappropriately on such events, as the King's Cross Underground fire has displayed.
Damage and harm must be considered with respect to people, property and the environment,
paying particular attention to the following cases for major hazards:
on-site at least three people suffering death, or at least five people suffering injury,
requiring first aid treatment or hospitalization
off-site at least one person suffering death, or at least five people being physically and
directly affected
damage to property and sites of historical or archaeological interest and buildings given
statutory protection against deliberate change or damage
loss of normal occupancy of property for three months
permanent or long-term damage to water, land, flora or fauna in a significant area of
terrestrial, freshwater or marine habitat.
It should also be noted how the business will be affected by any incident, considering loss
of production or market share, legal liabilities and costs including damages paid in civil
actions, and the knock-on effects on other business interests at local, national and
international level.
Simplified consequence analysis
The sources of major accidents are as follows:
failure of vessels giving either an instantaneous loss or a continuous loss for 30 minutes,
normally assuming connected pipework
pipe breaks
the loss of process material by discharge through an abnormal opening or the change in
a normal product, discharge, vent or product.
A simplified consequence analysis can be carried out assuming typical leak areas and using
historical data for the frequency of failures of pipes, flanges and seals. For a selected leak
the consequences can be estimated using appropriate computer software. Obviously these
results are most readily interpreted if the consequence analysis tool plots appropriate
contours over the site and plot plan. Alternatively qualitative consequences can be
expressed based on the experience of analysts or industry. General values for flammable
releases (allowing for different size of a leak) can be taken for the probability of ignition
and for explosion in the event of ignition. Event trees branch outwards according to
different scenarios, consequently for overall reporting it is important to develop a list of
accidents seen as TOP events. Part of a preliminary consequence analysis is given in
Table 7. At a later stage this can be amplified by preliminary hazard analysis and further
branching questions introduced to examine failure to mitigate or escape in more detail.
17 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
Table 7: Preliminary consequence analysis
PRELIMINARY HAZARD ANALYSIS (PHA)
A preliminary hazard analysis is structured in a similar manner to a HAZOP study.
However, it is usually possible to partition the plant into fewer sections. Thus, instead of
proceeding line by line it may be practical to consider just main items of plant and
associated lines and heat exchangers. It has been found helpful to consider what happens if
the products and planned discharges are off-specification.
Plant information assembly
Plant information should include process information, such as notes on fundamental process
chemistry including dangerous reactions and side-reactions; data on hazardous materials;
process flow diagrams showing control measures and safeguards; equipment specification
sheets and inventory levels and any available operating information. The studies noted
earlier should be completed as a precursor to preliminary hazard analysis. It is important
prior to the preliminary hazard analysis to have a clear specification of the objectives: a full
process specification of feeds, products and wastes; constraints on emissions and effluents;
specification of utilities.
Partition of the plant into critical sections
The plant is usually partitioned according to the main plant items and their associated
ancillary equipment. The design intent of this section should then be defined carefully. If
not done previously then a critical examination of the design intent should be carried out.
18 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The best starting point of the analysis is at a point on the incident scenario termed
'dangerous disturbance of plant'. The variations of parameters considered to be relevant to a
dangerous disturbance form the deviations examined at this stage. They are as follows:
disturbances resulting in rupture on exceeding mechanical limits: overpressure;
over-temperature; machine overload or stress underpressure; under-temperature
critical defect in construction: critical defect left in construction or critical deterioration
in construction
flow through abnormal opening to atmosphere: abnormal opening left in plant or
abnormal opening made in plant
adverse change in a planned product or other release: change before leaving plant or
change after leaving plant.
The analyst expands each cause of a dangerous disturbance leading to rupture and discharge
by progressing down to immediate cause as appropriate.
The immediate causes of incidents are classified as follows:
inadequate action by personnel
defects directly causing loss of integrity
plant or equipment inadequate or inoperable
control systems inadequate or inoperable
deliberate change from design intent
environmental and external threats.
A risk evaluation sheet should be used to conduct the analysis. In this case it is immaterial
if the analysis starts at immediate cause and follows the scenario up to consequences of the
release. However, it is necessary always to return to the dangerous disturbance as the
fulcrum of the study.
An example, taken from a case study, is given in Table 8. In this particular version of the
form up to 2 dangerous disturbances and 3 x 2 immediate causes can be studied. The
hazardous disturbances noted on the form correspond to HAZOP style deviations. It is
generally unnecessary to complete the form in the detail shown. The risk data is added after
and not during the meeting.
It is important that the search does not become a preliminary HAZOP study. The main
search processes become too similar in nature. The PHA should emphasise disturbances of
temperature and pressure whereas a HAZOP usually starts with studying deviations of flow.
Sometimes it will be found necessary to expand a particular box. For example, the operator
action may need to evaluate whether the operator is alerted or stimulated, whether the
correct diagnosis is made and whether the right action is taken. Such action may be
drastically wrong. In this case an appropriate continuation sheet can be used or a special
note added. Also as forms can get congested, it may be desirable to append a separate
action sheet or extend the size of sheet used for the analysis. Simplified sheets are used in
meetings to carry out the analysis.
19 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
Table 8: PSA risk evaluation sheet
TARGET RISK AND THE RISK EVALUATION SHEET
Risk is here defined as the likelihood, L, of a specific undesired event occurring within a
given period or in particular circumstances. The likelihood is measured as a frequency per
year. The severity, S, is a measure of the expected consequence of an incident outcome.
The target risk is defined by the equation:
Target risk = log1010L + log1010S = L + S
where L is the exponent of likelihood as measured by frequency (a negative value) and S is
a severity ranking set by the company and referring to a set of five failure ranges from
minor (1) to catastrophic (5).
The target risk is only acceptable when its value is equal or less than zero. To reduce the
risk, measures should be taken to reduce the likelihood of occurrence, which is a measure of
the expected probability or frequency of occurrence of an event, or to ameliorate the
severity of the consequences of occurrence by appropriate measures. For example, the
20 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
exposure of an individual to a hazardous substance which cannot be eliminated by other
means might involve measures aimed at prevention of exposure, reduction of emission or
exposure and provision of means for dealing with residual risk.
Results which are clearly not acceptable are prioritized for further study with risk reduction
or elimination as the aim.
It is particularly helpful to evaluate risk using risk evaluation sheets as this ensures that the
contribution to mitigation effected by the operators is particularly noted. This may also
highlight the need for specific training. The technique has been applied to maintenance
problems, evaluation of the effect of emergency control systems being inoperable, and
incident investigation. In most cases it is not necessary to have absolute accuracy for risk
estimates as the relative improvement or sensitivity of overall risk to certain criteria is the
factor of interest.
CONCLUSIONS
All hazard identification methods aim to model part of the incident (accident) scenario. If
one observes the amount of data available to the analyst at any stage during the
development of plant then it is clear that the starting point of the search must be selected
carefully. Methods start from different points: e.g. FMEA at a failure mode, HAZOP at a
hazardous deviation.
In the main method described here, preliminary hazard analysis, the analysis pivots around
a dangerous disturbance of plant which is identified as a point just before the release of
material. Also the method utilizes a model of the incident scenarios for documentation
purposes. Furthermore, the opportunity is taken to evaluate the risk.
It will be noted how all the methods used in preliminary safety analysis combine to produce
a comprehensive safety study which can be carried out at an early stage of the design, and
can be developed further as the detailed engineering of the plant proceeds.
The risk evaluation sheets provide a ready record which can be examined during production
to identify the effect on risk should changes in plant and its availability arise.
ACKNOWLEDGEMENTS
Mike Wardman is sponsored by the UK Science and Engineering Research Council and
Cris Whetton by the EC STEP programme.
21 RE A D I N G 3 .5 P R E LI M I N AR Y
S AFE T Y AN ALY S I S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
GE
ME
NT
REFERENCES
1. Wells, G. L. 'Preliminary Safety Analysis', Module 1, PSLP Course, Sheffield, Oct 12-
15, 1992.
2. Turney, R. D. Process Safety & Environmental Protection, February 1990, 12.
3. James, R. A. 'Applications of HAZOP and the Pre-HAZOP technique', Module 1,
PSLP Course, Sheffield, Oct 12-15, 1992.
4. Elliot, T. D. M. and Owen, J. M. The Chemical Engineer, November 1968, 377.
5. The Control of Industrial Major Accident Hazards Regulations', SI 1984/1902, 1984.
Source: Journal of Loss Prevention in Process Industries, 1993, 6(1): 47–60.
SU G G E S T E D A N S W E R S
EXERCISES
3.1 Case study—Fuel storage terminal
When you apply the checklists to the case study, you will find that there is not enough
information regarding organisational structure to provide answers to some of the points. It
is not critical for hazard identification at this stage. However, important issues such as the
need for a laboratory to check blending composition/quality etc. need to be highlighted.
For example, question A1(e) asks 'Are adequate facilities available (e.g. … laboratories)?'
A hazard can be identified by this question:
Out of specification jet fuel (no laboratory or inadequately equipped) and risk to
aviation safety
Human error in diverting gasoline to automotive diesel fuel tank, and low flash point
fuel in automotive diesel fuel tank. This has an explosion potential when injected into
a high compression ratio diesel engine.
Hazard identification using the checklists for some items is given below as a guideline.
Note that in this case not all the questions in the checklists can be used for identifying
hazards.
B1(c): There are no detectors or alarms for detection of leaks. The hazard is that if a leak
occurs, it could be prolonged before it can be detected, and hence the incident may escalate.
However, it is possible to provide an alarm from the level transmitter in the tanks, for an
unscheduled change in level, and this can be a recommendation of the review.
B3(a): Incorrect labelling of tanks and product contamination.
C11: Incorrect spare parts used, e.g. wrong flange gasket resulting in product leak and
ignition.
C18(e): No records of instrument and control calibration. Level transmitter reading low,
tank overfilled and product overflow in the bund.
C21: Incorrectly filled permit-to-work form, hot work carried out in wrong area.
C22: Use of non-intrinsically safe electrical equipment (potential source of ignition) in tank
farm area by untrained contractors.
D4(a): Alarms and interlocks not tested and no schedule exists. This means that the
reliability of the alarm to operate on demand is questionable.
D5(a): Product pumps wrong spares used. Wrong product lined up to pumps.
Sections E and F: No answers provided here. It is left to you to complete, based on the
examples given above, along with the description of hardware safety systems given in the
case study.
3 .2 TO P I C 3 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
3.2 Failure modes and effects analysis
Partial analysis results are provided in Table 3.11 below. Not all the components have been
covered and you should attempt to complete the table as part of the exercise.
Note that the listing of components in an FMEA study may not be exhaustive, depending on
the component level to which the system has been broken down. For instance, a valve can
stick open, stick closed, or stick in the current position. If a more detailed analysis is
required, the valve would be split into additional components such as the body, the trim,
and the actuator. Such details have not been considered in this exercise.
Apart from routine maintenance, additional measures that would reduce the risk of losing
the bugs are:
independent temperature element
high temperature alarm to alert the operator so that immediate action could be taken to
turn off the hot water until the system is repaired
if monitoring is conducted remotely, a deviation alarm between the two temperature
elements can be designed so that if one of the probes fails, an alarm would sound for
the operator to attend and fix minor deviations quickly before high temperature is
reached.
Table 3.11: Results of FMEA
Ref. No.
Component Failure mode
Cause of failure
Possible effects Possible action to reduce failure rate
or effects 1 Hot water
head tank float valve
Fails to close
Corrosion, debris build up, mechanical failure
Hot water overflows tank Injury to personnel
Routine inspection and preventive maintenance
2 FCV1 Sticks open
Corrosion, debris build up
Too much hot water flow Reactor temperature high
Routine valve maintenance High temperature alarm on water flow to reactor
3 FCV1 Fails in closed position
Pneumatic actuation system failure
No hot water Reactor gets cold No reaction
Routine maintenance Regular operator patrol of area
4 TE/TC Reads low Wrong calibration, calibration drift
TC assumes the temperature is low and opens FCV1 more Effect same as Ref. No.2
Regular calibration of temperature element Redundant independent TE and high temperature alarm
3.3 Hazard and operability study
Partial results for this exercise are provided in Table 3.12. The cold water line has not been
considered and you should complete this as part of the exercise.
Note that the findings and actions are similar to FMEA, but the focus is on operation rather
than individual components. For instance, more than one failure mode can result in the
operational deviation being considered. When a guideword is selected for a specific line,
for causes of that deviation, we once again look at all the components in that line, and the
possible failure modes of those components that could result in the given deviation.
3.3 TO P I C 3 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
For example, when we consider 'High Flow' of hot water in the line, we look at all the
failure modes, i.e. FCV1 failures, TE/TC failures etc. as a single package, whereas in
FMEA we consider each component and the operational deviation a certain failure mode
would cause.
Note that several of the deviations may give rise to the same action, which only goes to
confirm that the course of action is correct. The reason there appears to be a lot of
repetition in the HazOp process is that flow/level/temperature tend to be interrelated and a
change in one affects others. The structure of the HazOp technique is also such that if the
causes of a deviation are not correctly identified in one step, they are captured in the next
step.
Table 3.12: HazOp study datasheet
Pro
pose
d sa
fegu
ards
Rou
tine
mai
nten
ance
. Ind
epen
dent
TE
and
high
tem
pera
ture
ala
rm.
Inde
pend
entT
E to
ala
rm o
n lo
wte
mpe
ratu
re a
s w
ell.
Reg
ular
ope
rato
r pa
trol
of a
reas
.
Sam
e as
abo
ve.
Pos
sibl
e co
nseq
uenc
es
Too
muc
h ho
t wat
er to
spr
ay s
yste
m.
Hig
h te
mpe
ratu
re. B
ugs
affe
cted
.
Insu
ffic
ient
hot
wat
er. L
owte
mpe
ratu
re. N
o re
acti
on.
Sam
e as
abo
ve.
Pos
sibl
e ca
uses
FCV
1 st
icks
ope
n.T
E r
eads
low
.T
IC f
ails
to lo
w. M
anua
l set
poi
ntto
o hi
gh (
hum
an e
rror
).
FCV
1 fa
ils
in c
lose
pos
ition
.TE
read
s hi
gh.T
C f
ails
to h
igh.
Dra
in v
alve
in ta
nk le
aks.
Flo
atva
lve
in h
ot w
ater
hea
d ta
nks
fail
sto
ope
n w
hen
wat
er le
vel i
s lo
w.
Sam
e as
for
hig
h fl
ow.
Sam
e as
for
low
flo
w.
Gui
dew
ords
Hig
h fl
ow
Low
flo
w
Low
leve
l
Hig
h te
mpe
ratu
re
Low
tem
pera
ture
Res
pons
ibili
ty
Mai
nten
ance
Eng
inee
ring
Pro
duct
ion
Eng
inee
ring
Pro
duct
ion
Stud
y ti
tle:
HA
ZO
P of
hot
wat
er s
yste
mU
nit:
Hot
wat
er ta
nkL
ine/
equi
pmen
t de
scri
ptio
n: H
ot w
ater
line
fro
m ta
nk to
col
dwat
er li
ne ju
ncti
on/M
ixed
spr
ay to
rea
ctor
By:
Dra
win
g no
:P
age:
1 o
f 1
Dat
e: 8
Dec
embe
r 20
06L
ocat
ion:
Ade
laid
e pl
ant
Issu
e: A
3 .4 TO P I C 3 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
3.4 Functional concept hazard analysis
In a real life situation you would need to have input from the gas compression
engineer/specialist as well as the vendor's representative in order to conduct a more detailed
analysis. For this exercise, it is sufficient to demonstrate a clear understanding of the
functional concept analysis technique.
A high-level analysis is provided in Table 3.14. Note that a different group of people may
select different keywords and arrive at a slightly different answer, although similar
deviations and consequences should have been identified.
3.5 Vulnerability analysis
First identify the 'assets' or critical success factors (those things which must be protected),
then consider the threats to these. Then evaluate the criticality of each threat to each asset.
Finally, determine the control measures you need to manage each critical vulnerability.
A sample analysis is provided in Table 3.13. Note that this table focuses on consequence
value for credible threats rather than likelihood. That is, if it is credible and did happen,
what is the realistic worst-case result. This is the focus of senior decision makers and the
courts after the event.
Table 3.13: Vulnerability analysis
Threats Project critical success factors
Completion on time
Completion on budget
Environment Government sponsor
satisfaction
Internal sponsor
satisfaction
Community satisfaction
Safety Statutory compliance
Conditions of contract issues
xx x — — x x x xx
Scope changes after sign off
xxx xxx — — — x xx xx
Litigation/liability issues
x xxx xxx xx xxx — x x
Insurance issues (lack of, length)
x x xxx xx — — x x
Unforeseen site difficulties
xx xx x — xx xx xx xx
Weather xxx x x xxx x x xx —
Mismatch of staff skills/resources/ availability
xx x — — x — xxx xx
Succession planning/loss of expertise/ knowledge
xx x x — x — xx —
Inadequate processes/policies/ decision making
xx xx xxx xxx xx x xx xx
Subcontractor tendering issues
x — — x — — xx x
Subcontractor delivery issues
xxx x x x x — xxx xxx
IR disputes xxx x — xx x x xxx xx
IT/ data/ information retrieval failure
x x — — — — — —
3.5 TO P I C 3 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 3.14: Functional concept hazard analysis for natural gas compressor station and cylinder storage/handling area
TO P I C 4
ESTIMATING THE SEVERITY OF CONSEQUENCES
Preview 4.1 Introduction 4.1 Objectives 4.1 Required reading 4.1 Estimating consequence severity 4.2 Effect and vulnerability models 4.4 Root causes of system failures 4.5 Technical and organisational factors 4.5 Accounting for event dependency consequences 4.5 Qualitative estimation of severity 4.9 Consequence assessment of release of hazardous chemicals 4.10 Release of liquid from atmospheric storage 4.11 Release of liquid stored under pressure above boiling point 4.12 Release of gas 4.14 Calculations for leak rates 4.16 Fire consequence assessment 4.17 Types of fires 4.17 Vulnerability models for fires 4.20 Explosion consequence assessment 4.22 Vulnerability models for explosions 4.23 Toxicity consequence assessment 4.25 Exposures 4.25 Effect models for toxic releases 4.26 Vulnerability models for toxic release 4.27 Structural failure consequence assessment 4.28 Project risk impact assessment 4.29 Sensitivity analysis 4.29 Summary 4.31
4.1 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PR E V I E W
INTRODUCTION
In the last topic we explored how to define an engineering system's components, couplings
and interactions and identify hazards and potential loss events. Once a hazard list is
generated, the next step is to estimate the magnitude or severity of the adverse
consequences should a loss event occur. This is important as an aid to both inherently safer
design and pre-incident planning. It involves carrying out appropriate calculations, which
will vary according to the industry and the nature of the hazard. In the processing industries, such calculations are designed to assess:
the physical effects of unplanned releases of hazardous chemicals
the damage consequences of the releases. In the utilities area they are designed to assess:
loss of water supply for specified periods
loss of power or gas supply with associated consequences. In the area of civil infrastructure they may relate to structural failure of a dam or bridge and
associated consequences such as flooding or accidents. Many of these calculations are routinely done using commercial software. However,
sometimes in the initial stages of a risk analysis it may be useful to perform simple manual
calculations to obtain a feel for the numbers and their corresponding physical realities. Consequence calculations are specific to each industry type and take us into the realm of
hazard analysis. Since the focus in this topic is on risk management issues, the discussion
of analysis and calculation has been kept to a minimum. For those interested in the details
of analysis relating to their industry, relevant references are provided.
OBJECTIVES
After studying this topic you should be able to:
identify the type and depth of analysis required to estimate consequence severity
identify the type of specialist assistance required
specify the output requirement of the investigation
make judgments on the scale of the loss event
identify actions that will eliminate or mitigate the loss event.
REQUIRED READING
There is no additional reading required for this topic. However, it would be useful to
become familiar with hazard analysis techniques in the industry of your discipline, using the
references listed at the end of this topic. In particular, the US EPA website provides substantial downloadable information on Risk
Management Program Guidance for Offsite Consequence Analysis. This can be obtained
by visiting http://www.epa.gov/ceppo. It includes methods, references and relevant
properties of chemicals. A number of Australian regulators refer to these guidelines.
4 .2 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
ES T I M AT I N G C O N S E Q U E N C E S E V E R I T Y
Having identified a range of hazards and potential loss events, the next step is to estimate
the severity of the adverse consequences if a loss event occurs. The main loss events
encountered across a range of engineering disciplines are:
fires (flammable liquids/solids/gases and combustible substances)
explosions (gas, dust, chemical, use of explosives)
toxicity effects from chemical toxic exposure from accidental releases or combustion
products from fires
major structural failures (plant and equipment, buildings, bridges, dams)
major breakdowns causing business interruption
environmental pollution due to unplanned releases
project failures or overruns (commercial consequences).
Estimating the severity of a loss event involves determining both the types of effects of such
an event and the amount of damage caused by these effects. This requires the use of
knowledge, experience, mathematical models, logic models or a combination of these
methods in order to make an informed judgment. Quantifying the consequences of loss
events that result in monetary loss is generally easier than quantifying the consequences of
those that result in loss of assets or loss of life.
The estimation of loss event consequences involves four distinct steps:
1. Define system. This is generally done as part of the hazard identification stage (see
Topic 3) and involves developing an outline of the system for which calculations of
loss event consequences are to be carried out. The outline should set out:
a) the system boundaries, for example one identifiable section of a plant such as
bulk fuel storage area, a specific warehouse section, a bridge or a dam, a
production line or a software package
b) the subsystem or equipment whose failure would cause a loss event, for example
vessels, piping, an LPG tank, a flammable packaged goods depot, a reservoir or
dam, a bridge, a gas or water supply pipeline, a power transmission system
c) a description of the internal environment of the system, i.e. pressure, temperature,
inventory, state of the fluid (vapour, liquid, two-phase mixture, etc.), process
flow rates/loads, structural strength, maximum allowable operating pressure in
the case of gas pipelines, maximum load/stress in the case of structures.
2. Develop incident scenarios. This involves formulating hypothetical failure scenarios
based on historical data, the outputs from hazard identification techniques and
experience.
3. Model calculations. This involves identifying the types of consequences that may
occur by examining the different potential sequences of events and then calculating the
effect levels of particular consequences (e.g. release rate of a hazardous chemical,
thermal radiation levels from fires, blast overpressure levels for explosions, ground
level concentration from dispersion of toxic gases, structural strength analysis,
vibration analysis).
4. Quantify damage. This involves translating the effect levels into damage estimates
such as injury, fatality, structural damage, environmental impairment or extent of
business interruption.
4.3 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Example 4.1
A local government authority maintains an aquatic centre consisting of two
swimming pools: a small pool for swimming lessons for children and a large pool for
adult swimmers. The water is chlorinated by direct injection of chlorine gas from a
chlorination facility consisting of liquid chlorine cylinders and associated dosing
control system. To estimate the consequences of an accidental release of chlorine
gas (highly toxic), the following steps would be applied:
1. Define system. This consists of chlorine storage cylinders, connecting
pipework, dosing control system and safety shutdown system. Chlorine is a
liquefied gas under pressure (approximately 700 kPa) and is at ambient
temperature. The total quantity in a cylinder is 70 kg, and the system consists of
6 cylinders, connected to a pipe manifold. A chlorine gas detector is installed
which, on sensing gas, would raise an alarm and automatically shut down the
system.
2. Develop incident scenarios. Two scenarios may be considered:
a) Rupture of a cylinder and sudden loss of cylinder inventory.
b) Rupture of a pipeline and slow release of chlorine until shutdown occurs. If
automatic shutdown fails, the system must be manually shut down by
personnel wearing self-contained breathing apparatus.
3. Model calculations. Methods exist for calculating release rates and gas
dispersion to predict gas concentrations within the facility as well as outside.
This is often conducted by specialists.
4. Quantify damage. Based on the toxic gas concentration and the duration of
exposure, it is possible to estimate the extent of injury or potential fatality to
exposed persons. This is based on toxicology data for the specific component
(chlorine); once again, specialist skills are required.
Example 4.2
A two-lane bridge over a railway line in suburbia was built in the 1960s. The traffic
volumes at that time were low, and 'B-double' articulated trucks, of total weight of
40 tonnes, had not been developed.
In recent times, not only has heavy vehicle traffic increased, but several trucks may
stand on the bridge for minutes at a time, waiting for the traffic to clear. This has
placed additional dynamic load on the bridge, and measurements of the level of
vibration during routine inspections have shown an increase. To estimate the
consequences of a structural failure, the following steps would be applied:
1. Define system. This consists of the bridge, the postulated worst load on the
bridge and the duration, and the number of such load cycles per day. The static
load-bearing capacity and the limits on the vibration are known. Strain gauge
measurements of the extent of strain and the cycles are available.
2. Develop incident scenarios. These may include failure of a span of the bridge
between two sets of supports, or failure of a support.
3. Model calculations. A finite-element analysis of the stresses and vibrations for
various postulated dynamic and static loads would be required. This is a
specialist exercise.
4. Quantify damage. The model calculations would provide the extent of physical
damage that could occur to the structure, from which other effects can be
assessed, for example vehicle accidents, repair/rebuilding costs, liabilities, and
traffic disruption costs.
4 .4 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
EF F E C T A N D V U L N E R A B I L I T Y MO D E L S
There are two types of models used to estimate the consequences of a loss event:
effect models which are usually mathematical and are used to quantify the effects
vulnerability models which are usually empirical and are used to quantify damage. Effect models calculate the effect levels that will result from particular loss event
consequences. For instance, assessment of the effect of a fire may consider the levels of
thermal radiation intensity (or heat flux) at various distances from the source of the fire. A
toxicity effect model may calculate the ground level concentration of a toxic gas at various
downwind/crosswind distances from the emission source. Vulnerability models take the output of an effect model and assess the resources that will be
affected (e.g. people, structures, biophysical environment) and the extent of damage to these
resources. A brief summary of effect and vulnerability models is given in Table 4.1. Table 4.1: Effect and vulnerability models Loss event Effect Resources affected Damage (vulnerability) Flash fire Thermal radiation People Burn injury/fatality Burning pool of liquid Thermal radiation People Burn injury/fatality Structures Failure Explosion Blast, flying fragments People Injury/fatality Structures Structural damage Glass breakage Gas jet/torch fire Thermal radiation People Burn injury/fatality Flame impingement Structures Failure Toxic release Toxic vapour People Irritation/distress Injury/fatality Toxic dose Environment Environmental damage Collision Mechanical impact People Injury/fatality Structures Mechanical damage Radioactive leak Nuclear radiation People Injury/fatality Earthquake Structural failure People Injury Structures Mechanical damage Food contamination Poisoning, sickness People Illness/fatality Structural overload, excessive vibration
Structural failure People Structures
Injury/fatality Mechanical damage/loss
Both effect and vulnerability models have a number of limitations that need to be
recognised. Some of these limitations are listed below.
Effect models are generally based on idealised systems and can only approximate real
situations.
Many of the models are empirical/semi-empirical, based on limited data.
Most models have been verified only in small-scale tests.
The influence of the environment (terrain, buildings, etc.) is generally not considered in
gas dispersion models, except in highly sophisticated ones. Sometimes combined effect/vulnerability models are referred to as vulnerability models
(VM) or population vulnerability models (PVM). In this representation the consequences
of a loss event are split into physical effects (effect) and damage effects (vulnerability).
4.5 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
RO OT C AU S E S O F S YS T E M FA I LU R E S
TECHNICAL AND ORGANISATIONAL FACTORS
Very often in assessing vulnerability models, the mitigation effects for non-hardware
systems (i.e. management factors, procedures, training etc.) are not addressed. This can
lead to an incomplete representation of the consequences and a pessimistic assessment of
risk. However, in most instances the reverse is true. So-called 'human factors' generally
contribute to or cause major loss events.
For example, the outcomes of the investigation into the explosion that ultimately resulted in
the loss of the Piper Alpha oil and gas platform in the North Sea concluded that the
following factors all contributed to the event:
complacent organisational culture
unrecognised (and unnecessary) couplings in design
insufficient redundancies in safety systems
difficulties in managing the trade-off between productivity and safety
a tendency to stretch maintenance operations when production pressures increase.
(Paté-Cornell, 1993)
The above factors, if present in an organisation, should be recognised. The modelling will
initially consist of effects calculations of postulated failure events. In the next step, when
vulnerability assessment is made from the effects calculations, the organisational and human
deficiencies should be accounted for.
ACCOUNTING FOR EVENT DEPENDENCY CONSEQUENCES
In the consequence analysis of major loss events, it is essential that all couplings,
interactions and event dependencies be modelled wherever possible to provide a full picture
of the risk. Generally, a small initiating event triggers a progressive series of other events
and escalates into a major event because of the inadequacy or failure of safeguard systems
and the absence of, or deficiencies in, the management system.
An analysis of the aftermath of the destruction of the Piper Alpha oil and gas platform in the
North Sea in 1988 by Paté-Cornell (1993) led to the development of a model for event
dependency consequence analysis. Figure 4.1 shows a simplification of this model.
Each step in Figure 4.1 is quite complex and consists of a number of interacting and
sequential components. A simplified description follows. A schematic layout diagram of
the Piper Alpha modules is shown in Figure 4.2 to help you follow the discussion.
4 .6 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 4.1: Event dependency consequence analysis model for Piper Alpha
Causalfactors
Losses LAPrimary initiating
events (A)Subsystemstates EA
Losses LBSecondary initiating
events (B)Subsystemstates EB
Tertiary initiatingevents (C)
Subsystemstates EC
Losses LC
A: Primary initiating event—first explosion
Process disturbance
Two redundant pumps inoperative in module C; Hydrocarbon condensate pump 'B'
trips; the redundant pump 'A' was shutdown for maintenance
Failure of a flange assembly at the site of a pressure safety valve in module 'C'
Release of condensate vapours in module 'C'
First ignition and explosion
Failure of firewall leading to damage of emergency systems in adjacent module.
EA Subsystem states after primary initiating event
Immediate loss of electric power
Failure of emergency lighting
Control room failure
Failure of public address/general alarm system
Failure of radio telecommunication room
Some people escape from 68' level to 20' level, others jump into the sea.
LA Losses after primary initiating event
Loss of emergency systems (deluge, communication)
Loss of helipad operation for rescue due to smoke
Casualties in modules A, B and C.
B Secondary initiating event—second explosion
Rupture of B/C firewall
Rupture of a pipe in module B due to projectiles from B/C firewall
Large crude oil leak in module B
Fireball and deflagration in module B
Fire spreads to module C through failed B/C firewall.
4.7 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 4.2: Piper Alpha module layout
Source: Paté-Cornell 1993: 217.
EB Subsystem states after secondary initiating event
Fire in modules B and C spread to various containers (lube oil drums,
industrial gas bottles)
Pipes and tanks rupture in modules B and C
Smoke engulfs many parts of the platform preventing escape from deck to
living quarters
Smoke ingress into living quarters
Some survivors jump into sea from 68' and 20' levels
Failure of firewater pumps; automatic start had been turned off; manual start
pumps damaged by C/D firewall breach.
4 .8 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
LB Losses after secondary initiating event
Some fatalities in living quarters due to smoke ingress and asphyxiation
Escalating damage to structures due to spread of fire
Some people unable to be rescued from the sea.
C Tertiary initiating event—jet fire from process riser
Rupture of riser (Tartan to Piper Alpha) caused by flame impingement from fires
Third violent explosion and large fire and smoke engulf the platform
Intense impingement of large jet fire on platform support structural members.
EC Subsystem states after tertiary initiating event
Most people trapped in living quarters
Some survivors jump from the helideck into the sea (175' level)
Collapse of platform at 68' level below module B
Fourth violent explosion and rupture of Claymore gas riser
Major structural collapse in various sections of platform
Accommodation module overturned into the sea
Rescue of survivors at sea (throughout the accident) by onsite vessels.
LC Losses after tertiary initiating event
Human casualties: 167
Total loss of the platform
Damage in excess of US$3 billion.
If these events were fully depicted there would be interactions between EA and LA, EA and
EB, EB and LB, EB and LC, and so on, making it extremely complex. However, Figure 4.1
does provide a simple framework for describing the initiation of a loss event and accident
progression.
A C T I V I T Y 4 . 1
Consider a loss event that has occurred in your workplace, e.g. a fire or spill, then
conduct a dependency consequence analysis of it using the model in Figure 4.1. If
you have not had such an event in your workplace, use a major incident that has been
well documented. For example, you may wish to consider:
the collapse of the World Trade Center buildings in New York
one of the many bridge collapses caused by ship impact, flood or structural
failure
the Exxon Valdez oil spill
the Esso Longford gas explosion.
Follow through to final resolution of the crisis in each case—do not just stop after
the initiating event. Document your model in a serious of dot points as in the Piper
Alpha example above.
Keep your results for use in Topic 5.
4.9 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
QUA L I TAT I V E E S T I M AT I O N O F S E V E R I T Y
Since most quantitative assessments of consequence severity require specialist assistance, it
is necessary to conduct an initial qualitative assessment in order to determine the extent of
quantification required. The basic steps in a qualitative assessment are:
1. Identify the hazards and potential loss events (Topic 3).
2. Identify the affected parties (the organisation, public, industrial neighbours, customers,
stakeholders, regulators, financiers).
3. Identify the potential adverse consequences for each affected party.
4. Assess the severity level of the adverse consequences to each affected party.
5. If the consequences must be estimated in financial loss terms, the loss is the sum total
of the following:
direct costs of the event (injury, fatality, asset damage, environmental damage etc.)
consequential losses (investigation costs, compensation costs, liabilities, legal
costs)
lost opportunity costs during business downtime
remediation costs (measures required to restore the facility and environment back
to original condition). The Standards Australia Risk Management Guidelines (HB 436:2004) suggest different
qualitative levels for consequence severity. An example is shown in Table 4.2. Table 4.2: Severity levels and descriptors
Level Descriptor 1 Negligible
2 Minor
3 Moderate
4 Major
5 Catastrophic For each severity level, criteria must be defined for different types of risk. A sample
consequence table is given in Table 4.3. These criteria have to be devised for each facility,
organisation or context before an assessment is undertaken. Table 4.3: Sample consequence table
Level People Environment Asset loss Business interruption
1 First aid injury Slight effect (within site boundary)
< $1000 < 4 hours
2 Medically treated injury
Minor effect (temporary contamination)
$1000—$10,000 1 shift
3 Lost time injury Local effect (recoverable environmental loss)
$10 000—$0.1m 1–2 days
4 Disability/single fatality
Major effect (severe damage, Recoverable)
$0.1m—$1m up to 1 week
5 Multiple fatalities Massive effect (widespread long-term damage)
> $1m 2–4 weeks
4 .10 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
CO N S E Q U E N C E A S S E S S M E N T O F R E L E A S E O F H A Z A R D O U S C H E MI C A L S
Hazardous chemicals are those that are flammable, combustible or toxic. Some can be both
flammable and toxic (e.g. ammonia). Other terms used include 'hazardous substances' for
materials that are toxic and 'dangerous goods' for materials that are flammable or
combustible and sometimes also toxic. In Australia, these terms are about to be replaced by
the term 'workplace hazardous chemicals' which will cover all types of materials that can
give rise to hazardous situations. Many different industries produce, use, transport or store hazardous chemicals. These
include:
chemical process industry
utilities (e.g. water treatment)
mining and mineral processing
gas industry
transport industry (handling and storage of fuel)
construction industry
agriculture
manufacturing. When the release of a hazardous chemical occurs, the consequences vary depending on the
physical properties of the chemical and the pressure and temperature at which it is stored.
The four types of release events are:
1. Release of liquid from atmospheric storage. The boiling point of the liquid is generally
well above ambient temperature.
2. Release of liquid stored under pressure above its boiling point. Examples include:
liquefied petroleum gas (LPG), which is stored as a liquid under pressure
other liquids with boiling points above ambient temperature, but processed at
much higher temperatures under pressure such as in chemical/petroleum
processing plants.
3. Release of gas from pressurised containers.
4. Release of cryogenic liquid stored at normal pressure, which vaporises at ambient
temperature and rapidly expands in volume. For inert cryogenic liquids, the main
hazard is displacement of oxygen. The size of the release is estimated by examining the spectrum of possible failures and
identifying those that could occur on the site under investigation. In descending order of
magnitude of effect, the spectrum of possible failures comprises:
immediate catastrophic rupture of pressure vessels
large leaks from atmospheric storage vessels
complete rupture of large pipes
large leaks in pressure vessels
large holes in large pipes
complete rupture of small pipes
fitting and flange leaks. Specific types of vessel leaks include:
a) Small leaks to sizes < 6 mm.
b) Full bore leak from a nozzle on the vessel. A range of sizes may be used, typically
from 25 mm to 150 mm.
c) A flange gasket leak on all the nozzles (equivalent to 6 mm–10 mm hole).
4 .11 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Based on the liquid level in a given vessel, leaks from both vapour space and liquid space
are considered.
Specific types of pipework leaks and ruptures include:
a) Flange gasket leak, between two adjacent bolts, giving an equivalent hole size of
6 mm–10 mm, depending on the type of gasket.
b) A 20 mm hole (typical instrument nozzle size).
c) A 25 mm–50 mm hole (pipe rupture).
d) Full bore leak. Since full bore failures of large diameter pipes are unlikely, or are due
to impact effects, the leak size can be restricted to 150 mm maximum.
The resulting list of possible failure scenarios forms the basis of the consequence analysis.
These are normally divided into a few discrete scenarios for ease of analysis.
A brief discussion of the concepts behind consequence modelling is provided on the
following pages. Detailed equations required for the analysis are not described as they refer
mainly to the chemical process industry and are not of interest to all engineering disciplines.
RELEASE OF LIQUID FROM ATMOSPHERIC STORAGE
The driving force for the liquid release is the hydrostatic head of liquid in the storage
vessel; as there is no static pressure at atmospheric storage on release, the liquid would
spread on the ground as a pool. Normally a bulk liquid storage is provided with secondary
containment by bunds or dykes. The bunds would restrict the size of the spreading liquid
pool. If ignited, a pool fire would result.
Depending on the vapour pressure of the liquid, it may slowly evaporate and disperse in the
air. If the liquid is toxic, then exposure to the vapour from evaporating liquid could have
toxic effects. Figure 4.3 shows the possible consequences of a liquid release from
atmospheric storage and an example is provided below. Example 4.3
A hose rupture occurs during the transfer of petrol from a bulk tanker to an
underground storage tank in an automotive retail outlet. The discharge is by gravity
and no pumping is involved. The tanker is at atmospheric pressure.
The area is not bunded, therefore the right-hand branch of Figure 4.3 would apply.
The sequence is:
Leak occurs.
Leak spreads to form a pool.
Product vaporises and disperses.
Driver/onsite personnel attempt to stop the leak.
Vapour contacts an ignition source and flashes back to form a pool fire.
Tanker engulfed by fire.
If the leak is isolated, the duration of the fire will be limited to a few minutes.
While structural damage may not occur, injury to people is possible.
If attempts at isolating the leak are unsuccessful, a major fire will result causing
injury, possible fatality and structural damage.
If the material does not ignite, then the spill may flow into the stormwater drain.
There is potential for explosion in the drain and for environmental pollution.
4 .12 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 4.3: Consequences of liquid release from atmospheric storage
Spi
ll in
a b
unde
d ar
eaw
ithi
n dy
ke
Liq
uid
pool
spr
ead
inbu
nd (
rest
rict
ed)
Igni
tion
Poo
l fir
e in
bun
d
No
igni
tion
Liq
uid
cont
aine
d w
ithi
n bu
nd
Igni
tion
Poo
l fir
e ou
tsid
e bu
nded
are
a
No
igni
tion
Liq
uid
spil
l to
envi
ronm
ent
Spi
ll o
utsi
de b
unde
d ar
ea
Liq
uid
rele
ase
Liq
uid
pool
spr
ead
(unr
estr
icte
d)
RELEASE OF LIQUID STORED UNDER PRESSURE ABOVE BOILING POINT
When a pressurised storage of liquid is released through an opening, the system behaviour
is dependent on the physical properties of the material and the pressure and temperature of
the released inventory.
If the release is directly from the vessel, the leak is generally a liquid leak. This will cause
an increase in the vapour space in the vessel and a reduction in pressure. The vapour space
is filled by flashing vapour from the liquid, with consequent reduction in temperature. This
process continues until the inventory is fully depleted. There will be a gradual reduction in
the leak rate as the static pressure decreases. For smaller inventory, this temperature
reduction is ignored for simplicity's sake and the release is treated as isothermal.
In the case of leaks from pipework, there is a length of pipeline between the liquid
inventory and the source of leak. There would be a significant drop in the pressure,
4 .13 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
resulting in the partial vaporisation (known as 'flashing') of the liquid in the pipeline. The
resulting leak is therefore a mixture of vapour and liquid, referred to as a two-phase flow.
The vapour would tend to choke the flow at the leak source to the choke velocity, i.e. the
maximum velocity (also referred to as the sonic velocity). Therefore, the resulting
two-phase flow will have aerosol droplets in the spray, part of which may rain out, and the
rest evaporate into the vapour phase.
As a rule of thumb, the release rate from a two-phase flow tends to be approximately
30%–40% of the liquid-release-only condition.
For large release rates, depressurising the inventory may have a significant effect on the
consequences. The leak rates would vary with time as depressurising progresses. An
integrated average rate is generally used for consequence impact assessment.
Adiabatic flash of released liquefied gas
Since the liquefied gas (e.g. propane, butane, anhydrous ammonia, liquid chlorine, liquid
sulfur dioxide) is stored under pressure above its atmospheric boiling point, when a liquid is
released into the atmosphere it will tend to expand rapidly. The initial expansion is so rapid
that there is no time for heat exchange between the product and the surroundings. Thus the
expansion may be assumed to be adiabatic.
The liquid would cool down to its atmospheric boiling point and form a pool on the ground.
The heat given up in the expansive cooling is taken up by part of the liquid itself to
vaporise. The ratio of the flash portion of liquid to the total release is known as adiabatic
flash fraction. Table 4.4 shows typical adiabatic flash fractions for a range of materials.
Table 4.4: Adiabatic flash fractions of selected substances (Storage/ambient temperature 20ºC)
No. Substance Adiabatic flash fraction 1 Propane 0.325 2 Butane 0.125 3 Ammonia 0.191 4 Chlorine 0.172 5 Sulphur dioxide 0.105 6 Vinyl chloride monomer 0.127
In consequence analysis, the calculated adiabatic flash fraction is normally doubled to allow
for the entrained aerosol fraction (Cox et al., 1990). This means that, for instance, if there
is a leak of LPG (propane), nearly two-thirds of it will flash off as a flammable gas cloud.
Example 4.4
A hose rupture occurs during the transfer of LPG from a bulk tanker to a static tank
in an automotive retail outlet. The discharge is by pumping at a pressure of
1000 kPa. The LPG has an atmospheric boiling point of –42ºC.
Since the hose may be treated as a section of flexible pipework, the leak path
between the tanker and the rupture point is greater than twice the diameter of the
hose and the release would be a two-phase gas/liquid spray release. The sequence is:
4 .14 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Leak occurs.
Twice the adiabatic flash becomes a vapour cloud (approximately 65%
according to Table 4.4).
The unflashed portion forms a liquid pool, but rapidly boils if receiving heat
from the ground and ambient air, which have a temperature of about 20ºC.
If safety systems isolate the leak quickly at both ends, the leak duration is
restricted to less than one minute. An ignition would result in a flash fire, but
possibly no explosion. Serious injury/fatality to exposed people may occur, but
no structural damage.
If the leak is unable to be isolated, then a vapour cloud explosion may result
following ignition, with severe structural damage, injury or fatality.
In the worst case, a phenomenon known as Boiling Liquid Expanding Vapour
Explosion (BLEVE) may occur. BLEVE concepts are discussed later in this
topic.
If ignition does not occur, then cold burns on exposed skin would occur because
the propane vapour temperature is very low.
RELEASE OF GAS
Gas releases are easier to model than flashing liquid releases, but a number of factors
should be considered.
If the pressure is greater than typically 2-bar, choke velocity would be reached in the
orifice, and the release is referred to as sonic flow or critical flow.
For sub-sonic releases, the turbulent momentum jet effect is significantly lower and
often ignored.
For sonic releases, the gas jet has significant momentum, resulting in air entrainment
into the jet. Prevailing meteorological conditions play a lesser role compared to jet
momentum effects.
Depending on the size of the leak and the inventory of gas between isolatable sections,
significant system depressurising can occur. This means that within a few seconds the
leak rate would be much lower than the initial release rate. Consequence modelling
based on the initial release rate alone could lead to pessimistic estimation of results.
In the case of gas releases from vapour space of vessels, the line friction is generally
ignored. However, if the release is from a long pipeline, then the release rate is
significantly reduced within a few seconds (typically 10% of the initial release rate), as
the frictional forces in the line dominate. Failure to consider this would lead to an
over-estimation of the consequences.
For gas releases from larger inventory, there would be a drop in temperature of the
system due to gas expansion. Modelling this temperature effect may be necessary to
ensure that the pipe material specification is adequate.
Instantaneous release of the inventory would result in adiabatic expansion, and flash
fire of the air-vapour mixture if ignited. However, if the discharge rate of the release is
controlled, then an ignition of flammable gas would result in a jet or torch fire, for
sonic releases.
The physical behaviour of a gas in a variety of release situations is depicted in Figure 4.4.
4 .15 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 4.4: Consequences of gas release
Pre
ssur
ised
leak
Inst
anta
neou
s le
akD
isch
arge
rat
e co
ntro
lled
Soni
c fl
ow
Low
pre
ssur
e le
ak(S
ub-s
onic
)
Gas
rel
ease
Imm
edia
te ig
nitio
nD
elay
ed ig
nitio
nN
o ig
nitio
n
Flas
h fi
reFl
ash
fire
/vap
our
clou
d ex
plos
ion
Dis
pers
ion
toat
mos
pher
e
Imm
edia
te ig
nitio
nD
elay
ed ig
nitio
nN
o ig
nitio
n
Jet f
ire
Flas
h/Je
t fir
eva
pour
clo
udex
plos
ion
Jet d
ispe
rsio
n to
atm
osph
ere
Imm
edia
te ig
nitio
nD
elay
ed ig
nitio
nN
o ig
nitio
n
Flas
h fi
reA
diab
atic
expa
nsio
nA
diab
atic
expa
nsio
n
Flas
h fi
re/v
apou
rcl
oud
expl
osio
nD
ispe
rsio
n to
atm
osph
ere
4 .16 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
A C T I V I T Y 4 . 2
Make an inventory of the (bulk) storage of hazardous chemicals within your
workplace. Using Figure 4.3 or Figure 4.4 as appropriate, identify the types of
release events that can occur and the potential consequences (e.g. pool fires, jet fires
from natural gas supply line failure, explosion from LPG container failure).
If your workplace does not have hazardous chemicals, use the following details
concerning an ammonia production facility to carry out this exercise.
Feedstock is a natural gas which is processed through a series of reactor vessels and
pipework to be chilled and then stored as liquid in two 40,000 tonne storage vessels.
The liquid ammonia is exported via a 4 km dual pipeline to a nearby port. Ammonia
is constantly circulated in the pipeline to keep it cool between export shipments,
which occur about every ten days. Other chemicals used in significant quantities are
gaseous chlorine (drawn from four 70 kg cylinders of liquefied chlorine) and smaller
amounts of liquid nitrogen and nitric acid. (If you are not familiar with the
properties of the chemicals, then consult the Material Safety Data Sheets (MSDSs)
for them. These are readily accessible via the internet.)
The natural gas feedstock is piped in to the facility. The nearest town is about 15 km
away and there are no closer inhabitants. In the port region there are approximately
200 workers. There is a beach about 2 km away from the plant which is popular on
weekends with the inhabitants of the nearest town. The only road to the beach
passes by the plant.
Keep your results for later activities.
CALCULATIONS FOR LEAK RATES
Whilst detailed calculation methods are not given in this topic, an overview of relevant
references is given in Table 4.5 for the interested reader. The equations can also be found
in any reference book on hydraulics or fluid mechanics.
Table 4.5: Release rate calculation methods
Leak type Release calculation method Reference Liquid leak Bernoulli equation Cox, Lees & Ang (1990)
Two-phase leak Fauske's equation Fauske & Epstein (1988)
Gas leak Bernoulli equation modified for gas compressibility
Cox, Lees & Ang (1990)
4 .17 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
FI R E C O N S E Q U E N C E A S S E S S M E N T
TYPES OF FIRES
Fires may be classified into the following categories:
1. Pool fires (flammable and combustible liquids)
2. Jet fires (gases/two-phase sprays)
3. Flash fires (flammable gas cloud ignition without explosion)
4. BLEVE (Boiling Liquid Expanding Vapour Explosion)
5. Storage facility fires (flammable and combustible materials)
6. Building and other fires.
Pool fires
A leak of flammable or combustible liquid from equipment or pipework will result in the
formation of a liquid pool on the floor. If this pool ignites before it can effectively drain, a
pool fire will result. Such fires can emit high heat radiation intensities which pose a risk to
people and may result in the failure of equipment and structures, if engulfed by fire.
A distinction must be made between the heat intensity experienced by an object outside the
pool fire and one that is engulfed in the fire. An object located at a distance from the pool
fire would experience mainly the heat radiation emanating from the flame surface. This
flame surface radiation flux (intensity) varies according to the fuel type and the amount of
soot and smoke generation. For low molecular weight fuels (e.g. LPG), the surface heat
flux is high because of cleaner flames, and is generally of the order of 100–120 kW/m2.
Experiments for crude oil fires have recorded flame surface radiation intensity of
approximately 20–40 kW/m2 (Considine, 1984), depending on the pool diameter. This low
figure is due mainly to the presence of appreciable soot and smoke in crude oil fires and the
surface heat flux is reported to drop rapidly with increasing pool diameter.
Objects engulfed in pool fires experience heat intensities from flame surface radiation flux,
flame impingement and heat convection. Tests on crude oil fires have recorded flame
temperatures of 920K (Husted and Sonju, 1985). A heat flux of 100 kW/m2 is generally
used for objects engulfed in a hydrocarbon pool fire.
Jet fires
If a flammable gas, under pressure, escapes through an orifice and ignites, the result may be
a 'jet' or 'torch' fire. Typical sources include flanges, holes in pipes and pipe fractures.
Such a fire can rapidly damage equipment because of the flame's intensity (high flame
temperatures due to turbulent mixing with air, high radiation efficiency) and its length.
Jet fires can cause significant damage with direct flame impingement on objects due to the
high heat fluxes involved. Although surface heat fluxes for jet fires are of the order of
200 kW/m2, heat fluxes up to 300 kW/m2 can be generated in direct flame engulfment.
In general, a jet flame impinging on a steel structure can raise its temperature to above
500ºC in less than 10 minutes, when the structure would lose its load bearing capacity.
Flash fires
If a flammable vapour cloud ignites but fails to explode because the rate of combustion is
too low to generate a percussive pressure wave, a flash fire of extremely short duration (2 to
5 seconds) will result.
4 .18 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Because the radiation from a flash fire is very high, it is a serious risk to those personnel
enveloped within the flammable cloud and to those very close to the flame. Flash fires do
not affect structures and equipment as the duration of exposure is too small.
Modelling flash fires involves estimating the dimensions of the flammable cloud using gas
dispersion models.
BLEVE
A BLEVE (Boiling Liquid Expanding Vapour Explosion) is defined as the sudden rupture
of a vessel/system containing liquefied flammable gas under pressure due to flame
impingement from an external fire. The vessel will usually rupture into a number of large
pieces which rocket considerable distances. This is accompanied by a large fireball and
some explosive pressure effects produced from the liquid expanding rapidly during the
propagation of fracture as the vessel ruptures. The pressure effects are generally minor
compared with the heat radiation from the fireball.
The surface heat flux in a BLEVE would be in the range of 250–350 kW/m2. It is modelled
as a rising fireball, approximated by a spherical geometry.
Whilst BLEVEs are associated with explosive effects causing structural failures, the thermal
radiation impact of a BLEVE is far more significant for exposed people because radiation
distances can be much larger than explosion effect distances. A 100 tonne LPG vessel in a
storage depot, if subjected to a BLEVE, can cause injury to personnel 1200 m away. The
LPG industry has constantly improved design and installation standards over the last decade
to minimise significantly the chance of such an event.
Storage facility fires
These fires are more common because storage facilities carry significant amounts of
combustible materials and some store hazardous chemicals. The major hazards associated
with the storage of flammable or combustible materials are fire and toxic products formed
by combustion or decomposition.
The main parameters of interest are the activation time and effectiveness of sprinkler
systems, distances from the storage facility at which critical radiation intensities occur, and
the dispersion of toxic gases downwind from the storage.
To quantify these dangers, it is necessary to study the growth of the fire and the
effectiveness of the installed sprinkler system. Once the fire has passed a point called
'flashover', where all fuel surfaces are burning, it will be virtually impossible to control the
fire. Flashover is a phenomenon when the temperature of the hot gas layer at the roof
exceeds the structural failure temperature of load bearing members.
As the stored materials burn, toxic gases form and rise in the fire plume due to buoyancy
effects. The dispersion of toxic gases can be modelled using a Gaussian model corrected
for release from the area source rather than a point source. This is necessary since the fire
covers the area of the storage facility and toxic gases are released from this burning area
(i.e. release is not from a point source).
A simplified flowchart for fast fire growth in storage facilities is shown in Figure 4.5.
4 .19 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 4.5: Simplified flowchart for fast fire growth in storage facilities
A number of software packages are available for assessing the effects of fires, including:
EFFECTS and DAMAGE—Calculation of physical effects of release of hazardous
materials and the damage effects. Developed by TNO in the Netherlands.
PHAST—Hazard consequence models for release of hazardous materials. Developed
by Det Norske Veritas (DNV) in Norway.
FRED (Fire Radiation Explosion Dispersion)—Developed by Shell Global Solutions in
the UK.
Firewind—Developed by Dr Victor Shestopal (who was formerly with CSIRO and has
since formed his own consultancy Fire Modelling and Computing).
CFAST—Developed by the Building and Fire Research Laboratory (BFRL) in the US
for modelling fires in large warehouse type of buildings.
Building and other fires
Building fires can also generate toxic smoke but the main issue is the ability of occupants to
escape safely. Fires in commercial buildings do not usually cause fatalities unless there are
inadequate exit routes and/or overcrowding, such as can occur in nightclubs. Even
non-enclosed buildings can result in fatalities if escape is impeded. The Bradford football
stadium fire in the UK in 1985 resulted in 52 deaths and 265 injuries. Fire spread rapidly in
the timber structure and many were unable to escape the intense heat in time. Those who
headed towards the exits rather than onto the ground were trapped because after the start of
the match the gates were kept locked to prevent gatecrashers. Tunnel fires can be catastrophic. In Austria in 2000, 155 people died in a fire onboard a
funicular railway as it passed through a 3 km tunnel. The fire was caused by a faulty heater
at the rear of the train. Those that escaped to the rear of the train survived as the tunnel
created a chimney effect for the toxic smoke.
Istime to activate
sprinklersless than time to
flashover?
Fast fire growth
Aresprinklersinstalled?
Yes
Yes
Yes
No
No
No
Flashover
Aresprinklerseffective?
Fully developed fire
Fire extinguished Fire effectsCombustion products
effects
4 .20 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
VULNERABILITY MODELS FOR FIRES
The effects of thermal radiation from fires are summarised in Table 4.6.
Table 4.6: Effects of thermal radiation
Heat flux
(kW/m2)
Effect
1.2 Received from the sun at noon in summer. 2.1 Minimum to cause pain after 1 minute. 4.7 Will cause pain in 15–20 seconds and injury after 30 seconds exposure (at least second
degree burns will result) 12.5 Significant chance of fatality for extended exposure. High chance of injury.
After long exposure, causes the temperature of wood to rise to a point where it can be readily ignited by a naked flame.
Thin steel with insulation on the side away from the fire may reach a thermal stress level high enough to cause structural failure.
23 Likely fatality for extended exposure and chance of fatality for instantaneous exposure. Spontaneous ignition of wood after long exposure. Unprotected steel will reach thermal stress temperatures which can cause failures. Pressure vessel needs to be relieved or failure will occur.
35 Cellulosic material will pilot ignite within one minute's exposure. Significant chance of fatality for people exposed instantaneously.
Source: Department of Planning, NSW, 1997b.
Fire effects on people
Exposure to radiation intensities from a large fire may result in either severe burns or
fatalities, as was the case in the Bradford stadium fire. The effect is a function of both the
intensity of radiation and the duration of exposure. Some results are shown in Table 4.7.
Table 4.7: Effects of thermal radiation on people A Thermal radiation intensity
(kW/m2) Effect Reference
1.5 Threshold of pain Atallah and Allan (1971) 2.1 Level at which pain is felt after 1 minute
1 Level just tolerable to a clothed man HSE (1978) 8 Level which causes death within minutes
4.7 Threshold of pain. Average time to experience pain, 14.5s Crocker and Napier (1986)
B Thermal dose
(kJ/m2) Effect Reference
40 Second degree burns Williamson and Mann 125 Third degree burns (1981)a
65 Threshold of pain Rijnmond Public 125 First degree burns Authority (1982) 250 Second degree burns 375 Third degree burns
c.100 Threshold of blistering Crossthwaite (1984)a 200 Blistering 700 50% fatality
65 Threshold of pain, no reddening or blistering of skin BS 5908: 1990 125 First degree burns 200 Onset of serious injury 250 Second degree burns 375 Third degree burns
Source: Lees, 1996: 16/249. a For thermal radiation from a fireball
4 .21 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Fire effects on structures and materials
The strength and stiffness properties of metals decrease as the temperature rises. Both the
yield stress and modulus of elasticity decrease with increasing temperatures. The intensity
of stress in a steel member influences the load carrying capacity. The higher the load stress,
the more quickly a member will fail at elevated temperatures. A temperature of 500ºC is
normally considered the critical temperature for unprotected steel. At this temperature the
yield stress in the steel decreases to about one half of the value at ambient temperature.
This is the approximate level normally used as the design working stress.
Experimental research has been undertaken on the effects of fires on offshore equipment
and structures. Shell Research conducted experiments on pipe sections (540 mm diameter
and 13 mm wall thickness) exposed to large-scale propane jet fires (Bennett et al., 1990).
For unprotected structures, it was found that a temperature of 900–1000ºC was reached
within ten minutes from the time of ignition. For structures protected by fire proofing e.g.
mandolite, the temperature did not exceed 100ºC even after 40 minutes exposure.
For exposure to hydrocarbon pool fires, temperature rise with time may be approximately
estimated using Figure 4.6. The time for failure in a jet fire is considerably shorter, less
than 50% of the time required for pool fire engulfment.
Figure 4.6: Average rate of heating of steel plates exposed to open gasoline fire on one side
Source: API, 2000.
4 .22 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
EX P LO S I O N C O N S E Q U E N C E A S S E S S M E N T
Explosions can be of several types. In the case of explosives such as TNT, there is
condensed phase explosion or detonation, generating a blast wave. In gas explosions the
mechanism is quite different, and the percussive pressure wave is generated by acceleration
of the flame front, which is increased by obstacles. Finally, in the case of explosions within
enclosures (gas, dust), the blast effect is due to rapid pressure rise from both volume and
temperature increase resulting from combustion.
Since the mechanisms of blast generation are vastly different, the same methodology cannot
be applied for all types of explosions. Some concepts relating to explosions and the effects
of explosions on people and structures are discussed below. No calculation methods are
provided. The interested reader is referred to Lees (1996) and IChemE (1994).
Detonation is defined as the sudden and violent release of mechanical, chemical or nuclear
energy from a confined space which creates a shockwave that travels at supersonic speeds.
It is sometimes used interchangeably with the word explosion.
The term condensed phase explosions covers the direct use of explosives such as in the
mining industry and military applications, and to some extent, explosions involving
oxidising agents such as ammonium nitrate.
The TNT equivalence model is used extensively for effects modelling. In the past, this
model was also used for gas explosions, but it was abandoned by practitioners when it was
recognised that the mechanism of gas explosion is vastly different to TNT explosions.
The result of an explosion is the generation of a pressure wave higher than atmospheric for
a short duration. The pressure wave above atmospheric pressure is referred to as
'overpressure', and the highest overpressure reached in the deflagration process is referred
to as 'peak overpressure'. The duration of this overpressure until it reduces back to
atmospheric pressure is referred to as the 'positive phase duration' (see Figure 4.7).
Deflagration is defined as the extremely rapid burning of a material. This is much faster
than normal combustion, but slower than detonation.
Figure 4.7: Typical overpressure time curve
4 .23 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The peak overpressure caused by the deflagration of a hydrocarbon and air mixture in a
totally enclosed space initially at atmospheric pressure is of the order of 8 bar, whereas a
detonation may give a peak overpressure as high as 20 bar with the possibility of higher
pressure at the point of transition. In contrast, combustion of a completely unconfined
cloud of fuel and air produces only a few millibar overpressure even if the cloud is
optimally premixed. A detonation generates much greater pressures and is much more destructive than a
deflagration. The conditions necessary to generate a detonation, i.e. very rapid acceleration
of the flame front or a powerful shock to the system, are not generally considered to occur
in gas explosions, but instead occur mainly in condensed phase explosions. Obstacles, i.e. equipment layout, will always increase the overpressure in gas explosions,
but to a greater or lesser extent depending on their profile, number, size and location, as
well as absolute scale. In exploring the effect of design modifications on reducing
overpressure in a plant, the following guidelines are suggested.
a) Minimise inventories wherever possible.
b) Minimise volumes of potentially explosive mixture, but be careful not to reduce the
vent area ratio to an unacceptable value.
c) Maximise vent areas, but be careful not to open up new pathways that would allow
additional flame acceleration through obstacle arrays and be careful not to create
potential for cascade events.
d) Minimise the obstructions in the flame path as the flame propagates.
VULNERABILITY MODELS FOR EXPLOSIONS
Explosion effects on people
Explosions can cause injury or fatality to people through the effects of heat radiation, blast
and combustion products. Injury from blast may be from direct and indirect blast effects
including overpressure, missiles and whole body translation. The effect of blast overpressure on people depends on the peak overpressure, the rate of rise
and the duration of the positive phase. The damaging effect of a given peak overpressure is
greater if the rise is rapid. A relatively high overpressure (>90 kPa) will cause fatalities from direct blast effects,
primarily due to lung haemorrhage (Lees, 1996). However, lower overpressures can also
result in fatalities due to indirect effects such as missiles and whole body translation. Estimating the injury effects from explosions is complex. The use of probit equations and
other mathematical methods cannot satisfactorily account for the complex effects of blast
impact on humans which may include:
overpressure effects on sensitive organs such as lungs
generation of high velocity fragments
dislocation of heavy equipment
'blowing' of person's body against hard and/or sharp surfaces
collapse of structures on the person. Risk analysts have developed qualitative guidelines for the effects of explosion
overpressures on people based on review of quantitative methods and past explosion
incidents. Table 4.8 provides a rough guide from which approximate fatality probability
can be assigned for various overpressure levels.
4 .24 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 4.8: Expected effects on personnel at various explosion overpressures
Overpressure
(kPa) Personnel injury
186 Personnel will be killed by blast, by being struck by debris, or by impact against hard surfaces.
83 Personnel will be subject to suffer severe injuries or death from direct blast, building collapse, or translation.
55 Personnel are likely to be injured seriously due to blast, fragments, debris and translation. There is a 15 percent chance of eardrum rupture.
24 Personnel may suffer serious injuries from fragments, debris, firebrands or other objects. There is a two percent chance of eardrum damage to personnel.
16 Occupants of exposed structures may suffer temporary hearing loss or injury from blast effects, building debris and displacement. Although personnel in the open are not expected to be killed or seriously injured by blast effects, fragments and debris may cause some injuries.
12 Occupants of exposed, unstrengthened structures may be injured by secondary blast effects, such as falling building debris. Although personnel in the open are not expected to be killed or seriously injured by blast effects, fragments and debris may cause some injuries.
6–8 Personnel in buildings are provided a high degree of protection from death or serious injury; however, glass breakage and building debris may still cause some injuries. Personnel in the open are not expected to be injured seriously by blast effects. Fragments and debris may cause some injuries.
Source: Based on United States Department of Defense Ammunition and Explosives Safety
Standards, DoD 6055.9-STD, October 5 2004: 28–31.
Explosion effects on structures
The pressure loading generated by explosions and deflagrations has complex effects on
structures and structural components. High combustion rates produce a pressure loading
that varies with time, and the response of the structure to this variable load is itself time
dependent. The usual practice is to convert the pressure-time characteristics into an
equivalent static loading which is more convenient for structural response calculations.
In general, the structural response broadly depends on the peak overpressure and the ratio
of the duration of the imposed pressure load (td) to the natural period of vibration (tn) of the
structure. The duration of the main overpressure peak in a vented or partially confined
vapour cloud explosion is typically of the order of 100–200 milliseconds (ms). The natural
period of vibration of structural building components depends on the method of
construction and size of components, but typically lies in the range 10–50 ms. Since the
duration of the overpressure is generally larger than the natural period of vibration of the
structural element, the loading experienced will be equivalent to a static load of magnitude
equal to the peak overpressure generated by combustion.
The few experimental studies that have investigated the response of structures to gas
explosions have been confined to typical building materials. Some extremely rough
estimates on the effect of various overpressures on equipment and structures are shown in
Table 4.9. It is not possible to present satisfactory approximations for explosion
overpressure damage because of the complexity of these effects. The severity of these
effects is dependent not only on the peak overpressure but also on the duration, blast wave
reflections and the structural properties of the equipment.
4 .25 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 4.9: Effect of explosion overpressures on structures
Overpressure range (kPa)
Damage effect
70+ Pumps, compressors, vertical pressure vessels, turbines, damaged. Pipes ruptured and damaged. Equipment displaced off mountings.
35–70 Horizontal pressure vessels and heat exchangers damaged. Pipe breaks at flanges. Damage to thin walled steel equipment. Complete demolition of houses.
14–35 Control room, switch room walls damaged. Steel panels damaged. Houses uninhabitable.
7–14 Cladding, insulation damaged. 4–7 Windows broken, hot-glass breakage, glass fragments fly. Damage to internal
partitions and joinery, but can be repaired.
TOX I C I T Y C O N S E Q U E N C E A S S E S S M E N T
This section is primarily of interest to those involved in the storage, handling and
processing of toxic chemicals, therefore the description is brief and qualitative. References
for further information are provided for the interested reader.
EXPOSURES
Toxic effect models are employed to assess the consequences to human health of exposure
to toxic substances. There are two types of exposures.
1. Acute exposures
These can occur from accidental release of toxic substances to the atmosphere. An
example is exposure to chlorine gas in a water treatment facility from a failure of the
pipework/fitting. Other examples may include exposure to toxic fumes from a cargo
spill as a result of a truck accident on the road.
In general, acute exposures to small doses may not have a long-term effect on the
persons exposed. However, larger doses may cause irreversible damage and in some
instances can be fatal.
2. Chronic exposures
The term 'chronic exposure' is generally taken to mean regular exposures to small doses
of the toxic substance that may result in adverse health effects after a long period.
Examples of chronic exposure may include occupational exposure to chemicals in the
workplace and small dose exposures to users of contaminated land/groundwater.
Toxic responses caused by acute exposures to hazardous materials are difficult to evaluate
for several reasons (CCPS, 1999).
1. Humans experience a wide range of acute adverse health effects including irritation,
narcosis, asphyxiation, sensitisation, blindness, organ system damage and death. In
addition, the severity of many of these effects varies with intensity and duration of
exposure. For example, exposure to a substance at an intensity that is sufficient to
cause only mild throat irritation is of less concern than one that causes severe eye
irritation, lacrimation or dizziness, since the latter effects are likely to impede escape
from the area of contamination.
2. There is a high degree of variation in response among individuals in a typical
population. Factors such as age, health and degree of exertion affect toxic responses.
Generally, sensitive populations include the elderly, children and persons with diseases
that compromise the respiratory or cardiovascular system.
4 .26 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
3. For the overwhelming majority of substances encountered in industry, there is not
enough data on toxic responses of humans to permit an accurate or precise assessment
of the substance's hazard potential. Frequently, the only data available is from
controlled experiments conducted with laboratory animals to estimate likely effects in
humans. This extrapolation requires the professional judgment of a toxicologist.
4. Many releases involve multi-components. There are presently no 'rules' on how these
types of releases should be evaluated. Are they additive, synergistic or antagonistic in
their effect on the population? As more information is developed on the
characterisation of multi-component releases from source and dispersion
experimentation and modeling, corresponding information is needed in the toxicology
arena. Unfortunately, even toxic response data of humans to single component
exposures are inadequate for a large number of chemical types.
5. No toxicology testing protocols exist for studying episodic releases on animals. This
has been a neglected aspect of toxicology research. There are experimental problems
associated with testing toxic chemicals at high concentrations for very short durations
in establishing the concentration/time profile. In testing involving fatal concentration/
time exposures, there is the question of how to incorporate early and delayed fatalities
into the study results.
Despite the difficulty in accurately calculating the toxicological responses, there are some
established methods that can be used for risk assessment purposes. These are discussed in
the next section.
EFFECT MODELS FOR TOXIC RELEASES
When a toxic substance is released, a number of things occur depending on the nature of the
material.
If the material is a gas at ambient temperatures and pressure, it vaporises instantly and
disperses downwind.
If the material is a liquid at ambient conditions, it forms a spreading pool. If the liquid
is volatile, it evaporates and the vapour disperses downwind.
If the material is relatively non-volatile, then it affects only those who are in the
immediate vicinity through inhalation and/or dermal contact.
Gas dispersions can be in the form of a puff or plume. Plumes refer to continuous burst
emissions, and puffs to burst emissions that are of short duration compared with the travel
time or sampling time.
The following factors affect the concentration of toxic gases as they disperse in the
atmosphere:
Nature and physical properties of the gas.
Wind speed and atmospheric (Pasquill) stability class. The latter is a parameter that
defines the dispersion characteristics. It is a measure of the vertical mixing of the
dispersing gas as a result of temperature variation with height (known as the lapse rate).
Surface roughness.
Momentum of gas released vertically, causing plume to rise.
Air entrainment in the vicinity of the escape point.
Density and buoyancy effects.
Atmospheric chemistry and stability.
Terrain effects.
4 .27 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 4.8 shows the two types of toxic releases that can be modelled:
Neutrally buoyant dispersion for gases whose density is close to that of air.
Heavy gas dispersion for gases that are denser than air and tend to hug the ground.
Whilst a number of mathematical models are available for this purpose, they should only be
used by trained personnel because the source term specification requires skill and the results
may be incorrectly interpreted.
Figure 4.8: Flowchart for toxic release
Toxic release
Determine exposure durationbased on incident analysis
Determine concentration fromdispersion model
Calculate toxic dose
Neutrally buoyant
Plume Puff
Dense gas
Apply probit equationof toxic dose
Evaluate dose-response relationships
Fatality effects Injury effects
Determine probabilityof fatality
Determine toxicexposure category
Plume Puff
VULNERABILITY MODELS FOR TOXIC RELEASE
The consequences to an individual of a toxic release exposure can be expressed in terms of
a probability of the effect (fatality or injury). The type and severity of the effects of a toxic
gas or vapour depends on its concentration and the exposure duration.
The inhalation of toxic gases can cause a wide range of effects. These may be severe and
result in fatality, or they may be mild, such as irritation of the throat or eyes. A summary is
given in Table 4.10.
4 .28 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 4.10: Possible effects from toxic exposure
Effect Mode Irritation Respiration (chlorine, sulphur dioxide, ammonia etc.) Skin Eyes Narcosis Respiration (hydrocarbon) Asphyxiation Simple (nitrogen, helium). Inert gas displaces oxygen Chemical (carbon monoxide, hydrogen cyanide) Systemic damage Irreversible effects
The dose-response relationship is generally non-linear, that is as the concentration
increases, the time required to produce a given level of fatality decreases rapidly. For low
concentration exposure effects, the American Conference of Governmental Industrial
Hygienists (2003) has developed an On-Site Emergency Response Planning Guide which
recommends concentrations for different chemicals for up to one-hour exposures.
Toxic gas concentrations that may be injurious or cause distress to exposed people can be
used for consequence assessment, where no fatality is involved. The exposure levels have
been determined from available animal toxicology data and human experience. In
Australia, short-term and long-term exposure limits are specified by the Australian Safety
and Compensation Council (ASCC), formerly the National Occupational Health and Safety
Commission (NOHSC). The Environmental Health Criteria series published by WHO for
a number of chemicals provides valuable information on dose-response for low-level
exposures.
ST RU C T U R A L FA I LU R E C O N S E Q U E N C E A S S E S S M E N T
Failure of critical structures can result in:
loss of life or injury
major environmental damage
financial loss from business interruption
consequential losses such as liability and lost opportunity costs.
Typical examples of structural failure accidents are:
failure of dams and water reservoirs causing flooding downstream
failure of bridges and crossings
failure of tailings dams in mine sites affecting people and the environment
building collapses or partial collapses, e.g. roofs.
Quantitative assessments of structural failure consequences and safe operating envelopes
are generally made using stress analysis for the static and dynamic loading for the geometry
and design. Since these studies are often conducted at the design stage and sufficient safety
margins are then allowed in the design, the probability of failure is very low. In the case of
a tailings dam, the embankment is progressively raised as the mining activity continues and
the shear strength may vary according to the rock material. An assessment method is
described by Jackson and Fell (1993).
4 .29 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Linear finite element analysis is commonly used to calculate displacements and the resultant
stresses in the loaded components. This is applicable where the displacements are small
compared to component size, and the resulting stresses are below the yield stress of the
material. For major failure scenarios, the displacements are excessive and the stresses
exceed the yield stress, therefore, non-linear finite element analysis must be used. This is a
specialist area and appropriate advice must be sought.
The Australian National Committee on Large Dams (ANCOLD) has developed guidelines
for risk assessment of dam failures (ANCOLD, 2003). The guidelines recommend the use
of the US Bureau of Reclamation (USBR) method for failure consequence assessment. The
following steps are involved.
Identify modes of failure
Determine inundation areas
Assess threat to life
Assess economic damage
Determine environmental impacts.
The likelihood assessment would involve a probabilistic analysis which we will discuss in
Topic 5. Many related references are listed in the ANCOLD guidelines (2000) for damage
failure consequences.
PRO J E C T R I S K I M PAC T A S S E S S M E N T
The consequences of project risks are mainly related to costs. Safety and environmental
impacts can be covered by techniques described earlier. The commercial impact from
variations in key cost parameters can result in:
project cost overruns
project schedule delays
operating cost estimate blow-outs.
Life cycle costs are generally considered in project cost impact assessments rather than any
single cost in isolation.
SENSITIVITY ANALYSIS
Sensitivity analysis is used to identify the impact on the total cost from a change in a single
risk variable. The main risk variables or parameters in project risk are:
design cost
capital equipment cost
construction cost
project schedule
operating cost
maintenance cost
abandonment cost (when considering life cycle)
miscellaneous costs (land purchase statutory approvals, etc.).
The major advantage of sensitivity analysis is that it explicitly shows the robustness of the
ranking of alternative projects. It also identifies a point at which a given variation in the
expected value of a cost parameter changes a decision.
4 .30 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Flanagan and Norman (1993) describe the spider diagram technique for using sensitivity
analysis. The steps described by the authors are as follows.
1. Calculate the expected total life cycle cost by using expected values.
2. Identify the variables subject to risk.
3. Select one risky variable or cost parameter and re-calculate the total life cycle cost
using different assumptions about the value of this parameter. The life cycle chosen is
recalculated assuming that the cost parameter changes by 1%, 5%, and so on.
4. Plot the resulting life cycle costs on the spider diagram, interpolating between the
values. This generates the line labelled 'parameter 1' as shown in Figure 4.9.
5. Repeat steps 3 and 4 for the other risky variables.
The flatter a given parameter line is, the more sensitive the life cycle costs will be to
changes in that parameter. For example, in Figure 4.9, the life cycle costs are much more
sensitive to variation in parameter 1 than to variation in parameter 2.
Spider diagrams become difficult to read when too many variables are plotted. The
practical answer is to have several spider diagrams. Flanagan and Norman (1993)
recommend having one spider diagram for the financial and capital aspects of the project,
and a separate spider diagram for running costs.
Figure 4.9: Spider diagram for sensitivity analysis
Parameter 1
Parameter 2Parameter 3
Expected totallife cycle cost
Life Cycle Cost
% v
aria
tion
in P
aram
eter
+5%
+4%
+3%
+2%
+1%
0
-1%
-2%
-3%
-4%
-5%
-6%
Source: Flanagan & Norman, 1993: 99.
4 .31 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
SUMMA RY
In this topic we have discussed the third step of the risk management framework: estimating
the severity of the consequences should a loss event occur. We focused on two types of
models that are used to estimate the consequences of a loss event:
effect models which are usually mathematical and are used to model calculations
vulnerability models which are usually empirical and are used to quantify damage.
We emphasised the importance of accounting for couplings, interactions and event
dependencies wherever possible to provide a full picture of the risk.
Since most quantitative assessments of consequence severity require specialist assistance,
we explained how to carry out an initial qualitative assessment in order to determine the
extent of quantification required. We then provided a basic overview of the quantitative
consequence assessments that can be conducted by trained specialists for hazardous
chemical releases, fire, explosions, toxicity and structural failure. We concluded the topic
with a brief discussion of how to conduct a sensitivity analysis of project risks.
4 .32 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
EX E RC I S E S
4.1 QUALITATIVE SEVERITY LEVEL ASSESSMENT
For the following risk scenarios, ascribe a qualitative severity level to each consequence
using the sample consequence table shown in Table 4.3. Give reasons for the ranking
selected.
a) A small leak of chlorine gas occurs from the storage facility at a swimming pool
complex, resulting in a concentration that can cause coughing and distress. A group of
primary school children is visiting the complex for swimming lessons.
b) A leak occurs during transport of LPG in a bulk tanker on a section of highway. The
gas ignites and the jet flame starts to impinge on the vessel. The driver notices it and
stops the vehicle, then stops all the traffic at some distance from the tanker. The vessel
ultimately fails resulting in a BLEVE.
c) A new rail link is constructed under a BOOT scheme (Build, Own, Operate, Transfer)
between two airport terminals and a suburban train station that leads to the city. The
company must forecast a certain passenger volume and revenue prior to undertaking
the project. If the passenger volumes are not met, the company may face a financial
risk.
d) A large water storage dam has outlet pipes that feed a water filtration plant. The
motorised isolation valves on the pipes are of an old design and cannot be closed
during flow. To close the valve, the pressures between the parallel pipelines have to be
balanced by opening a balance valve, and then the required valve may be closed.
Should a failure occur on the pipeline, there would be uncontrolled flow from the dam,
and it may take several days before the flow can be stopped by blocking the inlet to the
pipe on the dam side.
e) In high temperature ore smelters, accretions build up on the furnace walls and from
time to time, small quantities of explosives are used to break up the accretions. Care
must be taken to ensure that premature initiation of the explosive does not occur until
the operator has time to move away from the location. Should an accident occur, the
amount of explosive in the charge can generate a blast overpressure of 10 kPa near
where the operator is standing (see Table 4.8).
4.2 IDENTIFICATION OF INFORMATION REQUIREMENTS
For the following risk scenarios, list the information you would need to gather to enable an
external specialist to undertake a quantitative hazard consequence analysis. The types of
hazards for (a) to (e) are given in the suggested answer to Exercise 1.1 in Topic 1.
a) Storage of chlorine gas for public swimming pool disinfection.
b) Delivery of LP gas from bulk tanker to suburban automotive retail outlet.
c) Handling heavy items by crane for construction of a high-rise building.
d) Movement of large oil tankers carrying crude oil supply to a marine terminal.
e) Material defect identified in a cross-country high-pressure natural gas pipeline.
g) Software development for inventory management in a large retail store.
4 .33 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
RE F E R E N C E S A N D F U RT H E R R E A D I N G
Publications
American Conference of Governmental Industrial Hygienists (2003) On-Site Emergency
Response Planning Guide.
Australian National Committee on Large Dams (ANCOLD) (2000) Guidelines on
Assessment of the Consequences of Dam Failure.
Australian National Committee on Large Dams (ANCOLD) (2003) Guidelines on Risk
Assessment.
API (2000) API RP520 Design and Installation of Pressure-Relieving Systems in
Refineries: Part 1—Sizing and Selection, 7th edn, American Petroleum Institute,
Washington, DC.
Atallah, S. & Allan, D.S. (1971) 'Safe separation distances from liquid fuel fires', Fire
Technology, 7(1):47.
Bennett, J.F. et al. (1990) Shell Offshore Flare Impingement Protection Programme:
Part 3—Performance of Charkel Type III Coated Specimens, Shell Research Limited,
Thornton Research Centre.
CCPS (1999) Guidelines for Chemical Process Quantitative Risk Analysis, Center for
Chemical Process Safety, American Institution of Chemical Engineers, New York.
Considine, M. (1984) Thermal Radiation Hazard Ranges from Large Hydrocarbon Pool
Fires, SRD, UK.
Cox, A.W., Lees, F.P. & Ang, M.L. (1990) Classification of Hazardous Locations,
IChemE, Rugby, UK.
Crocker, W.P. & Napier, D.H. (1986) 'Thermal radiation hazards of liquid pool fires and
tank fires', Hazards X, Hazards in the Process Industries, IChemE Symposium series
No. 97: 159–183.
Crossthwaite, P.J. (1984) 'HSE's approach to the control of developments near to notifiable
LPG installations', in Petts, J.I. (ed.) Major Hazard Installations: Planning and
Assessment, Seminar at the Department of Chemical Engineering, Loughborough
University of Technology.
Department of Planning, NSW (1997a) Hazardous Industry Planning Advisory Paper
No. 4: Risk Criteria for Land Use Safety Planning. NSW Department of Planning,
Sydney.
Department of Planning, NSW (1997b) Hazardous Industry Planning Advisory Paper
No. 6: Guidelines for Hazard Analysis, NSW Department of Planning, Sydney.
Fauske, H.K. & Epstein, E. (1988) 'Source term considerations in connection with chemical
accidents and vapour cloud modelling', Journal of Loss Prevention in the Process
Industries, volume 1.
Flanagan, R. & Norman, G. (1993) Risk Management and Construction, Blackwell
Scientific Publications, Oxford, England.
Health and Safety Executive (HSE) (1978) Canvey: An Investigation of Potential Hazards
from Operations in the Canvey Island/Thurrock Area, HMSO, London.
4 .34 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Husted, J. & Sonju, O.K. (1985) 'Radiation and size scaling of large gas and gas-oil
diffusion flames', 10th International Colloquium on Dynamics of Explosion and
Reactive Systems, Berkeley, CA.
IChemE (1994) Major Hazards Monograph: Explosions in the Process Industries, IChemE
Major hazards monograph, 2nd edn, A Report of the Major Hazards Assessment Panel,
Overpressure Working Party, 1994, Institution of Chemical Engineers, Rugby.
Jackson, S.D.F. & Fell, R. (1993) 'A risk based approach to the characterisation of mine
waste rock embankments', in R.E. Melchers & M.G. Stewart (eds), Probabilistic Risk
and Hazard Assessment, A.A. Balkema, Rotterdam: 95–109.
Lees, F.P. (ed.) (1996) Loss Prevention in the Process Industries: Hazard Identification,
Assessment and Control, 2nd edn, Butterworth-Heinemann, Oxford.
National Occupational Health & Safety Commission (1995) Exposure Standards for
Atmospheric Contaminants in the Occupational Environment, Guidance Note
[NOHSC:3008 (1995)] and National Exposure Standards:[NOHSC:1003 (1995)]
National Occupational Health & Safety Commission (1996) National Standard for the
Control of Major Hazard Facilities [NOHSC:1014 (1996)]
Paté-Cornell, M.E. (1993) 'Learning from the Piper Alpha accident: A postmortem analysis
of technical and organizational factors', Risk Analysis, 13(2): 215–231.
Standards Australia (1997) AS 2885.1–1997 Pipelines—Gas and Liquid Petroleum—
Design and Construction, Standards Australia, Sydney.
Standards Australia (1997) AS/NZS 4452:1997 The Storage and Handling of Toxic
Substances, Standards Australia/Standards New Zealand, Sydney.
Standards Australia (1998) AS/NZS 3931:1998 Risk Analysis of Technological Systems—
Application Guide, Standards Australia/Standards New Zealand, Sydney.
Standards Australia (2004) AS 1940–2004 The Storage and Handling of Flammable and
Combustible Liquids, Standards Australia, Sydney.
Standards Australia (2004) AS/NZS 4360:2004 Risk Management, Standards Australia/
Standards New Zealand, Sydney.
Standards Australia (2004) HB 436:2004 Risk Management Guidelines: Companion to
AS/NZS 4360:2004, Standards Australia/Standards New Zealand, Sydney.
TNO (1996) Methods for the Calculation of the Physical Effects of the Escape of
Dangerous Material, TNO Institute of Environmental and Energy Technology,
Apeldoorn, The Netherlands (known as 'The Yellow Book').
United States Department of Defense (2004) DOD Ammunition and Explosives Safety
Standards, DoD 6055.9-STD, October 5, US Department of Defense, Washington DC,
http://www.dtic.mil/whs/directives/corres/pdf/p60559std_100504/p60559s.pdf,
accessed 29 September 2006.
United States Department of Energy Quality Managers (2000) Software Risk Management:
A Practical Guide, US Department of Energy, available at:
http://cio.energy.gov/documents/sqas21_01.doc, accessed 13 December 2006.
United States Environmental Protection Agency, Chemical Emergency Preparedness and
Prevention Office (1999) Risk Management Program Guidance for Offsite
Consequence Analysis, http://www.epa.gov/ceppo, accessed 28 September 2006.
Williamson, B.R. & Mann, L.R.B. (1981) 'Thermal hazards from propane (LPG) fire balls',
Combustion Science Technology, 25: 141.
4 .35 TO P I C 4 ES T I M AT I N G T HE
S E V E R I T Y O F
C O N S E Q U E N C E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Websites
Standards Australia http://www.standards.com.au
http://www.riskmanagement.com.au
BSI British Standards http://www.bsi-global.com
BFRL—CFAST software http://fast.nist.gov
DNV Software—PHAST http://www.dnv.com/software/all/phast/product Info.asp
Fire Modelling & Computing—FireWind http://members.optusnet.com.au/~firecomp
International Standards Organization http://www.iso.org/iso/en/ISOOnline.frontpage
Shell Global Solutions—FRED software http://www.shell.com/static/globalsolutions-en/downloads/services_and_technologies/business_consultancy/hse/cts_bc_hse_fred.pdf
TNO—EFFECTS and DAMAGE software http://www.tno.nl/bouw_en_ondergrond/ producten_en_diensten/software/industriele_veiligheid/index.xml
UK Health and Safety Executive http://www.hse.gov.uk
US Defense Technical Information Centre http://www.dtic.mil
US Environmental Protection Authority http://www.epa.gov
SU G G E S T E D A N S W E R S
EXERCISES
4.1 Qualitative severity level assessment
a) The impact level is described as producing coughing, distress. Since children are
present, if they are exposed, there is potential for serious injury, not simply distress.
Therefore, the severity level from Table 4.3 is Level 3.
b) Thanks to the timely action of the driver in stopping all traffic, there is unlikely to be a
fatality. If the BLEVE had occurred without this action, the driver and other motorists
nearby would have been fatally injured. From Table 4.3, this is a Level 5 incident.
c) It is difficult to rank this incident without having some information on the extent of
revenue loss that may occur if expected passenger volumes are not achieved. If it is of
the order of Level 4 or 5 per year (losses in the millions of dollars), the viability of the
operation is threatened.
d) Uncontrolled flow of water from a large dam not only causes environmental damage
downstream due to flooding, but also results in loss of water supply from the dam.
Alternative supplies have to be found and the cost of transportation is very high. The
cost of this event would be in tens of millions of dollars, and hence it is a Level 5
incident.
e) From Table 4.8, the impact of a 10 kPa explosion overpressure would not result in
serious injury, unless hit by flying debris. Since there is insufficient information
available, we can conservatively assess this to be a Level 3 lost time injury rather than a
Level 2 medically treated injury.
4.2 Identification of information requirements
a) Quantity of chlorine, method of storage, location of storage, ventilation rate of storage
room, size of connections from the storage to chlorination point, location of chlorine
detector, whether chlorine alarm can be heard at all locations in the facility, response
procedures to an alarm, and the pressure of chlorine in storage and physical properties
of chlorine.
b) The amount of LPG carried by the tanker, the fittings and connections in the tanker, the
size of hose, the emergency isolation valves on the tanker and how they are operated,
ignition sources near the unloading area, the pressure of LPG in the tanker and physical
properties of LPG.
c) Size and capacity of the crane, height of lift, operating load as a percentage of total
load capacity of the crane, operating envelope with respect to the operating load, type
of rigging, method of securing to load during lifting, communication procedures
between crane driver and dogman, area to be cleared of people during lift, potential for
the load to swing, wind conditions.
d) Volume of crude oil carried by tankers, physical properties of crude oil, tanker speed,
whether or not it is being piloted, other users of the waterway, weather conditions, leak
detection method, spill response procedures.
e) Diameter of pipeline, wall thickness, maximum allowable operating pressure of
pipeline, physical properties of natural gas, length of pipeline, operating pressure in the
pipeline, soil conditions, location of nearest valve stations, leak detection mechanism,
response to leak alarms, time for isolation, sensitive land uses along pipeline route such
as population centres, river crossings, etc.
f) Purpose of software, details of specification of software, validation and testing
methods, complexity and user friendliness, software system architecture, error
diagnostic potential, backup/recovery systems.
TO P I C 5
ESTIMATING EVENT LIKELIHOOD AND MEASURING AND RANKING RISK
Preview 5.1 Introduction 5.1 Objectives 5.1 Required reading 5.1
Probability and frequency 5.2
Qualitative estimation of likelihood 5.3
Estimation of likelihood using statistical data 5.3 Failure rates 5.4 Sources of failure rate data 5.4 Typical failure rate data 5.6 Adjusting for the effects of safety and maintenance management systems 5.8 Human reliability analysis (HRA) 5.9 Calculating event frequency from historical data 5.12 Probability distributions 5.14 Reliability and availability 5.21 Screening reliability data 5.25
Estimation of likelihood using analytical techniques 5.28 Fault tree analysis 5.28 Event tree analysis 5.29 Cause–consequence analysis 5.31
Risk measurement and ranking 5.32 Qualitative risk matrix approach 5.33 Approaches for risk to people 5.34 Approaches for risk to projects 5.39
Summary 5.42
Exercises 5.42
References and further reading 5.44
Appendix 5.1 5.48
Readings
Suggested answers
5.1 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PR E V I E W
INTRODUCTION
In this topic we examine the fourth and fifth steps of the risk management framework:
estimating the likelihood of a loss event occurring and measuring and ranking the overall
level of risk.
There are two dimensions that need to be taken into account in likelihood estimates: event
probability and event frequency. We therefore begin this topic with a discussion of the
distinction between these dimensions.
We then discuss three basic approaches to estimating the likelihood of loss events:
1. A simple qualitative approach that can be used before undertaking a detailed estimation
to help decide which of the two quantitative approaches is most appropriate to a given
scenario.
2. A quantitative approach using statistical data to estimate the likelihood of loss events
caused by single failures. This is sometimes called the 'historical approach' or the
'actuarial method' and is often used in the insurance industry. It is also used by
organisations to estimate the likelihood of low consequence/high frequency and
medium consequence/medium frequency loss events such as workplace injuries, short
production interruptions caused by equipment breakdowns and non-conformance in a
quality assurance system.
3. A quantitative approach using analytical techniques such as a fault tree analysis, an
event tree analysis or a cause–consequence analysis to estimate the likelihood of loss
events caused by multiple failures, by breaking them down into their contributing
causes. This approach is commonly used for high consequence/low frequency loss
events such as major fires or explosions, structural collapses or dam failures because
the infrequency of such events means that limited statistical data is available and
circumstances and contributing factors are generally complex and change between
event occurrences (e.g. new designs, management systems and operations and
maintenance philosophies).
Once the likelihood of a loss event has been estimated, the overall level of risk can be
measured by combining the consequence severity estimate with the likelihood estimate.
The results can then be ranked according to magnitude of risk. We will therefore conclude
the topic by discussing a range of techniques for measuring and ranking risk. OBJECTIVES
After studying this topic you should be able to:
distinguish between probability and frequency
conduct simple qualitative assessments of likelihood for initial screening
estimate event frequency using statistical data
estimate event probability and assess the level of uncertainty in the result
construct simple fault trees and event trees
measure and rank risks to people and projects using appropriate methods. REQUIRED READING
Reading 5.1 'Fault trees'
5 .2 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
PRO BA B I L I T Y A N D F R E Q U E N C Y
The estimation of event likelihood involves consideration of event probability and event
frequency. The terms probability and frequency are often used interchangeably in risk
management. This is technically incorrect, as the following definitions show.
Definition—Probability 'A measure of the chance of occurrence expressed as a number between 0 and 1'
(AS/NZS 4360:2004).
Probabilities are generally used to measure the reliability of protection systems, or the
reliability of the barriers against realisation of a hazard.
Example 5.1
The probability of a firewater pump failing to start on demand is 0.005. This means
that out of 1000 such demands on the fire pump to start, it could fail on 5 occasions.
Definition—Frequency 'A measure of the number of occurrences per unit of time' (AS/NZS 4360:2004).
Frequency has a time element associated with it. In risk assessments of facilities that have
several years of operating life, the timeframe is usually taken as one year, therefore the
frequency may be expressed as the number of occurrences per year.
Example 5.2
The frequency of a minor fire in a goods storage facility is 0.01 per year.
This may be interpreted in two different ways. Insurance companies will generally
interpret it to mean that out of every 100 similar facilities operating under similar
conditions around the world, a minor fire could occur in one of them in a given year.
However, a manager of a specific facility will generally interpret it to mean that there
is a 1% chance of a fire in that facility in a given year.
In risk management, both frequency and probability are important parameters. For instance:
Frequency of a major loss event =
Frequency of an initiating minor loss event x Probability the event was not contained.
Example 5.3
A facility is equipped with a fire protection system, and a firewater pump is installed
to supply the sprinkler system. The frequency of a minor fire is 0.01 per year (p.a.)
and the probability of the firewater pump failing to start on demand is 0.005.
If a fire occurs and the firewater pump fails, there would be delay in mobilising other
fire fighting measures and the minor fire could escalate to a major fire. Thus:
Frequency of a major fire = Frequency of a minor fire x Probability of firewater pump failing to start on demand
= 0.01 p.a. x 0.005 = 5 x 10-5 p.a.
5.3 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Note that the frequency value has a unit attached to it (p.a.) whilst the probability
value is dimensionless. It is good practice to always label the unit of the frequency
value to prevent confusing the two parameters in numerical manipulations.
QUA L I TAT I V E E S T I M AT I O N O F L I K E L I H O O D
Before undertaking a detailed quantification of the likelihood of a loss event occurring, it is
helpful to carry out a quick qualitative assessment to give you a feel for whether you should
consider using a statistical quantitative approach or an analytical quantitative approach.
A useful qualitative grading system for event likelihood is shown in Table 5.1.
Table 5.1: Qualitative measures of likelihood Level Descriptor Explanation
A Almost certain Chance of the event occurring multiple times in a year, say weekly to monthly.
B Likely Chance of the event occurring once in a year.
C Possible Chance of the event occurring once in 10 years.
D Unlikely Very low chance of the event occurring, say once in 100 years.
E Rare Possible, but improbable event, say once in 1000 years.
If you assess that a particular loss event is either almost certain, likely or possible, there is a
reasonable chance that reliable statistical data may be available that will assist you in
quantifying the likelihood in more detail. However, if you assess that a loss event is
unlikely or rare, there is little chance that reliable statistical data will be available which
means an analytical quantitative approach may be required.
Remember, a qualitative assessment should only be used for screening purposes and is not a
substitute for a detailed quantitative estimation of likelihood.
ES T I M AT I O N O F L I K E L I H O O D U S I N G S TAT I S T I C A L DATA
A quantitative approach using statistical data is commonly employed to estimate the
likelihood of low consequence/high frequency and medium consequence/medium frequency
loss events caused by single failures. Since an operational system typically consists of
hardware, software and human operators, two different types of statistical data need to be
considered: statistical failure rates for hardware and software, and data on the probability of
human error.
In this section we examine how failure rate data and human reliability analysis are used to
calculate the likelihood of loss events. We also examine probability distributions in detail.
5 .4 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
FAILURE RATES
The failure rate of an equipment item or component is defined as the number of failures per
unit of time. A failure rate is therefore a frequency value. The failure rate of an equipment item or component is not constant. In the early 'run in'
stages of installation and operation, the failure rate could be higher due to installation errors
and commissioning problems. Once these are solved, the failure rate reduces and remains
relatively constant for the 'useful' operating life of the equipment, when it is subject to the
manufacturer's recommended maintenance routine. Finally, the equipment reaches the 'wear
out' stage when the failure rate increases due to wear and tear and the sheer age of the
equipment. A much higher level of repair and maintenance is required and eventually the
equipment must be replaced. In general, failure rates reported in generic statistical databases refer to the useful operating
life period. These are 'mean' failure rates and are treated as the mean of a statistical
distribution. In some instances, a lower bound and an upper bound value of the distribution
may also be provided. Failure rates are normally expressed as number of failures per million hours. The hours can
be calendar hours or operating hours. Since risk is often expressed on a 'per year' basis for
decision making purposes, the failure rate per million hours can be converted to a per year
basis for calculation purposes.
Example 5.4
The failure rate for critical failures of a compressor is 190 per million hours. The
compressor operates around the clock, except for scheduled maintenance periods.
The mean failure rate per annum is calculated as follows.
Failure rate = 190/106 hours = 1.9 x 10–4/h
Hours/year (continuous operation) = 8760
Failure rate/year = 1.9 x 10–4 x 8760 = 1.66 p.a.
SOURCES OF FAILURE RATE DATA
Failure data can be obtained from two principal sources:
in-house records
generic statistical databases.
In-house records
Data from a company's own operations records about a particular process or facility is the
most accurate data available. Data from other similar facilities within the same company is
not quite as accurate but is still better than data from generic sources because it reflects the
design, construction, operations and maintenance philosophies and practices of the
company. Such data is particularly valuable for reliably estimating the likelihood of high
consequence events such as fires and major equipment breakdown.
5.5 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The difficulty with compiling in-house data is that long periods of operating time are
required to obtain statistically significant probability data for low frequency events, and for
the failure rate of reliable but infrequently used equipment. The collection of data must also
be stringently managed to ensure all incidents are recorded. This means that accurate
first-hand data is rarely available, so it is generally necessary to draw upon generic
databases.
Generic statistical databases
A list of generic statistical data sources is provided in Appendix 5.1. The Norwegian
University of Science and Technology ROSS website http://www.ntnu.no/ross/info/data.php
is also a useful source of information.
For most populations of equipment items upon which generic estimates are based, the
number of failures is insufficient to determine the variation of failure rate with time. Given
the accuracy limits of the basic data, it is usually assumed that the failure rate (λ) is
constant. Under this assumption, an item operating at time t will fail in a subsequent
interval δτ with probability λδτ that is independent of t.
The failure rates quoted in generic databases generally include an upper and lower bound
on the failure rate. In most cases this estimate interval is due to the statistical sampling
uncertainty and is calculated assuming a constant failure rate. The more failures observed,
the narrower this uncertainty.
These estimate intervals usually do not indicate the likely spread of failure rates within one
industry, let alone between different industries. Thus it can be expected that different
estimate intervals for the 'same' item of equipment may not always overlap, and experience
at a particular site need not fall within the quoted interval. The uncertainty interval does not
indicate the possible range of expected failure rates for a component in a particular
application. A better indication of this is given by the range of failure rates for similar
components from a number of sources. However, because of the varying operating
conditions of components from different populations, some judgment of the suitability of
each source is required.
Various United States military references quote base failure rate values for most electronic
equipment, together with scaling factors to take account of the most significant factors
affecting these rates (e.g. operating temperature). The same level of precision is not
possible for engineering equipment, and scaling factors for particular operating conditions
are not readily available. However, usage patterns and operating environment affect the
reliability of engineering equipment more than they affect that of electronic equipment.
The following points are the major factors to consider when selecting an estimate for a
specific item of engineering equipment.
Equipment failure rates are specific to the mode of failure. For example, the rate at
which a valve fails to open may be substantially different to the rate at which the same
valve fails to close. The definition of the failure mode should therefore be identified
wherever possible.
Many generic estimates are based on all modes of failure, which in practice means all
failures reported in the maintenance history. However, in a particular application only
one mode may be relevant. For example, failure rates for a compressor that includes
the drive unit, gearbox, compression unit, lubrication system and cooling system
obviously differ from those that include only the compression unit. Thus the estimates
may be re-scaled by an assessed ratio of the mode of concern to the all-mode estimate.
5 .6 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Within a given class of equipment, different equipment types will have different failure
rates. For example, a gear pump, a centrifugal pump and a positive displacement pump
will all have different failure rates. It is therefore essential to find the specific failure
rate for a particular type of equipment.
Site knowledge should be taken into account, particularly where there is an interest in
the relative reliability of items of equipment that have been in use for some time. For
example, even though generic data would indicate that equipment type A is more
reliable than equipment type B, it may happen that on a particular site type B performs
better than A because of the way it is used.
The internal and external environment can have a significant effect on equipment
reliability. For example, electric motor burn-out is mainly due to excessive
temperature of the windings. Winding temperature is influenced by the ambient
temperature, motor load, dust and use of protective sensors. It is therefore necessary to
consider to extent to which a specific environment differs to that of the generic data
source. When selecting estimates to use, consider factors such as:
the nature of substances handled (e.g. acids will cause corrosion)
internal temperature, pressure, vibration
external humidity, atmospheric salts, sunlight, moisture, cold, heat, vibration,
altitude, dust
design limits and margins.
The level of operation significantly influences equipment reliability. Equipment lightly
loaded can be expected to fail less often than equipment heavily loaded, and continuous
operation under uniform conditions is usually less arduous than repeated stops and
starts. Equipment operated on standby or only in an emergency will generally have
poorer reliability than similar equipment operated more regularly. It may be more
useful to quote failure rates of such equipment on a per cycle basis or as a fail-to-start
percentage.
TYPICAL FAILURE RATE DATA
Indicative failure rates for a range of equipment items are presented in Table 5.2 on the
following page. This data is provided for illustrative purposes to demonstrate the
differences in failure rate between different equipment items and to provide an approximate
guide to their magnitude.
A typical data sheet of reliability data is shown in Figure 5.1.
5.7 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 5.2: Typical generic failure rate data
1. Piping Leaks/m/yr (Cox et al., 1990) Diameter Rupture (A) Major (0.1A) Minor (0.01A)
25 1 x 10–6 1 x 10–5 1 x 10–4 50 1 x 10–6 1 x 10–5 1 x 10–4 100 3 x 10–7 6 x 10–7 3 x 10–5 300 1 x 10–7 3 x 10–6 1 x 10–5 A = Cross-sectional area of pipe
2. Pumps Leaks/m/yr (Cox et al., 1990) Rupture (A) Major (0.1A) Minor (0.01A)
3 x 10–5 3 x 10–4 3 x 10–3 A = Cross-sectional area of pump connection
3. Flanges Major failure (Blything & Reeves, 1988) 5 x 10–6 pa/ flange connection
4. Non-return valves Failure rate (Blything & Reeves, 1988) 3 x 10–7/h to 4.2 x 10–5/h
5. Excess flow valves Failure probability on demand (Blything & Reeves, 1988) 0.13
6. Remote shutdown valves Failure probability on demand (Blything & Reeves, 1988) 0.001 to 0.005
7. Pressure vessels (Pape & Nussey, 1985)
Instantaneous 25–50 6–13
Failure frequency per yr 1 x 10–6 to 3 x 10–6 6 x 10–6 30 x 10–6
8. Pneumatic Transmitters (CCPS, 1989a)
Level Flow Pressure Differential pressure Temperature
Failure frequency per 106 hrs 2.32—141.0 1.93—109.0 0.159—91.3 1.01—218.0 1.68—97.0
9. Electric switches (CCPS, 1989a)
Flow Level Pressure Temperature
Failure frequency per 106 hrs 0.917—26.8 0.737—1.74 0.525—49.6 0.102—2.28
10. Pneumatic switches (CCPS, 1989a)
Level Pressure Temperature
Failure frequency per 106 hrs 0.0972—0.62 2.18—5.20 1.09—5.00
11. Flame detector Failure frequency per 106 hrs (CCPS, 1989a) 0.053—1760.0
12. Annunciators Failure frequency per 106 hrs (CCPS, 1989a) 0.0272—0.77
5 .8 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.1: Typical data sheet of reliability data
Data on selected process systems and equipment
Taxonomy No. 2.1.5
Operating Mode:
Population Samples Aggregated time in service (106 hrs)
Equipment Description: FLAME DETECTORS
Process Severity: UNKNOWN
No. of Demands:
Calendartime
Operating time
Failure mode Failures (per 103 demands)Failures (per 106 hrs)
Lower Mean Upper Lower Mean Upper
0.053 432.0 1760.0CATASTROPHIC
a. Functional without Signalb. Failed to Function when signalled
DEGRADEDa. Functioned at Improper Signal Levelb. Intermittent Operation
INCIPIENTa. In-service Problems
Equipment Boundary
Power supply
Sensor Computationalunit
Indicator/alarm
Output
Boundary
Date Reference No. (Table 5.1): 1,4
Source: CCPS, 1989a.
ADJUSTING FOR THE EFFECTS OF SAFETY AND MAINTENANCE MANAGEMENT SYSTEMS
Generic industry data is normally based on statistical data of equipment failures in similar
or allied industries. Therefore, in using generic data the analyst assumes (or implies) that
the facility's equipment and systems are maintained at standards equivalent to the industry
average. This may not be the case. If a facility's safety and maintenance management
systems are significantly inferior or superior to the industry's average, the failure rate of
equipment may be up to orders of magnitude lower or higher than the generic rate. Any
5.9 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
assessment of the risks at a facility must therefore include an assessment of how processes
and equipment are operated and maintained at that facility. There has been much discussion amongst regulatory authorities about whether it is possible
to apply some numerical factor to the 'average' data to allow for non-average quality of
safety management. The Health and Safety Executive in the UK (HSE, 1990) argues that if
such an approach is used, it should be done only within narrow limits. A large adjustment
to reduce the generic failure rate for an above-average safety management system could
well be optimistic given the possibility of changes over the years; conversely, a large
adjustment to increase the generic failure rates for a below-average safety management
system would seem to imply that a below-average level of safety is tolerable, which is not
the case. An attempt has been made to develop a method that accounts for the influence of safety
management systems on the frequency of loss events (Murphy and Paté-Cornell, 1996).
The approach entails undertaking a safety management audit of a facility and using the
results to derive a numerical factor to be used for the adjustment of failure frequencies. As
a guide, generic frequencies could be reduced by a factor of up to three for superior safety
management (best practice situation) or increased by a factor of ten for poor safety
management. The validity of this method is yet to be proven and standard practice is to use
industry average failure rate data from the generic databases. Many industries undertake a reliability centred maintenance (RCM) program to optimise the
maintenance requirements. This is a powerful risk management tool and is discussed in
Topic 7.
HUMAN RELIABILITY ANALYSIS (HRA)
An operational system typically consists of hardware, software and human operators.
Analysing the failure rates of hardware and software therefore tells us only part of what we
need to know to estimate loss event likelihood: to complete the picture we also need to
analyse the probability of human error. A human error is an action that fails to meet some of the limits of acceptability as defined
for a system. The action may be physical (e.g. closing a valve) or cognitive (e.g. fault
diagnosis or decision making). Human errors have been classified into the following
categories (HSC, 1991).
a) Skill-based errors that arise during the execution of a well-learned, fairly routine task
such as calibration, testing, or responding to an alarm.
b) Rule-based errors that occur when a set of operating instructions or rules to guide a
sequence of actions are either not followed, misunderstood, or a wrong sequence is
used, for example not following the startup/shutdown procedures.
c) Knowledge-based errors that arise when a decision has to be made between alternative
plans of action, for example deciding in an emergency whether to shutdown or continue
to operate, and whether to evacuate or try to fight a fire. Human reliability analysis (HRA) is concerned with the qualitative and quantitative analysis
of human error to facilitate the design of systems with greater error-tolerance. However,
predicting human error is complex and the accuracy and validity of HRA methods has often
been criticised from both theoretical and practical viewpoints (HSC, 1991). To date, there
has been limited application of HRA beyond the nuclear industry, the aerospace industry
and the defence forces.
5 .10 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The most common HRA methods are shown in Table 5.3.
Table 5.3: Human reliability analysis methods
Method Feature Reference THERP Technique for Human Error Rate Prediction
Contains tables of task/probabilities as a generic database
Swain & Guttmann (1983)
HCR Human Cognitive Reliability Model
Time-related analysis Moieni et al. (1994)
HEART Human Error Analysis and Reliability Technique
Based on performance shaping factors (PSF)
Williams (1986)
INTENT Based on performance shaping factors (PSF)
Gertman et al. (1992)
In addition to these methods, Yu et al. (1999) have suggested a complementary method
called Human Error Criticality Analysis (HECA). HECA is similar to the FMECA on
hardware systems, and is used to identify critical human tasks that have a high error
probability or severe consequences. It is important to remember that not all human errors
will result in severe consequences because recovery is possible in some instances. HECA
enables attention to be focused on critical tasks only.
When assessing the contribution of human error to a potential loss event, two distinct stages
in the event sequence should be considered: pre-event and post-event. During both stages,
the probability that a human error will result in a loss event is dependent on various factors
that affect performance in the operators' environment. These are commonly referred to as
performance shaping factors (PSF) (Swain and Guttman, 1983) and the most important of
these are:
critical equipment control design
training of operators
communication and procedures
instrumentation feedback and design
preparedness (expected frequency of situation)
stress.
A set of general guidelines for estimating the probability of operator error for various
situations, both pre-event and post-event, is listed in Table 5.4.
Once a loss event sequence has started, the most important variable is the time the operators
have to detect and correct errors before a serious condition results. The more time they
have, the more likely they are to be able to detect and diagnose the problem, decide on a
course of action, and implement the desired response. Figure 5.2 provides a general guide
to the probability of operator error as a function of time available for action (CCPS, 1989b).
5 .11 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 5.4: General estimates of probability of human error
Estimated error
probability
Activity
0.001 Selection of a switch dissimilar in shape or location to the desired switch, assuming no decision error, e.g. operator actuates large-handled switch rather than small switch.
0.003 General human error or commission, e.g. misreading label and therefore selecting wrong switch.
0.01 General human error of omission where there is no display in the control room of the status of the item omitted, e.g. failure to return manually-operated test valve to proper configuration after maintenance.
0.003 Errors of omission, where the items being omitted are embedded in a procedure rather than at the end as above.
1.0 If an operator fails to operate correctly one of two close coupled valves or switches in a procedural step, he also fails to correctly operate the other valve.
0.1 Personnel on different work shift fail to check condition of hardware unless required by checklist or written directive.
0.5 Monitor fails to detect undesired position of valves etc. during general walk-around inspections, assuming no checklist is used.
0.2—0.3 General error rate given very high stress levels where dangerous activities are occurring rapidly.
2(n–1). x Given severe time stress, as in trying to compensate for an error made in an emergency situation, the initial error rate x for an activity doubles for each attempt n after a previous incorrect attempt, until the limiting condition of an error rate of 1.0 is reached or until time runs out.
1.0 Operator fails to act correctly in the first 60 seconds after the onset of an extremely high stress condition, e.g. loss of coolant in a nuclear reactor.
0.9 Operator fails to act correctly after the first five minutes after the onset of an extremely high stress condition.
0.1 Operator fails to act correctly after the first 30 minutes after the onset of an extremely high stress condition.
0.01 Operator fails to act correctly after the first several hours in a high stress condition.
Source: Health and Safety Commission (HSC), 1991: 88–89.
Figure 5.2 Probability of failure by control room personnel to correctly diagnose an abnormal event
1E + 0
1E – 1
1E – 2
1E – 3
1E – 4
1E – 5
1 10 100 1000 10000
Time available (in minutes) for diagnosis of anabnormal event after control room annunciation.
Source: CCPS, 1989b: 242.
5 .12 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
CALCULATING EVENT FREQUENCY FROM HISTORICAL DATA
According to the CCPS Guidelines for Chemical Process Quantitative Risk Analysis
(2000), there are five steps required to calculate event frequency from historical data:
1. Define context.
2. Review source data.
3. Check data applicability.
4. Calculate event frequency.
5. Validate frequency.
These steps are described below using extracts from the CCPS Guidelines (CCPS, 2000:
300–301).
Step 1 Define context. The historical approach may be applied at any stage of a
design—conceptual, preliminary, or detailed development—or to an existing
facility. System description and hazard identification should be completed to
provide the details necessary to define the loss event list. These steps are
potentially iterative as the historical record is an important input to hazard
identification. The output of this step is a clear specification of the loss events
for which frequency estimates are sought.
Step 2 Review source data. The relevant source data should be reviewed for
completeness and independence. Lists of loss events will almost certainly be
incomplete and some judgment will have to be used. The historical period must
be of sufficient length to provide a statistically significant sample size.
Loss event frequencies derived from lists containing only one or two events of a
particular type will have large uncertainties. When multiple data sources are
used, duplicate events must be eliminated. Sometimes the data source will
provide details of the total plant or item exposure (plant-years, etc.). Where the
exposure is not available, it will have to be estimated from the total number and
age of plants in operations, the total number of vehicle-miles driven, etc.
Step 3 Check data applicability. The historical record may include data over long
periods of time (5 or more years). As the technology and scale of plant may
have changed in the period, careful review of the source data to confirm
applicability is important. It is a common mistake for designers to be
overconfident that relatively small design changes will greatly reduce failure
frequencies. In addition, larger-scale plants (those that employ new technology)
or special local environmental factors may introduce new hazards not apparent
in the historical record. It is commonly necessary to review event descriptions
and discard those failures not relevant to the plant and scenario under review.
Step 4 Calculate event frequency. When the data are confirmed as applicable and the
loss events and exposure are consistent, the historical frequency can be obtained
by dividing the number of incidents by the exposed population. For example, if
there have been five major leaks from pressurised ammonia tanks from a
population of 2500 vessel-years, the leak frequency can be estimated at 2 x 10–3
per vessel-year.
Where the historical data and the plant under review are not totally consistent, it
is necessary to exercise judgment to increase or decrease the event frequency.
Where the data are not appropriate, an alternative method, such as fault tree
analysis, must be employed.
5 .13 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Step 5 Validate frequency. It is often possible to compare the calculated event
frequency with a known population of plant or equipment not used for data
generation. This is a useful check as it can highlight an obvious mistake or
indicate that some special feature has not received adequate treatment.
Example 5.5
The following example is taken from the CCPS Guidelines for Chemical Process
Quantitative Risk Analysis (CCPS, 2000: 300–303) and illustrates the estimation of
leakage frequencies for a gas pipeline. Note that values have been metricated.
Step 1 Define context. The objective is to determine the leakage frequency of a
proposed 200 mm diameter, 16 km long, high-pressure ethane pipe, to be
laid in a semi-urban area. The proposed pipeline will be seamless, coated,
and cathodically protected, and will incorporate current good design and
construction practices.
Step 2 Review source data. Three sources of data are available:
British Gas;
European Gas Pipelines Association; and
US Department of Transportation.
The database found to be the most complete and applicable is the gas
transmission leak report data collected by the US Department of
Transportation for the years 1970–1980. It is based on 400 000 pipe-km
of data, making it the largest such database. It contains details of failure
mode and design/construction information. Conveniently, it contains both
incident data and pipeline exposure information.
Step 3 Check data applicability. The database includes all major pipelines, of
mixed design specifications and ages. Thus, inappropriate pipelines and
certain non-relevant incidents must be rejected. The remaining,
population exposure data are still extensive and statistically valid. Those
data rejected are:
Pipelines:
– pipelines that are not steel;
– pipelines that are installed before 1950; and
– pipelines that are not coated, not wrapped, or not cathodically
protected.
Incidents:
– incidents arising at a longitudinal weld;
– incidents where construction defects and materials failures
occurred in pipelines that were not hydrostatically tested.
Step 4 Calculate likelihood. The pipeline leakage frequencies are derived from
the remaining Department of Transportation data using the following
procedure:
1. Estimate the base failure rate for each failure mode (i.e. corrosion,
third party impact, etc.).
2. Modify the base failure rate, as described above, where necessary to
allow for other conditions specific to this pipeline. In particular, the
Department of Transportation failure frequency attributable to
external impact is found to be diameter dependent, and data
appropriate for a 200 mm pipeline should be used. As the pipeline is
to be built in a semi-urban area, the failure frequency for external
impact is judged to increase by a factor of 2 to reflect higher
frequency digging activities. Conversely, the semi-urban location is
5 .14 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
expected to reduce the frequency of failure due to natural hazards,
because of the absence of river crossings, etc. The frequency of this
failure mode is judged to be reduced by a factor of 2.
Table 5.5 shows the application of Steps 3 and 4 to the raw frequency
data. The approximate distribution of leak size (full bore, 10% of
diameter, pinhole) by failure mode is then obtained from the database.
This distribution is used to predict the frequency of hole sizes likely from
the pipeline. Thus, if this distribution were 1, 10, and 89%, respectively,
the full bore leakage frequency for the 16 km pipeline would be:
0.01 x (0.413 leaks/1000 pipe km-years) x 16 km = 6.6 x 10–5 per year.
Table 5.5: Contribution of failure mechanisms to pipeline example
Failure Frequency (per 1000 pipe km-years)* Failure mode Raw DOT
data Modified data (inappropriate data removed)
Modification factor
(judgment)
Final values
Material defect 0.131 0.044 1.0 0.044
Corrosion 0.20 0.031 1.0 0.031
External impact 0.313 0.15 2.0 0.300
Natural hazard 0.219 0.013 0.5 0.006
Other causes 0.038 0.031 1.0 0.031
Total failure frequency 0.90 0.27 — 0.413
* This value is appropriate for a 200 mm pipe.
Step 5 Validate likelihood. In the United Kingdom, the British Gas Corporation
repeatedly had 75 leaks on their transmission pipelines between 1969 and
1977, on a pipeline exposure of 134 400 km-years. This gives a final
leakage frequency of 0.556 per 1000 km-years, which is consistent with
the value given in Table 5.5.
PROBABILITY DISTRIBUTIONS
Until the mid 1970s items were seen as exhibiting a standard failure profile consisting of
three separate characteristics:
an infant mortality period due to quality of product failures
a useful life period with only random stress-related failures
a wear-out period due to increasingly rapid conditional deterioration resulting from
use or environmental degradation.
This was referred to as the 'bathtub curve' and is shown in Figure 5.3.
The consequence of such beliefs was that equipment was taken out of service and
maintained at particular intervals, regardless of whether it was exhibiting signs of wear.
5 .15 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.3: Bathtub failure curve
FailureRate
InfantMortality
Useful Life Wear-Out
Time
However, actuarial studies of aircraft equipment failure data conducted in the early 1970s
identified a more complex relationship between age and the probability of failure (Smith,
1993). This is illustrated in Figure 5.4.
Figure 5.4 Failure rate curves
89%
Wear-In then Random
Random over Measurable Life
Increasing during Wear-In and then Random
Steadily Increasing
Random then Wear-Out
Wear-in to Random Wear-Out
2%
5%
7%
14%
68%
The bathtub curve was discovered to be one of the least common failure modes, and
periodic maintenance was shown to increase the likelihood of failure. This led to the idea
that the maintenance regime ought to be based on the reliability of the components and the
required level of availability of the system as a whole.
5 .16 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Weibull distribution
The three regions in any failure curve may be described by the Weibull distribution, which
has two parameters: η and β.
a) Failure density function:
f (t) =
βη
tη
⎛ ⎝ ⎜
⎞ ⎠ ⎟
β −1
exp −tη⎛ ⎝ ⎜
⎞ ⎠ ⎟
β⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
(5.1)
b) Mean:
μ = ηΓ 1+
1β
⎛ ⎝ ⎜
⎞ ⎠ ⎟ (5.2)
where Γ represents the Gamma function. c) Variance:
σ 2 = η2 Γ 1+2β
⎛ ⎝ ⎜
⎞ ⎠ ⎟ − Γ 1+
1β
⎛ ⎝ ⎜
⎞ ⎠ ⎟
⎡
⎣ ⎢
⎤
⎦ ⎥
2⎧ ⎨ ⎪
⎩ ⎪
⎫ ⎬ ⎪
⎭ ⎪ (5.3)
A three-parameter Weibull distribution is also available and is more flexible, to fit wide
ranging data.
Gamma distribution
The Gamma distribution also has two parameters, is similar to Weibull and simpler to use.
a) Failure density function:
f (t) =
1bΓ(a)
tb
⎛ ⎝ ⎜ ⎞
⎠ ⎟
a− 1
exp −tb
⎛ ⎝ ⎜ ⎞
⎠ ⎟ (5.4)
b) Mean:
μ = ba (5.5)
c) Variance:
σ 2 = b 2 a (5.6)
Negative exponential distribution
A risk assessment mainly concentrates on the 'useful life' region of the bathtub curve in
Figure 5.3, since a piece of equipment is likely to be replaced by the time it reaches the
'wear-out' region. Where this is not the case for an existing operation, the safety
management systems of the organisation should be improved with increased emphasis on
preventive maintenance.
During the 'useful life' period, the failure rate is constant. In other words, a failure could
occur randomly regardless of when a previous failure occurred (i.e. no previous memory).
This results in a negative exponential distribution for the failure frequency. Therefore, the
failure rates used in fault tree analysis are the means of negative exponential distributions
(Wells, 1991; Lees, 1996). Note that this treatment is simplistic in the sense that the data
sources for the failure rates may also contain failures from the 'infant mortality' region and
the 'wear-out' region.
5 .17 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
a) Failure density function: f (t) = λ exp −λt( ) (5.7)
b) Mean:
μ =
1λ
(5.8)
c) Variance:
(5.9)
where λ is the failure rate per year
Fitting field data to distributions
Where in-house maintenance data is available for equipment and components, a Weibull or
negative exponential distribution may be fitted to the raw data.
The processed data will provide the mean failure rate (for use in fault tree analysis), as well
as the variance indicating the 'spread' of the distribution and associated uncertainty.
Sophisticated regression techniques and variance reduction techniques are required for the
raw data processing to obtain the parameters of the distributions, available in numerical
analysis texts. The interested reader is referred to Lees (1996) for further information.
Probability of failure on demand
In the previous sections we have considered obtaining information on failure rates of
equipment. This data is normally available as a frequency, e.g. number of failures per
million hours. However, very often in fault tree and event tree analysis we also need
information on the probability of failure on demand. The distinction between the two
should be appreciated, and is critical to a correct analysis.
Many processes and equipment have specific protection systems (e.g. gas or fire detection,
emergency shutdown system, firewater deluge) and the failure rate data of these protection
systems needs to be processed into a probability of failure demand.
Every protection system failure can be placed into one of two categories.
1. The failure is revealed. In this case, a failure can be detected before an actual demand
on the system occurs. One example is a protection system that is proof-tested at regular
intervals. Any failure that had occurred between two successive test intervals would be
revealed.
2. The failure is unrevealed until the demand occurs. The protection system would not
operate if it had failed, but there is no way of knowing this a priori if no proof-testing
is carried out.
The reliability of the protection systems may be assessed by using different calculation
methods, depending on whether it is a revealed failure or not.
A useful parameter when considering failures in protective systems is the probability of
unavailability or probability of failure on demand, known as fractional dead time (FDT).
This parameter is a probability and is the average fraction of time that the protective system
is unavailable. If the frequency of a demand (demand rate (D)) on a protective system is
known, then a resulting 'hazard or loss event rate' (HR) can be calculated. For low demand
rates and small FDTs, the hazard or loss event rate can be obtained by direct multiplication
of the demand rate and FDT.
=
1
λ 2 σ 2
5 .18 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
HR = D * FDT (5.10)
where:
HR = hazard or loss event rate/year
D = demand or loss event rate/year
FDT = fractional dead time.
For revealed faults, a component can be in a failed or operational state when proof-testing is
carried out. Whether a protective system is working may be assessed from the following:
1. If a demand occurs between proof-test intervals and the protective system has to
operate.
2. The next proof-test conducted to check the system as part of a routine schedule.
Within the 'useful life' of the equipment, the probability of failure within a time period is as
shown in Figure 5.5.
Figure 5.5: Exponential distribution for failures
Time tTests
Time t
Negative exponential
Probabilityof failure bytime t
Probabilityof failure bytime t
1.0
The FDT of a single component protective system due to component failure is, therefore, a
function of both the mean failure rate of the component (λ) and the proof-test interval (T).
The failure rate dictates on average how often failures occur. If it is assumed they occur
randomly at any time during a proof-test interval, then on average over a large number of
test intervals, a failure could occur halfway through the proof-test interval. Within a
proof-test interval, the average time the system could be in a failed state would then be
approximately (T/2).
The fractional dead time is given by the expression:
( )[ ]TT
λλ
−−−= exp11
1FDT (5.11)
If we expand the exponential series and truncate after the linear term, a simplified
expression results as shown below:
FDT = 0.5λT for λT<<1 (5.12)
5 .19 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Typical magnitudes for FDT values are shown in Table 5.6.
Table 5.6: Typical FDT values
FDT System
0.01 A simple system, regularly tested and reasonably maintained.
0.001 The practical limit for process plant, unless designed and tested by High Integrity Specialists and maintained and tested to those standards.
0.0001 Only in nuclear installations, or process plant with unusually high standards of operation, maintenance, supervision and management, and a benign operating environment
Source: Tweeddale, 1992.
In the case of an operator acting as the protection barrier (i.e. responding to an alarm and
taking necessary action), the human error probability is directly used in the analysis.
FDT can be reduced by:
1. Reducing the proof test interval (T); or,
2. Reducing the mean failure rate (λ) of the component.
However, indiscriminate increase in proof testing would not necessarily reduce FDT.
Strictly speaking, FDT should take into account the following:
1. 1/2 λΤ (as described above)
2. τ/Τ (fraction of test duration)
3. ε (human error of leaving protection system disarmed after each test).
Therefore,
FDT = (1/2) λT + τ/Τ + ε (5.13)
where λ is the failure rate per year and T is the time required to test the system.
If τ<<Τ, the term τ/Τ can be neglected, but ε may not be negligible.
Example 5.6
The failure rate of an emergency shutdown valve is, say 0.1 p.a. The proof-test
interval is once in six months (two tests/year). Each time the test is conducted, the
isolation system is bypassed for approximately one hour. Referring back to Table
5.4, the general human error probability of omission to re-arm the trip is 0.003 per
operation, for a simple non-routine operation.
Thus, we have:
λ = 0.1 p.a.
Τ = 0.5 year
τ = 1/8760 (year)
ε = 0.003
FDT = 0.025 + 2.28E-4 + 0.003
= 0.0282
The error in neglecting the last term is 11%.
5 .20 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
It is commonly believed that if the system was proof-tested more frequently, the
reliability would improve. Let us assume monthly testing with T = 1/12 year.
Therefore,
FDT = 0.0042 + 0.0014 + 0.003
= 0.0085
The reliability turns out to be only three times better than half-yearly testing because
human error begins to dominate.
In general, a three-to-six monthly interval is considered reasonable for emergency shutdown
systems.
If a protective system is never proof tested, the system will continue to degrade until it fails.
The probability of failure on demand will increase as a function of time. An approximate
formula for calculating the hazard frequency for a system comprising a component which
can generate a demand for protection and an untested protection system is:
HR =
DλD + λ
(5.14)
where:
D = demand rate per year.
λ = protection system failure rate (failures/year).
Example 5.7: Hazard rate for revealed vs unrevealed failures
Equipment Item A has a failure frequency of λ = 0.5 p.a. (i.e. it will fail on average
once every two years, at any time in that two year period).
Demand Event B has a frequency of occurrence of D = 0.1 p.a. (i.e. the demand
event will occur on average once every ten years).
Revealed failure:
HR = D . FDT
where:
FDT = 1/2 λT
= 1/2 x 0.5 x (1/4) for quarterly testing
= 0.0625
Therefore,
HR = 0.1 x 0.0625
= 0.00625 p.a.
Unrevealed failure:
From equation (5.14)
0.1 x 0.50.1 + 0.5
= 0.083 p.a.
The quarterly testing produces an order of magnitude difference in the hazard rate
for the event, clearly indicating the importance of regular function testing of
protection systems as part of the overall safety management system.
5 .21 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
RELIABILITY AND AVAILABILITY
Reliability is defined as the probability that a device will satisfactorily perform a specified
function for a specified period of time under given operating conditions (Smith, 1993: 28).
This may also be stated as the probability that an item will perform a required function for a
stated period of time (Lees, 1996).
For the negative exponential distribution, the failure rate of the component is constant,
hence the reliability:
R = exp(–λt) (5.15)
The mean life of a component is expressed as the mean time between failures (MTBF),
given by:
MTBF =
1λ
(5.16)
For systems with repair, a repair time distribution can be developed. Assuming a negative
exponential distribution for repair times (in reality it is likely to be Weibull), with a mean
repair rate of μ, the mean time to repair (MTTR) is given by:
MTTR =
1μ
(5.17)
The failure time and repair time distributions can be used to obtain a system availability. In
general, the availability A(t) is a function of time. It is expressed as:
A(t) =
u(t)u(t) + d(t)
(5.18)
where:
u(t) = uptime (i.e. system running) d(t) = downtime (i.e. system under repair).
For long time periods, t →∞ , u(t) = MTBF and d(t) = MTTR. Therefore:
A(∞ ) = MTBF
MTBF + MTTR (5.19)
From equations (5.16) to (5.19), the system availability can also be written as:
A(∞ ) =
μλ + μ
(5.20)
The unavailability of the system (U) is given by:
U(∞ ) = 1—A(∞ ) (5.21)
5 .22 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Example 5.8
In Example 5.4, we calculated the failure rate for a compressor to be 1.66 p.a.
Assuming that the mean time to repair a breakdown is approximately 72 hours, the
availability of the compressor can be calculated as follows.
Failure rate = 1.66 p.a.
Number of hours/year = 8760
Mean time between failures (MTBF) = 8760/1.66
= 5277 hours
Mean time to repair (MTTR) = 72 hours
Availability (A) = 5277/(5277 + 72)
= 0.987
By carrying critical spare parts and arranging additional manpower, let us say that
the repair time can be halved to 36 hours. The new availability becomes:
A = 5277/(5277 + 36)
= 0.993
The increased availability of 0.6% may contribute to improved productivity. A
cost–benefit analysis may be used to review the gains obtained against the additional
costs incurred in deciding to carry the spare (inventory cost) and additional
maintenance resources (labour cost).
Availability analysis is an extremely valuable tool in making decisions about capital
investment or inventory management and in planning maintenance strategy. The
methodology can be extended to complete systems in series, complete systems in parallel
and series-parallel systems.
Sometimes a system may have a number of components connected in series (a linear
system). Each component may have its own λ and μ values. In such a case (O'Connor,
1991), the global availability is given by:
As =
μi
λi + μ i
⎛
⎝ ⎜
⎞
⎠ ⎟
i =1
n
∏ (5.22)
=
Aii =1
n
∏ (5.23)
where:
As = availability of series system. n = number of components.
If the system is arranged in parallel as shown in Figure 5.6, and all components are
operating, the availability becomes:
AP = 1−
λ i
λi + μi
⎛
⎝ ⎜
⎞
⎠ ⎟
i =1
n
∏ (5.24)
Equation (5.24) assumes series repair, i.e. single repair team. For a complete system
consisting of series/parallel units, the system is broken down into simpler blocks and each
block availability is calculated before the system availability is obtained.
5 .23 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.6: Configurations for series/parallel systems
Series
Parallel
Series-parallel
Example 5.9
A telemetry system for monitoring automatically controlled unmanned operations at
a remote location consists of the following components at both the transmission end
and the receiving end:
Radio modem
Radio switch
Data link switch.
The full system is duplicated (active redundancy). Assume the MTBF and MTTR
values are as given in Table 5.7.
Table 5.7: Failure/repair time data
Component MTBF (Hours) MTTR (Hours) Control room
MTTR (Hours) Remote location1
Radio modem 30 000 24 96 Radio switch 250 000 16 88 Data link switch 300 000 24 96
Note 1: Assumes access time of 72 hours
Calculate the system availability.
The availability block diagram configuration is shown in Figure 5.7.
5 .24 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.7: Availability diagram
Radiomodem
Radioswitch
Radioswitch
Radioswitch
Radioswitch
Radiomodem
Radiomodem
Radiomodem
Data linkswitch
Data linkswitch
Data linkswitch
Data linkswitch
(Transmission) (Receiver)
This is a series parallel system. The decomposition may be made as follows:
Radio modem: A1 = 30 000
30 000 + 24 = 0.99920
Radio switch: A2 = 250 000
250 000 + 16 = 0.99994
Data link switch: A3 = 300 000
300 000 + 24 = 0.99992
Availability of control room (ACR1):
A1 . A2 . A3 = 0.99906
Availability of 2 units in parallel in control room:
ACR = 1—(1—ACR1)2 = 1.0
Similarly, availability at remote locations are:
Radio modem: A4 = 30 000
30 000 + 96 = 0.99681
Radio switch: A5 = 250 000
250 000 + 88 = 0.99965
Data link switch: A6 = 300 000
300 000 + 96 = 0.99968
Availability of one unit in field (AF1):
A4 . A5 . A6 = 0.99614
Availability of two units in parallel in field:
AF = 1—(1—AF1)2 ≈ 1.0
Therefore, system availability:
AS = ACR . AF ≈ 1.0
If the active redundancy were not provided:
AS = ACR1 . AF1 ≈ 0.9952
An availability of 0.5% is gained by providing the redundancy. While this may
appear small, the cost penalties of losing the telemetry may be very high, hence the
redundant system offers a near 100% availability.
5 .25 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
SCREENING RELIABILITY DATA
A reliability database consists of component failure rates distributed across various failure
modes. For example, pump failure modes could be:
seal failure
significant external leak
loss of performance (loss of discharge head)
fails to run
electrical failure of drive motor.
If we are assessing health and safety risks where loss of containment becomes a major
factor (say the pump is pumping acid), the only failure modes of significance are seal
failures and external leaks. However, if we are assessing production continuity risks where
the on-line time and performance of the pump becomes critical, then all the above failure
modes need to be included. Therefore, the data required for frequency analysis depends on
the nature of risk being assessed and the failure modes that are relevant to that risk.
Where a single global failure rate value is given without identifying the failure rates, use of
this value in safety assessment would produce a pessimistic estimate of risk.
Health and safety assessment
Failure rate data for health and safety assessment generally would include the following
failure modes:
Failure rates of detection systems (gas, fire).
Failure rates of protection systems (isolation, fire protection).
Probability of failure of protection systems on demand.
Frequency of initiating events (fire, spill, loss of containment).
Example 5.10
The failure rate data for an oil gas well emergency shutdown valve on an off-shore
production platform is given in Figure 5.8. The list of information relevant for a
safety assessment and the reasons for their selection are provided in Table 5.8.
Figure 5.8: Failure rate data for oil/gas ESD valve
Taxonomy number and item
1.2.1.3
Process Systems Valves ESD (Emergency Shut-Down).
Description
Gate valves, ball valves and glove valves. Electric, pneumatic or hydraulic actuator. Size
2"–34", typically 2"–4" or greater than 8".
Application
Used to shut off part of or the entire process during emergency. Normally held open,
fail-safe construction. When the valve has closed, it must be opened manually.
Operational mode
Normally open (fail-safe-close). Tested regularly.
Internal environment
Crude oil, gas or water.
External environment
Enclosed, partially enclosed, outdoor.
5 .26 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Maintenance
The Emergency Shutdown System (including ESD valves) shall be designed so that it can
be tested when the installation is in operation.
Item boundary specification
Only failures within the boundary indicated by the dashed line in the figure below are
included in the reliability data source.
Control unit
Including contact breaker formotor actuation, pilot valvefor hydraulic.
Powersupply
Remoteinstrumentation
Boundary
Actuator
Valve
Monitoring UnitPower supply
Remoteinstrumentation
Taxonomy no. 1.2.1.3
Item Process systems valves ESD
Population Installations Aggregated time in service (106 hours) No. of demands
322 12 Calendar time* 6.4065 Operational time •
Failure mode No. of
failures Failure rate (per 106 hours) Active
repairs Repair (manhours)
Lower Mean Upper (hours) Min Mean Max Critical 64 6.46 9.17 12.29 12.3 1.0 20.5 245.0 External leakage 2 0.09 0.28 0.85 5.5 6.0 8.5 11.0
Faulty indication 4 0.25 0.56 1.26 3.7 2.0 5.5 10.0
Fail to close 27 2.77 3.81 5.24 9.3 1.0 15.2 169.0
Fail to open 15 1.36 2.12 3.25 12.9 1.0 21.6 125.0
Internal leakage 1 0.03 0.14 0.63 3.5 5.0 5.0 5.0
Overhaul 2 0.09 0.28 0.85 140.4 245.0 245.0 245.0
Significant external leakage 1 0.03 0.14 0.65 1.7 2.0 2.0 2.0
Seepage 1 0.03 0.14 0.63 56.5 98.0 98.0 98.0
Significant internal leakage 7 0.00 1.12 2.64 12.3 11.0 20.6 45.0
Spurious operation 3 0.17 0.43 1.06 3.5 2.0 5.0 8.0
Unknown 1 0.02 0.14 0.65 6.3 10.0 10.0 10.0
Degraded 19 1.94 2.95 4.40 11.2 2.0 18.5 98.0 Delayed operation 1 0.02 0.16 0.71 37.7 65.0 65.0 65.0
External leakage 6 0.47 0.93 1.79 9.9 2.0 16.3 82.0
Faulty indication 2 0.09 0.31 0.95 4.9 7.0 7.5 8.0
Internal leakage 9 0.67 1.40 2.56 11.0 2.0 18.2 98.0
Unknown 1 0.03 0.16 0.71 6.3 10.0 10.0 10.0
Incipient 51 5.78 7.69 10.01 6.4 0.5 10.2 126.0 External leakage 12 0.79 1.63 2.88 14.3 2.0 24.0 126.0
Faulty indication 10 0.85 1.58 2.73 3.6 2.0 5.2 20.0
Internal leakage 19 1.83 2.93 4.45 1.1 0.5 0.9 4.0
Other modes 1 0.02 0.16 0.71 14.9 25.0 25.0 25.0
Seepage 5 0.37 0.77 1.58 13.0 2.0 21.8 76.0
Unknown 4 0.26 0.65 1.42 3.9 2.0 5.8 10.0
Faulty indication 9 0.49 1.11 2.11 5.1 2.0 8.0 26.0 Overhaul 4 0.14 0.50 1.24 15.0 6.0 25.3 58.0
Unknown. 4 0.15 0.60 1.43 5.0 3.0 7.8 18.0
All modes 151 17.23 22.03 27.26 9.7 0.5 15.9 245.0
Source: OREDA, 1992.
5 .27 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 5.8: Oil well isolation valve failure data relevant for safety assessment
Failure mode Reasons for selection Failure rate x 106
hours (mean)
Critical external leakage
Includes external leakage and significant external leakage. An ignition has serious downstream safety consequences.
0.42
Fail to close Unable to isolate a downstream leak. Potentially serious.
3.81
Critical internal leakage
Includes seepage, internal leakage and significant internal leakage. If a leak occurs downstream of valve, isolation may not be effective.
1.4
Unknown Since it is listed as a critical failure and failure mode not known, it is better to include for conservative assessment.
0.14
Total 5.77
Source: Calculated from Figure 5.8.
Spurious operation is listed as a failure mode. Since the valve is normally open, a
spurious operation would refer to an unwanted closure. Whilst this would be a
production continuity risk, it is not a safety risk as being closed is the 'fail-safe'
position for the valve.
Degraded failures include external leakage, but this would only be very small
(otherwise it gets into the critical list) and can be handled safely by a planned
shutdown for maintenance.
Out of the 9.17 failures per 106 hours, only 5.77 in 106 hours (63%) contribute to a
safety risk.
Production continuity assessment
For production continuity, it is not sufficient to look at the failure modes associated with
safety risks. Business continuity risk requires identification of all failures that will require a
system shutdown for maintenance, resulting in production loss. Example 5.11
Using the information in Figure 5.8, the failure modes required for inclusion in the
assessment of production continuity are as follows.
Table 5.9: Oil well isolation valve failure rate relevant for production continuity assessment
Failure mode Reasons for inclusion Failure rate x 106 hours (mean)
All safety related failures
As in Table 5.8. 5.77
Other critical failures
Fail to open, overhaul, spurious operation, faulty indication.
3.39
Incipient Will require a shutdown for planned maintenance. 7.69 Degraded/
unknown Effect on system not known. Include to maintain
conservatism. 2.95
Total 19.80
Nearly 90% of the total failures could result in production interruption from that well
because maintenance repairs would require a shutdown of it.
5 .28 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
ES T I M AT I O N O F L I K E L I H O O D U S I N G A NA LY T I C A L T E C H N I Q U E S
In the previous section we discussed how to use statistical data to estimate the likelihood of
low consequence/high frequency and medium consequence/medium frequency loss events.
However, a different approach is required to estimate the likelihood of high consequence/
low frequency loss events such as major fires or explosions, structural collapses or dam
failures, because reliable statistical data is rarely available and these type of loss events are
usually caused by a complex combination of failures rather than a single failure alone.
In this section we will examine three analytical techniques that can be used to estimate the
likelihood of high consequence/low frequency loss events:
1. Fault tree analysis
2. Event tree analysis
3. Cause–consequence analysis.
FAULT TREE ANALYSIS
Fault tree analysis (FTA) is a widely used tool for the systematic analysis of combinations
of events that can lead to a loss event. A fault tree is a logic diagram showing the different
ways that a system can fail in terms of a defined final failure event.
You should now read Reading 5.1 'Fault trees' which provides an overview of the
construction and use of fault trees.
Reading 5.1 refers to the terms 'demand' and 'protection action or device' in relation to fault
tree construction. These terms are commonly used in FTA and need to be clearly
understood.
In general, the failure of an item of equipment or the development of an undesirable
situation (e.g. high level in tank) will create a 'demand' on the protection device to operate,
e.g. level switch to close feed valve. The undesirable top event occurs when there is a
demand and the protective device fails.
A 'demand' on the protective device to be brought into operation is generally expressed as a
frequency (e.g. number of times/year). The chance that the protective device will fail when
the demand occurs is expressed as a probability (no time units).
For example, the presence of gas in the vicinity of an LPG installation is a demand on the
gas detector (protective device) to shut off the isolation valves. If the detection system fails
when called upon to act, or the isolation valve fails to close, then there is the chance of a
fire or gas explosion, if the leak finds an ignition source. A simplified fault tree for such an
event is shown in Figure 5.9.
5 .29 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.9: Example of a fault tree for an LPG fire
LPG fireoccurs
LPG leaknot isolated
7.09 x 10 /yr-6
2.36 x 10 /yr-5
3.3 x 10 /yr-4
7.17 x 10-2
5 x 10 -36.7 x 10-2
0.3
Gasdetector
fails
SDVfails toclose
Ignitionoccurs
LPG leakoccurs Leak not isolated
A C T I V I T Y 5 . 1
1. Develop a fault tree for the loss event you analysed in Activity 4.1.
2. Attempt to quantify the fault tree. Base failure rates can be from experience
(e.g. obtained by talking to production/maintenance staff.)
3. Compare the top event frequency calculated against experience.
4. Conduct a sensitivity analysis on the failure rates to confirm any discrepancy
between calculated values and actual experience.
EVENT TREE ANALYSIS
Event tree analysis (ETA) is applied when a single hazardous event can result in a variety of
consequences. The analysis identifies and evaluates potential event outcomes that might
result following a failure or upset, normally called an initiating event. Demand frequencies
and component failure probabilities are applied to calculate the frequency of outcome
events. The analysis is presented in the form of an event tree logic diagram. Event trees are primarily safety-oriented and are particularly suitable for the analysis of
systems where time is a significant factor, for example, when manual intervention can avoid
the escalation of an event if applied within a specified timeframe. Working forward in time
from the failure event, the operation of each safety failure or contingency plan is
considered. If these fail to achieve the desired result, the consequence is established and the
frequency is determined.
Generally, each node in an event tree has two branches, although several branches from the
same node are possible (similar to a decision tree). The two branches in each node
represent success (yes) or failure (no) of the protective device or system and can lead to a
5 .30 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
different outcome, depending on the path. The protective devices or systems can include
hardware items (e.g. firewater pump) or procedural items (e.g. emergency response), or
both. Each protective device or system is treated as a separate node, and the outcome of its
success or failure is analysed through the two branches.
The estimation of failure probabilities in each node of the event tree or each base event of
the fault tree requires information from historical equipment failure rate data and/or human
error assessment.
Example 5.12
Figure 5.10 shows a typical event tree. Starting with the initiating event (motor
burnout), the tree branches into various fire damage scenarios with five possible final
outcomes. For each branch, a corresponding probability value is ascribed. The
probability of a given final outcome is obtained by multiplying the individual
probabilities along the route leading to that final outcome. The sum of the
probabilities (or frequencies if the initiating event is given as a frequency) of all the
final outcomes should equal the initiating probability (or frequency).
Figure 5.10: Example of event tree
P1
P2
1–P2
P3
1–P3
P4
1–P4
P0
1–P1
1 yr delay,10 killed+ $2 million damageP0 P1 P2 P3 P4
3 month delay+ $100 000 damageP0 P1 P2 P3 (1 – P4)
15 hr delay+ $10 000 damageP0 P1 P2 (1 – P3)
10 hr delay+ $2000 damageP0 P1 (1 – P2)
5 hr delay+ $1000 damageP0 (1 – P1)
Motor overheats
PNo = 1 – P Yes
P1 = Overheatingcauses fire(10–4)
P2 = Fire notextinguished(10–1)
P3 = Line rupture(10–2)
P4 = Explosion(10–1)
Final outcome
Yes
Yes
Yes
Yes
No
No
No
No
Example 5.13
An example of an event tree for the loss of emergency power supply is shown in
Figure 5.11. When normal grid power supply is interrupted, the following backup
systems are used:
diesel alternator
battery power.
If both backup systems fail, then there is total loss of power.
5 .31 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The frequency of total loss of power is calculated by the addition of the contribution
of two points as shown in Figure 5.11.
1. F1 = P1x (1–P2) x (1–P3) = 9.8 x 10–7 p.a.
2. F2 = (1–P1) x (1–P3) = 2 x 10–5 p.a.
Total frequency 2.1 x 10–5 p.a.
The larger contribution is from the failure of the diesel engine to start, therefore
efforts to improve backup systems should be directed to improving the reliability of
the diesel alternator.
Figure 5.11: Event tree for loss of grid power supply
Yes
No
Yes
No
Yes
No
Yes
No
(1 – P2) = 0.001
P2 = 0.999
P3 = 0.99
P3 = 0.99
(1 – P3) = 0.01
(1 – P3) = 0.01
Continue operation
Continue operation
Total loss of power
Continue operation
Total loss of power
P1 = 0.98
(1 – P1) = 0.02
Loss of gridpower supplyf = 0.1/year
Battery powersupply functional
Diesel engineruns for requiredperiod
Diesel enginestarts
A range of software packages is available to carry out the fault tree and event tree analyses.
A demonstration version of RM Consultants' LOGAN fault and even tree analysis program
can be downloaded from: http://www.rmclogan.co.uk/index2.htm.
CAUSE–CONSEQUENCE ANALYSIS
By combining a fault tree analysis and an event tree analysis, the frequencies for the
outcomes of all loss events can be obtained. This is referred to as a cause–consequence
analysis. Examples of cause–consequence analyses are shown in Figures 5.12 and 5.13. Figure 5.12: Example of cause–consequence analysis (1)
InjuryFatalityStructural damageEnvironmental pollution
Fault tree analysis
Causes
CorrosionErosionMaterial defectImpact/collisionHuman errorIgnition sources
Accident event
Top event FTAStart event ETA
Hydrocarbonrelease/ignition
Event tree analysis
Escalation
Safety systems failure
Gas/fire detection Emergency shutdown Deluge Emergency response
Outcome
5 .32 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.13: Example of cause–consequence analysis (2)
Bridge collapseInjuryFatalityMajor structural damage
Fault tree analysis
Causes
Inadequate designExcessive loadSubsidenceSeismic activitySoil erosion
Accident event
Top event FTAStart event ETA
Road bridge weakenedHigh vibration
Event tree analysis
Escalation
Safety systems failureDegradation not detected(human error)Inspection delayedVibration monitoring equipment incorrectLoad restriction not followed
Outcome
A cause–consequence analysis can be expressed diagrammatically as a cause–consequence
model, which consists of a fault tree and an event tree joined in the centre by the event of
concern, generally known as the loss of control point or accident. This provides a
quantitative method for calculation of consequence probabilities, e.g. fatality. It also allows
the analyst to identify the key factors that can be modified/improved in order to reduce the
probability of the undesired consequences.
An example of a cause–consequence model is shown in Figure 5.14. Further information
on cause–consequence modelling can be found in Robinson et al. (2006).
Figure 5.14: Cause-consequence model
Top Event FTA
Start Event ETA
Fault Tree Analysis Event Tree Analysis
RI S K M E A S U R E M E N T A N D R A N K I N G
The results of the consequence severity and likelihood analyses are combined for each
outcome of each loss event to obtain an overall measure of risk associated with each
outcome. These individual risk contributions may be summed to provide total risk
measures for the facility.
Measuring the risk of loss events serves the following purposes.
The risks can be ranked to identify the major risk contributors and provide a sound
basis for risk management.
The calculated risk levels can be compared with risk targets or criteria and/or the
historical risk level of the industry, company or other installations.
5 .33 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The significance of the calculated risk levels can be reconciled with risks from other
activities.
The risk levels of different design/operating options can be compared.
Decisions can be made about whether a certain level of risk is tolerable or whether to
proceed with a project.
There is no single standard method of risk measurement, ranking and presentation. The
most suitable method(s) depends on the information and resources available, the objectives
of the risk assessment and the intended audience. Three different types of approaches are
discussed in this section:
the qualitative risk matrix approach
approaches for risk to people
approaches for risk to projects.
It must be emphasised that risk analysis can only provide estimates of risk. When using
these estimates to make technical decisions, develop management strategies or
communicate risk to the public or the government, it is essential that the uncertainties be
known and acknowledged.
QUALITATIVE RISK MATRIX APPROACH
A qualitative risk matrix is a graphical representation of the risk as a function of
consequence severity and likelihood, and is very useful for an initial assessment and ranking
of risks to enable priorities to be allocated.
A typical qualitative risk matrix is shown in Figure 5.15. The matrix brings together the
information shown in Table 5.1 and Table 4.2 and shows events of decreasing likelihood
from top to bottom, and events of increasing severity from left to right. It groups risk into
four categories: Extreme (E), High (H), Moderate (M) and Low (L).
Figure 5.15: Qualitative risk matrix
LIKELIHOOD
Almost Certain A
SEVERITY
Likely B
Possible C
Unlikely D
Rare E
1Insignificant
2Minor
3Moderate
4Major
5Catastrophic
E = extreme risk; immediate attentionrequired
H = high risk; senior managementattention required
M = moderate risk; managementresponsibility must be specified
L = low risk; manage by routineprocedures
H H E E EM H H E EL M H E EL L M H EL L M H H
5 .34 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The advantage of the qualitative risk matrix is that it graphically identifies the events that
require priority action from management. The disadvantage is that it uses qualitative scales
and risk categories that are open to highly subjective interpretations. All variables should
be defined quantitatively in order to reduce the subjectivity.
APPROACHES FOR RISK TO PEOPLE
Risk to people can be measured in terms of injury or fatality. The use of injuries as a basis
for risk evaluation may be less disturbing to some than the use of fatalities. However, when
risk is expressed in terms of injury rather than fatality, two key problems are introduced:
the type and extent of injury must be defined clearly, e.g. first or second degree burns
from fires, lung rupture from explosion overpressure, which means different injury
risks are not directly comparable.
historical fatality rate data are available for many industries and activities, but historical
injury rate data is less common, so if the risk is expressed in terms of injury, direct
comparison of performance within and across industries may not be possible.
Fatal accident rate
The fatal accident rate (FAR) is a measure of the average risk of fatality to employees in a
hazardous facility or industry. It is used extensively in industry as a measure of risk.
FAR is defined as the number of fatalities per 100 million worked (exposed) hours.
Historical FARs are normally calculated using a combination of fatality statistics over a
defined period and an estimate of the total number of hours worked by all employees over
this period:
Number of Fatalities over N years x 108
Total Number of Hours Worked (Exposed) by Employees over N years (5.27)
The fatal accident rates for several industries in Australia are listed in Table 5.10.
Table 5.10: Fatal accident rates in Australian industry
Industry category FAR Mining (non-coal) 27 Mining (coal) 17 Agricultural forestry 11 Construction 9 Chemicals, petroleum 4 Other manufacturing 3
Source: Calculated from ABS data.
FAR is one of the risk measures used in quantitative risk assessment studies. The
calculation of expected FARs requires that the estimate of the total number of hours spent
by all personnel in the plant be weighted to account for 'time on site' variations between
process, maintenance, construction, etc. Therefore, the total number of hours per year all
personnel spend on site can be expressed as:
5 .35 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Total exposed hours/year = Σ(x1 M1 + x2 M2 .... xn Mn) x 8760 hours/year (5.28)
where:
Mn = number of personnel in crew n
xn = fraction of time crew n spends on site or in area A
n = category of personnel (process, maintenance, construction, etc.)
The FAR for each category of personnel is normally calculated as:
where:
fi = frequency of incident (p.a)
pi = probability of fatality for incident i
θi = fractional exposure time
ni = number of fatalities for incident i
Example 5.14
For illustrative purposes, the FAR calculation for a single event is shown below.
Scenario: Reactor–Ethylene feed gas line
(20 mm) flange failure in reactor area
and jet fire.
Frequency of loss event: 2.24 x 10–4 p.a.
Probability of fatality: 1 (based on jet fire size in area)
Fractional exposure time: 0.067 (based on the average time a
person may spend in the area)
Number of fatalities: 2 (based on 2 persons present in area
at the time of incident)
Number of personnel on site: 60
Average time spent by personnel on site: 8 hours/day
Number of man-hours on site p.a.: 60 x 8 x 365 days
Therefore, FAR = 2.24 x 10–4 x 1 x 0.067 x 2 x 108
60 x 8 x 365
= 0.017
Lost time injury rate
The measure conventionally used for lost time injuries is expressed as the lost time injury
rate (LTIR). It is also sometimes referred to as lost time injury frequency rate (LTIFR),
even though the frequency and the rate refer to the same thing.
LTIR is defined as the number of lost time injuries per million hours worked. It is
calculated as follows.
LTIR =
Number of LTI x 106
Number of hours worked
p.a. hours man exposed of number
10
FAR
8
× ⎟ ⎟ ⎠
⎞
⎜ ⎜ ⎝
⎛
=
∑ i
i i i i n p f θ
( 5.29 )
5 .36 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Other similar measures for measuring safety performance are:
a) Major injury severity rate:
MISR =
Number of days lost x 10 6
Number of hours worked
(i.e. days lost due to lost time injuries per million hours worked)
b) Lost time injury incidence rate:
LTIIR =
Number of LTI x 100
Average number of employees
(i.e. % of workforce that suffered lost time injury in the given time period.)
Data on lost time injuries for different industries are collected by government agencies
responsible for health and safety at work.
Individual risk
Individual risk is usually expressed as the probability that a person would be harmed in the
course of a year, due to major hazard(s). For example, this may be expressed as a risk of
one chance in a million per year that a person may sustain fatal injuries due to an incident at
a hazardous facility.
Individual risk is the most common form of risk measurement and presentation for land-
based hazardous industries. It is used by government authorities in a number of countries to
assess the risk levels from new and existing hazardous facilities as part of the decision-
making process for land-use safety planning. These government planning authorities are
mainly concerned with risks to the public.
The calculation of individual risk at a geographical location near a plant assumes that the
contributions of all loss events at the facility are additive. The total risk at each point is
therefore equal to the sum of the risks of all possible loss events at that point associated
with that plant.
The total risk at geographical location x, y is given by:
Individual riskxy = Σ Frequency of event outcome
(p.a.)
x Probability of fatality from individual
event
x Fractional exposure
time
(5.25)
sum of event
outcomes
Note that the calculation of individual risk requires the evaluation of all the possible
outcomes of each loss event and their corresponding probabilities using fault tree/event tree
analysis. For example, a flammable hydrocarbon release can result in a jet fire, pool fire,
BLEVE, vapour cloud explosion, flash fire or safe dispersal. Each outcome needs to be
accounted for in the above equation.
Individual risk is normally presented in the form of risk contour plots. Risk contours show
individual risk estimates at specific points on a map. These contours connect to points of
equal risk around a hazardous facility. An example of a risk contour plot is shown in Figure
5.16. For public risk around a land-based installation, the fractional exposure time is
5 .37 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
generally taken conservatively as 1. That is, all members of the public are present
24 hours/day, 365 days/year.
In addition to risk contours, the risk level for a specific individual most exposed to a hazard
may sometimes need to be calculated. This is referred to as 'peak individual risk'. For
example, in the formula for individual risk given above, by substituting the probability of
fatality and the fractional exposure time for the most exposed individual, the peak
individual risk is obtained.
In estimating risk to people in a residential area, it is generally assumed that at least one
individual would be in the residential area for 100% of the time. Thus, the individual risk
contour becomes the peak individual risk as well. Some analysts make a distinction
between time spent indoors and time spent in open air for toxic exposures. In that case, the
number of air changes per minute in the building also has to be taken into account. While
in theory this is a correct approach, in practice too many assumptions have to be made at
each level, many of them difficult to substantiate. Therefore, risk to residential areas is
often estimated as peak individual risk to minimise uncertainty and maintain conservatism.
Figure 5.16: Typical risk contour plot for individual risk of fatality
5 .38 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Potential loss of life
A commonly used index for risk of fatality to personnel is the potential loss of life (PLL)
which is defined as the expected average number of fatalities over the life of the facility, or
over a given time period, e.g. one year. The event frequency, the probability of fatality and
the number of people affected are multiplied together to obtain the PLL.
The PLL is normally calculated as follows:
PLL = frequency of loss event (p.a.) x probability of fatality
x fractional exposure time x number of people exposed
x duration of activity phase (lifetime of facility). (5.26)
The sum of these PLL for all events considered in the study provides the total risk.
Detailed information on projected population distribution on a plant is required for this
analysis. This includes the approximate fraction of the time spent in each plant section by
all employees, for example:
plant operators
construction contractors
maintenance personnel
transport personnel.
Where the consequences of events exceed the site boundary, the surrounding population
would also need to be considered.
F–N curves
F–N curves are also known as societal risk curves and have been extensively used in
quantitative risk assessment studies for land-based industries. F–N curves are cumulative
frequency-fatality plots, showing the cumulative frequencies (F) of events at which N or
more fatalities could occur. They are derived by sorting the frequency–fatality (F–N) pairs
from each outcome of each loss event and summing them to form cumulative
frequency–fatality (F–N) coordinates on a log–log plot. A typical F–N curve is shown in
Figure 5.17.
F–N curves for land-based facilities include fatalities outside plant boundaries involving the
public and employees in neighbouring industrial facilities. Therefore, the consequence
analysis has to estimate the number of fatalities that can result outside plant boundaries for
each event outcome. This requires detailed information on population densities in the
vicinity of the plant.
Unlike the aforementioned risk measures, F–N curves address two important issues. Firstly,
the public believes that the number of people exposed to a particular risk is important.
Secondly, the public is more alarmed at single loss events involving multiple fatalities than
a large number of smaller events causing the same number of fatalities over a period of
time. This aspect is discussed further in Topic 6.
F–N curves are typically determined and published by government authorities in relation to
land use planning.
5 .39 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 5.17: Typical F–N curve
APPROACHES FOR RISK TO PROJECTS
Quantitative risk matrix
We have discussed the risk matrix technique for qualitative risk assessment. Vose (2000)
has described how the matrix can be used for assessing risk in a semi-quantitative fashion.
A value range is ascribed to the probability/severity scales to match the size of the project.
An example is shown in Table 5.11. Table 5.11: Value ranges for use in risk matrix for project risks
Impact on Project
Scale Probability
(%) Schedule
Delay (months)
Cost Increase
(%) Performance
NIL 0 0 0 None
VLO 0-10 <1 <5 Does not meet a minor objective
LO 10-20 1-2 5-10 Does not meet more than one minor objective
MED 20-30 3-4 10-15 Shortfall in meeting objectives
HI 30-40 4-6 15-30 Significant shortfall in meeting objectives
VHI 40-50 >6 >30 General failure in meeting objectives
5 .40 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Sensitivity analysis and probability contours
In Topic 4 we discussed how to use a sensitivity analysis to identify the effect a change in a
single 'risk' variable will have on the total project cost. The spider diagram technique we
examined can be extended to show the confidence limits (Flanagan and Norman, 1993).
The assessment focuses on how likely it is that the cost parameter will vary within a
particular range. The probability assessment, carried out as a separate exercise, forms an
input to the spider diagram.
The risk parameter is a variable subject to a defined statistical distribution. Vose (2000)
describes two methods for determining statistical distributions for the risk variables,
depending on whether data is available.
Where data is available, a table of discrete points in a distribution is used. Vose
recommends fitting a non-parametric distribution (i.e. no need to fit a mathematical model
to the data) using standard distributions such as Beta, normal, triangular, etc. The fitted
distribution should be subject to a χ2-test to test its statistical fit.
Where no data is available or is sparse, expert opinion is required to 'fill-in the holes'. This
adds another element of uncertainty to an already random variable. Vose (2000) reports
that the triangular distribution is the most commonly used distribution for modelling expert
opinion. It is defined by its minimum, most likely and maximum values (three points in a
distribution). Vose also recommends combining three different expert opinions with
weights allocated to each opinion. Warnings against incorrect uses are given.
Let us say that the risk parameter's standard deviation for the distribution selected is
available from an independent analysis such as a Monte Carlo simulation (see next section).
We know from the control limit theorem that there is a 95% probability that the parameter
will lie within ±2σ where σ is the standard deviation. The 95% confidence limits are
plotted on the spider diagram as two points on the sensitivity analysis curve. This is
repeated for each risk parameter in turn. Finally, when all such points are connected, we
have what is referred to as the probability contour. An example is shown in Figure 5.18. In
this figure, point A indicates that there is a 95% probability that parameter A would lie
within ± a1% of its expected value. The probability contour also shows that there is a 95%
probability that the life cycle cost would lie between the lower and upper limit. Figure 5.18: Probability contour
+
-
1
1+a
-a
%V
aria
tion
in P
aram
eter
Life Cycle Cost
1A
2
3
Source: Flanagan & Norman, 1993: 100.
5 .41 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Monte Carlo method
The basic steps in the Monte Carlo method are described by Flanagan and Norman (1993).
Step 1: Determine the probability distribution for the risk variable. This has been
discussed in the previous section. The most popular is the triangular
distribution (due to paucity of data).
Step 2: Generate a random number to represent the variable, using a random number
generator, subject to the constraints of the probability distribution.
Step 3: Calculate an estimate of the final output (project cost, project schedule in
weeks, operating cost, etc.), using the random value of the variable generated in
Step 2.
Step 4: Repeat Steps 2 and 3 to generate a data set of output versus variable. Grey
(1995) recommends a minimum of 300 simulations and a maximum of 1000,
above which further simulations generally do not refine the results.
Step 5: Plot the N estimates as a cumulative frequency curve and as a histogram.
Step 6: Interpret the results carefully. Look for any interdependence between the
variables.
Step 7: Test the sensitivity of the data by performing a sensitivity analysis on the key
elements in the analysis.
The Monte Carlo method cannot be performed by hand. Software packages are available to
carry out the study. The two popular software packages are:
1. @Risk
This allows Monte Carlo simulations to be conducted within Microsoft Excel. It has
special functions to select probability distributions, fit probability distributions from
input data, and graphically display the results. Further information can be found on the
developers' website: http://www.palisade.com.au/risk.
2. Crystal Ball
This program performs the same function as @Risk, and is an alternative tool. Further
information can be found on the developers' website: http://www.decisioneering.com/.
Further software for project risk analysis is available at the Vose Consulting website:
http://www.risk-modelling.com/.
A project risk analysis is primarily concerned with the general uncertainty for the problem.
For instance, we may construct a model to estimate how long it will take to design,
construct and commission a gas turbine power generation facility. The model would be
broken down into key tasks and probabilistic estimates made for the duration of each task.
We would then run a simulation to find the total effect of all these uncertainties.
One question that arises is: Should we include rare events (high severity/low frequency) in
the risk analysis model? For instance, should we include the risk of a gas explosion and
major damage to the power station in the project risk analysis? According to Vose (2000),
one should not include rare events, as it tends to increase the standard deviation of the
simulation results significantly so that the expected value cannot be predicted within
reasonable confidence limits. Techniques such as fault tree and event tree analysis,
discussed earlier in this topic, are the appropriate tools for these rare events.
The final question is: Why go to the length of complex Monte Carlo simulations when most
of the time people stop with a deterministic analysis using single-point estimates for each
task duration and cost? Vose (2000) has compared the results of the deterministic versus
5 .42 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
stochastic analysis in a number of cases and reports that the latter provides a mode and
mean that are nearly always greater than the deterministic model, and that sometimes the
output from a distribution does not even include the deterministic result. This indicates that
the risk is often underestimated in single-point deterministic methods, and that a Monte
Carlo simulation is a more reliable guide to the project risks.
SUMMA RY
In this topic we examined the fourth and fifth steps of the risk management framework:
estimating the likelihood of a loss event occurring and measuring and ranking the overall
risk.
We began by discussing the distinction between the two dimensions that need to be taken
into account in likelihood estimates: event probability and event frequency. We then
discussed three basic approaches to estimating the likelihood of loss events:
1. A simple qualitative approach that can be used before to help decide which of the two
quantitative approaches is most appropriate to a given scenario.
2. A quantitative approach using reliable statistical data to estimate the likelihood of loss
events caused by single failures.
3. A quantitative approach using analytical techniques such as a fault tree analysis, an
event tree analysis or a cause–consequence analysis to estimate the likelihood of loss
events caused by multiple failures, by breaking them down into their contributing
causes.
We concluded the topic with a discussion of a range of techniques that can be used to
measure and rank risks.
EX E RC I S E S
5.1 FAILURE RATES
A factory bottles petroleum spirit using a bottling machine. There are eight independent
flexible lines/connections in the machine. The failure rate of a flexible line may be taken as
3.6 per million hours of operation. The bottling line operates six hours a day, five days a
week for 45 weeks a year. The rest of the time is spent on cleaning and equipment
maintenance.
Calculate the release frequency of petroleum spirit.
5.2 FRACTIONAL DEAD TIME
Following a fire in the bottling machine described in 5.1 above, the company decides to
install a remote operated shutdown valve in the product supply line to the machine. The
manufacturer assures the company that the valve is reliable and has a low failure rate, of the
order of 0.02 per year, based on past experience.
5 .43 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The company installs the valve and tests its operation once every six months in a
maintenance schedule. After a few years, people take the valve operation for granted and
discontinue the critical function test.
a) Calculate the release frequency of petroleum spirit, assuming the same failure rate for
flexible line failure as in Exercise 5.1, but with the emergency isolation valve operating
and tested.
b) Repeat the calculation in (a), but this time assuming that the valve function test has
been discontinued.
5.3 FAULT TREE ANALYSIS
A switch room has two light globes in a parallel circuit as shown in the wiring schematic
below. There are no windows in the room and the lights are left on all the time. If both
lights fail there will be total darkness in the room. Should such a failure occur, maintenance
access would be delayed, with corresponding plant downtime. The switch room is routinely
visited by maintenance personnel once a week, unless there is a need for a special visit.
a) Develop a fault tree for the situation of no light in the switch room (top event).
b) Calculate the frequency of the top event, given the following base data.
Power failure = 0.2 per year
Fuse failure from overload = 0.2 per year
Circuit breaker (switch) fails open = 0.01 per year
Light globe failure = 0.0001 per hour of operation
Figure 5.19
Powersource
Fuse
Switch
Lights
Room boundary
5.4 EVENT TREE ANALYSIS
In a printing press for specialised printing, a solvent-based ink is used. A flammable
solvent is pumped from storage to an ink mixing tank. The frequency of pump motor
overheating is estimated to be 10–3 per year. In such an event, in 1 out of 10 situations, an
electrical fire could result. In such situations, the following sequence of events can occur
(Wells, 1984).
If no fire occurred, the loss would be about $2500 and there would be a five-hour
production delay until a new motor was installed.
5 .44 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
If a fire did occur, it is likely it would be extinguished quickly as there are generally people
present in the area. The loss would be about $5000 and there would be a ten-hour
production delay until a new motor was installed. However, there is a 1% chance that the fire may not be extinguished immediately. In such
case, by the time the fire is brought under control and the plant is started up, there would be
a 15-hour delay, but the cost of damage would be much higher, $25 000. In the event of a prolonged fire, depending on the orientation of the flame, there is a 10%
chance that a solvent line connection could rupture. This would result in a major fire
causing a loss estimated at $250 000 and a delay of three months to allow for investigation,
redesign to reduce risk, lead time for new equipment delivery, and so on. There is also a 1% chance that solvent vapours could accumulate in congested areas and
result in an explosion. This is a major loss event, with losses up to $2.5 million and delays
of up to one year. There could also be fatalities on site.
a) Construct an event tree to describe the above sequence of events.
b) Quantify the event tree and calculate the probabilities of the various outcomes. RE F E R E N C E S A N D F U RT H E R R E A D I N G
Publications
Bedford, T. & Cooke, R. (2001) Probabilistic Risk Analysis: Foundations and Methods,
Cambridge University Press, Cambridge, UK.
Blything, K.W. & Reeves, A.B. (1988) An Initial Prediction of BLEVE Frequency of a 100
Tonne Butane Storage Vessel, UKAEA/SRD.
Center for Chemical Process Safety (CCPS) (1989a) Guidelines for Process Equipment
Reliability Data: With Data Tables, CCPS, American Institute of Chemical Engineers,
New York.
Center for Chemical Process Safety (CCPS) (1989b) Guidelines for Chemical Process
Quantitative Risk Analysis, CCPS, American Institute of Chemical Engineers, New
York.
Center for Chemical Process Safety (CCPS) (2000) Guidelines for Chemical Process
Quantitative Risk Analysis, CCPS, American Institute of Chemical Engineers, New
York.
Center for Chemical Process Safety (CCPS) (2003) Guidelines for Analyzing and
Managing the Security Vulnerabilities of Fixed Chemical Sites, CCPS, American
Institute of Chemical Engineers, New York.
Cox, A.W., Lees, F.P. & Ang, M.L. (1990) Classification of Hazardous Locations,
IChemE, Rugby, UK.
Dougherty, E.M. & Fragola, J.R. (1988) Human Reliability Analysis: A System Engineering
Approach with Nuclear Power Plant Applications, John Wiley & Sons, New York.
Energy Institute (UK) (2005) Top Ten Human Factors Issues Facing Major Hazards
Sites—Definition, Consequences, and Resources, available at:
http://www.energyinst.org.uk/content/files/hftopten.doc, accessed 11 December 2006.
Fenton, N.E. & Pfleeger, S.L. (1997) Software Metrics: A Rigorous and Practical
Approach, 2nd edn, PWS Publishing, Boston, Massachuestts.
5 .45 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Flanagan, R. & Norman, G. (1993) Risk Management and Construction, Blackwell
Scientific, Oxford, England.
Fullwood, R. (2000) Probabilistic Safety Assessment in the Chemical and Nuclear
Industries, Butterworth-Heinemann, Boston, Massachusetts.
Gertman, D.I., Blackman, H.S., Haney, L.N., Deidler, K.S. & Hahn, H.A. (1992)
'INTENT—A method for estimating human error probabilities for decision-based
errors', Reliability Engineering and System Safety, 35: 127–136.
Gertman, D.I. & Blackman, H.S. (1994) Human Reliability and Safety Analysis Data
Handbook, John Wiley & Sons, New York.
Grey, S. (1995) Practical Risk Assessment for Project Management, John Wiley & Sons,
Chichester.
Health and Safety Commission (HSC) (1991) Study Group on Human Factors. Second
Report: Human Reliability Assessment—A Critical Overview, HMSO, London.
Health and Safety Executive (HSE) (1990) Risk Criteria for Land-Use Planning in the
Vicinity of Major Industrial Hazards, HMSO, London.
International Atomic Energy Agency (IAEA) (1990) Human Error Classification and Data
Collection, TECDOC 5.38, IAEA, Vienna.
Kales, P. (1998) Reliability: For Technology, Engineering, and Management, Prentice
Hall, Upper Saddle River, New Jersey.
Kapur, P.K. & Verma, A.K. (2005) Quality, Reliability and Information Technology:
Trends and Future Directions, Narosa, New Delhi.
Kumamoto, H. & Henley, E.J. (1996) Probabilistic Risk Assessment and Management for
Engineers and Scientists, IEEE, New York.
Lees, F.P. (ed.) (1996) Loss Prevention in the Process Industries, 2nd edn, Butterworth-
Heinemann, Oxford.
Mancini, G. (1978) Data and Validation, C.E.C. Joint Research Centre, Ispra, Italy, RSA
12/78, June 6.
Modarres, M. (2005) Risk Analysis in Engineering: Techniques, Tools, and Trends, Taylor
& Francis, Boca Raton, Florida.
Moieni, P., Spurgin, A.J. & Singh, A., (1994) 'Advances in human reliability analysis
methodology. Part I: Frameworks, models and data', Reliability Engineering and
System Safety, 44: 27–55.
Murphy, D.M. & Paté-Cornell, M.E. (1996) 'The SAM framework—modeling the effects of
management factors on human behaviour in risk analysis', Risk Analysis, 16(4): 501–
515.
Nelson, W. (2004) Accelerated Testing: Statistical Models, Test Plans and Data Analyses,
Wiley-Interscience, Hoboken, New Jersey.
O'Connor, P.D.T. (1991) Practical Reliability Engineering, 3rd edn, John Wiley & Sons,
New York.
Ohring, M. (1998) Reliability and Failure of Electronic Materials and Devices, Academic
Press, San Diego.
OREDA (1992) OREDA Offshore Reliability Data Handbook, 2nd edn, Veritech, Norway,
distributed by Det Norske Veritas, Norway.
OREDA (2003) OREDA Offshore Reliability Data Handbook, 4th edn, prepared by
SINTEF Technology and Society and distributed by Det Norske Veritas, Norway,
5 .46 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
http://www.sintef.no/static/TL/projects/oreda/handbook.html#Order, accessed 26
October 2006.
Pape, R.P. & Nussey, C. (1985) 'A basic approach for the analysis of risks from major toxic
hazards', IChemE Symposium Series No. 94, Institution of Chemical Engineers, Rugby,
UK.
Porter, A. (2004) Accelerated Testing and Validation: Testing, Engineering, and
Management Tools for Lean Development, Newnes, Boston, Massachusetts.
Reason, J. (1990) Human Error, Cambridge University Press, Cambridge, England.
Reason, J. (1997) Managing the Risks of Organizational Accidents, Ashgate, Aldershot.
Robinson, Richard M. et al. (2006) Risk & Reliability—An Introductory Text, 6th edn, Risk
& Reliability Associates Pty Ltd, Melbourne.
Rome Laboratory & Reliability Analysis Centre (1995, 2002, 2004) Reliability Toolkit,
http://quanterion.com/Publications/Toolkit/index.asp, accessed 26 October 2006.
Smith, A.M. (1993) Reliability Centered Maintenance, McGraw-Hill, New York.
Standards Australia/Standards New Zealand (1998) Risk Analysis of Technological
Systems—Applications Guide, Australian/New Zealand Standard AS/NZS 3931:1998.
Sträter, O. & Bubb, H. (1999) 'Assessment of human reliability based on evaluation of plant
experience: requirements and implementation', Reliability Engineering and System
Safety, 63(2): 199–219.
Swain, A.D. & Guttman, H.E. (1983) A Handbook of Human Reliability Analysis with
Emphasis on Nuclear Power Plant Applications, US NRC, Nurge/ CR-1278,
Washington, D.C., Sandia National Laboratories.
Tweeddale, H.M. (1992) 'Balancing quantitative and non-quantitative risk assessment',
Proc. Safety and Environmental Protection, IChemE, May.
United States Department of Defense (1981) Reliability Modelling and Prediction, Military
Standard, MIL-STD-756B.
United States Department of Energy Quality Managers (2000) Software Risk Management:
A Practical Guide, US Department of Energy, available at:
http://cio.energy.gov/documents/sqas21_01.doc, accessed 13 December 2006.
Vose, D. (2000) Risk Analysis: A Quantitative Guide, 2nd edn, John Wiley & Sons,
Chichester.
Wasserman, G.S. (2002) Reliability Verification, Testing and Analysis in Engineering
Design, Marcel Dekker, New York.
Wells, G.L. (1984) Safety in Process Plant Design, John Wiley & Sons, New York.
Wells, G.L. (1991) Safety in Process Design, John Wiley & Sons, New York.
Whittingham, R.B. (2004) The Blame Machine: Why Human Error Causes Accidents,
Elsevier, Boston, Massachusetts.
Williams J.C. (1986) 'HEART—A Proposed Method for Assessing and Reducing Human
Error', in 9th Advances in Reliability Technology Symposium, University of Bradford,
England.
Yu, R-J, Hwang, S-L & Huang, Y.H. (1999) 'Task analysis for industrial work process from
aspects of human reliability and system safety', Risk Analysis, 19(3): 401–415.
5 .47 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Websites
Decisioneering (Crystal Ball) http://www.decisioneering.com
Det Norske Veritas http://www.dnv.com/technologyservices/handbooks
Exprosoft http://www.exprosoft.com
Government-Industry Data Exchange Program http://www.gidep.org
Norwegian University of Science and Technology ROSS website http://www.ntnu.no/ross/index.php
Palisade (@Risk) http://www.palisade.com.au/risk
RM Consultants http://www.rmclogan.co.uk/index2.htm
Vose Consulting http://www.risk-modelling.com
5 .48 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
AP P E N D I X 5 .1 : GE N E R I C S TAT I S T I C A L DATA S O U RC E S F O R R I S K A N D R E L I A B I L I T Y S T U D I E S
Advanced Mechanics and Engineering Ltd (AME) (1990) Research Data.
Alion (Annual) System and Part Integrated Data Resource (SPIDR), Alion System
Reliability Center, http://src.alionscience.com/spidr, accessed 26 October 2006.
Ayyub, B.M. (2003) Risk Analysis in Engineering and Economics, Chapman & Hall/CRC,
Boca Raton, Florida.
Blything, K.W. (1984) In Service Reliability Data for Underground Cross-Country Oil
Pipelines, UKAEA/SRD.
Blything, K.W. & Reeves, A.B. (1988) LPG Vessel and Equipment Failure Rates, based on
SRD Database.
British Telecom (1994) Handbook of Reliability Data for Components used in
Telecommunications Systems, HRD5.
Cannon, A.G. & Bendell, A. (eds) (1991) Reliability Data Banks, Elsevier Applied Science,
London.
Carderock Division of the Naval Surface Warfare Center (CDNSWC) (2006) Handbook of
Reliability Prediction Procedures for Mechanical Equipment, http://www.mechrel.com/products.php, accessed 26 October 2006.
Center for Chemical Process Safety (CCPS) (1989a) Guidelines for Process Equipment
Reliability Data: With Data Tables, CCPS, American Institute of Chemical Engineers,
New York.
Center for Chemical Process Safety (CCPS) (Annual) Process Equipment Reliability
Database (PERD), http://www.aiche.org/CCPS/ActiveProjects/PERD/index.aspx,
accessed 26 October 2006.
Cox, A.W., Lees, F.P. & Ang, M.L. (1990) Classification of Hazardous Locations,
IChemE, Rugby, UK.
Exida (2006) Safety Equipment Reliability Handbook, 2nd edn, Exida,
http://www.exida.com, accessed 26 October 2006.
Exprosoft (1999) Reliability of Well Completion Equipment—Phase III, Exprosoft,
http://www.exprosoft.com, accessed 26 October 2006.
Exprosoft (2002)Reliability of Well Completion Equipment—Phase IV, Exprosoft,
http://www.exprosoft.com, accessed 26 October 2006.
Exprosoft (2003) SubseaMaster: Experience Database for Subsea Production Systems—
Phase II, Exprosoft, http://www.exprosoft.com, accessed 26 October 2006.
Flamm, J. & Luisi, T. (eds) (1992) Reliability Data Collection and Analysis, Kluwer,
Dordrecht.
IEEE (1984) 'IEEE guide to the collection and presentation of electrical, electronic, sensing
component, and mechanical equipment reliability data for nuclear-power generating
stations', IEEE Std 500—1984, Institute of Electrical and Electronic Engineers Inc.
Institution of Electrical Engineers (IEE) (1981) Electronic Reliability Data: A Guide to
Selected Components, Institution of Electrical Engineers, UK.
5 .49 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
International Atomic Energy Agency (IAEA) (1988) Component Reliability Data for Use in
Probabilistic Safety Assessment, TECDOC-478, IAEA, Vienna.
Kumamoto, H. & Henley, E.J. (1996) Probabilistic Risk Assessment and Management for
Engineers and Scientists, IEEE, New York. Includes data from Green & Bourne
(1972), Mancini (1978), Lees (1996) and WASH-1400.
Lees, F.P. (ed.) (1996) Loss Prevention in the Process Industries, Butterworth-Heinemann,
Oxford, Appendix 14, Vol. 3.
Maintenance 2000 (Annual) Failure Rate Data in Perspective (FARADIP), Maintenance
2000, http://www.maint2k.com/failure-rate-data-in-perspective.htm, accessed 26
October 2006.
Melvin J.G. & Maxwell R.B. (eds) (1974) Reliability and Maintainability Manual—
Process Systems, AECL–4607, Chalk River Nuclear Laboratories, Ontario, Canada.
Moss, T.R. (2005) The Reliability Data Handbook, Professional Engineering, London.
OREDA (1984) OREDA Offshore Reliability Data Handbook, 1st edn, Veritech, Norway,
distributed by Det Norske Veritas, Norway.
OREDA (1992) OREDA Offshore Reliability Data Handbook, 2nd edn, Veritech, Norway,
distributed by Det Norske Veritas, Norway.
OREDA (1997) OREDA Offshore Reliability Data Handbook, 3rd edn, distributed by Det
Norske Veritas, Norway.
OREDA (2003) OREDA Offshore Reliability Data Handbook, 4th edn, prepared by
SINTEF Technology and Society and distributed by Det Norske Veritas, Norway,
http://www.sintef.no/static/TL/projects/oreda/handbook.html#Order, accessed 26
October 2006.
Scarrone, M. & Piccinini, N. (1989) 'A reliability data bank for the natural gas distribution
industry', in Colombari, V. (ed.) Reliability Data Collection and Use in Risk and
Availability Assessment, Proceedings of the 6th Eurodata Conference, Sienna, Italy,
March: 90–103.
SINTEF (2006) Reliability Data for Safety Instrumented Systems—PDS Data Handbook,
distributed by Sydvest, http://www.sydvest.com/Products/pds%2Ddata/#Data_HB,
accessed 26 October 2006.
Smith, D.J. (2005) Reliability, Maintainability and Risk: Practical Methods for Engineers,
7th edn, Elsevier Butterworth-Heinemann, Amsterdam.
Telecordia (2006) Reliability Prediction Procedure for Electronic Equipment, SR-332,
Telecordia, http://telecom-info.telcordia.com/site-cgi/ido/index.html, accessed 26
October 2006.
TNO (1990) COMPI Component Failure Database, TNO Institute of Environmental and
Energy Technology, Apeldoorn, The Netherlands, June.
US Department of Defense (1986) Military Handbook—Reliability Prediction of Electronic
Equipment, MIL-HDBK-217E.
US Department of Defense (1991) Failure Mode/Mechanism Distributions, FMD-91,
Reliability Analysis Center, Griffis AFB, New York.
US Department of Defense (1995) Non-Electronic Parts Reliability Data, NPRD-95,
Reliability Analysis Center, Griffis AFB, New York.
5 .50 TO P I C 5 ES T I M AT I N G
E V E N T
LI KE LI HO O D AN D
M E AS U R I N G AN D
R AN KI N G R I S K
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
US Nuclear Regulatory Commission (1975) Reactor Safety Study—An Assessment of
Accident Risks in US Commercial Nuclear Power Plants: Summary Report, United
States Nuclear Regulatory Commission, Washington, DC.
World Offshore Accident Data (WOAD) (1998) WOAD Statistical Report, DNV,
http://webshop.dnv.com/trainingus/offer.asp?order=1&id=650616&c0=2274&c1=2277
&c2=2293, accessed 26 October 2006.
RE A D I N G 5 .1
FAULT TREES
FRANK P. LEES
A fault tree is used to develop the causes of an event. It starts with the event of interest, the
top event, such as a hazardous event or equipment failure, and is developed from the top
down.
Accounts of fault trees are given in Reliability and Fault Tree Analysis (Barlow, Fussell and
Singpurwalla, 1975), Fault Tree Handbook (Vesely et al., 1981), Engineering Reliability
(Dhillon and Singh, 1981), Reliability Engineering and Risk Assessment (Henley and
Kumamoto, 1981), Designing for Reliability and Safety Control (Henley and Kumamoto,
1985) and Probabilistic Risk Assessment, Reliability Engineering, Design and Analysis
(Henley and Kumamoto, 1992), and by Vesely (1969, 1970a,b), Vesely and Narum (1970),
Fussell and Powers (1977a, 1979), Vesely and Goldberg (1977b) and Kletz and Lawley
(1982).
The fault tree is both a qualitative and a quantitative technique. Qualitatively it is used to
identify the individual paths which led to the top event, while quantitatively it is used to
estimate the frequency or probability of that event.
The identification of hazards is usually carried out using a method such as a hazard and
operability (hazop) study. This may then throw up cases, generally small in number, where
a more detailed study is required, and fault tree analysis is one of the methods which may
then be used.
Fault tree analysis is also used for large systems where high reliability is required and where
the design is to incorporate many layers of protection, such as in nuclear reactor systems.
With regard to the estimation of the frequency of events, the first choice is generally to base
an estimate on historical data, and to turn to fault tree analysis only where data are lacking
and an estimate has to be obtained synthetically.
Fault tree analysis
The original concept of fault tree analysis was developed at the Bell Telephone
Laboratories in work on the safety evaluation of the Minuteman Launch Control System in
the early 1960s, and wider interest in the technique is usually dated from a symposium in
1965 in which workers from that company (e.g. Mearns) and from the Boeing Company
(e.g. Haasl, Feutz, Waldeck) described their work on fault trees (Boeing Company, 1965).
Developments in the methodology have been in the synthesis of the tree, the analysis of the
tree to produce minimum cut sets for the top event, and in the evaluation of the frequency or
probability of the top event. There have also been developments related to trees with
special features, including repair, secondary failures, time features, etc.
2 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
A general account of fault tree methods has been given by Fussell (1976). He sees fault tree
analysis as being of major value in
1. directing the analyst to ferret out failures deductively,
2. pointing out the aspects of the system important in respect of the failure of interest,
3. providing a graphical aid giving visibility to those in system management who are
removed from system design changes,
4. providing options for qualitative or quantitative system reliability analysis,
5. allowing the analyst to concentrate on one particular system failure at a time,
6. providing the analyst with genuine insight into system behaviour.
He also draws attention to some of the difficulties in fault tree work. Fault tree analysis is a
sophisticated form of reliability assessment and it requires considerable time and effort by
skilled analysts. Although it is the best tool available for a comprehensive analysis, it is not
foolproof and, in particular, it does not of itself assure detection of all failures, especially
common cause failures.
Basic fault tree concepts
A logic tree for system behaviour may be oriented to success or failure. A fault tree is of
the latter type, being a tree in which an undesired or fault event is considered and its causes
are developed. A distinction is made between a failure of and a fault in a component. A
fault is an incorrect state which may be due to a failure of that component or may be
induced by some outside influence. Thus fault is a wider concept than failure. All failures
are faults, but not all faults are failures.
A component of a fault tree has one of two binary states: essentially it is either in the correct
state or in a fault state. In other words, the continuous spectrum of states from total
integrity to total failure is reduced to just two states. The component state which constitutes
a fault is essentially that state which induces the fault that is being developed.
As a logic tree, a fault tree is a representation of the sets of states of the system which are
consistent with the top event at a particular point in time. In practice, a fault tree is
generally used to represent a system state which has developed over a finite period of time,
however short. This point is relevant to the application of Boolean algebra. Strictly, the
implication of the use of Boolean algebra is that the states of the system are
contemporaneous.
Faults may be classed as primary faults, secondary faults or command faults. A primary
fault is one which occurs when the component is experiencing conditions for which it is
designed, or qualified. A secondary fault is one which occurs when the component is
experiencing conditions for which it is unqualified. A command fault involves the proper
operation of the component at the wrong time or in the wrong place.
A distinction is made between failure mechanism, failure mode and failure effect. The
failure mechanism is the cause of the failure in a particular mode and the failure effect is the
effect of such failure. For example, failure of a light switch may occur as follows:
Failure mode—high contact resistance
Failure mechanism—corrosion
Failure effect—switch fails to make contact
3 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Some components are passive and others active. Items such as vessels and pipes are
passive, whilst those such as valves and pumps are active. A passive component is a
transmitter or recipient in the fault propagation process, an active one can be an initiator. In
broad terms, the failure rate of a passive component is commonly two or three orders of
magnitude less than that of an active component.
There is a distinction to be made between the occurrence of a fault and the existence of a
fault. Interest may centre on the frequency with which, or probability that a fault occurs, i.e.
on the unreliability, or on the probability that at any given moment the system is in a fault
state, i.e. on the unavailability.
The simplest case is the determination of the reliability of a non-repairable system. This is
sometimes known as the 'mission problem': the system is sent on a mission in which
components that fail are not repaired. The obvious example is space missions, but there are
cases in the process industries which may approximate to this, such as remote pumping
stations or offshore subsea modules. The availability of a non-repairable system may also
be determined, but the long-term availability, which is usually the quantity of interest, tends
to zero.
Generally, however, process systems are repairable systems, and for these both reliability
and availability may be of interest. If concern centres on the frequency of realization of a
hazard, it is the reliability which is relevant. If, on the other hand, the concern is with the
fractional downtime of some equipment, it is the availability which is required.
A fault tree may be analysed to obtain the minimum cut sets, or minimum sets of events
which can cause the top event to occur. Discussion of minimum cut sets occurs later but it
is necessary to mention them at this point since some reference to them in relation to fault
tree construction is unavoidable.
Fault tree elements and symbols
The basic elements of a fault tree may be classed as (1) the top event, (2) primary events,
(3) intermediate events and (4) logic gates.
The symbols most widely used in process industry fault trees are shown in Table 9.5. The
British Standard symbols are given in BS 5760 Reliability of Systems, Equipment and
Components, Part 7: 1991 Guide to Fault Tree Analysis. For the most part the symbols
shown in Table 9.5 correspond to those in the standard, but in several cases the symbols in
the table are the Standard's alternative rather than preferred symbols.
4 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 9.5: Fault tree event and logic symbols
A Events
Symbol
Primary, or base, event—basic fault event requiring no
further development
Undeveloped, or diamond event—fault event which has
not been further developed
Intermediate event—fault event which occurs due to
antecedent causes acting through a logic gate
Conditioning event—specific condition which applies
to a logic gate (used mainly with PRIORITY AND and
INHIBIT gates)
External, or house, event—event which is normally
expected to occura
B Logic gates, etc.
Symbol Alternative Symbol
AND gate—output exists only if all inputs exist
OR gate—output exists if one or more inputs exists
INHIBIT gate—output exists if input occurs in
presence of the specific enabling condition (specified
by conditioning event to right of gate)
PRIORITY AND gate—output exists if all inputs
occur in a specific sequence (specified by
conditioning event to right of gate)
5 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 9.5: Continued
EXCLUSIVE OR gate—output exists if one, and only
one, input exists
VOTING gate—output exists if there exist r-out-of-n
inputs
TRANSFER IN—symbol indicating that the tree is
developed further at the corresponding TRANSFER
OUT symbol
TRANSFER OUT—symbol indicating that the portion
of the tree below the symbol is to be attached to the
main tree at the corresponding TRANSFER IN symbol
a This the definition given by Vesely et al. (1981). Other authors such as Henley and
Kumamoto (1981) use this symbol for an event which is expected to occur or not to
occur.
The top event is normally some undesired event. Typical top events are flammable or toxic
releases, fires, explosion and failures of various kinds.
Primary events are events which are not developed further. One type of primary event is a
basic event, which is an event that requires no further development. Another is an
undeveloped event, which is an event that could be developed, but has not been. One
common reason for not developing an event is that its causes lie outside the system
boundary. The symbol for such an undeveloped event is a diamond and this type is
therefore often called a 'diamond event'. A third type of primary event is a conditioning
event, which specifies conditions applicable to a logic gate. A fourth type of event is an
external event, which is an event that is normally expected to occur.
Intermediate events are the events in the tree between the top event and the primary events
at the bottom of the tree.
Logic gates define the logic relating the inputs to the outputs. The two principal gates are
the AND gate and the OR gate. The output of an AND gate exists only if all the inputs
exist. The output of an OR gate exists provided at least one of the inputs exists. The
probability relations associated with these two gates are shown in Table 9.6, Section A.
Other gates are the EXCLUSIVE OR gate, the PRIORITY AND gate and the INHIBIT
gate. The output of an EXCLUSIVE OR gate exists if one, and only one, input exists. The
output of a PRIORITY AND gate exists if the inputs occur in the sequence specified by the
associated conditioning event. The output of an INHIBIT gate exists if the (single) input
exists in the presence of the associated conditioning event. There are also symbols for
TRANSFER IN and TRANSFER OUT, which allow a large fault tree to be drawn as a set
of smaller trees.
r
n inputs
6 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Table 9.6: Probability and frequency relations for fault tree logic gates (output A; inputs B and C)
A Basic probability relationsa
Logic symbol Reliability graph Boolean algebra
relation
Probability relations
A=BC P(A)=P(B)P(C)
A=B+C
P(A)=P(C)–P(B)P(C)
B Relations involving frequencies and/or probabilitiesa
Gate Inputs Outputs
OR PB OR PC PA=PB+PC–PBPC≈PB+PC
FB OR FC FA=FB+FC
FB OR PC Not permitted
AND PB AND PC PA=PBPC
FB AND FC Not permitted; reformulate
FB AND PC FA=FBPC a F, frequency; P, probability
AND gates
One of the two principal logic gates in a fault tree is the AND gate. AND gates are used to
represent a number of different situations and therefore require further explanation. The
following typical situations can be distinguished:
1. output exists given an input and fault on a protective action;
2. output exists given an input and fault on a protective device;
3. output exists given faults on two devices operating in parallel;
4. output exists given faults on two devices, one operating and one on stand-by.
In constructing the fault tree the differences between these systems present no problem, but
difficulties arise at the evaluation stage.
As already described, the probability Po that the output of a two-input AND gate exists,
given that the probabilities of the inputs are p1, and p2, is
p0= p 1 p 2
The occurrence of events may be expressed quantitatively in terms of frequency or of
probability. Failure of equipment is normally expressed as a frequency and failure of a
protective action or device as a probability.
A
B C
A
B C
7 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
A protective device is normally subject to unrevealed failure and needs therefore to be
given a periodic proof test. Data for the failure of such a device may be available either as
probability of failure on demand, or as frequency of failure. It can be shown that, subject to
certain assumptions, the relationship between the two is
p = λ τ p / 2
where p is the probability of failure, λ is the failure rate, and τp is the proof test interval.
Then for a Type 1 situation the frequency λ0 of a fault is
λ0 = λ p
where p is the probability of failure of the protective action, λ is the frequency of the input
event, and λ0 is the frequency of the output event.
For a Type 2 situation, Equation 9.5.3 is again applicable, with the probability p of failure
of protective action in this case being obtained from Equation 9.5.2.
The evaluation of a Type 3 situation is less straightforward. For this, use may be made of
the appropriate parallel system model derived from either the Markov or joint density
function methods, described earlier. These give the probability of the output event given
the frequency of the input events. Where applicable, the rare event approximation may be
used to convert from probability to frequency:
λ = p / t
Similarly, for a Type 4 situation use may be made of the appropriate stand-by system model.
Fault tree construction
The construction of a fault tree appears a relatively simple exercise, but it is not always as
straightforward as it seems and there are a number of pitfalls. Guidance on good practice in
fault tree construction is given in the Fault Tree Handbook. Other accounts include that in
the CCPS QRA Guidelines, and those by Lawley (1974b, 1980), Fussell (1976) and Doelp
et al. (1984).
An essential preliminary to construction of the fault tree is definition and understanding of
the system. Both the system itself and its bounds need to be clearly defined. Information
on the system is generally available in the form of functional diagrams such as piping and
instrument diagrams and more detailed instrumentation and electrical diagrams. There will
also be other information required on the equipment and its operation, and on the
environment. The quality of the final tree depends crucially on a good understanding of the
system, and time spent on this stage is well repaid.
It is emphasized by Fussell (1976) that the system boundary conditions should not be
confused with the physical bounds of the system. The system boundary conditions define
the situation for which the fault tree is to be constructed. An important system boundary
condition is the top event. The initial system configuration constitutes additional boundary
conditions. This configuration should represent the system in the unfailed state. Where a
component has more than one operational state, an initial condition needs to be specified for
that component. Furthermore, there may be fault events declared to exist and other fault
events not to be considered, these being termed by Fussell the 'existing system boundary
conditions' and the 'not-allowed system boundary conditions', respectively.
Fault trees for process plants fall into two main groups, distinguished by the top event
considered. The first group comprises those trees where the top event is a fault within the
8 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
plant, including faults which can result in a release or an internal explosion. In the second
group the top event is a hazardous event outside the plant, essentially fires and explosions.
If the top event of the fault tree is an equipment failure, it is necessary to decide whether it is
the reliability, availability, or both, which is of interest. Closely related to this is the extent to
which the components in the system are to be treated as non-repairable or repairable.
As already described, the principal elements in fault trees are the top event, primary events
and intermediate events, and the AND and OR gates. The Handbook gives five basic rules
for fault tree construction:
Ground Rule 1 : Write the statements that are entered in the event boxes as faults; state
precisely what the fault is and when it occurs.
Ground Rule 2: If the answer to the question, 'Can this fault consist of a component
failure?' is 'Yes', classify the event as a 'state-of-component fault'. If the answer is 'No',
classify the event as a 'state-of-system fault'.
No Miracles Rule: If the normal functioning of a component propagates a fault sequence,
then it is assumed that the component functions normally.
Complete-the-Gate Rule: All inputs to a particular gate should be completely defined before
further analysis of any one of them is undertaken.
No Gate-to-Gate Rule: Gate inputs should be properly defined fault events, and gates
should not be directly connected to other gates.
Each event in the tree, whether a top, intermediate or primary event, should be carefully
defined. Failure to observe a proper discipline in the definition of events can lead to
confusion and an incorrect tree.
The identifiers assigned to events are also important. If a single event is given two
identifiers, the fault tree itself may be correct, if slightly confusing, but in the minimum cut
sets the event will appear as two separate events, which is incorrect.
For a process system, the top event will normally be a failure mode of an equipment. The
immediate causes will be the failure mechanisms for that particular failure. These in turn
constitute the failure modes of the contributing subsystems, and so on.
The procedure followed in constructing the fault tree needs to ensure that the tree is
consistent. Two types of consistency may be distinguished: series consistency within one
branch and parallel consistency between two or more branches. Account needs also to be
taken of events which are certain to occur and those which are impossible.
The development of a fault tree is a creative process. It involves identification of failure
effects, modes and mechanisms. Although it is often regarded primarily as a means of
quantifying hazardous events, which it is, the fault tree is of equal importance as a means of
hazard identification. It follows also that fault trees created by different analysts will tend
to differ. The differences may be due to style, judgement and/or omissions and errors.
It is generally desirable that a fault tree have a well-defined structure. In many cases such a
structure arises naturally. It is common to create a 'demand tree', which shows the
propagation of the faults in the absence of protective systems, and then to add branches,
representing protection by instrumentation and by the process operator, which are
connected by AND gates at points in the demand tree. An example of a fault tree
constructed in this way has been given in Figure 2.2. Essentially the same fault tree may be
drawn in several different ways, depending particularly on the location of certain events
which appear under AND gates.
9 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Dependence
A fundamental assumption in work on reliability generally, and on fault trees in particular,
is that the events considered are independent, unless stated otherwise. Formally, the events
are assumed to be statistically independent, or 's-independent'. In practice, there are many
types of situation where events are not completely independent. In fault tree work this
problem was originally known as 'common mode failure', then as 'common cause failure',
and now more usually as 'dependent failure'.
The problem is particularly acute in systems, such as nuclear reactor systems, where a very
high degree of reliability is sought. The method of achieving this is through the use of
protective systems incorporating a high degree of redundancy. On paper, the assessed
reliabilities of such systems are very high. But there has been a nagging worry that this
protection may be defeated by the phenomenon of dependent failure, which may take many
and subtle forms. Concern with dependent failure is therefore high in work on fault trees
for nuclear reactors.
Dependent failure takes various forms. In most cases it requires that there be a common
susceptibility in the component concerned. Some situations which can cause dependent
failure include: (1) a common utility; (2) a common defect in manufacture; (3) a common
defect in application; common exposure to (4) a degrading factor, (5) an external influence,
or (6) a hazardous event; (7) inappropriate operation; and (8) inappropriate maintenance.
Perhaps the most obvious dependency is supply from a common utility such as electric
power or instrument air. Equipment may suffer common defects either due to manufacture
or to specification and application. Common degrading factors are vibration, corrosion,
dust, humidity, and extremes of weather and temperature. External influences include such
events as vehicle impacts or earthquakes. An event such as a fire or explosion may disable
a number of equipments. Equipment may suffer abuse from operators or may be maintained
incorrectly. It will be clear that in such cases redundancy may be an inadequate defence.
Generally, a common location is a factor in dependent failure, interpreting this fairly broadly.
But it is by no means essential. In particular, incorrect actions by a maintenance fitter can
disable similar equipments even though the separation between the items is appreciable.
A type of dependent failure that is important in the present context is that resulting from a
process accident. A large proportion of equipments, including protective and fire fighting
systems, may be susceptible to a major fire or explosion, just at the time when they are
required.
There is some evidence that dependent failure is associated particularly with components
where the fault is unrevealed. Thus a study of nuclear reactor accident reports by I.R.
Taylor (1978b) showed that of the dependent failures considered only one was not
associated with a stand-by or intermittently operated system.
Not all dependent failure involves redundant equipment. Another significant type of
dependent failure is the overload which can occur when one equipment fails and throws a
higher load on another operating equipment. Failures caused by domino effects, and
escalation faults generally, may also be regarded as dependent failures.
Dependent failure, then, is a crucial problem in high reliability systems. A more detailed
account is therefore given later. Here further discussion is confined to fault tree aspects.
Dependent failure can be taken into account in a fault tree only if the potential for it is first
recognized. Given that this potential has been identified, there are two ways of representing it
in the tree. One is to continue to enter each fault separately as it occurs in the tree, but
10 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
ensuring that each such entry is assigned the same identifier, so that the minimum cut sets are
determined correctly. The other approach is to enter the effect as a single fault under an AND
gate higher up the tree. A further measure which may be taken to identify dependent failure is
to examine the minimum cut sets for common susceptibilities or common locations.
Illustrative example: instrument air receiver system. As an illustration of fault tree analysis,
consider the system shown in Figure 9.3(a). The vessel is an air receiver for an instrument
air supply system. Air is let down from the receiver to the supply through a pressure
reducing valve. The pressure in the receiver is controlled by a pneumatic control loop
which starts up an air compressor when the receiver pressure falls below a certain value.
The instrument air supply to the control loop is taken from the instrument air supply
described, and if the pressure in the supply system falls below a certain value this too causes
the control loop to start up the compressor. There is a pressure relief valve on the receiver.
There is also a pressure relief valve (not shown) on the instrument air supply system. The
design intent is that the pressure relief valve on the air receiver is sized to discharge the full
throughput of the compressor and is set to open at a pressure below the danger level and
that the pressure reducing valve is sized to pass the full throughput of the compressor if the
instrument air pressure downstream falls to a very low value. One of the main causes of
failure in the system is likely to be dirt.
The top event considered is the explosion of the air receiver due to overpressure. A fault
tree for the top event of 'Receiver explosion' is shown in Figure 9.3(b).
One fault event occurs in two places—'Pressure reducing valve partially or completely
seized shut or blocked'. This is drawn as a subtree. One primary failure event appears at
several points in the tree—'Dirt'. As shown, this is treated in the tree as separate primary
failures for the pressure reducing valve and the pressure relief valve.
Two of the events in the tree are mutually exclusive. These are 'Instrument air system
pressure abnormally high' and 'Instrument air pressure abnormally low'. These events are
denoted by B and B*, respectively.
The analysis of this fault tree to obtain the minimum cut sets and the probability of
occurrence of the top event is described below.
Minimum cut sets
A fault tree may be analysed to obtain the minimum cut sets. A cut set is a set of primary
events, that is of basic or undeveloped faults, which can give rise to the top event. A
minimum cut set is one which does not contain within itself another cut set. The complete
set of minimum cut sets is the set of principal fault modes for the top event.
The minimum cut sets may be determined by the application of Boolean algebra. The
procedure may be illustrated by reference to the fault tree shown in Figure 9.3(b). This may
be represented in Boolean form as:
T = (A + B + C + D) (B* + F) (G + H + I)
Then substituting
B* = C + D + E
and noting that:
BB* = 0
CC = C; DD = D
AC, CD, CE, CF ⊂ C
11 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
AD, DC, DE, DF ⊂ D
gives
T = (AE + AF + BF + C + D) · (G + H + I) Equation 9.5.6a
= [A· (E + F) + BF + C + D] · (G + H + I) Equation 9.5.6b
and thus the minimum cut sets are:
AEG AEH AEI
AFG AFH AFI
BFG BFH BFI
CG CH CI
DG DH DI
A simplified fault tree which corresponds to Equation 9.5.6b is shown in Figure 9.3(c).
Figure 9.3: Instrument air receiver system: flow diagram and fault trees for the explosion of an air receiver: (a) instrument air receiver system; (b) fault tree for top event 'Receiver explodes' (see over); (c) equivalent but simplified fault tree for top event 'Receiver explodes'
Air compressor
Air receiver
Non-returnvalve
PC
Pressurereducing valve
Pressurerelief valve
Instrumentair system
(a)
(c)
Receiverexplodes
C
A B F
FE
D G H I
12 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 9.3: continued
Since fault trees for industrial systems are often large, it is necessary to have systematic
methods of determining the minimum cut sets. Such a method is that described by Fussell
(1976). As an illustration method, consider the motor system which is described by this
author and which is shown in Figure 9.4(a). The top event considered is the overheating of
the motor. The fault tree for this event is shown in Figure 9.4(b). The structure of the tree
is:
Gate Gate type No. of input Input code No.
A OR 2 1 B
B AND 2 C 2
C OR 2 4 3
Receiverexplodes
Pressure relief valve failsto give adeauate dischargeat pressure danger level
Incorrectdesign Dirt
Othercauses
SUBTREE
Pressure reducing valvepartially or completelyseized shut or blocked
DirtOthercauses
C D
G H I
Air flow into receiverexceeds flow out atpressure danger level
Pressure reducingvalve flow lessthan compressor flow
Pressure controlloop causescompressor to run
Incorrectdesign
Instrumentair systempressureabnormallyhigh
See Subtree
Instrument airsystem pressureabnormally low
Other causes
B*F
A
B
E
Air flow outof air system(demand + leakage)abnormal and exceedspressure reducingvalve capacity
Air flow out of airsystem normal butflow in abnormally low
See Subtree
(b)
T
13 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The procedure is based on successive elimination of the gates. The analysis starts with a
matrix containing the first gate, gate A, in the top left-hand corner:
A
A is an OR gate and is replaced by its inputs listed vertically:
1
B
B is an AND gate and is replaced by its inputs listed horizontally:
1
C
2
C is an OR gate and is replaced by its inputs listed vertically:
1
4
2
3
2
It should be noted that when C is replaced by 4 and 3, the event 2, which is linked to C by
an AND gate, is listed with both events 4 and 3. The minimum cut sets are then:
(1); (4, 2); (3, 2)
There are now a large number of methods available for the determination of the minimum
cut sets of a fault tree. Methods include those described by Vesely (1969, 1970b),
Gangadharan, Rao and Sundararajan (1977), Zipf (1984) and Camarinopoulos and Yllera
(1985).
There are also a number of computer codes for minimum cut set determination. One of the
most commonly used is the code set PREP and KIIT. Another widely used minimum cut
set code is FTAP.
14 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Figure 9.4: Motor system: system diagram and fault tree for overheating of the motor (Fussell, 1976): (a) motor system; and (b) fault tree for top event 'Motor overheats'
Source: Sijthoff and Noordhoff International Publishing Company.
REFERENCES
Barlow, R.E., Fussell, J.B. and Singpurwalla, N.D. (eds) (1975). Reliability and Fault Tree
Analysis (Philadelphia, PA: Soc. for Ind. and Appl. Maths)
Boeing Company (1965). Systems Safety Symp. (Seattle, WA)
Camarinopoulos, L. and Yllerra, J. (1985). An improved top-down algorithm combined with
modularization as a highly efficient method for fault tree analysis. Reliab. Engng, 11, 93
Dhillon, B.S. and SINGH, C. (1981). Engineering Reliability. New Techniques and
Applications (New York: Wiley-Interscience)
Doelp, L.C., LEE, G.K., Linney, R.E and Ormsby, R.M. (1984). Quantitative fault tree
analysis: gate-by-gate method. Plant/Operations Prog., 3, 227
Switch
Fuse
Wire
Motor
Powersupply
(a)
Motor overheats
Primarymotor failure(overheated)
(1)
Excessive current tomotor
A
C
B
Primaryfuse failure
(closed)(2)
Excessive current incircuit
Fuse failsto open
Primarywiring failure
(shorted)(3)
Primarypower supplyfailure (surge)
(4)
(b)
15 RE A D I N G 5 .1 FAU LT T R E E S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
Gangadharan, A.C., Rao, M.S.M. and Sundararajan, C. (1977). Computer methods for
qualitative fault tree analysis. Failure Prev. Reliab., 251
Henley, E.J. and Kumamoto, H. (1981). Reliability Engineering and Risk Assessment
(Englewood Cliffs, NJ: Prentice-Hall)
Henley, E.J. and Kumamoto, H. (1985). Designing for Reliability and Safety Control
(Englewood Cliffs, NJ: Prentice-Hall)
Henley, E.J. and Kumamoto, H. (1992). Probabilistic Risk Assessment, Reliability
Engineering, Design and Analysis (Englewood Cliffs, NJ: Prentice Hall) (rev. ed. of
Henley, E.J. and Kumamoto, H. (1981), op. cit.)
Fussell, J.B. (1973a). A formal methodology for fault tree construction. Nucl. Sci. Engng,
52, 421
Fussell, J.B. (1975). How to hand-calculate system reliability and safety characteristics.
IEEE Trans. Reliab., R-24, 169
Fussell, J.B. (1976). Fault tree analysis: concepts and techniques. In Henley, E.J. and
Lynn, J.W. (1976), op. cit., p. 133
Fussell, J.B. (1978b). Phased Mission Systems. NATO Advanced Study Inst. on Synthesis
and Analysis Methods for Safety and Reliability Studies, Urbino, Italy
Kletz, T.A. and Lawley, H.G. (1982). Safety technology in industry. Chemical. In Green,
A.E. (1982b), op. cit., p. 317
Lapp, S.A. and Powers, G.J. (1977a). Computer-aided synthesis of fault trees. IEEE Trans
Reliab., R-26, 2
Lapp, S.A. and Powers, G.J. (1979). Update of the Lapp Powers fault-tree synthesis
algorithm. IEEE Trans Reliab., R-29, 12
Lawley, H.G. (1974b). Operability studies and hazard analysis. Loss Prevention, 8, 105
Lawley, H.G. (1980). Safety technology in the chemical industry: a problem in hazard
analysis with solution. Reliab. Engng., 1(2), 89
Vesely, W.E. (1969). Analysis of Fault Trees by Kinetic Tree Theory. Rep. IN-1330. Idaho
Nucl. Corp., Idaho Falls, ID
Vesely, W.E. (1970a). Reliability and Fault Tree Aplications at NRTS (report). Idaho Nucl.
Corp., Idaho Falls, ID
Vesely, W.E. (1970b). A time-dependent methodology for fault tree evaluation. Nucl.
Engng Res., 13(2), 337
Vesely, W.E. and Goldberg, F.F. (1977b). Time-dependent unavailability analysis for
nuclear safety systems. IEEE Trans Reliab., R-26, 257
Vesely, W.E. and Narum, R.E. (1970). PREP and KITT: Computer Codes for the Automatic
Evaluation of a Fault Tree. Rep. IN-1349. Idaho Nucl. Corp., Idaho Falls, ID
Vesely, W.E. et al. (1981). Fault Tree Handbook. Rep. NUREG-0492. Nucl. Regul.
Comm., Washington, DC
Zipf, G. (1984). Computation of minimal cut sets of fault trees: experiences with three
different methods. Reliab. Engng, 7(2), 159
Source: Loss Prevention in the Process Industries: Hazard Identification, Assessment,
and Control, 2nd edn, Butterworth-Heineman, Oxford, 1996: 9/13–9/22.
SU G G E S T E D A N S W E R S
EXERCISES
5.1 Failure rates
Hours of operation per year = 6 x 5 x 45
= 1350 hours/year
Failure frequency = 3.6 x 10-6 x 8
= 2.88 x 10-5 / hour of operation
Petroleum spirit release frequency = 2.88 x 10-5 x 1350
= 0.039 per year (or 1 in 26 years on average)
5.2 Fractional dead time
From Eqn (5.10), the hazard rate is the product of the demand rate and the fractional dead
time of the emergency isolation valve.
We have D = 0.039 per year (from Exercise 5.1).
a) The FDT is obtained from Eqn (5.11).
λ = 0.02 per year (manufacturer data)
T = 0.5 year (half-yearly test interval)
λT = 0.02 x 0.5
= 0.01
This value is much smaller than 1, hence Eqn (5.12) can be used for simplicity.
FDT = 0.5 λT
= 0.005 (this is a probability and is dimensionless)
Hazard rate = 0.039 per/year x 0.005
= 1.95 x 10-4 per year, or 1 in 5140 years on average.
This value is far greater than the lifetime of the facility, hence the risk of leak may be
considered acceptable. However, ignition prevention measures must be in place in
design and in practice.
b) If testing of the protection system is no longer carried out, we use Eqn (5.14) for the
hazard rate.
HR = Dλ /(D +λ)
= 0.039 x 0.02 / (0.039 + 0.02)
= 0.013 per year or 1 chance in 75 years on average.
The hazard rate is 70 times higher than that when the critical function testing of the
emergency isolation valve is carried out.
This exercise is very realistic, and many small companies have come to grief by not
understanding the importance of critical function testing of protection systems in
engineering risk management.
5 .2 TO P I C 5 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
5.3 Fault tree analysis
a) The top event can occur from the following:
Power failure OR
Fuse failure OR
Circuit breaker fails open OR
Light globe No.1 fails AND Light globe No.2 fails.
The fault tree is shown in the Figure 5.20.
Figure 5.20
Powerfailure
Fusefailure
Circuitbreakerfailure
No light inroom
0.4174/yr
Globes failure
0.2/yr 0.2/yr 0.01/yr0.0074/yr
Globe 1fails
Globe 2fails
0.876/yr 0.0084/yr
b) The light globe failure rate is given as a frequency/hour. The annual failure rate is
calculated as:
Failure frequency of one light = 0.0001 x 8760 (hours/year)
= 0.876 per year.
Since the failure of both light globes means an AND gate, the frequencies cannot be
multiplied. The frequency of failure of both globes is calculated as:
f (both globes) = f (one globe) x p (second globe fails before first globe is replaced)
where p is the conditional probability.
The globes are checked at least once a week during the maintenance visits to see if they
are functional. If not, they are replaced. Therefore the FDT for a globe becomes,
FDT = 0.5 x 0.876 x (1/52)
= 0.0084
Therefore, the frequency of both globes failing
= 0.876 x 0.0084
= 0.00736 per year.
The top event frequency is simply the sum of all the individual component failures.
f(top event) = 0.2(power) + 0.2(fuse) + 0.01(circuit breaker) + 0.0074 (both globes)
= 0.417 per year
If there is only a single light globe in the room, this value would become
0.2 + 0.2 + 0.01 + 0.876 = 1.29 per year.
5.3 TO P I C 5 SU G G E S T E D
AN S W E R S
UN
IT 4
15
E
NG
INE
ER
ING
RIS
K M
AN
AG
EM
EN
T
The dual light globe system reduces this frequency by three-fold, but failures of other
components become dominant contributors.
5.4 Event tree analysis
The event tree is shown in the following Figure 5.21 and the outcome is summarised below
in Table 5.12.
Table 5.12
Event No.
Description Frequency per year
Loss from event ($)
1 Explosion. Major damage. Fatality. 1.0 x 10–9 2.5M 2 Solvent line fails. Major fire. 9.9 x 10–8 250 000 3 Delayed fire suppression. Solvent line intact. 9.0 x 10–7 25 000 4 Fire occurs. Controlled quickly. 9.9 x 10–5 5000 5 No fire. Motor damage. 9.0 x 10–4 2500
Figure 5.21
10–3
per
yea
r
0.1
0.9
0.99
0.01
0.9
0.99
0.1
0.01
Exp
losi
on. M
ajor
dam
age.
Fat
ality
.
Sol
vent
line
fai
ls.
Maj
or f
ire.
Del
ayed
fir
e su
ppre
ssio
n.S
olve
nt li
ne in
tact
.
Fir
e oc
curs
.C
ontr
olle
d qu
ickl
y.
No
fire
.M
otor
dam
age.
1 yr
del
ay, p
oten
tial
fata
lity
up to
$2.
5 m
illio
n
3 m
onth
del
ay$2
50 0
00
15 h
our
dela
y$2
5 00
0
10 h
our
dela
y$5
000
5 ho
ur d
elay
$250
0
1. 2. 3. 4. 5.
Mot
or o
verh
eats
Ele
ctri
cal f
ire
occu
rs
Fir
e no
tex
tingu
ishe
dim
med
iate
ly
Sol
vent
line
conn
ectio
nru
ptur
esE
xplo
sion
Out
com
eC
onse
quen
ces