Post on 25-Jun-2020
Moosung Jae
May 4, 2015
Quantitative Reliability Analysis
System Reliability Analysis
System reliability analysis is conducted in terms of probabilities
The probabilities of events can be modelled as logical
combinations or logical outcomes of other random events
Graphical methods include
Failure Modes and Effects Analysis (FMEA)
Reliability Block Diagrams (RBD)
Master Logic Diagrams etc
Two main methods used include:
Fault tree analysis
Event tree analysis
Failure Modes and Effects Analysis
Failure modes and effects analysis (FMEA) is a qualitative
technique for understanding the behaviour of components in
an engineered systems
The objective is to determine the influence of component
failure on other components, and on the system as a whole
It is often used as a preliminary system reliability analysis to
assist the development of a more quantitative event tree/fault
tree analysis
FMEA can also be used as a stand-alone procedure for
relative ranking of failure modes that screens them according
to risk
i.e., as a screening tool
FMEA (cont’d)
As a risk evaluation technique, FMEA treats risk in it true
sense as the combination of likelihood and consequences
However, strictly speaking, it is not a probabilistic method
because it does not generally use quantified probability
statements
Rather, failure mode occurrences are described using
qualitative statements of likelihood (e.g., rare vs. frequent etc.)
Consequences are also ranked qualitatively using levels or
categories
e.g., ranging from safe to catastrophic
FMEA uses a rank-ordered scale of likelihood with respect to
failure mode occurrence, so that together with the
consequence categories, a rank-ordered level of relative risk
can be derived for each failure mode
FMEA (cont’d)
FMEA consists of sequentially tabulating each component
with
all associated possible failure modes
impacts on other components and the system
consequence ranking
failure likelihood
detection methods
compensating provisions
Failure modes effect and criticality analysis (FMECA) is
similar to FMEA except that the criticality of failure is
analyzed in greater detail
Example Example: Consider the
following water heater system
used in a residential home.
The objective is to conduct
a failure modes and effects
analysis (FMEA) for the
system.
Solution (cont’d)
Define consequence categories as
I. Safe – no effect on system
II. Marginal – failure will degrade system to some extent but will
not cause major system damage or injury to personnel
III. Critical – failure will degrade system performance and/or
cause personnel injury, and if immediate action is not taken,
serious injuries or deaths to personnel and/or loss of system will
occur
IV. Catastrophic – failure will produce severe system
degradation causing loss of system and/or multiple deaths or
injuries
The FMEA is shown in the following table
Fundamentals of Reliability
© M. Pandey, University of Waterloo
Solution Component Failure
Mode
Effects on
other
components
Effects on
whole system
Consequence
Category
Failure
Likelihood
Detection
Method
Compensating
Provisions
Pressure
relief valve
Jammed
open
Increased gas
flow and
thermostat
operation
Loss of hot
water, more
cold water
input and gas
I - Safe Reasonably
probable
Observe at
pressure
relief valve
Shut off water
supply, reseal or
replace relief
valve
Jammed
closed
None None I - Safe Probable Manual
testing
No conseq.
unless
combined with
other failure
modes
Gas valve Jammed
open
Burner
continues to
operate,
pressure relief
valve opens
Water temp.
and pressure
increase; water
turns to steam
III - Critical Reasonably
probable
Water at
faucet too hot;
pressure
relief valve
open (obs.)
Open hot water
faucet to relieve
pres., shut off
gas; pressure
relief valve
compensates
Jammed
closed
Burner ceases
to operate
System fails to
produce hot
water
I - Safe Remote Observe at
faucet (cold
water)
Thermostat Fails to
react to
temp.
rise
Burner
continues to
operate,
pressure relief
valve opens
Water temp.
rises; water
turns to steam
III - Critical Remote Water at
faucet too hot
Open hot water
faucet to relieve
pressure;
pressure relief
valve
compensates
Fails to
react to
temp.
drop
Burner fails to
function
Water
temperature
too low
I - Safe Remote Observe at
faucet (cold
water)
Fundamentals of Reliability
© M. Pandey, University of Waterloo
Reliability Block Diagrams
Most systems are defined through a combination of both
series and parallel connections of subsystems
Reliability block diagrams (RBD) represent a system using
interconnected blocks arranged in combinations of series
and/or parallel configurations
They can be used to analyze the reliability of a system
quantitatively
Reliability block diagrams can consider active and stand-by
states to get estimates of reliability, and availability (or
unavailability) of the system
Reliability block diagrams may be difficult to construct for
very complex systems
Series Systems
Series systems are also referred to as weakest link or chain
systems
System failure is caused by the failure of any one component
Consider two components in series
Failure is defined as the union of the individual component
failures
For small failure probabilities
1 2
where Q denotes the
probability of failure
Series Systems (cont’d)
For n components in series, the probability of failure is then
Therefore, for a series system, the system probability of
failure is the sum of the individual component probabilities
In case the component probabilities are not small, the system
probability of failure can be expressed as
For n components in series
Series Systems (cont’d)
Reliability is the complement of the probability of failure
For the two components in series, the system reliability can
be expressed as
Assuming independence
For n components in series
Therefore, for a series system, the reliability of the system is
the product of the individual component reliabilities
Parallel Systems
Parallel systems are also referred to as redundant
The system fails only if all of the components fail
Consider two components in parallel
Failure is defined by the intersection of the individual
(component) failure events
Assuming independence
1
2
Parallel Systems
For n components in parallel, the probability of failure is then
Therefore, for a parallel system, the system probability of
failure is the product of the individual component
probabilities
The reliability of the parallel system is
For n components in parallel, the system reliability is
Example Problem
Solution:
First combine the parallel components 2 and 3
The probability of failure is
The reliability is
Example: Compute the reliability and probability of failure for the
following system. Assume the failure probabilities for the
components are Q1 = 0.01, Q2 = 0.02 and Q3 = 0.03.
2
3
1
Solution (cont’d)
Next, combine component 1 and the sub-system (2,3) in
series
The probability of failure for the system is then
The system reliability is
Solution (cont’d)
The system probability of failure is equal to
The system reliability is
which is also equal to RSYS = 1 – QSYS
As shown in this example, the system probability of failure
and reliability are dominated by the series component 1
i.e. a series system is as good as its weakest link
Things to Consider
Reliability block diagrams can also be used to assess
Voting systems (k-out-of-n logic)
Standby systems (load sharing or sequential operation)
Simple systems can be assessed by gradually reducing them
to equivalent series/parallel configurations
More complex systems would require the use of a more
comprehensive approach, such as conditional probabilities or
imaginary components
For complex systems, great effort is needed to identify the
ways in which the system fails or survives
Fault trees can be used to decompose the main failure event
into unions and intersections of sub-events
Event trees can be used to identify the possible sequence of
events (also failures)
Examples
A series system is one which operates if and only if all of its components operate.
The equivalent circuit diagram is
Let Ri = P(component i works) and Rsys = P(system works)
Then, if the components operate independently,
( ) (1 2 )
(1 ) (2 ) ( )
P system operates P works works n works
P works P works P n works
1 2sys nR R R R
Series systems
Parallel systems
A parallel system is one which operates if and only if any of its components operate.
1 ( )
( ) (1 )
(1 ) ( )
i iQ R P component i fails
unreliability of component i
P system fails P fails n fails
P fails P n fails
1 2sys nQ Q Q Q
1 21 (1 )(1 ) (1 )sys nR R R R
Series/parallel systems
Example Find the reliability of the system shown below, if all components have reliability 0.8.
Solution. The system can be broken down into subsystems that are
series or parallel.
Decomposition
2
1
1
0.2 0.04
1 0.04 0.96
C DQ Q Q
R
2 1
0.8 0.8 0.96
0.6144
A BR R R R
2
(1 0.6144) 0.2
0.077
0.923
sys E
sys
Q Q Q
R
Conditional probability method
Some complex systems cannot be broken down into series and parallel subsystems. There are several reliability analysis methods such as conditional probability, and cut
sets and fault trees.
Example. Find the reliability of the complex system shown below, if all components have reliability 0.8.
This keystone component is chosen carefully. In this case we choose component E. Using the law of total probability gives
( ) ( ) ( | )
( ) ( | )
P system works P E works P system works E works
P E fails P system works E fails
| |sys E sys E E sys ER R R Q R
Conditional probability method
20.2 0.04AB A B CDQ Q Q Q
| 0.96 0.96 0.9216sys E AB CDR R R
|sys ER
|sys ER
2
2
|
|
0.8 0.64
(1 0.64) 0.1296
1 0.1296 0.8704
AC A C B D
AC BDsys E
sys E
R R R R R
Q Q Q
R
| |
0.8 0.9216 0.2 0.8704 0.91136
sys E sys E E sys ER R R Q R
Cut set method
Find the reliability of the complex system shown below, if all components have reliability 0.8.
A cut set is a subset of the components with the property that if all components in set
fail, then the system fails. For example, {A, B, E} is a cut set in the system above.
Definition. A minimal cut set is a cut set for which no subset is a cut set.
For example, in the system above, {A, B, E} is not a minimal cut set, but {A, B} is a minimal cut set.
The list of all minimal cut sets for the system above is
Define C1 to be the event “all components in cut set C1 fail ”, etc. Then
1 2 3 4, , , , , , , , ,C A B C C D C A D E C B C E
1 2 3 4( ) ( )sysP system fails Q P C C C C
Standby systems
One example of a standby system is of the power supply to a hospital. The primary
supply is from the electricity grid. The backup might be a diesel generator. A standby
system differs from a parallel system in two ways:
• the backup component is not in use while the primary component is operational
(and therefore is not susceptible to failure)
• there is a switching mechanism, which detects failure of the primary component
and activates the backup component. This switching mechanism may fail to operate
(e.g., the diesel generator may fail to start).
There may be more than one backup component, as shown below.
We show the general case, with failure rates
Note that we need to distinguish here between two cases. If then we will end
up with the result of the Example. Here we consider only the case , which gives
which, after some manipulation,
1 2 and
1 1 2
1 2 2 1
1 1 20
( )
10
( )
10
( ) ( ) ( ) ( )
t
syss
tt t t s
s
tt t s
s
R t R t sf s R t s d
e e e ds
e e e ds
1 2
1 2
2 1
1 2
2 1
1 2
( )
1
2 1 0
( )
1
2 1
( )
1
ts
t t
sys
s
tt t
eR t e e
ee e
1 2
12
2 1
t t
e e
To Calculate the reliability function of a standby system in which, the primary
component has a constant hazard rate of 0.01 per hour, while the backup component
has hazard rate 0.02 per hour
Example of Standby systems The primary component has a constant hazard rate of 0.01 per hour, while the backup
component has hazard rate 0.02 per hour. Compare the mean times to failure if these
components are operated (a) in parallel, (b) in standby mode.
(a) For the parallel system,
(b) For the standby system, we have
Thus the standby system gives the longer MTTF. This agrees with common sense,
since in the standby system the standby component begins its life later.
Suppose that the primary and backup components both have hazard rate 0.01 per
hour. Compare reliability functions if these components are operated (a) in parallel, (b)
in standby mode.
1 2 1 2
1 1 1 1 1 1116.67
0.01 0.02 0.03sysE T hr
1 2
1 2
1 1 1 1150
0.01 0.02sysE T E T E T hr
2( ) 1 (1 )(1 ) 2 0.991t t t t
sysR t e e e e
0.1
( ) ( 1) (2 )
( 1) 1.1 0.995
sys sysR t P T P nd failure occurs after time t
P N e
Review on Reliability and Failure Rate
ET and FT
IE Sys-A Sys-B Result
10-1/yr
Success
Failure
Success
Failure
OK
OK
CD CDF=1.1X10-7/yr
Sys-AFailure
Pump-1Failure
Pump-2Failure
Sys-BFailure
Pump-1Failure
Pump-2Failure
10-4/yr
10-2/yr 10-2/yr
1.1x10-2/yr
10-2/yr 10-3/yr
Emergency Electric Power System (비상전력계통)
E1 E2
G1 G3G2
At least 60KVA
30KVA 30KVA 30KVA
The probability that a device will perform successfully for
the period of time intended under the operating
conditions.
Emergency Diesel Generators
An Example
기본사건 (Basic Event) 기기고장률 (Demand Failure)
E1 3.18E-03
E2 3.18E-03
G1 7.72E-06
G2 7.72E-06
G3 7.72E-06
Emergency Electric Power System (비상전력계통)
E1 E2
G1 G3G2
At least 60KVA
30KVA 30KVA 30KVA
FT in KIRAP
최소단절집합(Minimal Cut Set) – 총 8 가지 {G1,G2} {G2,G3} {G1,G3} {E1,E2} {E1,G2} {E1,G3} {E2,G2} {E2,G1}
An Example Results
KCUT Version 4.8a(20) 1999.4.9 + Uncertainty
Boolean Equation Reduction Program + Uncertainty
Copyright Han, S.H. KAERI
Fri Feb 06 18:00:50 2004
>
>
LEVEL ( 0.000e+000 ) .
Reporting for FAIL
value = 1.053e-005
Final Cut Sets
no value f-v acc cut sets
1 1.043e-005 0.9905 0.9905 E1 E2
2 2.532e-008 0.0024 0.9929 E1 G3
3 2.532e-008 0.0024 0.9953 E1 G2
4 2.455e-008 0.0023 0.9977 G1 E2
5 2.455e-008 0.0023 1.0000 G2 E2
6 5.960e-011 0.0000 1.0000 G1 G2
7 5.960e-011 0.0000 1.0000 G1 G3
8 5.960e-011 0.0000 1.0000 G2 G3
Execution time 0 seconds (gen:0, exp:0, abs:0), Return Code = 1
End of CUT Run
시스템 이용불능도
KIRAP의 실행초기화면
KIRAP의 실행
KIRAP Menu
KIRAP 메뉴설명
이 름 의 미
Name 현재사건의 이름을 입력
Type 현재사건의 형태를 입력
Description 현재사건에 대한 자세한 설명
Mean, Cal. Type, Lambda, Tau Mean은 현재사건에 대한 신뢰도 값을 의미하며, 사용자가 직접 입력하는 것이 아니라 Lambda, Tau값들의 계산을 통해 얻어짐
EF 각 사건에 대해 주어진 Mean 값의 오차인자
Dist. Type 사건에 주어진 신뢰도 값에 대한 확률분포를 정해줌
Transfer 현재사건은 전이게이트(Transfer gate)로 만듬
Module 현재사건을 전이게이트에서 해제
Remark 현재사건에 대한 비고나 특기사항들을 기록
KIRAP 메뉴설명
Cal.
Type
Lambda
(λ)
Tau
(τ)
Mean의
계산 의 미
0 Demand
Failure Prob. - Mean = λ
: 고장확률을 바로 줄 경우 사용. 대부분의 demand failure가 이에 해당.
1 Running
Failure Rate
Mission
Time
Mean =
λx τ : 사고 후 주어진 시간 동안 운전하지 못하는 확률을 표현
2 Running
Failure Rate
Repair
Time
Mean =
λx τ : 항상 기기를 감시하다가 고장이 나면 바로 수리하는 경우에 이용 불능도를 표현
3 Standby
Failure Rate
Test
Interval
Mean =
λx τ/2 : 대기상태에 있으면서 정기적으로 점검하는 기기의 이용 불능도를 표현.
4 Failure
Rate - Mean = λ : 고장율을 단위로 가진 Event에 사용.
KIRAP 메뉴설명
KIRAP 메뉴설명
Tree Display Option
KIRAP 메뉴설명
계산수행 - 이용불능도 계산 - 최소단절군(MCS) 및 중요도 계산 etc