Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur...
-
Upload
rebecca-hodge -
Category
Documents
-
view
220 -
download
1
Transcript of Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur...
Background on Reliability and Background on Reliability and AvailabilityAvailability
Slides prepared by Wayne D. Grover and Matthieu Clouqueur
TRLabs & University of Alberta
© Wayne D. Grover 2002, 2003
E E 681 - Module 2
( Version for book website )
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 2
Overview of the lecture
• Concept of Reliability– Reliability function, Failure density function, hazard rate
• Concept of Availability:– Availability function, unavailability, availability of elements in
series/parallel
• Methodology for Availability Analysis– Quick Unavailability Lower bound estimation– Cut sets method– Tie paths method
• Automatic Protection Switching (APS) Systems– Principle– Availability Analysis of an APS system
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 3
Reliability is a mission-oriented question
Technical meaning of Reliability
• In everyday English:– “My car is very reliable” It works well, it starts every time (even at
-30°).
• Technical meaning:– Reliability is the probability of a device performing its purpose
adequately for the period of time intended under the operating conditions intended.
– Example:• Reliability of a fuel-pump during a rocket launch
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 4
– Q(t) = probability { at least one failure in interval [0,t] }– Q(t) = 1 - R(t)
R(t)
t
( )
0dR t
dt(R(t) is always a non-increasing function)
Reliability
• The reliability function R(t):– R(t) = probability { no failure in interval [0,t] }
R(0) = 1R() = 0
1
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 5
2
1
1 2( ) probability of at least one failure in interval [t ,t ]t
t
f t dt
( ) 1 ( )t
o
R t f t dt
f(t) can be seen as the pdf of time to next failure
Reliability
• R(t) = prob { no failure in [0,t] }
• Related function: failure density function, f(t)
( ) ( )t
o
Q t f t dt
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 6
1 ( ) 1 ( )( )
( ) ( )
dQ t dR tt
R t dt R t dt
0
( )
( )
t
u du
R t e
Rate of failuresGiven that the element has
survived this long
Reliability
• Hazard rate (t) (age specific failure rate) : Rate of failure of an element given that this element has survived this long
0
( ) ln ( )t
u du R t
integration
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 7
0 0
{ ( )} ( ) ( )d
MTTF E f t t f t dt t R t dtdt
TTF1
Failure0 t
Reliability
• Expected Time To Failure or Mean Time To Failure (MTTF):– It is the expected value of the random variable with pdf f(t):
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 8
0( )t
0( ) tR t e
0
1MTTF
00( )
prob { k failures in [0,t] } = !
kt
te
k
Reliability
• Special case: constant hazard rate (memoryless)
– In this case we can apply the Poisson distribution:
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 9
Reliability
• Numerical example:– Poisson distribution with = 1 / (5 years)
• Probability of 1 failure in the first year: P = 16.4%• Probability of at least one failure in the first year: P = 18.1%
• Probability of 1 failure in the first 5 years: P = 36.8%• Probability of at least one failure in the first 5 years: P = 63.2%
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 10
• “What is the probability that the engine of a formula 1 car will work during the whole race?”– This is a reliability question
• “How often do I hear the dial tone when I pick up the phone?”– This is an availability question
Availability is the probability of finding the system in the operating state at any arbitrary time in the future
Unlike in the context of reliability we now consider systems that can be repaired
Availability (Repairable systems)
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 11
A(t)R(t)
A(t)
R(t)
t=0 t
“Asystem”
21 3
Region 1: R(t) and A(t) are the same
Region 2: Repair actions begin to hold up A
Region 3: A reaches a steady state
Availability
• Comparison of Availability and Reliability Functions:
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 12
Time to Repair
Time to Failure
Time to Repair
Time Between Failures
t
MTTF: Mean-Time To Failure
MTBF: Mean-Time Between Failures
MTTR: Mean-Time To Repair
Failure Repair RepairFailure
MTTFA
MTTF MTTR
lim
obsT obs
UptimeA
T
Availability
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 13
MTTR MTTRU
MTTF MTTR MTBF
1U A
In availability analysis we usually work with unavailability quantities because of some simplifications that can be done on the unavailability of elements in series and in parallel
FIT: Unit corresponding to 1 failure in 109 hours
1 FIT = 1 failure in 114,155 years1 failure / year = 114,155 FITS (high!)Typical value for telecom equipment: 1500 FITS
( MTTF = 76 years )
Availability
• Unavailability:
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 14
1 2 3 n . . .
2
1
n
. . .
1
n
s i
i
U U
1
n
s i
i
U U
Approximation based on the fact that Ui << 1
Numerical examp.
Ui = 10-3, n = 3
Us = 3 . 10 -3
Us = 10 -9
Availability
• Series elements unavailability reduction:
• Parallel elements unavailability reduction:
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 15
Availability Analysis
• The reliability engineer can use different techniques to evaluate the availability of a system:– 1) Quick estimate of a lower bound for the unavailability– 2) Series and parallel unavailability reductions– 3) Cut set method– 4) Tie paths method– 5) Conditional decomposition
• The general methodology is explained next…
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 16
Availability Analysis
• General Methodology:1) Get unavailability values of all components and sub-systems.2) Draw parallel and series availability relationships3) Reduce the system availability model by repeated applications of the
parallel/series availability simplifications.4) If not completely reduced, do quick unavailability lower bound
estimation, use the tie paths method, the cut sets method or the conditional decomposition
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 17
AB
C
D
E
F
G H
Lower bound of Us: UA+UH
Availability Analysis
• Lower bound on unavailability– The contributions of parallel elements to the unavailability is not
taken into account
– In some cases this quick evaluation of a lower bound on U can be enough to conclude that the system does not meet the availability requirements
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 18
1A
2
3
5
B 4
IO
path i
( path)
v
v
A A
tiepaths
syst path i
( )i
A A
Availability Analysis
• Tie paths method:– We enumerate all the paths from I to O
8 || 96+7
1 2
3
5
4
A B
I O
– The availability of each paths is calculated:
– The availability of the system is:
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 19
cut i)(
(cut i) 1v
vP A
(i cusets)
syst 1 (cut i)A P
Availability Analysis
• Cut sets method:– Which combinations of element failures can bring the system down?
– The probability of each cut is calculated:
– The availability of the system is :
1A
2
3
5
B 4
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 20
Asyst 1 Asyst 2
syst d syst 1 d syst 2(1 )A A A A A
Ad low
Availability Analysis
• Conditional decomposition (High Unavailability Elements):– When some elements have high U, it becomes less acceptable to
sum unavailabilities.– Solution: Conditional decomposition:
A1
A3
A2
A4
A d
A1 A2
A3A4
A1
A3 A4
A2
– The availability of the system is :
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 21
Automatic Protection Switching (APS) Systems• Basic idea:
– to provide a standby transmission channel that is kept in fully operating condition and used to replace any of the other traffic bearing channels in the event of their failure
• Characteristics of an APS system:– spare to working ratio:
• ‘1-to-1’ or ‘1-to-N’
– co-routed / diversely routed:• ‘1-to-1’ or ‘1-to-1 /DP’• ‘1-to-N’ or ‘1-to-N /DP’
– 1+1 or 1:1:• ‘1+1’: Signal always sent on the spare channel• ‘1:1’: Signal sent on spare channel upon failure of the working channel
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 22
1Uw
NUw
Us
Ub2 Ut2
spare
N working
Ub1 Ut1Ub1
For Head End Bridge(HEB) and Tail End Transfer(TET):Mode 1 failure: working signal is not relayedMode 2 failure: no bridging or transfer to/from spare channel
Automatic Protection Switching (APS) Systems• 1:N APS system:
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 23
Automatic Protection Switching (APS) Systems• Cut sets approach to 1:N APS availability analysis:
– Combinations creating outage for a specific channel (cut sets):• Cut set 1: Failure that channel with prior failure of at least one other working
channel• Cut set 2: Failure of that working channel plus the spare channel or head end
bridge or tail end transfer in mode 2• Cut set 3: Failure of head end bridge or tail end transfer in mode 1
– The probability of each cut set is:• Cut set 1: Uw (N-1)Uw 0.5
• Cut set 2: Uw (Us + Ub2 + Ut2)
• Cut set 3: Ub1 + Ut1
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 24
2
channel 2 2 1 11
( )2
w s b t b twN
U U U U U U U U
O(U)
A
B
O(Uc)UA = UB = 10-3 UC = 10 -5
US = UA UB + 2 UC 2 UC
c c
Automatic Protection Switching (APS) Systems• 1:N APS Unavailability
– The unavailability of a channel is:
– The term in O(U) reflects the irreducible series-availability elements: the HEB and the TET in their mode 1 failure.
• It is impossible to make a perfectly redundant system. There is always some parallelism-accessing device c that brings series unavailability contribution
E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 25
Summary
• Reliability is a mission oriented question for non-repairable systems
• In telecom engineering we are interested in the availability of the system designed
• There are several techniques that can be used for availability analysis. The one we will use in the rest of the course is the algebraic approach (equivalent to cut sets)
• APS is a protection scheme that enhances availability by providing a spare channel for restoration of failed working channels