National Broadband Map Broadband Availability in Rural vs Urban Areas
Safety vs. Availability of Complex Systems
Transcript of Safety vs. Availability of Complex Systems
Safety vs. Availability of Complex SystemsDilemma or Opportunity?
1
Outline
Reliability and Availability
Failure Modes of Complex Systems
Dilemma
Opportunities & Recommendations
2
Reliability Model
1.) Function Provided
2.) Fault: event that can lead to a failure
3.) Failure: state of the system when function is not
provided
4.) Failure rate statistical mean for occurrence of failure.
5.) Assumption: failure rate is constant (bath tub curve
middle).
3
Model for Maintainability
Repairable Systems
Corrective Maintenance – Preventative Maintenance – Logistics Support => AVAILABILITY
4
Availability5
Safety
• Hazard identification
• Risk assessment
• Define safety goal
• Define tolerable hazard rate
• Define tolerable functional failure rate
• Allocate SIL level
• Design system according to SIL Level
and make sure that the tolerable
functional failure rate is not exceeded.
• Risks are acceptable => System is safe.
MTBF: Mean Time
Between Failure
MTTR: Mean Time To
Repair
FRACAS: Failure Report
Analysis and Corrective
Action System
Recapitulation
7
Analysis of a System’s Architecture8
Operation Analysis Failure Modes
Real systems DO have more
than just a single mode for
system failure!
For example….
9
Functional Failure Mode Effects (and Criticality) Analysis
Cause Effect Functional Failure Mode
Severity Probability
Criticality
10
Mishap Risk Index (MRI) Matrix Severity Classification
(1) (2) (3) (4)Probability Classificaion
Negligibel Marginal Critical Catastrophic
(5) Frequent
(4) Probably
(3) Occasional
(2) Remote
(1) Extremely Remote
Initial Mishap Risk Index/ Criticality Matrix11
Dilemma
Example
• Automatic Train Protection (ATP) System Fails
• The fault of the ATP increases the risk on the
environment.
• Advice: Put system to standstill (Safety first)
Safety First Availability First
12
• Putting the system to standstill can lead to knock on
threats in dynamic systems.
• Advice: Keep the system running (Availability first)
Requestor. • Requestor’s department/section. • Type of change. • Reason for change. • Priority. • Anticipated effects on system.
Occurrence of Dilemma
Example
13
Component Reliability
System Availabilty System Safety
Example
Decrease Decrease Decrease No Dilemma Simple/Complicated
Systems
Increase Increase Increase No Dilemma Simple/Complicated
Systems
Decrease Decrease Increase Dilemma Complex Systems “No
movement no risk.”
Increase Increase Decrease Dilemma Complex Systems
“Deteriorated
increases risk”
Opportunities• Functional FMECA on system level gives the overview
• Not all system functions are essential in all
modes/states of the system => Define the state
machine of your system.
• Define the availability/safety targets and go confidently
to the borders => The closer you come, the more
extensive the analysis shall be.
14
Recommendations
Set Availability and Safety Targets
Functional FMECA on system level
Actively Manage component failures
Diagnosis Systems with integrity
15
16
Functional Failure Mode Effects (and Criticality) Analysis
FMECA
• Work from the component level and identify all possible fault modes at the
component level (a team effort and bottom-up approach)
• Assess severity of each component fault and its effects on overall system
performance
• Build a ‘table’ with all fault modes, assign severity, probabilities, determine
interactions, possible actions, etc.
17
18
What is missing for the FMECA?
SEVERITY PROBABILITY CRITICALITY
19
AbstractWe will briefly discuss the inherent dilemma between
systems safety and systems availability from a functional
perspective. Based on that discussion a few strategies and
methods will be presented, that can help you on managing
this dilemma. Moreover, the 'digital age' gives rise to new
opportunities. Plenty of presentation praise these
opportunities for improving the systems availability. This
talk will show, that you first need to understand the
balance between safety and availability and then you can
successfully improve your system's availability.
20