Impala competence center asset management - cost driven risk management
2016-05-30 risk driven design
-
Upload
jaap-van-ekris -
Category
Technology
-
view
43 -
download
0
Transcript of 2016-05-30 risk driven design
Risk Driven Development
My Projects
Reliability
Availability
Maintainability
Safety
IEC 61508: Required activities for safety related systems
Risk and the design process
• Each design step includes the refinement of the risk analysis
• Each design solution has to be measured against the risk analysis
• Constant design questions: – Is the design balanced? – Can it be made? – Can it be done simpeler?
Simplicity is prerequisite for reliability
Edsger W. Dijkstra
Risk management process
Slide 7 15 June 2016
Failure definitions
• What can go wrong exactly?
• When do we consider the system to be failed?
An example…
• Not extracting landing gear when commanded without error indication
• Spontaneous irreversible landing gear extraction while travelling overseas
Top-down vs. Bottom-Up analysis
• Bottom-up: structured brainstorm about everything that could happen given a specific scope
• Top-down: think about your biggest fears first, than find out what could cause it.
FME(C)A: bottom-up thinking
• Failure Mode and Effect (Criticality) Analysis
• Reasoning from failure of the components, thinking about the consequences
Risk: System does not perform trick?
Guide words…
Look at every component and investigate what happens if:
– It doesn’t work
– It is very slow
– Does the wrong thing
– Sends messages spontanously
– Loses messages/state
– Leaks information
Structured FMECA approach
Function Failure Mode Causes Local Effects System Effects Criticality
Inwin Wrong output Logical error Unjustified open No closure Catastrophic
Delayed output PLC error delayed closure Closure delayed Limited
No output Application hang No closure No closure Catastrophic
Spontanous output Switching error Unjustified open Onterechtesluit False Positive
Process Wrong output Logical error Unjustified open No closure Catastrophic
Delayed output PLC error delayed closure Closure delayed Limited
No output Application hang No closure No closure Catastrophic
Spontanous output Switching error Power failure No closure Catastrophic
… … … … … …
… … … … …
… … … … …
… … … … …
Certainty…
Rank beliefs not according to their plausibility but by the harm they may cause.
Nassim Nicholas Taleb
Slide 15 15 June 2016
Identifying measures
• Risk = Chance * Impact
• Moments allowing measures: – Preventive
– Detection
– Repression
– Correction
– Ignore
– Accept
Slide 16 15 June 2016
You can’t mitigate everything… • You can’t prevent everything
• You can’t plan for everything
• You can’t predict everything
• You couldn’t do any business
• But, you can’t ignore everything either
Structured FMECA approach
Function Failure Mode Causes Local Effects System Effects Criticality Detection
Mitigating
Measures
Inwin Wrong output Logical error Unjustified open No closure Catastrophic None Multiprogramming
Delayed output PLC error delayed closure Closure delayed Limited None
No output Application hang No closure No closure Catastrophic None Failsafe behaviour Process
Spontanous output Switching error Unjustified open Onterechtesluit False Positive None
Process Wrong output Logical error Unjustified open No closure Catastrophic None Multiprogramming
Delayed output PLC error delayed closure Closure delayed Limited None
No output Application hang No closure No closure Catastrophic None Deadlock detection
Spontanous output Switching error Power failure No closure Catastrophic None Safety relay
… … … … … … … …
… … … … … … …
… … … … … … …
… … … … … … …
New functional and design requirements!
Disadvantages FME(C)A
• It is impossible to calculate an overall risk exposure
• Relation between risks is missing – Common mode failures usually aren’t modelled
• Complex scenario’s are hard to model – Multiple failures aren’t modelled
– Are there root causes that could trigger multiple failures?
• Usually identifies irrelevant risks
Top-Down Risk analysis • Start with a dominant
concern
• Identify potential causes
• Detail further
A small FTA
Typical risks identified
• Components making the wrong decissions
• Power failure
• Hardware failure of PLC’s/Servers
• Software failures
• Network failure
• External factors
• Human maintenance error
22
Breaking a cut-set
Alternate component
Alternate service
Measures and FTA
15/06/2016 24
Before After
Design decisions…
• Every design decision is accompanied by a Risk analysis focussing on RAMS aspects
• In the end the cost, RAMS effects and other trade-off aspects will determine which design option will be used
Option 1
FTA Option 1
Option 2
FTA Option 2
Info
Hoogtebepaling Aansturing
Hoogtemeting
Waterkering
Diesels
Meeta
Meetb
Stuura
Stuurb
Software failure Chance: 1/1.000 year
Measurement error Chance: (1/1.000.000 year)3
Software failure Chance: 1/1.000.000 year
Software failure Chance: 1/1.000 year
Design Option 1
Info
Hoogtebepaling Aansturing
Hoogtemeting
Waterkering
Diesels
Meeta
Meetb
Stuura
Stuurb
Software failure Chance: 1/10.000 year
Measurement error Chance: (1/1.000.000 year)3
Software failure Chance: 1/100 year
Software failure Chance: 1/10.000 year
Design Option 2
IEC 61508: Required activities for safety related systems
Testing
Function Impact wrong/not functioning
Impact spontanous functioning
Function 1 Small Medium
Function 2 Disasterous Huge
Function 3 Serious Huge
Function 3 Serious Small
Function 4 Serious Serious
Function 5 Serious Small
Function 6 Huge Huge
…
Test depth and acceptable risk
• Level A: Thorough endurancetest aiming to prove function reliability with high accuracy.
• Level B: Thorough endurancetest aiming to prove function reliability with medium accuracy.
• Level C: Thorough endurancetest aiming to prove function reliability with low accuracy.
• Level D: Test to verify if the function works once.
• Level E: Function testd alongside other functions, might leave paths untested.
Test effort
Level #Tests Effort
Level A 50.000 120 hours
Level B 10.000 24 hours
Level C 1.000 4 hours
Level D 1 1 hour
Level E - PM
Test depth…
Functie Not functioning Spont. Function
Function 1 Level E NOT
Function 2 Level A Level A
Function 3 Level A Level A
Function 3 Level B Level B
Function 4 Level A Level A
Function 5 Level E NOT
Function 6 Level A Level A
… … …