Risk-Informed Verification Planning
Transcript of Risk-Informed Verification Planning
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
RAMS Exploitation of In‐Orbit Data: RIDE+ Study
Astrium Satellites
JF GAJEWSKIAstrium Satellites
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
Summary
1.Introduction
2.Failure events
3.Definitions of failures
4.Random Failures
5.Systematic Failures
6.SF Modelling
7.Outputs of the study
8.Conclusion2
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
1‐ Introduction
RIDE+ is a CCN to RIDE1 study “RAMS exploitation of in-orbit data”study (contract n°21167/07/NL/EM)
RIDE1 objective (reminder)
to enhance the efficiency and the pertinence of the RAMS analyses and associatedfailure risk management by introducing a feedback loop between in-orbitoperational events (failures, …) recorded in ESTEC’s and ESOC’s operationaldatabase and ESA RAMS engineering, risk assessment and engineering process. .
RIDE+ objective
part of the Risk Informed Planning initiative
to develop a reliability prediction model taking into account Random Failures, aswell as, Systematic Failures
Derive additional users requirements for the RIDE tool
3
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
2‐ Failure events
Classification of the FAILURE EVENTS Anomaly / Mishap
Failure or Undesired Outcome
Fault or error
Initiating Event or Proximate Cause
Contributing Factor
4
Source : Astrium satellites
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
2‐ Failure events
5
*In the case the Rad / SEU predictions are not in accordance with reality the fault is a design error**in the case the wear-out occurrs before the end of the mission lifetime, i.e. lifetime qualification is unsufficient, the fault is considered as a design error
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 6
3‐ Definitions of failures
RANDOM FAILURE (RF) A failure is considered as “RANDOM” when it is due to an intrinsic defect such as
Defect relates to an HW elementary part physical defect
Probability of occurrence is LOW (less than 1% of the parts within a batch may be affected)
Defect probability remains compliant with the part procurement standards and the reliabilityprediction
Physical Root cause (defect) is either not identified or considered as a ‘one‐off’ afterinvestigations
RFs from one S/C to another are always mutually independent Some attributes
Always a HW failure
Leads to the failure of an equipment either electrical or mechanical or of any other nature
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
-1 1 3 5 7 9 11 13 15
STRESS versus STRENGTHThe area where the stress exceeds the strength (i.e. failure) is defined by the programme assumptions (derating rules, parts selection)
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 7
3‐ Definitions of failures
SYSTEMATIC FAILURE (SF) A failure is considered as “SYSTEMATIC” when it is due to an intrinsic defect such as
Defect relates either to an elementary part defect or to a ‘system’ defect
Probability of occurrence is SIGNIFICANT (more than 1% of the parts within a batch may be affected)
Defect is not detected before launch by procurement & system tests on ground
Root cause (defect) is identified after investigations
SFs from one S/C to another may be dependent (same batch, similar conditions…)
Fault/Error : event or fact or action which Is not in conformance with a planned procedure
Leads to a weakness (unknown consequences while respecting the planned actions)
Design or manufacturing error within an elementary part
HW defective part
SW bug
Design or manufacturing (Assembly & Test) error at system level
WO occurring below the specified lifetime (degradation, rad) is considered as a design error
SE with a frequency occurrence higher than the prediction is considered as a design error
Operations error (wrong procedure, erroneous data, …) at satellite level during operational life
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
-1 1 3 5 7 9 11 13 15
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 8
3‐ Definitions of failures
SINGLE EVENT (SE) A Single Event is an abnormal event due to
EEE parts sensitivity to Single Event Phenomenon (SEU, SET, …)
HW transient mishap
A SE is not permanent but may occur several times depending on the conditions
A SE is reversible (no physical damage on event occurrence)
SE is counted when the probability of occurrence is compliant with the prediction (otherwise it is aconsidered as a SF)
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 9
3‐ Definitions of failures
WEAR‐OUT EVENT (WO) An event is considered as a “WEAR‐OUT” event when it results from
A progressive degradation (up to the definitive failure) of the performances due to the applied lifetime stresses O/O cycling
Thermal cycling
Radiation effects (cumulated dose)
Wear‐out (mechanical part, monoatomic O, …)
Only impacts physical part
WO is counted when occurring beyond the design lifetime (otherwise it is a SF) since ‘safe life’ –applied in Space design‐ means that the limited lifetime items are qualified (by tests) over thespecified lifetime
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
4 – Random failures
• Reliability prediction– In time : system, units, parts
• Weibull, exponential distribution
• Data = Standard (Mil‐217, UTE‐C‐80810, FIDES …), tests, in‐orbit feedback
– Stress / Strength (mechanical items mainly)• R= P ( stress < strength)
• Data = test results, design & manufacturing sizing
– Proportion (one‐shot device)• R = P (success)
• Data = Test results, designcharacteristics
10
• Relates to 10% of the in‐orbit anomalies !• Pessimistic versus operational reliability !!• There is a need for new approach
• In‐orbit correlation• Estimation (, R …)• Corrective factor• …
• New database (FIDES ?)• …
• Relates to 10% of the in‐orbit anomalies !• Pessimistic versus operational reliability !!• There is a need for new approach
• In‐orbit correlation• Estimation (, R …)• Corrective factor• …
• New database (FIDES ?)• …
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
5 – Systematic Failure
• A Systematic Failure (SF) is a failure event leading to impact
(a) the system itself or
(b) a part of the system (HW or SW as well)
which is recoverable (use of redundancy, new SW up‐load, workaround) or not (defect affecting both channels of redundancy, loss of spacecraft due to misuse) and due to:
– Design Error
– Manufacturing Error
– Operator (or operations) Error
• This failure is called “systematic” as long as a deterministic physical defect ora deterministic cause is identified through investigations which means thefailure was predictable though not detected on‐ground or before anyoperational sequences.
11
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
5 – Systematic Failure
• SF are mainly due to Human Error (design/manufacturing/operation phase)
• HUMAN ERROR
inappropriate or undesirable human decision or behaviour with potentialimpact on safety, dependability or system performance.
• About Human ErrorModelling (rate …)
– No real HE modelling during development phase
– Only Nuclear domain (TMI …) uses quantitative HE modelling (THERP =Technique for Human Error Rate Prediction…)
– For other domains only QUALITATIVEmethods are used (in operations)
12
“However, quantitative assessments of the probabilities of crew or maintenance errors are not currentlyconsidered feasible. If the failure indications are considered to be recognizable and the required actions donot cause an excessive workload, then for the purposes of the analysis, the probability that the correctiveaction will be accomplished, can be considered to be one. If the necessary actions cannot be satisfactorilyaccomplished, the tasks and/or the systems need to be modified.”CS25‐amendment 11
“However, quantitative assessments of the probabilities of crew or maintenance errors are not currentlyconsidered feasible. If the failure indications are considered to be recognizable and the required actions donot cause an excessive workload, then for the purposes of the analysis, the probability that the correctiveaction will be accomplished, can be considered to be one. If the necessary actions cannot be satisfactorilyaccomplished, the tasks and/or the systems need to be modified.”CS25‐amendment 11
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 13
5 – Systematic Failure
NO PERTINENT HUMAN ERRORMODEL CONTRIBUTING FACTORS (DRIVERS) to HE• e.g.Task complexity, skills, training …• Classified as Low / Medium / High• Defined per phase
Design/ Manufacturing/ operation• Basis for regression (in‐orbit feedback)
NO PERTINENT HUMAN ERRORMODEL CONTRIBUTING FACTORS (DRIVERS) to HE• e.g.Task complexity, skills, training …• Classified as Low / Medium / High• Defined per phase
Design/ Manufacturing/ operation• Basis for regression (in‐orbit feedback)
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 14
6 – SF Modelling
BASIS• Drivers per phase contributing to HE• Filters : Reviews / Tests / Validation• Severity scale (impacts in‐orbit)SF MODELLING• Based on REGRESSION
DRIVERS In‐Orbit SF• 3 Models with increasing complexity
BASIS• Drivers per phase contributing to HE• Filters : Reviews / Tests / Validation• Severity scale (impacts in‐orbit)SF MODELLING• Based on REGRESSION
DRIVERS In‐Orbit SF• 3 Models with increasing complexity
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 15
6 – Model 1
OUTPUTSExpected number of SFs per Severity level based on regression (in‐orbit feedback vs drivers) for the new S/COUTPUTSExpected number of SFs per Severity level based on regression (in‐orbit feedback vs drivers) for the new S/C
Basis Pure Regression
Basis Pure Regression
INTPUTSLaunched S/C featuresIn‐orbit observedSFsNew S/C features
INTPUTSLaunched S/C featuresIn‐orbit observedSFsNew S/C features
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 16
6 – Model 2
OUTPUTSNumber of SFs per Severity level in TIME(Rayleigh and Exponential mixture) –and per phase‐OUTPUTSNumber of SFs per Severity level in TIME(Rayleigh and Exponential mixture) –and per phase‐
Basis = ‐ Regression‐ Rayleigh / Exponential fit
Basis = ‐ Regression‐ Rayleigh / Exponential fit
INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 1 outputs
INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 1 outputs
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20 17
6 – Model 3
OUTPUTSResidual Number of SFs per Severity level in “real time” along the S/C development phaseOUTPUTSResidual Number of SFs per Severity level in “real time” along the S/C development phase
Basis = ‐ SMERFS^3 Tool for dvt‐ Reliability growth‐ Rayleigh / Exponential
Basis = ‐ SMERFS^3 Tool for dvt‐ Reliability growth‐ Rayleigh / Exponential
INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 2 outputsTest on‐ground feedback
INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 2 outputsTest on‐ground feedback
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
7 ‐ Outputs
• SF modelling (statistical approach in 3 steps)
• Human Error survey per domain : nuclear, aeronautics, automotive…
• SW reliability models analysis
• Open Source tools : R, OpenTurn…
• RIDE+ Study outputs– TN#1
– MiniTool (excel-based)
– User’s Manual
– Use Case
– Final report
18
FADAT 2, 2013@Noordvijk, Netherlands
Thi
s do
cum
ent
and
its c
onte
nt i
s th
e pr
oper
ty o
f A
striu
m [
Ltd/
SA
S/G
mbH
] an
d is
str
ictly
con
fiden
tial.
It sh
all
not
be c
omm
unic
ate
d to
any
thi
rd p
arty
with
out
the
writ
ten
cons
ent
of A
striu
m [
Ltd/
SA
S/G
mbH
].
2013/06/20
8 – Study Conclusion
19
• Most of SF are Human Error-caused
• There is no Human Error probability quantification except innuclear domain and limited to operations & maintenance only
• Generally Human Error during Design phase is not formallyaddressed
• SF modelling is based on regression
• Need for in-orbit feedback (> 50 S/C dependent on thenumber of drivers to be considered !)
• Rayleigh fit is quite good
• SMERFS use is questionable (pertinence, tool maturity …)