Risk-Informed Verification Planning

FADAT 2, 2013@Noordvijk, Netherlands

Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

RAMS Exploitation of In‐Orbit Data: RIDE+ Study

Astrium Satellites

JF GAJEWSKIAstrium Satellites


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

Summary

1.Introduction

2.Failure events

3.Definitions of failures

4.Random Failures

5.Systematic Failures

6.SF Modelling

7.Outputs of the study

8.Conclusion2


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

1‐ Introduction

RIDE+ is a CCN to RIDE1 study “RAMS exploitation of in-orbit data”study (contract n°21167/07/NL/EM)

RIDE1 objective (reminder)

to enhance the efficiency and the pertinence of the RAMS analyses and associatedfailure risk management by introducing a feedback loop between in-orbitoperational events (failures, …) recorded in ESTEC’s and ESOC’s operationaldatabase and ESA RAMS engineering, risk assessment and engineering process. .

RIDE+ objective

part of the Risk Informed Planning initiative

to develop a reliability prediction model taking into account Random Failures, aswell as, Systematic Failures

Derive additional users requirements for the RIDE tool

3


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

2‐ Failure events

Classification of the FAILURE EVENTS Anomaly / Mishap

Failure or Undesired Outcome

Fault or error

Initiating Event or Proximate Cause

Contributing Factor

4

Source : Astrium satellites


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

2‐ Failure events

5

*In the case the Rad / SEU predictions are not in accordance with reality the fault is a design error**in the case the wear-out occurrs before the end of the mission lifetime, i.e. lifetime qualification is unsufficient, the fault is considered as a design error


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 6

3‐ Definitions of failures

RANDOM FAILURE (RF) A failure is considered as “RANDOM” when it is due to an intrinsic defect such as

Defect relates to an HW elementary part physical defect

Probability of occurrence is LOW (less than 1% of the parts within a batch may be affected)

Defect probability remains compliant with the part procurement standards and the reliabilityprediction

Physical Root cause (defect) is either not identified or considered as a ‘one‐off’ afterinvestigations

RFs from one S/C to another are always mutually independent Some attributes

Always a HW failure

Leads to the failure of an equipment either electrical or mechanical or of any other nature

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

-1 1 3 5 7 9 11 13 15

STRESS versus STRENGTHThe area where the stress exceeds the strength (i.e. failure) is defined by the programme assumptions (derating rules, parts selection)


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 7


SYSTEMATIC FAILURE (SF) A failure is considered as “SYSTEMATIC” when it is due to an intrinsic defect such as

Defect relates either to an elementary part defect or to a ‘system’ defect

Probability of occurrence is SIGNIFICANT (more than 1% of the parts within a batch may be affected)

Defect is not detected before launch by procurement & system tests on ground

Root cause (defect) is identified after investigations

SFs from one S/C to another may be dependent (same batch, similar conditions…)

Fault/Error : event or fact or action which Is not in conformance with a planned procedure

Leads to a weakness (unknown consequences while respecting the planned actions)

Design or manufacturing error within an elementary part

HW defective part

SW bug

Design or manufacturing (Assembly & Test) error at system level

WO occurring below the specified lifetime (degradation, rad) is considered as a design error

SE with a frequency occurrence higher than the prediction is considered as a design error

Operations error (wrong procedure, erroneous data, …) at satellite level during operational life

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

-1 1 3 5 7 9 11 13 15


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 8


SINGLE EVENT (SE) A Single Event is an abnormal event due to

EEE parts sensitivity to Single Event Phenomenon (SEU, SET, …)

HW transient mishap

A SE is not permanent but may occur several times depending on the conditions

A SE is reversible (no physical damage on event occurrence)

SE is counted when the probability of occurrence is compliant with the prediction (otherwise it is aconsidered as a SF)


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 9


WEAR‐OUT EVENT (WO) An event is considered as a “WEAR‐OUT” event when it results from

A progressive degradation (up to the definitive failure) of the performances due to the applied lifetime stresses O/O cycling

Thermal cycling

Radiation effects (cumulated dose)

Wear‐out (mechanical part, monoatomic O, …)

Only impacts physical part

WO is counted when occurring beyond the design lifetime (otherwise it is a SF) since ‘safe life’ –applied in Space design‐ means that the limited lifetime items are qualified (by tests) over thespecified lifetime


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

4 – Random failures

• Reliability prediction– In time : system, units, parts

• Weibull, exponential distribution

• Data = Standard (Mil‐217, UTE‐C‐80810, FIDES …), tests, in‐orbit feedback

– Stress / Strength (mechanical items mainly)• R= P ( stress < strength)

• Data = test results, design & manufacturing sizing

– Proportion (one‐shot device)• R = P (success)

• Data = Test results, designcharacteristics

10

• Relates to 10% of the in‐orbit anomalies !• Pessimistic versus operational reliability !!• There is a need for new approach

• In‐orbit correlation• Estimation (, R …)• Corrective factor• …

• New database (FIDES ?)• …

• Relates to 10% of the in‐orbit anomalies !• Pessimistic versus operational reliability !!• There is a need for new approach

• In‐orbit correlation• Estimation (, R …)• Corrective factor• …

• New database (FIDES ?)• …


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

5 – Systematic Failure

• A Systematic Failure (SF) is a failure event leading to impact

(a) the system itself or

(b) a part of the system (HW or SW as well)

which is recoverable (use of redundancy, new SW up‐load, workaround) or not (defect affecting both channels of redundancy, loss of spacecraft due to misuse) and due to:

– Design Error

– Manufacturing Error

– Operator (or operations) Error

• This failure is called “systematic” as long as a deterministic physical defect ora deterministic cause is identified through investigations which means thefailure was predictable though not detected on‐ground or before anyoperational sequences.

11


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20


• SF are mainly due to Human Error (design/manufacturing/operation phase)

• HUMAN ERROR

inappropriate or undesirable human decision or behaviour with potentialimpact on safety, dependability or system performance.

• About Human ErrorModelling (rate …)

– No real HE modelling during development phase

– Only Nuclear domain (TMI …) uses quantitative HE modelling (THERP =Technique for Human Error Rate Prediction…)

– For other domains only QUALITATIVEmethods are used (in operations)

12

“However, quantitative assessments of the probabilities of crew or maintenance errors are not currentlyconsidered feasible. If the failure indications are considered to be recognizable and the required actions donot cause an excessive workload, then for the purposes of the analysis, the probability that the correctiveaction will be accomplished, can be considered to be one. If the necessary actions cannot be satisfactorilyaccomplished, the tasks and/or the systems need to be modified.”CS25‐amendment 11

“However, quantitative assessments of the probabilities of crew or maintenance errors are not currentlyconsidered feasible. If the failure indications are considered to be recognizable and the required actions donot cause an excessive workload, then for the purposes of the analysis, the probability that the correctiveaction will be accomplished, can be considered to be one. If the necessary actions cannot be satisfactorilyaccomplished, the tasks and/or the systems need to be modified.”CS25‐amendment 11


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 13


NO PERTINENT HUMAN ERRORMODEL CONTRIBUTING FACTORS (DRIVERS) to HE• e.g.Task complexity, skills, training …• Classified as Low / Medium / High• Defined per phase

Design/ Manufacturing/ operation• Basis for regression (in‐orbit feedback)

NO PERTINENT HUMAN ERRORMODEL CONTRIBUTING FACTORS (DRIVERS) to HE• e.g.Task complexity, skills, training …• Classified as Low / Medium / High• Defined per phase

Design/ Manufacturing/ operation• Basis for regression (in‐orbit feedback)


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 14

6 – SF Modelling

BASIS• Drivers per phase contributing to HE• Filters : Reviews / Tests / Validation• Severity scale (impacts in‐orbit)SF MODELLING• Based on REGRESSION

DRIVERS In‐Orbit SF• 3 Models with increasing complexity

BASIS• Drivers per phase contributing to HE• Filters : Reviews / Tests / Validation• Severity scale (impacts in‐orbit)SF MODELLING• Based on REGRESSION

DRIVERS In‐Orbit SF• 3 Models with increasing complexity


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 15

6 – Model 1

OUTPUTSExpected number of SFs per Severity level based on regression (in‐orbit feedback vs drivers) for the new S/COUTPUTSExpected number of SFs per Severity level based on regression (in‐orbit feedback vs drivers) for the new S/C

Basis Pure Regression

Basis Pure Regression

INTPUTSLaunched S/C featuresIn‐orbit observedSFsNew S/C features

INTPUTSLaunched S/C featuresIn‐orbit observedSFsNew S/C features


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 16

6 – Model 2

OUTPUTSNumber of SFs per Severity level in TIME(Rayleigh and Exponential mixture) –and per phase‐OUTPUTSNumber of SFs per Severity level in TIME(Rayleigh and Exponential mixture) –and per phase‐

Basis = ‐ Regression‐ Rayleigh / Exponential fit

Basis = ‐ Regression‐ Rayleigh / Exponential fit

INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 1 outputs

INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 1 outputs


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20 17

6 – Model 3

OUTPUTSResidual Number of SFs per Severity level in “real time” along the S/C development phaseOUTPUTSResidual Number of SFs per Severity level in “real time” along the S/C development phase

Basis = ‐ SMERFS^3 Tool for dvt‐ Reliability growth‐ Rayleigh / Exponential

Basis = ‐ SMERFS^3 Tool for dvt‐ Reliability growth‐ Rayleigh / Exponential

INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 2 outputsTest on‐ground feedback

INTPUTSLaunched S/C featuresIn‐orbit SFsNew S/C featuresModel 2 outputsTest on‐ground feedback


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

7 ‐ Outputs

• SF modelling (statistical approach in 3 steps)

• Human Error survey per domain : nuclear, aeronautics, automotive…

• SW reliability models analysis

• Open Source tools : R, OpenTurn…

• RIDE+ Study outputs– TN#1

– MiniTool (excel-based)

– User’s Manual

– Use Case

– Final report

18


Thi

s do

cum

ent

and

its c

onte

nt i

s th

e pr

oper

ty o

f A

striu

m [

Ltd/

SA

S/G

mbH

] an

d is

str

ictly

con

fiden

tial.

It sh

all

not

be c

omm

unic

ate

d to

any

thi

rd p

arty

with

out

the

writ

ten

cons

ent

of A

striu

m [

Ltd/

SA

S/G

mbH

].

2013/06/20

8 – Study Conclusion

19

• Most of SF are Human Error-caused

• There is no Human Error probability quantification except innuclear domain and limited to operations & maintenance only

• Generally Human Error during Design phase is not formallyaddressed

• SF modelling is based on regression

• Need for in-orbit feedback (> 50 S/C dependent on thenumber of drivers to be considered !)

• Rayleigh fit is quite good

• SMERFS use is questionable (pertinence, tool maturity …)

Risk-Informed Verification Planning

Documents

Transcript of Risk-Informed Verification Planning