1 Advances in methods for uncertainty and sensitivity analysis Nicolas Devictor CEA Nuclear Energy...

28
1 Advances in methods for uncertainty and sensitivity analysis Nicolas Devictor CEA Nuclear Energy Division [email protected] in co-operation with: Nadia PEROT, Michel MARQUES and Bertrand IOOSS (CEA) Julien JACQUES (INRIA Rhône-Alpes, PhD student), Christian LAVERGNE (Montpellier 2 University & INRIA). International Workshop on level 2 PSA and Severe Accident Management Koln, Germany, March 2004

Transcript of 1 Advances in methods for uncertainty and sensitivity analysis Nicolas Devictor CEA Nuclear Energy...

1

Advances in methods for uncertainty and sensitivity analysis

Nicolas DevictorCEA Nuclear Energy Division

[email protected]

in co-operation with:Nadia PEROT, Michel MARQUES and Bertrand IOOSS (CEA)

Julien JACQUES (INRIA Rhône-Alpes, PhD student),Christian LAVERGNE (Montpellier 2 University & INRIA).

International Workshop on level 2 PSA and Severe Accident Management Koln, Germany, March 2004

March 2004 2OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Introduction (1/2)•In the framework of the study of the influence of uncertainties on the results of

severe accidents computer codes, and then on results of Level 2 PSA (responses, hierarchy of important inputs…)

•Why taken account uncertainty ?– A lot of sources of uncertainty ;– To show explicitly and tracebly their impact decision process that could be

robust against uncertainties.

•Probabilistic framework is one of the tools for a coherent and rational treatment of uncertainties in a decision-making process.

•Some applications of treatment of uncertainty by probabilistic methods– For a best understanding of a phenomenon

• To evaluate the most influential input variables. To steer R&D. – For an improvement of a modelling or a code

• Calibration, Qualification…– In a risk decision-making process

• Hierarchy of contributors interest for actions to reduce uncertainty or to define a mitigation mean (for example a SAM measure)

• Confidence intervals or probabilistic density functions or margins…•In any analysis, we must keep in mind the choice in modelling and the

assumptions.– Case : a variable has a big influence on the response variability, but we have

a low confidence on his value…

March 2004 3OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Sources of uncertainties

Real phenomenon

Theory

Human understandingSimplified model…

Equations

« mathematics »

Code

Numerical schemesConvergence criteria

Input variables

Model parameters

OutputMeaning ?

Variability ?

March 2004 4OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Introduction (2/2)•A lot of methods exist, but these methods are often not suitable, from a

theoretical point of view, when– the phenomena that are modelled by the computer code are discontinuous

in the variation range of influent parameters;

– input variables are statistically dependent.

•For an overview of the method see paper

•The talk will mainly speak about:– Sensitivity analysis in the case of dependent input variables.– The validation of response surfaces.– The estimation of the additional error that is introduced by the use of a

response surface on the results of the uncertainty and sensitivity analysis. – Clustering methods, that could be useful when we want apply statistical

methods based on Monte-Carlo simulation.

March 2004 5OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Uncertainties on: - physical variables,- model parameters,- models…

Uncertainty on outputs Probability Y > Ytarget

Most influential variables

Process (codes,experiences…)

Inputs for the studyProbabilistic models of the uncertainties on physical variables and parameters ;Mathematical model of the ageing or failure phenomenon ;Acceptance criterion

Propagation of uncertaintiesProbability to exceed athresholdSensitivity analysis

(in this talk) “influence of uncertainties” means:

March 2004 6OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Sensitivity analysesy = f(x1, … , xp) (where y could be a probability)

•1st Question : what is the impact of a variation of the value of an input variable on the value of the response Y ?

– Gradient, differential analysis– Often deterministic approach

•2nd Question : what is the part of the variance of Y that comes from the variance of Xi (or a set {Xi}) ?

– Usual sensitivity indices• Pearson’s correlation coefficient, Spearman’s correlation coefficient,

Coefficients from a linear regression, PRCC…– In the case of non linear or non monotonous : Sobol’s method or FAST

• with very time consuming code ( use of response surface),• problems with correlated uncertainties.

– All these indices are defined under the assumptions that the variables inputs are satistically independent.

YVXYEV

March 2004 7OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Sensitivity analyses – dependent inputs•The problem of sensitivity analysis for model with dependant inputs is a real one,

and concerns the interpretation of sensitivity indices values.

•Inputs are statistically independent the sum of these sensitivity indices = 1. •Inputs are statistically dependent

– the terms of model function decomposition (Sobol’s method) are not orthogonal, so it appears a new term in the variance decomposition.

the sum of all order sensitivity indices is not equal to 1. – Effectively, variabilities of two correlated variables are linked, and so when

we quantify sensitivity to one of this two variables we quantify too a part of sensitivity to the other variable. And so, in sensitivity indices of two variables the same information is taken into account several times, and sum of all indices is thus greatest than 1.

•We have studied the natural idea: to define multidimensional sensitivity indices for groups of correlated variables.

– We can also define higher order indices and total sensitivity indices.– If all input variables are independent, those sensitivity indices are the same

than in case of independant variables. – The assessment is often time consuming (extension of Sobol’s method)

some computational improvements are in progress and very promising.

YVXYEV

Sj

j/

March 2004 8OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Response surface method

•Interest for a response surface (or meta-model or surrogated model):

– Good capability in approximation (study on the training sample) ;

– Good capability in prediction ;

– Low CPU time for a calculation.

•Data needed in a Response Surface Method (RSM) :– a training sample D of points (x(i), z(i)), where P(X,Z) the probability law of

the random vector (X,Z) (unknown in practice) ;– a family F of function f(x,c), where c is either a parameter vector or a index

vector that identifies the different elements of F.

•The best function in the family F is then the function f0 that minimized a

risk function :

•In practice, often use of an empirical risk function :

ydPcfzLfR ,,, xx

N

iE ifiz

NfR

1

2,1

cx

March 2004 9OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Examples of response surface•Polynomial models

•Generalized Linear Models (GLM) – Regression models (assumption : continuous function).– Other possibility : discriminant function (logit, probit models).– Qualitative and quantitative inputs.

•Thin plate spline– Regression models (assumption : continuous function).

•PLS (Partial Least Squares)– Regression models (assumption : continuous function).– Qualitative and quantitative inputs.

•Neural networks– Regression models (assumption : continuous function).– Other possibility : discriminant function (logit, probit models).

•A simplified « physical » model (3D 1D, …)

March 2004 10OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

With regard to the validation step• The characteristic « good approximation » is subjective and

depends on the use of the response surface. – What is the future use of the built response surface ?

– What are the constraints that are forced by the use ?

– How to define the validity domain of a response surface ?

• Calibration, modelling, prediction, probability computation…

• Specific criteria in the decision making process

– Conservatism / A bound on the remainder / Better accuracy in a interest area (distribution tail…).

• How defines the expected accuracy ?– Ratio “residual deviance / null deviance” ?

– Calibration : representativeness of the most influential parameters,

– Prediction : robustness : bias/variance compromise,

– The quality of the response surface should be compatible with the accuracy of the studied code.

March 2004 11OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Validation of a response surfaceStatistics(often under assumptions like Gauss-Markov assumptions…)

– Variance analysis– Estimator of the variance ²– R² statistics– Confidence area 1- for coefficients c...

Prediction : test base (bias), cross validation

Bootstrap method– to improve the

estimation of the bias between learning and generalization error,

– to estimate the sensitivity of the trained model f in relation to available data.

Comparison of results – Pdf of the output,

Confidence interval…

Values computed by the function g

Values computed by a polynomial response

surface

6,6 6,9 7,2 7,5 7,8 8,1 8,4 6,8

7,1

7,4

7,7

8

8,3

0,0000

0,2000

0,4000

0,6000

0,8000

1,0000

1,2000

1,4 000

30 40

50 60

70 80

90

Database size

Mean

Standard deviation

Minimum

Maximum

March 2004 12OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Example : The direct containment heating (DCH)•In the framework of a contract with the PSA Level 2 project at IRSN (in 2000).•Code : RUPUICUV module of Escadre ( Model has changed since 2000)•The calculations have been performed with the in 2000. A database of 300

calculations is available. The inputs vectors for these calculations have been generated randomly in the variation domain.

•Responses– maximum pressure in the containment;– the presence of corium in the containment outside the reactor pit; it is a

discrete response with value 0 (no corium) or 1 (presence). •Inputs variables

– MCOR : mass of corium, uniformly distributed between 20 and 80 tons,– FZRO : fraction of oxyded Zr, uniformly distributed between 0,5 and 1,– PVES : primary pressure, uniformly distributed between 1 and 166 bars,– DIAM : break size, uniformly distributed between 1 cm and 1 m,– ACAV : section de passage dans le puits de cuve (varie entre 8 and 22 m 2 )– FRAC : fraction of corium directly ejected in the containment, uniformly

distributed between 0 and 1,– CDIS : discharge coefficient at the break, uniformly distributed between 0,1

and 0,9,– KFIT : adjustment parameter, uniformly distributed between 0,1 and 0,3,– HWAT : water height in the reactor pit, discrete random variable (0 or 3

meter)

March 2004 13OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Example : maximum pressure (1/2)

•Use of the empirical risk function

•Approximation capabilities : all the RS seems good

•Prediction capabilities :

– Non negligible residues

Response surface

Training set Test base Sensitivity indices comparison

Max. residue

Pdf comparison

RS statistics Pdf comparison RS statistics Partial correlations

Spearman correlations

(bars)

Polynôme degré 2

Similar R²= 90,3%

CC=0,95

Similar R²= 86,8%

CC=0,93

Very good Very good -2,94 ;

+ 1,83

Thin plate spline =3

Interpolation Interpolation Similar R²= 82,2%

CC=0,90

Not same hierarchy

Not same hierarchy

-1,80 ;

+ 2,81

RN-1 Similar R²=97,0 %

CC=0,98

Similar R²= 74,7%

CC=0,86

Not same hierarchy

Not same hierarchy

-2,81 ;

+ 3,68

Mean of 3 RN

Similar R²= 97,3%

CC=0,98

Similar R²= 82,3%

CC=0,91

Very good Very good -1,81 ;

+ 2,70

GLM3 Similar R²= 91,1%

CC=0,95

Similar R²= 87,9%

CC=0,94

Very good Very good -2,99 ;

+ 1,52

March 2004 14OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Example : maximum pressure (2/2)

Box-and-Whisker Plot

0 2 4 6 8 10

po2

Pression

Variablespo2Pression

Density Traces

0 2 4 6 8 100

0,04

0,08

0,12

0,16

0,2

dens

ity

Plot of Fitted Model

0 2 4 6 8 10

Pression

0

2

4

6

8

10

po2

Training sample Test sample

Box-and-Whisker Plot

0 2 4 6 8 10

po2

Pression

Variablespo2Pression

Density Traces

0 2 4 6 8 100

0,04

0,08

0,12

0,16

0,2

0,24

dens

ity

Plot of Fitted Model

0 2 4 6 8 10

Pression

0

2

4

6

8

10

po2

March 2004 15OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

About the impact of response surface « error » •Use of a RS in an UASA a bias or an error on the results of the uncertainty and

sensitivity analysis.•Usual questions are:

– What is the impact of this “error” on the results of an uncertainty and sensitivity analysis made on a response surface?

– Can we deduce results on the “true” function from results obtained from a response surface?

“residual function” (x1, …, xp) = RS(x1, …, xp) - f(x1, …, xp)

•Assume that all Xi are independent, and sensitivity analysis have been done on the two function RS and , and we note SRS,i and S,i the computed sensitivity analysis.

V(E(f(X1, …, Xp)/Xi)) from

SRS,i and S,i is: 

•Problem of the computation of the covariance term generally impossible to deduce results on the “true” function from results obtained from a RS.

•Only cases where results can be deduce are :– SR is a truncated model obtained from a decomposition in a orthogonal basis; is not very sensitive of the variables X1, …, Xp

SSR,i / (V((x1, …, xp))+V(SR(x1, …, xp)))

p

ipip

p

pi

p

piSRif

XXfV

XXXEXXXRSE

XXfV

XXVS

XXfV

XXRSVSS

,,

,,,,,cov2

,,

,,

,,

,,

1

11

1

1,

1

1,,

March 2004 16OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Discontinuous model •No usual response surface family is suitable.•In practice, discontinuous behaviour means generally that more than one

physical phenomenon is implemented in the code. •To avoid misleading in interpretation of results of uncertainty and

sensitivity analysis, discriminant analysis should be used to define areas where the function is continuous. Analysis are led on each continuous area.

•Possible methods:– neural networks with sigmoid activation function,– GLM models with a logit link or logistic regression,– Vector support machine – Decision tree,

and variants like random forest…

•Practical problems are often encountered if the sample is « linearly separable ».

•Support vector machines and methods based on Decision Trees are very promising for that case.

1st « continuous » set

2nd « continuous » set

March 2004 17OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Example : presence of corium in the containment•First tool generalized linear model with a logit link.

– It exists always a model that explains 100% of the dispersion of the results for the training set.

– But there is some drawbacks :• the list of the terms that are statistically significant varies strongly with

the training set;• the prediction error is around 20%.

•Use of neural networks similar problems.

•Other methods SVM, decision trees and random forest

•Conclusion (for that example)– The most efficient method is the Random Forest method.– The methods J48 and Random Forest are faster than the algorithms based

on optimisation step (like Naïve Bayes, SVM, Neural Network…).– The principle of decision trees and random forest is simple and based on the

building of a set of logical combination of decision rules. They are often very readable, and have very prediction capabilities (like shown by the example).

March 2004 18OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Example : presence of corium in the containment Naive Bayes SVM SMO J48 Random Forest Description of the method

Optimisation of the classification probability in a

Bayesian framework

under Gaussian assumptions

Support vector machine method

with SMO algorithm for

the optimisation step

Decision Tree method also named C4.5 algorithm.

Algorithm based on a set of

decision trees predictor.

Method parameters -D -C2-E2 -S -I40 -G0.01 -C0.25 -K0 -A1000003 M2 -S1 -T0.001 -A -P10-10 -N0 -R -M -V-1 -W 1 Training set Badly classified 7.5% 8.5% 5% 0% Error matrix 10 11 12 15 17 10 27 0 5 168 2 171 0 173 0 173 Test set Badly classified 9% 9% 7% 5% Error matrix 11 3 7 7 11 3 10 4 6 80 2 84 4 82 1 85 Cross validation (300 calculations)

Badly classified 8.67% 9.33% 9.67% 7.33% Error matrix 26 15 18 23 21 20 25 16 11 248 5 254 9 250 6 253

A more global indicator of the quality (approximation + prediction capabilities) of the model is obtained by cross validation method.

March 2004 19OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Conclusions

•A lot of methods exist for UASA in the framework of level 2 PSA and severe accident codes.

•As these methods are often not suitable, from a theoretical point of view, when

– the phenomena that are modelled by the computer code are discontinuous in the variation range of influent parameters;

– input variables are statistically dependent,

new results and ideas to overcome these problems have been described in the paper.

•Practical interest of these “new” methods should be confirmed, by application on « real » problems.

March 2004 20OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

March 2004 21OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

• Probability distribution– Simulation + fit + statistical tests (asymptotical)

• First statistical moments– Statistics on a sample (convergence, Bootstrap)– Approximation of the standard deviation

• Confidence interval– From the density function– Wilks formula

MYmPP

111 NN N

2/1

1 1

22

1 ;cov2, ,

n

i

n

ijji

jiii

in xxxxxsxxx

ffffs

Response uncertainty

March 2004 22OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Monte-Carlo Simulations

–Variance reduction methods: conditional MC, stratified MC, Hypercube Latin

–More suitable for the computation of a probability : importance sampling, directional simulation

–Practical problem with very time consuming codeResponse surface

March 2004 23OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

FORM/SORM Methods

• Probabilistic transformation Z U

(Ui is N(0,1)-distributed and are independents)

• In U-space, a new failure surface G(U)=H(T(Z))=0

• Design point and Hasofer-Lind index U*

• FORM approximation

• SORM approximation (Breitung)

• Sensitivity factors

HLG u

tu u

min0

F HLP

iHL

iu

*

0

Failuredomain

U1

U2

U*

G(U) = 0

HL

Safedomain

n

iiHLHLFP

1

211

March 2004 24OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

FORM – simple case

•Ramdom variables : N(0,1)-distributed and are independents

•Limit state function : hyper plane

0U1

U2

P*

G(U) = 0

HL

Domaine dedéfaillance

f HL

HL

P ob Y

Pr 1

March 2004 25OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Validation of the FORM/SORM results

• Sets of results : FORM, SORM, Conditional importance sampling, etc.• Comparison of FORM, SORM and Conditional Importance Sampling (CIS)

results– Coherence of all these results ?

• If yes, a good confidence is obtained in FORM result and geometrical assumption of FORM method.

– Coherence of FORM and CIS results ? • If yes, a good confidence is obtained in FORM result and the

geometrical assumption of FORM method.– Coherence of SORM and CIS results ?

• If yes, a good confidence is obtained in SORM result, and the geometrical assumption of FORM method is false.

– If no coherence • Geometrical assumptions for FORM and SORM are false. • Existence of other minima ?• Monte-Carlo simulation or a variance reduction method (with or

without a response surface).• New tests have been developed to check that the computed minimum is a global

minimum (non negligible costs).

March 2004 26OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Conditional importance sampling

0

Domaine dedéfaillance

U

2

1

U

P*

G(u)=0

March 2004 27OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Simulations FORM/SORM RESULTS

FAILURE PROBABILITY ERROR ON THE ESTIMATION PROBABILITY DISTRIBUTION OF THE RESPONSE

ASSUMPTIONS

NO ASSUMPTION ON THE RANDOM VARIABLES (DISCRETE, CONTINOUS, DEPENDANCY…) NO ASSUMPTION ON THE LIMIT STATE FUNCTION

DRAWBACKS

COMPUTATION COSTS (depends on the probability level)

RESULTS

FAILURE PROBABILITY MOST INFLUENTIAL VARIABLES (PROBABILITY) EFFICIENCY (depends on the number of random variables)

ASSUMPTIONS

CONTINOUS RANDOM VARIABLES CONTINOUS LIMIT STATE FUNCTION

DRAWBACKS

NO ERROR ON THE ESTIMATION GLOBAL MINIMUM

Comparison of methods

March 2004 28OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM CEA/Cadarache - Plant Operation and Reliability Laboratory

Examples of response surface

•Polynomial models•Generalized Linear Models (GLM)

– Regression models (assumption : continuous function).

– Other possibility : discriminant function (logit, probit models).

– Qualitative and quantitative variables.

•Thin plate spline– Regression models (assumption : continuous function).

– Qualitative (if 2 factors) and quantitative variables.

•PLS (Partial Least Squares)– Regression models (assumption : continuous function).

– Qualitative and quantitative variables.

•Neural networks– Regression models (assumption : continuous function).

– Other possibility : discriminant function (logit, probit models).

– Qualitative (if 2 factors) and quantitative variables.