Functional data ?

04/19/23

Global sensitivity analysis of computer models with functional inputs

B. Iooss (CEA Cadarache)

M. Ribatet (CEMAGREF Lyon)

Conference SAMO 2007Budapest, Hongrie

B. Iooss – SAMO 2007 - 22/06/07

Functional data ?•Classical model writes Y = f (X) , where Y is a scalar output variable

and X is a vector of scalar input variables.

X is considered as a vector of random variables Y is a random variable

B. Iooss – SAMO 2007 - 22/06/07

Functional data ?•Classical model writes Y = f (X) , where Y is a scalar output variable

and X is a vector of scalar input variables.


•The model with functional variables writes Y(v) = f (X1(u1),…, Xp(up)), where

– v and ui are some parameters (scalar or multidimensionnal),

– Y(v) is an output function,

– Xi(ui) is an input function (possibly constant).

B. Iooss – SAMO 2007 - 22/06/07

Functional data ?•Classical model writes Y = f (X) , where Y is a scalar output variable and

X is a vector of scalar input variables.


•The model with functional variables writes Y(v) = f (X1(u1),…, Xp(up)), where

– v and ui are some parameters (scalar or multidimensionnal),

– Y(v) is an output function,

– Xi(ui) is an input function (possibly constant).

Ex. for u and v : time t, spatial coordinates (x,y,z), temperature T, …

Xi(ui) are considered as random functions Y(v) is a random function.

B. Iooss – SAMO 2007 - 22/06/07

An example of a functional input problem

First study :•20 random input variables (permeability, porosity, Kd, …),•20 scalar outputs (concentrations at piezometers),•LH sample (N=300) 300 model evaluations (3 days)•Construction of metamodels,•Global sensitivity analysis (Sobol) via the use of metamodels.

Result : permeability of the second layer is the most influent variable.

August 2002 December 2010

Concentrationsmap

Pollutant (90Sr) transport simulation in porous media [ Volkova et al., SERRA 07 ]

B. Iooss – SAMO 2007 - 22/06/07

Second study :

• We want to take into account the spatial heterogeneity of the permeability.

• We represent it by a random field (x,y).

Realisations of this random field are obtained via geostatistical simulation techniques.

Classical methods of global sensitivity analysis or metamodel construction are no more applicable.

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

2 possible realisations of the permeability

An example of a functional input problem

B. Iooss – SAMO 2007 - 22/06/07

Some recent works (not exhaustive)

Functional input :– Tarantola et al., SERRA 02 : environmental assessment problem.

Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations.

B. Iooss – SAMO 2007 - 22/06/07




– Ruffo et al., RESS 06 : hydrocarbon exploration risk evaluation.

The basin and petroleum system models are very complex random fields.

Consider one scenario variable (32 basin models) as a categorical variable.

B. Iooss – SAMO 2007 - 22/06/07




– Ruffo et al., RESS 06 : hydrocarbon exploration risk evaluation.

The basin and petroleum system models are very complex random fields.

Consider one scenario variable (32 basin models) as a categorical variable.

– Zabalza-Mezghani et al., JPSE 04 : hydrocarbon production optimization.

The random field is considered as an uncontrollable input variable (« Stochastic uncertainty parameter » ).

The other scalar inputs are the controllable variables.

B. Iooss – SAMO 2007 - 22/06/07

Our problem and some possible solutions

Compute the Sobol indices when some input variables are functional.

B. Iooss – SAMO 2007 - 22/06/07



•Complete discretization : unrealizable (several thousands of parameters).

B. Iooss – SAMO 2007 - 22/06/07




•Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise).

B. Iooss – SAMO 2007 - 22/06/07





•Consider the functional input as an unique multi-dimensional parameter.

Multidimensional sensitivity indices (Sobol, MCS 01, Jacques et al., RESS 06) via algorithms which use some independent samples (simple Monte-Carlo).

FAST, RBD and quasi-MC methods are not applicable.

B. Iooss – SAMO 2007 - 22/06/07

Our problem and some possible solutionsCompute the Sobol indices when some input variables are

functional.



•Consider the functional input as an unique multi-dimensional parameter. Multidimensional sensitivity indices (Sobol, MCS 01, Jacques et al., RESS 06) via

algorithms which use some independent samples (simple Monte-Carlo). FAST, RBD and quasi-MC methods are not applicable.

•Replace the functional input by a scalar parameter ~ U[0,1] : it governs the simulation (or not) of the functional input (Tarantola et al., SERRA 02).

Calculate the Sobol index of by any methods.It leads to a quantification of the sensitivity of the output due to the

presence/absence of , but not due to the variability of.

B. Iooss – SAMO 2007 - 22/06/07

Moreover, in our case, we need metamodels

We deal with complex computer codes : non linear effects, time consuming, large number of inputs (>10).

The Sobol indices estimation cannot be made via the direct use of the code, but via the intermediate use of a metamodel.

B. Iooss – SAMO 2007 - 22/06/07

Moreover, in our case, we need metamodels

We deal with complex computer codes : non linear effects, time consuming, large number of inputs (>10).

The Sobol indices estimation cannot be made via the direct use of the code, but via the intermediate use of a metamodel.

Zabalza-Mezgani et al., JPSE 04, propose to consider the functional input as an uncontrollable parameter.

With scalar inputs X and functional input (u), the metamodel becomes

a mean component E(Y|X) and a variance component Var(Y|X).

Uncertainty propagation via this joint model.E(Y|X) + (Y|X)

E(Y|X)

X

Y

B. Iooss – SAMO 2007 - 22/06/07

Sobol indices of the joint modelVar[Y(X ,) ] = Var[ E(Y |X ) ] + E[ Var(Y |X) ]

= Var[ Ym(X) ] + E[ Yd (X) ]

Variance decomposition of Y :

Variance decomposition of Ym :

Then, Sobol indices of X on Y are obtained by :

E[Yd (X) ] contains all the terms including effects of

Total Sobol indice of :

)(Var

)]Var[E( m

Y

XYS i

iX

)(Var

)(

Y

YES dT

)()()()()(Var 121

dp

p

jiij

p

ii YYVYVYVY

)()()()(Var 121

mp

p

jimij

p

imim YVYVYVY

B. Iooss – SAMO 2007 - 22/06/07

Modeling the mean Ym and dispersion Yd

Dual modeling by 2 polynomials (Taguchi 86, Vining & Myers, JQT 90).

Joint modeling by 2 Generalized Linear Models (McCullagh & Nelder 89)

– more general theoretical framework (exponential family distribution),

– modelize simultaneously the mean and variance: iterative fits,– no replications needed (require less computations).

•For the dispersion d, we take the deviance contribution.

•Deviance analysis, Student and Fisher tests, residuals analyses, … allow to perform terms selection and to choose functions g and v.

iii

jjijiiii

Y

xgY

v)(Var

: ,)(E

22)(Var

log: ,)(E

ii

jjijiiii

d

ud

mean dispersion

B. Iooss – SAMO 2007 - 22/06/07

The drawback of GLM is its parametric form which leads to limitations when modeling complex computer codes.

Replace it by popular non parametric models : GAM (Hastie & Tibshirani)

si’s are obtained by fitting a smoother to the data : penalized regression splines (integrated model selection via Generalized Cross Validation).

Deviance analysis, statistical tests on coefficients, residuals analyses, … allow to perform terms selection.

Compared to other metamodels (kriging, neural networks) :– GAM offers a direct interpretation of the model – the drawback stands in the additive effect hypothesis.

Joint modeling with Generalized Additive Models

p

jijiij

p

iii XXsXsgY ),()()(;)(

1

X

ji

jiiji

ii UUsUsd ),()()log(;)(1

U

B. Iooss – SAMO 2007 - 22/06/07

Simple example : Ishigami function with Xi ~ U[-, ]

To test our joint models, X3 is considered as an uncontrollable input.

Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data.

)sin(1.0)sin(7)sin( 143

221 XXXXY

B. Iooss – SAMO 2007 - 22/06/07




Joint GLM (Q2 = 61 %) :

Simple GAM (Q2 = 75 %) :

Joint GAM Q2 (mean) =76 %, Explained deviance : 93% (mean), 37% (dispersion)

)sin(1.0)sin(7)sin( 143

221 XXXXY

7.5 and 29.029.017.269.292.1 42

31

221 dm YXXXXY

)()(67.276.3 211 XsXsXY

)(59.0exp and )()(06.375.3 1211 XsYXsXsXY dm

B. Iooss – SAMO 2007 - 22/06/07




Joint GLM (Q2 = 61 %) :

Simple GAM (Q2 = 75 %) :

Joint GAM Q2 (mean) =76 %, Explained deviance : 93% (mean), 37% (dispersion)

)sin(1.0)sin(7)sin( 143

221 XXXXY

7.5 and 29.029.017.269.292.1 42

31

221 dm YXXXXY

)()(67.276.3 211 XsXsXY

)(59.0exp and )()(06.375.3 1211 XsYXsXsXY dm

Indices Exact Joint GLM Joint GAM Simple GAM

S1 0.314 0.314 0.325 0.333

S2 0.442 0.318 0.414 0.441

ST3 0.244 0.366 0.261 0.25

S13 0.244 0 > 0 unknown

S23 0 0 0 unknown

B. Iooss – SAMO 2007 - 22/06/07

An hydrogeological applicationPollutant (90Sr) transport simulation in porous media

•16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, …

•1 functional input : the permeability

•LH sample (N=300) for the 16 inputs 300 model evaluations (8 days)

•1 output : the concentration at a specified location

50 100 150 200 250 300

50

100

150

200

B. Iooss – SAMO 2007 - 22/06/07






•Joint GAM : Devexp(mean) = 98%, Devexp(dispersion) = 29%

Explanatory terms : mean [ s(kd1) , s(kd2) , s(per3) , s(per2,kd2) ]

dispersion [ kd1 , kd2 ]

50 100 150 200 250 300

50

100

150

200

B. Iooss – SAMO 2007 - 22/06/07






•Joint GAM : Devexp(mean) = 98%, Devexp(dispersion) = 29%

Explanatory terms : mean [ s(kd1) , s(kd2) , s(per3) , s(per2,kd2) ]dispersion [ kd1 , kd2 ]

S(kd2)=52%, S(per2)=8%, S(kd2,per2)=6%, S(kd1)=4%

ST()=28%, S(kd1,) > 0 and S(kd2,) > 0

50 100 150 200 250 300

50

100

150

200

B. Iooss – SAMO 2007 - 22/06/07

Conclusions•This approach, based on joint models to compute Sobol

sensitivity indices, is useful in the following situations :

– model with « complex » functional inputs, – time consuming model (so a metamodel is needed),– heteroscedasticity (functional input interacts with scalar

inputs),

B. Iooss – SAMO 2007 - 22/06/07




inputs),

•Another great interest : uncertainty propagation.

B. Iooss – SAMO 2007 - 22/06/07




inputs),

•Another great interest : uncertainty propagation.

•Actual limitations :

– It cannot distinguish the effects of different functional inputs.

– we obtain qualitative sensitivity indices of the interactions between functional input and other inputs.

B. Iooss – SAMO 2007 - 22/06/07

Useful SOFTWARE

R Packages :

“JointModeling”

“sensitivity” of G. Pujol

Functional data ?

Documents

Transcript of Functional data ?