Functional data ?
-
Upload
hilel-burnett -
Category
Documents
-
view
18 -
download
3
description
Transcript of Functional data ?
04/19/23
Global sensitivity analysis of computer models with functional inputs
B. Iooss (CEA Cadarache)
M. Ribatet (CEMAGREF Lyon)
Conference SAMO 2007Budapest, Hongrie
B. Iooss – SAMO 2007 - 22/06/07
Functional data ?•Classical model writes Y = f (X) , where Y is a scalar output variable
and X is a vector of scalar input variables.
X is considered as a vector of random variables Y is a random variable
B. Iooss – SAMO 2007 - 22/06/07
Functional data ?•Classical model writes Y = f (X) , where Y is a scalar output variable
and X is a vector of scalar input variables.
X is considered as a vector of random variables Y is a random variable
•The model with functional variables writes Y(v) = f (X1(u1),…, Xp(up)), where
– v and ui are some parameters (scalar or multidimensionnal),
– Y(v) is an output function,
– Xi(ui) is an input function (possibly constant).
B. Iooss – SAMO 2007 - 22/06/07
Functional data ?•Classical model writes Y = f (X) , where Y is a scalar output variable and
X is a vector of scalar input variables.
X is considered as a vector of random variables Y is a random variable
•The model with functional variables writes Y(v) = f (X1(u1),…, Xp(up)), where
– v and ui are some parameters (scalar or multidimensionnal),
– Y(v) is an output function,
– Xi(ui) is an input function (possibly constant).
Ex. for u and v : time t, spatial coordinates (x,y,z), temperature T, …
Xi(ui) are considered as random functions Y(v) is a random function.
B. Iooss – SAMO 2007 - 22/06/07
An example of a functional input problem
First study :•20 random input variables (permeability, porosity, Kd, …),•20 scalar outputs (concentrations at piezometers),•LH sample (N=300) 300 model evaluations (3 days)•Construction of metamodels,•Global sensitivity analysis (Sobol) via the use of metamodels.
Result : permeability of the second layer is the most influent variable.
August 2002 December 2010
Concentrationsmap
Pollutant (90Sr) transport simulation in porous media [ Volkova et al., SERRA 07 ]
B. Iooss – SAMO 2007 - 22/06/07
Second study :
• We want to take into account the spatial heterogeneity of the permeability.
• We represent it by a random field (x,y).
Realisations of this random field are obtained via geostatistical simulation techniques.
Classical methods of global sensitivity analysis or metamodel construction are no more applicable.
50 100 150 200 250 300
50
100
150
200
50 100 150 200 250 300
50
100
150
200
2 possible realisations of the permeability
An example of a functional input problem
B. Iooss – SAMO 2007 - 22/06/07
Some recent works (not exhaustive)
Functional input :– Tarantola et al., SERRA 02 : environmental assessment problem.
Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations.
B. Iooss – SAMO 2007 - 22/06/07
Some recent works (not exhaustive)
Functional input :– Tarantola et al., SERRA 02 : environmental assessment problem.
Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations.
– Ruffo et al., RESS 06 : hydrocarbon exploration risk evaluation.
The basin and petroleum system models are very complex random fields.
Consider one scenario variable (32 basin models) as a categorical variable.
B. Iooss – SAMO 2007 - 22/06/07
Some recent works (not exhaustive)
Functional input :– Tarantola et al., SERRA 02 : environmental assessment problem.
Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations.
– Ruffo et al., RESS 06 : hydrocarbon exploration risk evaluation.
The basin and petroleum system models are very complex random fields.
Consider one scenario variable (32 basin models) as a categorical variable.
– Zabalza-Mezghani et al., JPSE 04 : hydrocarbon production optimization.
The random field is considered as an uncontrollable input variable (« Stochastic uncertainty parameter » ).
The other scalar inputs are the controllable variables.
B. Iooss – SAMO 2007 - 22/06/07
Our problem and some possible solutions
Compute the Sobol indices when some input variables are functional.
B. Iooss – SAMO 2007 - 22/06/07
Our problem and some possible solutions
Compute the Sobol indices when some input variables are functional.
•Complete discretization : unrealizable (several thousands of parameters).
B. Iooss – SAMO 2007 - 22/06/07
Our problem and some possible solutions
Compute the Sobol indices when some input variables are functional.
•Complete discretization : unrealizable (several thousands of parameters).
•Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise).
B. Iooss – SAMO 2007 - 22/06/07
Our problem and some possible solutions
Compute the Sobol indices when some input variables are functional.
•Complete discretization : unrealizable (several thousands of parameters).
•Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise).
•Consider the functional input as an unique multi-dimensional parameter.
Multidimensional sensitivity indices (Sobol, MCS 01, Jacques et al., RESS 06) via algorithms which use some independent samples (simple Monte-Carlo).
FAST, RBD and quasi-MC methods are not applicable.
B. Iooss – SAMO 2007 - 22/06/07
Our problem and some possible solutionsCompute the Sobol indices when some input variables are
functional.
•Complete discretization : unrealizable (several thousands of parameters).
•Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise).
•Consider the functional input as an unique multi-dimensional parameter. Multidimensional sensitivity indices (Sobol, MCS 01, Jacques et al., RESS 06) via
algorithms which use some independent samples (simple Monte-Carlo). FAST, RBD and quasi-MC methods are not applicable.
•Replace the functional input by a scalar parameter ~ U[0,1] : it governs the simulation (or not) of the functional input (Tarantola et al., SERRA 02).
Calculate the Sobol index of by any methods.It leads to a quantification of the sensitivity of the output due to the
presence/absence of , but not due to the variability of.
B. Iooss – SAMO 2007 - 22/06/07
Moreover, in our case, we need metamodels
We deal with complex computer codes : non linear effects, time consuming, large number of inputs (>10).
The Sobol indices estimation cannot be made via the direct use of the code, but via the intermediate use of a metamodel.
B. Iooss – SAMO 2007 - 22/06/07
Moreover, in our case, we need metamodels
We deal with complex computer codes : non linear effects, time consuming, large number of inputs (>10).
The Sobol indices estimation cannot be made via the direct use of the code, but via the intermediate use of a metamodel.
Zabalza-Mezgani et al., JPSE 04, propose to consider the functional input as an uncontrollable parameter.
With scalar inputs X and functional input (u), the metamodel becomes
a mean component E(Y|X) and a variance component Var(Y|X).
Uncertainty propagation via this joint model.E(Y|X) + (Y|X)
E(Y|X)
X
Y
B. Iooss – SAMO 2007 - 22/06/07
Sobol indices of the joint modelVar[Y(X ,) ] = Var[ E(Y |X ) ] + E[ Var(Y |X) ]
= Var[ Ym(X) ] + E[ Yd (X) ]
Variance decomposition of Y :
Variance decomposition of Ym :
Then, Sobol indices of X on Y are obtained by :
E[Yd (X) ] contains all the terms including effects of
Total Sobol indice of :
)(Var
)]Var[E( m
Y
XYS i
iX
)(Var
)(
Y
YES dT
)()()()()(Var 121
dp
p
jiij
p
ii YYVYVYVY
)()()()(Var 121
mp
p
jimij
p
imim YVYVYVY
B. Iooss – SAMO 2007 - 22/06/07
Modeling the mean Ym and dispersion Yd
Dual modeling by 2 polynomials (Taguchi 86, Vining & Myers, JQT 90).
Joint modeling by 2 Generalized Linear Models (McCullagh & Nelder 89)
– more general theoretical framework (exponential family distribution),
– modelize simultaneously the mean and variance: iterative fits,– no replications needed (require less computations).
•For the dispersion d, we take the deviance contribution.
•Deviance analysis, Student and Fisher tests, residuals analyses, … allow to perform terms selection and to choose functions g and v.
iii
jjijiiii
Y
xgY
v)(Var
: ,)(E
22)(Var
log: ,)(E
ii
jjijiiii
d
ud
mean dispersion
B. Iooss – SAMO 2007 - 22/06/07
The drawback of GLM is its parametric form which leads to limitations when modeling complex computer codes.
Replace it by popular non parametric models : GAM (Hastie & Tibshirani)
si’s are obtained by fitting a smoother to the data : penalized regression splines (integrated model selection via Generalized Cross Validation).
Deviance analysis, statistical tests on coefficients, residuals analyses, … allow to perform terms selection.
Compared to other metamodels (kriging, neural networks) :– GAM offers a direct interpretation of the model – the drawback stands in the additive effect hypothesis.
Joint modeling with Generalized Additive Models
p
jijiij
p
iii XXsXsgY ),()()(;)(
1
X
ji
jiiji
ii UUsUsd ),()()log(;)(1
U
B. Iooss – SAMO 2007 - 22/06/07
Simple example : Ishigami function with Xi ~ U[-, ]
To test our joint models, X3 is considered as an uncontrollable input.
Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data.
)sin(1.0)sin(7)sin( 143
221 XXXXY
B. Iooss – SAMO 2007 - 22/06/07
Simple example : Ishigami function with Xi ~ U[-, ]
To test our joint models, X3 is considered as an uncontrollable input.
Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data.
Joint GLM (Q2 = 61 %) :
Simple GAM (Q2 = 75 %) :
Joint GAM Q2 (mean) =76 %, Explained deviance : 93% (mean), 37% (dispersion)
)sin(1.0)sin(7)sin( 143
221 XXXXY
7.5 and 29.029.017.269.292.1 42
31
221 dm YXXXXY
)()(67.276.3 211 XsXsXY
)(59.0exp and )()(06.375.3 1211 XsYXsXsXY dm
B. Iooss – SAMO 2007 - 22/06/07
Simple example : Ishigami function with Xi ~ U[-, ]
To test our joint models, X3 is considered as an uncontrollable input.
Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data.
Joint GLM (Q2 = 61 %) :
Simple GAM (Q2 = 75 %) :
Joint GAM Q2 (mean) =76 %, Explained deviance : 93% (mean), 37% (dispersion)
)sin(1.0)sin(7)sin( 143
221 XXXXY
7.5 and 29.029.017.269.292.1 42
31
221 dm YXXXXY
)()(67.276.3 211 XsXsXY
)(59.0exp and )()(06.375.3 1211 XsYXsXsXY dm
Indices Exact Joint GLM Joint GAM Simple GAM
S1 0.314 0.314 0.325 0.333
S2 0.442 0.318 0.414 0.441
ST3 0.244 0.366 0.261 0.25
S13 0.244 0 > 0 unknown
S23 0 0 0 unknown
B. Iooss – SAMO 2007 - 22/06/07
An hydrogeological applicationPollutant (90Sr) transport simulation in porous media
•16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, …
•1 functional input : the permeability
•LH sample (N=300) for the 16 inputs 300 model evaluations (8 days)
•1 output : the concentration at a specified location
50 100 150 200 250 300
50
100
150
200
B. Iooss – SAMO 2007 - 22/06/07
An hydrogeological applicationPollutant (90Sr) transport simulation in porous media
•16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, …
•1 functional input : the permeability
•LH sample (N=300) for the 16 inputs 300 model evaluations (8 days)
•1 output : the concentration at a specified location
•Joint GAM : Devexp(mean) = 98%, Devexp(dispersion) = 29%
Explanatory terms : mean [ s(kd1) , s(kd2) , s(per3) , s(per2,kd2) ]
dispersion [ kd1 , kd2 ]
50 100 150 200 250 300
50
100
150
200
B. Iooss – SAMO 2007 - 22/06/07
An hydrogeological applicationPollutant (90Sr) transport simulation in porous media
•16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, …
•1 functional input : the permeability
•LH sample (N=300) for the 16 inputs 300 model evaluations (8 days)
•1 output : the concentration at a specified location
•Joint GAM : Devexp(mean) = 98%, Devexp(dispersion) = 29%
Explanatory terms : mean [ s(kd1) , s(kd2) , s(per3) , s(per2,kd2) ]dispersion [ kd1 , kd2 ]
S(kd2)=52%, S(per2)=8%, S(kd2,per2)=6%, S(kd1)=4%
ST()=28%, S(kd1,) > 0 and S(kd2,) > 0
50 100 150 200 250 300
50
100
150
200
B. Iooss – SAMO 2007 - 22/06/07
Conclusions•This approach, based on joint models to compute Sobol
sensitivity indices, is useful in the following situations :
– model with « complex » functional inputs, – time consuming model (so a metamodel is needed),– heteroscedasticity (functional input interacts with scalar
inputs),
B. Iooss – SAMO 2007 - 22/06/07
Conclusions•This approach, based on joint models to compute Sobol
sensitivity indices, is useful in the following situations :
– model with « complex » functional inputs, – time consuming model (so a metamodel is needed),– heteroscedasticity (functional input interacts with scalar
inputs),
•Another great interest : uncertainty propagation.
B. Iooss – SAMO 2007 - 22/06/07
Conclusions•This approach, based on joint models to compute Sobol
sensitivity indices, is useful in the following situations :
– model with « complex » functional inputs, – time consuming model (so a metamodel is needed),– heteroscedasticity (functional input interacts with scalar
inputs),
•Another great interest : uncertainty propagation.
•Actual limitations :
– It cannot distinguish the effects of different functional inputs.
– we obtain qualitative sensitivity indices of the interactions between functional input and other inputs.
B. Iooss – SAMO 2007 - 22/06/07
Useful SOFTWARE
R Packages :
“JointModeling”
“sensitivity” of G. Pujol