A Local-Global Pattern Matching Method for Subsurface...
Transcript of A Local-Global Pattern Matching Method for Subsurface...
A Local-Global Pattern Matching Method for SubsurfaceStochastic Inverse Modeling
Liangping Lia,∗, Sanjay Srinivasana, Haiyan Zhoua, J. Jaime Gomez-Hernandezb
aCenter for Petroleum and Geosystems Engineering Research, University of Texas at Austin,78712, Austin, USAbResearch Institute of Water and Environmental Engineering, Universitat Politecnica de Valencia, 46022, Valencia,
Spain
Abstract
Inverse modeling is an essential step for reliable modeling of subsurface flow and transport, which is impor-
tant for groundwater resource management and aquifer remediation. Multiple-point statistics (MPS) based
reservoir modeling algorithms, beyond traditional two-point statistics-based methods, offer an alternative to
simulate complex geological features and patterns, conditioning to observed conductivity data. Parameter
estimation, within the framework of MPS, for the characterization of conductivity fields using measured dy-
namic data such as piezometric head data, remains one of the most challenging tasks in geologic modeling.
We propose a new local-global pattern matching method to integrate dynamic data into geological models.
The local pattern is composed of conductivity and head values that are sampled from joint training images
comprising of geological models and the corresponding simulated piezometric heads. Subsequently, a global
constraint is enforced on the simulated geologic models in order to match the measured head data. The
method is sequential in time, and as new piezometric head become available, the training images are up-
dated for the purpose of reducing the computational cost of pattern matching. As a result, the final suite of
models preserve the geologic features as well as match the dynamic data. This local-global pattern matching
method is demonstrated for simulating a two-dimensional, bimodally-distributed heterogeneous conductivity
field. The results indicate that the characterization of conductivity as well as flow and transport predictions
are improved when the piezometric head data are integrated into the geological modeling.
Keywords: multiple-point geostatistics, conditional simulation, inverse modeling, global matching,
uncertainty assessment
∗Corresponding authorEmail addresses: [email protected] (Liangping Li), [email protected] (Sanjay Srinivasan),
[email protected] (Haiyan Zhou), [email protected] (J. Jaime Gomez-Hernandez)
Preprint submitted to Environmental Modeling & Software April 8, 2015
1. Introduction1
Inverse modeling is a mathematical approach to identify parameters such as permeability or hydraulic2
conductivity at unsampled locations such that flow and transport modeling using the estimated parameters3
match observed state variables such as piezometric head or concentration data. Predictions for groundwater4
flow and solute transport made using the estimated parameters would then be more accurate. The fact that5
the number of observed state variables is much smaller than the number of unknown parameters implies that6
the solution of inverse problem will be non-unique (Carrera and Neuman, 1986) especially when heterogeneous7
subsurface systems are considered. In order to represent this non-uniqueness, stochastic inverse modeling8
seeks to generate multiple likely representations of parameter fields that are all conditioned to both direct9
measurements of the parameters at specific locations and dynamic data (Gomez-Hernandez et al., 1997).10
The multiple calibrated models obtained by applying stochastic inversion methods could be used to assess11
the uncertainty in predictions based on the available data. Reliable models for uncertainty are required by12
decision-makers For a review of the evolution and recent trends of inverse methods in hydrogeology, the13
reader is referred to Zhou et al. (2014).14
In cross-bedded aquifers or fluvial geologies, aquifer properties such as hydraulic conductivity exhibit15
connectivity along curvilinear paths. This complex connectivity significantly affects the flow and transport16
of fluids and chemical species (Gomez-Hernandez and Wen, 1998; Renard and Allard, 2011). Reproduction of17
the curvilinear geometry can be achieved using Multiple-Point Statistics (MPS) based stochastic simulation18
methods (Strebelle, 2002). MPS simulation was developed to overcome the limitation of traditional two-19
point variogram-based methods, which cannot capture strong connectivities in the subsurface aquifer. The20
higher moments (i.e., multiple-point statistics) are introduced into the simulation by borrowing patterns21
from a training image (Guardiano and Srivastava, 1993). Although MPS provides an avenue to simulate22
complex formations, stochastic inverse modeling within the framework of MPS simulations is extremely23
challenging because of the difficulty in maintaining the complex curvilinear connectivity geological structures24
while simultaneously honoring dynamic data that are related to conductivity through a strongly non-linear25
transfer function.26
In the literature, stochastic inverse methods can be classified into two groups. In the first group, an27
objective function is first constructed based on the discrepancy between observed data and simulated values.28
This objective function is subsequently minimized by iteratively perturbing the parameter values until a suf-29
ficiently close match is attained. Preservation of the prior geological structures is not explicitly considered30
during this process of optimization. Examples of this data-driven stochastic inverse method are sequential31
2
self-calibration (Gomez-Hernandez et al., 1997; Hendricks Franssen et al., 2003), the pilot-point method32
(de Marsily, 1978) and the ensemble Kalman filter (EnKF) (Evensen, 2003). It has been proven that these33
methods yield optimal estimates for multiGaussian conductivity fields. Some variants were proposed to34
handle non-multiGaussian conductivity fields. For example, Capilla et al. (1999) proposed the application35
of self-calibration method to local conditional probabilities defining the uncertainty in conductivity, instead36
of calibrating the conductivities directly. Later, Capilla and Llopis-Albert (2009) coupled the gradual de-37
formation method and the optimization of the probability fields in order to improve the efficiency of the38
previous proposal. In a similar way, Hu et al. (2013) proposed to consider the uniform random number used39
to draw the MPS realizations as part of the state variable set in EnKF. Sun et al. (2009) coupled Gaussian40
mixture models and EnKF to handle non-Gaussian conductivity fields. Jafarpour and Khodabakhshi (2011)41
proposed to first update the ensemble of MPS-generated conductivities to derive local probabilities, and42
then, to re-simulate the conductivities using the probability maps as soft data. Zhou et al. (2011) developed43
a normal-score EnKF to handle non-Gaussianity within the ensemble Kalman filtering framework.44
In the second group of inverse modeling approaches, data integration is achieved using Bayes’ theorem.45
The posterior models are sampled from the prior models by assessing first a likelihood function. A typical46
example of this model-driven stochastic inverse method is rejection sampling (Tarantola, 2005). The likeli-47
hood of a model sampled from a prior set is assessed, and the model is rejected depending on a likelihood48
threshold. The prior geological structures will be preserved in this process, because the posterior set of49
models is simply a subset of the prior set. However, like the particle filtering approach, this method is com-50
putationally expensive and is inapplicable in most practical cases because tens of thousands of models need51
to be evaluated. To improve the computational efficiency, Mariethoz et al. (2010a) proposed an iterative52
spatial resampling method in which the candidate models are generated by conditioning to data sampled53
from previous accepted models, thus resulting in less computational cost because of faster convergence to54
a posterior set that exhibits the desired dynamic characteristics. Another popular Bayesian approach to55
inverse modeling is the Markov chain Monte Carlo method (McMC) (Metropolis et al., 1953; Oliver et al.,56
1997) in which the parameter model is first locally perturbed for a gridblock or for a set of gridblocks (i.e.,57
the transition kernel) and then the forecast model is run to judge whether the new candidate model will be58
accepted (e.g., the Metropolis-Hastings rule). The problems with these McMC methods are: (1) the accep-59
tance rate of new models is dependent on the transition kernel used; (2) a long chain is usually required60
before the posterior distribution can be correctly sampled, and (3) a large number of perturbed models have61
to be generated and evaluated. An extensive description of the mathematical framework for the McMC62
3
method and recent advances can be found in the review paper by Liu et al. (2010).63
The Ensemble PATtern matching (EnPAT) stochastic inverse method was first proposed by Zhou et al.64
(2012) with the aim to create multiple conductivity fields honoring both measured conductivity and piezo-65
metric head data as well as the prior geological structures. The EnPAT is inspired by the Direct Sampling66
(DS) MPS method developed by Mariethoz et al. (2010b). In DS, the conductivity patterns are directly67
sampled from a training image without storing the entire pattern database in memory. This results in fast68
simulation and the possibility to simulate continuous variables such as hydraulic conductivity. Zhou et al.69
(2012) borrows the concept of DS and expands the conductivity pattern to include the pattern of piezometric70
heads for the purpose of inverse modeling. Correspondingly, multiple MPS-simulated conductivity models71
and the corresponding head models obtained by running the forward simulator are jointly used as the training72
images for learning during the simulation. Conductivities are simulated by matching joint patterns from the73
training image sets. As a result, the simulated conductivity models are not only conditioned to the measured74
conductivity and piezometric data, but also preserve the prior geological structures. Li et al. (2013a) devel-75
oped a hybrid of the EnPAT and the pilot point/self-calibration method (Gomez-Hernandez et al., 1997) to76
reduce the computational cost and to improve the characterization of conductivity connectivity during the77
dynamic data assimilation process.78
In this paper, we propose a local-global pattern matching method to integrate dynamic data into geologic79
models. In the previous implementation of the EnPAT, a local pattern is considered for ensemble matching,80
but that does not guarantee that the updated model matches the observed global dynamic data because81
of the non-linearity of the forecast function as well as the existence of complex boundary conditions. To82
address this issue, we implement an additional step in which we simulate the global response of the updated83
models and select those that best fit the observed data after the process of local pattern matching. As84
a consequence, updated models will preserve the geological structures and the dynamic data, although85
at a computational cost because of the additional forward simulations in the rejected models. In order to86
mitigate the computational demand and to accelerate the learning process, the training image sets are refined87
by progressively replacing the worst models in the prior training set with the newly accepted models. The88
method therefore borrows the concept of iterative resampling proposed by Mariethoz et al. (2010a). A ranking89
scheme is implemented to identify the poor initial models. The proposed methodology is demonstrated on a90
synthetic example for which predictions of flow and transport are considered.91
The remainder of the paper will be organized as follows. In section 2, the implementation of the ensem-92
ble pattern matching method is described, with emphasis on the significance of global constraints on the93
4
predictions of flow and transport. In section 3, a synthetic example is used to demonstrate the effectiveness94
of the proposed method. Then, in section 4, we discussed the computational efficiency of the EnPAT by95
continuously refining the training images. In section 5, there is a general discussion. The paper ends with a96
summary and conclusions.97
2. Methodology98
In the EnPAT method two steps are performed at each time step: the forecast step (i.e., solving the flow99
equation based on the current hydraulic conductivities to derive the piezometric head) and updating step100
(i.e., updating both conductivity and head through a pattern matching approach).101
During the updating step, patterns are constructed for the updating of each gridblock in each realization102
by searching within a predefined search neighborhood for static parameter values such as conductivities and103
dynamic variable values such as heads. Suppose that for the updating of the conductivity and the head104
at gridblock i and realization j for time t, we have found conductivities (K = k1, k2, · · · , kn) and heads105
(H = h1, h2, · · · , hm). Denote this as the conditioning pattern (Pt,j,i) (see Fig 1),106
Pt,j,i =
[K
H
]t,j,i
(1)
within the context of sequential simulation (see, for instance, Gomez-Hernandez and Journel (1993)) the107
conductivity and head components in the pattern can be the observed data and/or the previously estimated108
values. The number of conductivity (n) and head data (m) in the pattern must be less than a maximum109
conditioning data specified by the user and fall within a predefined maximum search radius around the110
simulation node. The conditioning pattern is dependent on the location of the gridlock, the particular stage111
within the simulation, and the time step. EnPAT extends traditional MPS method in two important ways,112
first, the patterns contain not only the parameters such as conductivities, but also state variables, and113
second, an ensemble of joint training images is used. When head data are included in the pattern, the MPS114
method becomes multi-variable co-simulation. In other words, the simulated conductivity is constrained by115
the surrounding conductivities and heads, and thus the multipoint cross-correlation between both variables116
can be preserved in the simulation.117
Pattern matching is initialized by generating an ensemble of prior conductivity fields and the correspond-118
ing ensemble of simulated heads. In this paper, the initial ensemble of conductivity fields is generated using119
the direct sampling MPS method, using a common training image for a fluvial aquifer. In the forecast step,120
5
the flow simulator is run for each conductivity realization until the time step for which new measured head121
data (htobs) are available. The ensemble of head realizations obtained by running the forward model, plus the122
ensemble of conductivities will be used as the joint training images to update both conductivity and head,123
given any observed head data. The pattern matching scheme has the following steps:124
• Build the conditioning pattern P at the first node to be simulated, conditioned to the measured n125
conductivity data and m piezometric head data.126
• Search for candidate patterns in the joint training images. Calculate the distance between the candidate127
pattern P found in the joint training images and the conditioning pattern around the simulation node128
P (see Fig 1). In this research, distance is measured by computing a weighted Euclidean distance129
function:130
d(P,P) =
[1∑p
i=1 h−1i
p∑i=1
h−1i
(P − P)2
d2max
]1/2(2)
where p is the number of data in the pattern; h is the Euclidean distance between the gridblock to be131
simulated and the conditioning data; and dmax is the maximum absolute difference of conductivities132
or heads observed in the pattern. The standardized distance between the candidate and conditional133
patterns lies within the range of 0 to 1. The searching process for candidate patterns is limited to a134
small area around the gridblock to be estimated because the calculated head depends on boundary135
conditions and sources. If the search radius is specified to be large then the influence of global boundary136
conditions becomes more pronounced.137
• If the resulting distances, computed independently for conductivities and for heads are both smaller138
than predefined threshold values (dk(P,P)
< ζk and dh(P,P)
< ζh ), the conductivity value and the139
piezometric head value of the matching pattern at the location of the simulation node is retained; if140
no pattern is found meeting these criteria, the values from the closest pattern are retained (see Fig 1).141
The retained conductivity and head values become conditioning data for the simulation of the next142
gridblocks.143
• Repeat the three previous steps until all nodes in the domain are simulated.144
The simulated conductivity and head values represent a realization of the updated conductivity and head145
field, conditioned to the measured conductivity and piezometric head data. Multiple realizations of updated146
conductivity and head values can be obtained by visiting the unknown gridblocks along different random147
6
paths. Furthermore, as more observation data become available at the next time step, the simulated models148
at the previous time get updated.149
The EnPAT procedure can be combined with the use of pilot points to reduce the computational cost.150
More specifically, the pattern search step is only applied to a set of predefined pilot point locations, and then151
traditional MPS is used to complete the non-simulated gridblocks. This variation of the EnPAT has been152
implemented in the study by Li et al. (2013a).153
The EnPAT algorithm can use any flow simulator as a black box because only the outputs, such as154
piezometric heads, are required. Compared to most of the traditional gradient-based stochastic inverse155
methods that require access to quantities such as Jacobian of the flow model (i.e., the self calibration156
method (Gomez-Hernandez et al., 1997)), the coding of the EnPAT is much simpler (Li et al., 2013b).157
Another advantage of the EnPAT is that, like the EnKF, the updated ensemble of conductivities can be158
used to assess the residual uncertainty associated with predictions. In addition, the updated conductivity159
fields preserve the curvilinear geometry exhibited by the training image. In fluvial deposits, for instance, it160
is very significant to preserve the connectivity of the geology in order to make accurate predictions of flow161
and transport (Gomez-Hernandez and Wen, 1998).162
However, there is a potential problem in the updating of conductivity and head values using the pattern163
matching approach. The updated conductivity might be inconsistent with the updated piezometric head164
because the transfer function relating the two is not explicitly accounted for. In other words, the simul-165
taneously simulated conductivity and head might not honor Darcy’s law (i.e., the mass balance equation).166
To handle this problem, in this paper, a global constraint is enforced on the updated conductivity. The167
updated conductivity is evaluated by running the flow model and only those models that match the observed168
global responses (such as piezometric head or concentration data) at the corresponding time step will be169
retained. By doing so, we ensure that the updated conductivities not only preserve complex connectivity170
but also honor the global dynamic data. The cost of the local-global pattern matching will be higher than171
the original implementation of the EnPAT, however to alleviate that cost, we will propose a learning scheme172
that is described next. This global match step makes the new algorithm similar to the iterative EnKF or173
confirmed EnKF used in petroleum engineering (Wen and Chen, 2006).174
In order to mitigate the computational cost, a learning process is integrated into the pattern matching175
scheme after the global matching step. Specifically, the mismatch between the observed and simulated176
response data will be used to rank the accepted models. The worst models in the training images in terms177
of the mismatch between predicted and observed heads will be replaced with the new accepted models. The178
7
set of training images will thus be refined using the ranking scheme (Bayer et al., 2010), which will result in179
a faster matching during the next local pattern searching process.180
Fig 2 is the flowchart of the improved EnPAT algorithm accounting for global constraints that incorporates181
the process of refining the training image sets. The ensemble of conductivity training images and the182
corresponding simulated heads as well as the observed head data are the inputs to the algorithm. The pattern183
matching is performed at a few randomly selected pilot points first, and then the results are extrapolated184
using a multiple point simulation technique such as the direct sampling technique. The simulation at the185
pilot point locations starts with a search of conditioning data in the vicinity of the simulation node. The186
conditioning pattern comprises both the pattern of conductivity data as well as the pattern of head data.187
The conditioning data pattern is determined by the search radius and the maximum number of conditioning188
nodes specified by the user. The training image ensemble is searched in order to find the matching pattern.189
The distance between the conditioning data pattern and the pattern in the training image is calculated, and,190
if the distance is lower than a tolerance value, the outcome at the node corresponding to the simulation node191
is retained as the simulated value. After all the pilot point locations are simulated, the remaining nodes are192
simulated using the MPS technique. Once the simulated realization is complete, flow simulation is performed193
and the global match to the observed head data is assessed. If the match is within a tolerance value, the194
updated realization is assimilated into the training image set. The training image with the worst match195
to the observed data is dropped from the training image set. It is evident that the computational cost is196
mainly dependent on the predefined tolerance value used to judge if the response of the updated conductivity197
model matches the history. The tolerance value can be linked to the likelihood function describing the head198
measurement error (Mariethoz et al., 2010a). In the subsequent example, the computational efficiency of the199
training image refining scheme will be evaluated.200
The EnPAT algorithm is coupled with the groundwater flow modeling program MODFLOW (Harbaugh201
et al., 2000) and the direct sampling MPS method (Mariethoz et al., 2010b), and programmed in C++.202
3. Synthetic Example203
3.1. Example Setup204
Given the training image shown in Fig 3A, the reference logconductivity field is generated using the direct205
sampling MPS method (Mariethoz et al., 2010b) (see Fig 3B). The model is discretized into 50 × 50 × 1206
gridblocks with cell size 1m× 1m× 1m. The logconductivity values follow a bimodal histogram with mean207
and standard deviation of 0.12m/d and 2.51m/d, respectively. The reference conductivity field exhibits208
8
curvilinear features with high conductivity sand channels and low conductivity mudstone zones. We assume209
that an injection well (Q = 25m3/d) is located at the center of the aquifer and there are 8 observation210
wells (Fig 3B). This reference logconductivity model will be regarded as the true model, and the aim of the211
stochastic inverse simulation is to generate a suite of models that are as close to the true model as possible,212
conditioned to the observed piezometric head data.213
The aquifer is assumed confined with constant head boundaries on the eastern and western sides and no214
flow boundaries at the remaining faces (Fig 3B). The specific storage is set constant and equal to 0.01. The215
total simulation time is 30 days discretized into 10 time steps that follow a geometric series with ratio of216
1.2. The observed data of the first five time steps will be used as the conditioning data. We assume that217
there are 9 measured conductivity data at the locations shown in Fig 3B. Five hundred initial conductivity218
models are generated using the direct sampling MPS method with the same training image and simulation219
parameters (for example, the distance threshold and the number of conditioning data in the pattern) as the220
reference, conditioned to the measured conductivities. The initial head is assumed to be zero in the whole221
aquifer. We only consider the uncertainty in conductivity. The boundary conditions and other parameters222
are assumed known with certainty.223
The parameters used in the EnPAT are listed as follows. The search radius is set as 25 m for both224
conductivity and head. The maximum number of elements in the pattern for conductivity and head are225
specified to be 10 for both. The weighted Euclidean distance is used to calculate the distance between226
the conditional and candidate patterns. Distance tolerances for the head and conductivity are both set227
to zero. The number of pilot points is specified to be 300. The threshold value used to judge the global228
convergence of updated models is set at 0.1. Apart from the threshold parameter, the sensitivity of results229
to other parameters are extensively investigated either in the context of the direct sampling MPS method230
(Meerschman et al., 2013) or in the pattern matching scheme (Li et al., 2013b) because both these methods231
have some parameters in common.232
3.2. Results233
In order to assess the uncertainty of the updated models before and after integrating the observed234
piezometric head data, the multidimensional scaling method is used to visualize the geological models in235
metric space, although a range of other approaches is proposed in the paper by Bennett et al. (2013).236
Multidimensional scaling (Borg and Groenen, 2005) is a data analysis method used to segment the model237
space on the basis of dissimilarity between models. For example, the difference in piezometric head between238
any pair of models can be used to construct the distance matrix of head data. Applying the multidimensional239
9
scaling method, the distance matrix is projected to an equivalent metric space. In this transformed space,240
closer points imply similar response behaviors, which might imply similar geological structures.241
Figure 4 shows the geological models in metric space for the different cases. For the case of the prior242
models (i.e., before conditioning to piezometric head data) (see Fig 4A), the geological models exhibit large243
variations in terms of simulated head. Specifically, two selected models far from the true model in metric244
space have distinctively different spatial pattern of hydraulic conductivity, compared with the reference.245
When the piezometric head data are integrated into the geological models using EnPAT (i.e., the local246
pattern matching approach) (see Fig 4B), the geological models converge to the reference in the metric247
space. It is evident that the posterior uncertainty of the geological models is smaller. However, a few models248
still have a minor deviation from the reference model in terms of the simulated head. It can be interpreted249
that the local pattern matching does not guarantee a global history match. This might be because of the250
limited size of the training ensemble that is insufficient to estimate the multipoint correlations between251
parameter and state accurately. The arbitrarily specified distance tolerance values might also contribute to252
the incorrect global match. For the case of the updated models using EnPAT with the global constraint, all253
the geological models are close to the reference (see Fig 4C). If we look at the two individual models, they254
show very similar spatial geologic patterns to the reference.255
Figure 5 displays the simulated head by rerunning the forward simulator from time zero for wells #3 and256
#6 for the different cases. When the observed head data are not integrated, the simulated head values for257
different models exhibit a large spread and the ensemble average of head values departs from the reference.258
When the head measurements are integrated into the geological models using EnPAT, the uncertainty (i.e.,259
spread) of simulated heads is reduced and the ensemble average is also close to the reference. As is evident260
from the updated geological models shown in Fig 4, the simulated head values for some models still deviate261
significantly from the reference. In order to remove such models, the global constraint is enforced and the262
resulting set of models yield simulated head values in a tighter range.263
3.3. Flow and transport predictions264
In channelized aquifers, the connectivity of high conductivities impacts flow and transport of solutes.265
Here, we use the updated conductivity models obtained previously to predict the flow and transport behaviors266
by subjecting then to modified boundary conditions. Specifically, the western boundary condition is changed267
from a constant head h = 0 to h = 5 m and the injection well is removed. Flow in the aquifer is simulated at268
steady-state. The longitudinal and transverse dispersion coefficients are set as 0.5 m and 0.1 m, respectively.269
A conservative tracer is injected at the left side of the aquifer (see Fig 6), and a control plane located270
10
at x = 45 m is used to record the travel time of particles. The random walk particle tracking algorithm271
(Salamon et al., 2006) is utilized to solve the transport equation.272
Figure 7 displays the cumulative breakthrough curves (BTCs) for different cases. In models that are273
not constrained to the observed piezometric head data (Fig 7A), the BTCs show a large spread and the274
reference curve is close to the 5th percentile of ensemble BTCs. This indicates that the initial models do not275
exhibit adequate connectivity of high conductivity regions in the vicinity of the injection face resulting in276
later arrival times to the control plane. When the models are conditioned to head data (Fig 7B), the spread277
of BTCs is reduced and the ensemble average of BTCs is much closer to the reference. We also observe that278
the breakthrough profile of the reference is more than the 5th percentile of BTCs, which implies that the279
ensemble connectivity of the updated geological models is no longer underestimated in the vicinity of the280
injection face. When a global constraint on the updated geological models is enforced (Fig 7C) the BTCs281
have a smaller spread and the ensemble average is close to the reference.282
4. Computational Efficiency283
One of improvements of the EnPAT algorithm presented in this work is the introduction of the learning284
process by refining the set of training images by replacing the worst models in the prior set with the newly285
accepted models. In this way, the training images will be close to the “true” model when more and more286
accepted models are reached. Additionally, the local pattern search process becomes faster for finding the287
matched candidate pattern because the uncertainty of training images is correspondingly reduced.288
Fig 8 shows the evolution of maximum root mean square error (RMSE) of the training image models in289
terms of the mismatch between the simulated head and the observation data, at the first time step. As we290
see, after 500 models are generated, the maximum RMSE is close to the predefined tolerance value for the291
global constraint. In other words, the training image models will reflect the observed data at this stage.292
Fig 9 displays the comparison of the computational efficiency of the original EnPAT with global constraint293
but without the training image replacement and the improved algorithm. It clearly shows that after the294
training image is refined using newly accepted models, the number of evaluations for each new generated295
model is reduced significantly, which results in lower computational cost.296
5. Discussion297
In this paper, we propose a local-global pattern matching method to characterize heterogeneous hydraulic298
conductivity field using observed piezometric head data. In previous studies (e.g., Zhou et al., 2012), only299
11
a local pattern matching approach is considered and the updated geological models might not be consistent300
with the observed piezometric head data because the relationship between the parameter and state variables301
may be inaccurately represented by the pattern searching procedure. This typically occurs when the ensemble302
size is not large enough to explore the non-linear relationship between the parameter and state variables.303
Another source of error might be the distance tolerance values that may be specified to be too large in304
order to find the matched pattern. To address this issue, a global constraint is carried out to select the305
updated models which fit the observed data by running the forward simulator. Specifically, the updated306
conductivity model will be accepted if the root mean square of absolute error between the simulated head307
and observed data is smaller than the predefined tolerance value. A key issue associated with this approach308
is the increased computational cost incurred to ensure that the global constraint is satisfied. It is evident309
that the computational expense is dependent on the magnitude of the defined tolerance value. In order to310
reduce the computational cost, a second level learning process is integrated into the EnPAT. Specifically, the311
training image models are refined by replacing the worst models in the training set with the newly accepted312
model. By doing so, the matching process is much faster and the number of evaluation of forward model is313
reduced.314
There are some similarities between EnPAT with global constraint and rejection sampling (Tarantola,315
2005). In rejection sampling, the likelihood of the data given a particular model is computed and the316
model is accepted based on that likelihood exceeding a threshold. In the current implementation of EnPAT317
with the global constraint, a particular updated model is accepted based on the mismatch between the318
observation and the simulation being below a threshold. If the threshold is large, more models are accepted.319
The correspondence between the posterior set of models obtained using the scheme outlined in this paper320
and a classical implementation of rejection sampling using the likelihood function will be assessed in a321
later publication. The new candidate model is generated by MPS method using the training set that is322
composed of both the conductivity models and the corresponding simulated piezometric heads. In this way,323
the acceptance ratio could be much higher and the sampling scheme could be more effective, which is similar324
with the iterative spatial resampling proposed by Mariethoz et al. (2010a) for which the new candidate model325
is regenerated by conditioning on a set of hard data sampled from the previous accepted model.326
The local-global pattern matching approach could be extended to integrate other sources of data such as327
flow-rate and concentration data within the same framework. Including additional variables into the joint328
pattern could make the pattern matching process more challenging, because the patterns found from the329
training set may not represent the relationship between the parameter and state variables accurately. An330
12
alternative is that, the joint pattern composed of conductivity and head is locally matched through the331
pattern searching scheme, and then the flow-rate or concentration data could be matched in a global match332
step. This multi-level patten matching could be more effective than a one-step implementation and will be333
extensively investigated in the future.334
In the current implementation, we assume that the training image is known and there is no uncertainty335
about the training image. In practice, the training image may have some degree of uncertainty. It is336
straightforward to integrate the uncertainty of training image into the EnPAT. Specifically, the uncertainty337
of training images could be handled by assembling multiple realizations generated by different training images338
in the ensemble used for the local pattern match.339
The performance of the EnPAT is dependent on the information available. If the conditioning data could340
not reflect the potential geological structures, the EnPAT could not identify them, accordingly.341
6. Conclusions342
In complex geological systems such as fluvial aquifers, carbonate systems and naturally fractured aquifers,343
multiple-point statistics-based modeling methods are required to characterize complex and curvilinear fea-344
tures. Parameter identification with MPS requires an effective inverse method that yields models that not345
only honor the observed dynamic data, but also preserve curvilinear geological features that impact hydro-346
carbon recovery and aquifer remediation.347
In this paper, a hybrid of EnPAT and global matching method is developed for developing models348
that honor multiple point statistics defining reservoir connectivity as well as the observed dynamic data.349
Specifically, the updated models through the local pattern matching approach are forward simulated to350
verify if they match the observed dynamic data. In other words, global pattern matching is conducted after351
the local pattern matching (i.e., the EnPAT) so that the resultant models will be conditioned to dynamic data352
and the curvilinear geometry will be preserved as well. In addition, to accelerate the local and global match,353
the training image models are refined by integrating the new matched models. We tested the local-global354
pattern matching approach to characterize a bimodally distributed heterogeneous conductivity field. The355
results indicate that the characterization of conductivity and flow and transport predictions are improved356
after the integration of the global constraint into the EnPAT algorithm. Also, the computational cost is357
reduced when a ranking scheme is introduced into the algorithm.358
Acknowledgements. The authors gratefully acknowledge the financial support by DOE through projects359
13
DE-FE0004962 and DE-SC0001114. The last author acknowledges the support of the Spanish Ministry of360
Economy and Competitiveness through project CGL2011-23295. We greatly thank the three anonymous361
reviewers for their comments, which substantially improved the manuscript.362
References363
Bayer, P., de Paly, M., Burger, C. M., 2010. Optimization of high-reliability-based hydrological design364
problems by robust automatic sampling of critical model realizations. Water Resources Research 46 (5).365
Bennett, N. D., Croke, B. F., Guariso, G., Guillaume, J. H., Hamilton, S. H., Jakeman, A. J., Marsili-Libelli,366
S., Newham, L. T., Norton, J. P., Perrin, C., et al., 2013. Characterising performance of environmental367
models. Environmental Modelling & Software 40, 1–20.368
Borg, I., Groenen, P. J., 2005. Modern multidimensional scaling: Theory and applications. Springer.369
Capilla, J., Llopis-Albert, C., 2009. Gradual conditioning of non-Gaussian transmissivity fields to flow and370
mass transport data: 1. Theory. Journal of Hydrology 371 (1-4), 66–74.371
Capilla, J. E., Rodrigo, J., Gomez-Hernandez, J. J., 1999. Simulation of non-gaussian transmissivity fields372
honoring piezometric data and integrating soft and secondary information. Math. Geology 31 (7), 907–927.373
Carrera, J., Neuman, S., 1986. Estimation of aquifer parameters under transient and steady state conditions:374
1. Maximum likelihood method incorporating prior information. Water Resources Research 22 (2), 199–375
210.376
de Marsily, G., 1978. De l’identification des systemes hydrogeologiques. Ph.D. thesis.377
Evensen, G., 2003. The ensemble Kalman filter: Theoretical formulation and practical implementation.378
Ocean dynamics 53 (4), 343–367.379
Gomez-Hernandez, J., Wen, X., 1998. To be or not to be multi-Gaussian? a reflection on stochastic hydro-380
geology. Advances in Water Resources 21 (1), 47–61.381
Gomez-Hernandez, J. J., Journel, A. G., 1993. Joint simulation of multiGaussian random variables. In:382
Soares, A. (Ed.), Geostatistics Troia ’92, volume 1. Kluwer, pp. 85–94.383
Gomez-Hernandez, J. J., Sahuquillo, A., Capilla, J. E., 1997. Stochastic simulation of transmissivity fields384
conditional to both transmissivity and piezometric data, 1, Theory. Journal of Hydrology 203 (1–4), 162–385
174.386
14
Guardiano, F., Srivastava, R., 1993. Multivariate geostatistics: beyond bivariate moments. In: Soares, A.387
(Ed.), Geostatistics-Troia. Kluwer Academic Publ, Dordrecht, pp. 133–144.388
Harbaugh, A. W., Banta, E. R., Hill, M. C., McDonald, M. G., 2000. MODFLOW-2000, the U.S. Geological389
Survey modular ground-water model. U.S. Geological Survey, Branch of Information Services, Reston, VA,390
Denver, CO.391
Hendricks Franssen, H., Gomez-Hernandez, J., Sahuquillo, A., 2003. Coupled inverse modelling of groundwa-392
ter flow and mass transport and the worth of concentration data. Journal of Hydrology 281 (4), 281–295.393
Hu, L. Y., Zhao, Y., Liu, Y., Scheepens, C., Bouchard, A., 2013. Updating multipoint simulatings using the394
ensemble Kalman filter. Computers & Geosciences 51, 7–15.395
Jafarpour, B., Khodabakhshi, M., 2011. A probability conditioning method (PCM) for nonlinear flow data396
integration into multipoint statistical facies simulation. Mathematical Geosciences 43 (2), 133–164.397
Li, L., Srinivasan, S., Zhou, H., Gomez-Hernandez, J. J., 2013a. A pilot point guided pattern matching398
approach to integrate dynamic data into geological modeling. Advances in Water Resources 62, 125–138.399
Li, L., Srinivasan, S., Zhou, H., Gomez-Hernandez, J. J., 2013b. Simultaneous estimation of both geologic400
and reservoir state variables within an ensemble-based multiple-point statistic framework. Mathematical401
Geosciences 46, 597–623.402
Liu, X., Cardiff, M. A., Kitanidis, P. K., 2010. Parameter estimation in nonlinear environmental problems.403
Stochastic Environmental Research and Risk Assessment 24 (7), 1003–1022.404
Mariethoz, G., Renard, P., Caers, J., 2010a. Bayesian inverse problem and optimization with iterative spatial405
resampling. Water Resources Research 46 (11).406
Mariethoz, G., Renard, P., Straubhaar, J., 2010b. The direct sampling method to perform multiple-point407
geostatistical simulaitons. Water Resources Research 46 (11).408
Meerschman, E., Pirot, G., Mariethoz, G., Straubhaar, J., Meirvenne, M., Renard, P., 2013. A practical409
guide to performing multiple-point statistical simulations with the direct sampling algorithm. Computers410
& Geosciences 52, 307–324.411
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., Teller, E., 1953. Equation of state412
calculations by fast computing machines. The journal of chemical physics 21 (6), 1087–1092.413
15
Oliver, D., Cunha, L., Reynolds, A., 1997. Markov chain Monte Carlo methods for conditioning a perme-414
ability field to pressure data. Mathematical Geology 29 (1), 61–91.415
Renard, P., Allard, D., 2011. Connectivity metrics for subsurface flow and transport. Advances in Water416
Resources 51, 168–196.417
Salamon, P., Fernandez-Garcia, D., Gomez-Hernandez, J. J., 2006. A review and numerical assessment of418
the random walk particle tracking method. Journal of Contaminant Hydrology 87 (3-4), 277–305.419
Strebelle, S., 2002. Conditional simulation of complex geological structures using multiple-point statistics.420
Mathematical Geology 34 (1), 1–21.421
Sun, A. Y., Morris, A. P., Mohanty, S., 2009. Sequential updating of multimodal hydrogeologic parameter422
fields using localization and clustering techniques. Water Resources Research 45 (7).423
Tarantola, A., 2005. Inverse problem theory and methods for model parameter estimation. siam.424
Wen, X., Chen, W., 2006. Real-time reservoir model updating using ensemble kalman filter with confirming425
option. SPE Journal 11 (4), 431–442.426
Zhou, H., Gomez-Hernandez, J., Hendricks Franssen, H., Li, L., 2011. An approach to handling non-427
gaussianity of parameters and state variables in ensemble kalman filtering. Advances in Water Resources428
34 (7), 844–864.429
Zhou, H., Gomez-Hernandez, J., Li, L., 2012. A pattern-search-based inverse method. Water Resources430
Research 48 (3).431
Zhou, H., Gomez-Hernandez, J., Li, L., 2014. Inverse methods in hydrogeology: evolution and recent trends.432
Advances in Water Resources 63, 22–37.433
16
K1(2.5)
K3(12.1)
K2(1.3)
H1(-0.4)
H2(-1.3)
H3(0.1)
K/H?
K1(4.3)
K3(11.6)
K2(1.8)
H1(-1.6)
H2(-1.1)
H3(-0.4)
Kc2/Hc2
K1(11.5)
K3(1.1)
K2(3.8)
H1(-0.9)
H2(-0.7)
H3(-0.2)
Kc1/Hc1
K1(2.7)
K3(11.8)
K2(1.2)
H1(-0.3)
H2(-1.5)
H3(-0.2)
Kc3/Hc3
Conditional Pattern
Candidate Patterns:
dk = 0.63
dh = 0.42
dk = 0.13
dh = 0.23
dk = 0.02
dh = 0.01
K=Kc3; H=Hc3
Real.1 Real.2 Real.3
Figure 1: Scheme of pattern matching. The gridblock conductivity and head are sampled as the estimated values if its patternhas distance values smaller than thresholds or minimum distance values.
17
t ++
r ++
Yes
Conductivity ensemble Simulation head ensemble Observation head
Find a conditioning data pattern around estimated node
Accept the value
Flow Model
Calculate distance between candidate pattern and conditioning pattern
dk < threshold_k
Yes
No
n < Np
pdate of the conductivity ensembleU
dh < threshold_h
Start of simulation on a realization
No
Start of simulation on a pilot point gridblock
r < Nr
n ++
Run flow simulation
Yes
No
Calculate RMSE
Extrapolate the pilot point values using MPS method
RMSE<ToleranceNo
Refine Training Images
Figure 2: Flowchart of the EnPAT. dk and dh indicate the threshold values of distance for the conductivity and head, respec-tively; n means the number of gridlock simulated in a realization; Np is the number of pilot points; r denotes the number ofrealization in the ensemble; Nr is the total number of realizations; t is the number of time step for the simulation.
18
(A) Training image (B) Reference lnK field
1
2
3
4
6
7
8
9
5
Q=25m/d3
LnK (m/d)
H=0 H=0
No flow
No flow
Figure 3: (A) Training image (B) Reference conductivity, boundary conditions of flow model and observation wells.
19
Dimension 1
Dimension
2
(B) After Conditioning by EnPAT
Dimension 1
Dimension
2
(C) After Conditioning by EnPAT
with Global Constraint
Dimension 1
Dimension
2
(A) Before Conditioning
Figure 4: Visualization of geological models in terms of the simulated heads of the last time step at the well locations using themultidimensional scaling method. The open circle denotes geological model, and the triangle indicates the true model.
20
Time [day]P
iezo
me
tric
he
ad
[m]
0 5 10 15 20 25 301
0.5
0
0.5
1
1.5
2
Realization
AverageReference
Well #6
Before Conditioning
Time [day]
Pie
zo
me
tric
he
ad
[m]
0 5 10 15 20 25 301
0.5
0
0.5
1
1.5
2
Realization
AverageReference
Well #3
After Conditioning by EnPAT
Time [day]
Pie
zo
me
tric
he
ad
[m]
0 5 10 15 20 25 301
0.5
0
0.5
1
1.5
2
Realization
AverageReference
Well #6
After Conditioning by EnPAT
Time [day]
Pie
zo
me
tric
he
ad
[m]
0 5 10 15 20 25 301
0.5
0
0.5
1
1.5
2
Realization
AverageReference
Well #3
After Conditioning by EnPAT with Global Constraint
Time [day]
Pie
zo
me
tric
he
ad
[m]
0 5 10 15 20 25 301
0.5
0
0.5
1
1.5
2
Realization
AverageReference
Well #6
After Conditioning by EnPAT with Global Constraint
Figure 5: The simulated head at two wells before and after the data conditions using the EnPAT with and without globalconstraint.
21
No flow
No flow
H=0H=5
Control Plane
Source
Particle Path
Figure 6: Transport configuration
22
0
0.2
0.4
0.6
0.8
1
1 10 100 1000Time [Day]
No
rma
lize
dC
on
ce
ntr
atio
n
95% Confidence IntervalReference
Average
(A) Before Conditioning
0
0.2
0.4
0.6
0.8
1
1 10 100 1000Time [Day]
No
rma
lize
dC
on
ce
ntr
atio
n
95% Confidence IntervalReference
Average
(B) After Conditioning by EnPAT
0
0.2
0.4
0.6
0.8
1
1 10 100 1000Time [Day]
No
rma
lize
dC
on
ce
ntr
atio
n
95% Confidence IntervalReference
Average
(C) After Conditioning by EnPAT with Global Constraint
Figure 7: The simulated cumulative breakthrough curves before and after the data conditions using the EnPAT with andwithout global constraint
23
0. 01
0. 1
1
10
100
0 100 200 300 400 500
#th of simulated models (t=1)
Maxim
um
of
RM
SE
Figure 8: The maximum of RMSE for the training image models (t = 1)
24
0
5
10
15
20
0 100 200 300 400 500
#th of models (t=1)
#of
evalu
ati
on
s
( A)
0
5
10
15
20
0 100 200 300 400 500
#th of models (t=1)
#of
evalu
ati
on
s
(B)
Figure 9: The number of evaluations for each simulated models using EnPAT (A) and improved EnPAT (B) (t = 1)
25