Stochastic Reservoir Simulation Using Neural Networks...

Society of Petroleum Engineers

SPE 49026

Stochastic Reservoir Simulation Using Neural Networks Trained on Outcrop DataJef Caers, Stanford University and Andre G. Journel, SPE, Stanford University

Copyright 1998, Society of Petroleum Engineers, Inc.

This pa~r was prepmd for presentation a[ the 1998 SPE Annual Technical Conferenceand Exhibition held in New Orleans, USA,, 27-30 September 1998,

~Is paper was selected for presentation by an SPE Program Comm}ttee following reviewof information contained in an abstract ~ubmitted by the authors(s), Contents of the paper,n.. presented, have not been reviewed by the Society of Petroleum Engineers and are sub-ject to cnmction by the author(s), Tbe material. as presented. dues not necessarily reflectany position of the Society of Petroleum Engineer?, its officers, or members, Papers pre-sented at SPE meetings are subject to publication review hy Editorial Committees of theSociety of Petroleum Engineers, Permission to copy is restricted to an abstract of not morethan 3W wnrds. Illustrations may not k copied, Tbe abstmct should contain conspicuousacknowledgement of where and by whom the paper was presented. Write Librarian, SPE,P.O. Box 833836, Richardson, TX 75083-3836, U. S. A., fax 01-214-952-9435,

Abstract

Extensive outcrop data or photographs of present day deposi-tions or even simple drawings from expert geologists containprecious structural information about spatial continuity that isbeyond the present tools of geostatistics essentially limited totwo-point statistics (histograms and covariances). A neural netcan be learned to collect multiple point statistics from vari-ous training images, these statistics are then used to generatestochastic models conditioned to actual data. In petroleum ap-plications, the methodology developed can be a substitute forobjects based-algorithms when facies geometry and reservoircontinuity are too complex to be modelled by simple objectsuch as channels. The performance of the neural net approachis illustrated using training images of increasing complexity,and attempts at explaining that performance is provided in eachcase. The neural net builds local probability distributions forthe facies types (categorical case) or for petrophysical proper-ties (continuous case). These local probabilities include mul-tiple point statistics learned from training images and are sam-pled with a Metropolis-Hastings sampler which also ensures re-production of statistics coming from the actual subsurface datasuch as locally varying facies proportions.

IntroductionStochastic simulation is a tool that has been driven by the needfor more “representative” i~~zage,sthan the smoothed n?aps pro-duced by regression techniques such as kriging. Although lo-cally accurate in a minimum error ~ariance sense, kriging mapsare poor representations of reality with various artefacts, Krig-ing produces conditional bias in the sense that through smooth-ing small values arc overestimated and large values are under-

estimated. Moreover, this smoothing is not uniform since itis minimal near the data locations and increases as estimationis performed further from the data locations. Smoothed mapsshould not be used where spatial patterns of values is importantsuch as in the assessment of travel times or production rates inflow simulations. Reproduction of [he spatial connectivity ofextremes is critical in any study involving the determination ofrisk of a rare event happening.

Most current methods of simulation rely on the modeling oftwo-point statistics such as covariances and variograms and thesubsequent reproduction of these second moments. Yet, manyphenomena are more complex in the sense that their spatial pat-terns cannot be captured by two-point statistics only. In partic-ular, identification of two-point statistics only is not sufficientto reproduce patterns of connectivity of extreme values overlong ranges. Strings of extreme values, curvilinear or wedge-shaped clusters of similar values are common occurrences inearth sciences, and their importance in water or oil flow simu-lations cannot be overstated.

The most common way of performing stochastic simula-tion is through some form of Gaussian simulation (Deutsch andJournel, 1997, p. 139). This class of simulation algorithmsis based on the congenial properties of multi-Gaussian models,which import some severe restrictions not always fully appreci-ated:

0

●

●

The marginal distribution is normal which is rather thin-tailed. This is Lhe least restrictive aspect of the multi-Gaussian assumption since we can always “scale” thesimulated Gaussian field by employing an appropriatelong-tailed marginal t-tack-transformation (program transin GSLIB, Deutsch and Journel, 1997, p. 132).

The multivariate and thus all conditional distributions arenormal which leads to extremes which in the asymp-totic limit, i.e. as the random function is taken to higherthreshold Ie\/els, become independent. The location ofextremes of a stationary isotropic Gaussian field is equiv-alent to a spatial Poisson process.

Both the shape and the variance of conditional distribu-tions are homoscedastic, i.e.. they are independent of theconditioning data values.

321

2 Jef Caers and Andr6 G. Journel SPE 49026

It is sometimes naively thought that a mere univariate back-transformation of a normal distribution into a non-normal one,renders the original Gaussian stochastic simulation a non-Gaussian one, see most major publications on stochastic sim-ulation, except for Goovaerts (1997, p. 266). Actually, a uni-variate transformation merely scales the simulated field back tothe target histogram but does not correct for the last two draw-backs cited above.

Improvement in connecting extremes could be obtained byusing an indicator simulation method (Deutsch and Journel,1997, p, 149). In this case, specific transition probabilities ofextremes can be imposed but the method is still restricted totwo locations only and therefore to two-point statistics. Also,the modeling of multiple indicator covariances can be difficultdue to lack of data, particularly at extreme thresholds.

The quest for “better” models or methods is motivated bythe fact that in many circumstances additional structural infor-mation on the spatial distribution is actually available. This in-formation need not be hard data, It can densely sampled train-ing images of a deemed similar field, e.g. outcrop photographs,secondary data exhibiting similar spatial variation. Training im-ages can be generated using physical dynamical models or theymay be hand-drawn by an expert geologist. Every bit of in-formation gives us more intelligence about the actual spatialdistribution, particularly that of extreme values. Also, by draw-ing, producing or making training images, one explicitly statesone’s prior belief about what the spatial model looks like, with-out hiding it behind some mathematical formulation such as amulti-Gaussian distribution. Using training images the wholeconcept of stochastic imaging is turned around: one first selectsa realistic image and subsequently tries to construct a modelthat captures the essence of that image. Thus, instead of imag-ing an analytical model, one models an image deemed repre-sentative.

A framework for direct incorporation of additional multi-point information has been introduced with the concept ofmulti-point statistics and extended normal equations (Journeland Alabert, 1989, Guardiano and Srivastava, 1992 and Srivas-tava, 1992). Information is now pooled at more than two loca-tions at the’same time, not being restricted to pairs of data suchas in the variogram. The problem is one of inference of the mul-tipoint (more than two) covariances involved and ensuring theirconsistency. Such inference requires extensive training imagesand important RAM to store the multiple point covariances ob-tained by scanning.

In literature one can find various other methods for incor-porating multi-point statistics

● Incorporating multi-point statistics using simulated an-nealing: the technique tolerates some amount of incon-sistency (non-positive definite situation), however as thenumber of classes becomes high and the order of themulti-point statistics raises, CPU and RAM quickly be-come a problem.

● Iterative methods for conditional simulation (Srivastava,1992; Wang, 1996). CPU is less of a problem but RAMis, since all conditional distributions (which are multi-point statistics) scanned from the training image must be

●

stored. There is no explicit modeling of these multi-pointstatistics, so no data-expansion is made, Hence, the train-ing image may have to be very large or there are many ofthem to allow for enough DEV’S to be recognized.

Markov Random Fields (Besag, 1974; Tielmeland,1996). In this case one explicitel~ states a model for themultivariate distribution of the values at all spatial loca-tions, typically a non-Gaussian model. The method isinternally consistent and not too CPU intensive. Yet, themodel contains hundreds of parameters which are diffi-cult to interpret and hence difficult to fit using actual dataor a training image. Therefore one typically restricts themodel calibration to a few parameters, the other paranle-ters being set to arbitrary values.

Methodology outlineThis paper aims at demonstrating that neural networks appliedto stochastic simulation can overcome many of the previouslymentioned problems. A neural network is in essence a multi-parameter non-linear regression model that defines a generalnon-linear mapping from any space into any other space both ofarbitrary dimension. In our case, the neural network is trainedto determine conditional distributions from a training image,then it proceeds with simulation using these distributions. Con-ditional distributions relate the value at any location to neigh-boring data values within a template centred at that location,see Figure 1. The neighbourhood template can be anisotropic(its anisotropy could be associated to variogram ranges) andthere could bc different templates of different sizes if a multiplegrid approach is used (Goodman and Sokal, 1989; Tran, 1994).The limitation to a finite neighborhood is similar to the screen-ing effect assumption; it is commonly used in kriging. Withinsuch data neighbourhoods, the neural network is learned to de-termine values g at a any given location given all neighboringdata values x,

More precisely. the neural network is trained to model thelocal conditional probability density function (cpdo

~(ylx)dy = Pr{y < Y < Y + dylx}

or its integral, the conditional cumulative distribution function(ccdf). The neural network maps the multi-point input (thetraining image) into a one dimensional output, the previous con-ditional distribution. In the training phase, we input the condi-tioning data y and x. that is Ihe training image, and the netdetermines the model j(ylx). In the generalization phase weinput x and draw values of y from these cpdf’s, see Figure 2,The network serves as a non-linear connection between inputand output.

In the training phase the network is learned by scanning atraining image using a certain template, retrieving multi-pointinformation in the form of conditional distributions. This ap-proach is somewhat different from the classical training phaseof a neural network, where the network output is compared withtrue or target output values, the learning then proceeds by back-propagating the mean square difference between network out-put and target output through adjusting the network parameters.

322

SPE49026 Stochastic Reservoir Simulation using Neural Networks 3

The output in our case are probability distributions which arenot available beforehand. We will show that an error functioncan still be defined to fine tune the training.

In the generalization phase, one visits each node i of theN nodes of the simulation grid (except the edges) and draws asimulated value yi conditioned to the data xi found within theprescribed template. The term generalization refers to the factthat, in this phase, the neural network is applied to assess dis-tributions conditioned to point information different from thosefound in the training data. Generalization thus entails data ex-pansion.

The generalization phase consists of generating stochasticsimulations (images). The problem is that, initially, we do nothave all the neighboring values of y; typically only a few con-ditioning data are available. The idea is to initialize the wholefield with random values (or according to some prior marginaldistribution) and then update each of these values using the net-work generated cpdf’s ~(y [x), A random path is defined thatvisits each node or location, and one keeps cycling throughthe whole field until some convergence is obtained. We willdefine later criteria for such convergence. This procedure istantamount to setting up a Markov Chain (Metropolis-Hastingssampler) sampling a stationary distribution. All original condi-tioning data values remain frozen. The general outline of ourmethod is as follows:

1. Training (=Modeling the image)

(a)

(b)

Define a neighbourhood template with which thetraining image is scanned

Train the neural net with the scanned data, extract-ing only “essential” features of the image. The re-sults are a model for the cpdf ~(y[x) known for allvalues of y and x.

2. Generalization (= Simulating an image)

(a) Define a random path visiting all non-frozen nodes

(b) Initialize the simulation by freezing the originaldata values at their locations and filling-in the re-mainder nodes with values drawn from the globaltarget histogram (prior calf)

(c) Start loop : visit all nodes, at each node do

● Draw a new value for that node (either at ran-dom or according to the prior calf)

. Retrieve from the neural net the cpdf of the oldand new value

● Change old value into new value according tosome probabilistic criterion (e.g. Metropoliscriterion).

(d) Stop loop when convergence is reached, if not goto(c)

(e) If another image is required start all over from (a)

First, we will develop the general methodology for stochas-tic simulation using local conditional distributions, and aMetropolis-Hastings sampler. Next we elaborate on how theneural network models these conditional distributions,

Conditional simulation with Markov Chains

Sequential simulation Sequential conditional simulationmethods rely on the conditional distribution of the value at eachnode to be simulated given the original conditioning data andpreviously simulated values. Points on a regular grid are visitedalong some random path and their values are drawn. In the caseof Gaussian sequential simulation. the parameters of the nor-mal conditional distributions are identified to the kriging meansand variances. The dimension of the kriging system rapidly be-comes unwieldy if all previously simulated \~alues are retained.An approximation is obtained by retaining only those condi-tioning values that are most “consequential”. In most casesonly the “closest” data are retained, where closest is definedwith respect to some variogram distance measure. Retainingthe closest data amounts to assume that the RF features someMarkov behaviour, i.e. the closest data screen the informationprovided by further away data.

In the case of Direct Sequential Simulation, only the meansand variances of the conditional distributions (not necessarilyGaussian) are identified to the kriging results (Xu and Journel,1994).

Metropolis sampling Suppose now that, through some algo-rithm, we have obtained the probability distribution of the at-tribute value at any specific location to be simulated conditionalto L neighboring data values; that distribution being generallynon-Gaussian. We also assume a Markov property, i.e. the con-ditioning is restricted to the L neighbors. We could then im-plement a sequential simulation method with such conditionaldistributions.

A powerful yet simple method for enforcing such condi-tional distributions is the Metropolis-sampler. Denote the set ofATpixel locations in the simulation grid by S = {1... N}.The simulated values over that grid is then the array y ={YI . . . Yjv}. Denote the neighbourhood data of a grid node i bya vector xi, i.e. the values closest to yi in some sense. Denoteby ~(y) the joint multivariate density of all N pixel values.

Conditional simulation with a Metropolis sampler calls foriterative visits of each grid node (Figure 3). The initial grid isfilled with random values. At each successive visit of a grid lo-cation i with present value Vi, a new \~aluey; is randomly drawnfrom the set of possible values and the original yi is replaced byy: with probability

‘in{l%}where ~(y;) is the joint distribution of the new pixel grid,where y* differs from y only by the value y? at the currentgrid location i. Using the conditional probability paradigm thiscan be rewritten as

323

4 Jef Caers and Andr& G. Journel SPE 49026

If only conditioning on local neighboring values is considered,the [ransition probability from y to y* reduces further to

P(y + y*) = min{’;:/::;}

(1)

The local conditional distributions ~(g~ Ixz) are required, theywere determined by the neural net in the training phase. Theoriginal conditioning data always remain frozen at their loca-tions.

The Metropolis sampler thus amounts to a Markov chainwith known transition probability ma(rix P(y + y“) definedby the expression ( 1). P is zero whenever y differs 1~]’morethan one pixel from y*. The stationary distribution of theMarkov chain is the joint distribution ~(y).

Honouring global statistics with the Metropolis-Hastingssampler Many gcostatistical simulation methods honourglobal statistics such as the histogram and a global variogram.The Metropolis-Hastings sampler, a more general form of theMetropolis sampler Iea\es the Ilcxibility to explicitly honour aglobal histogram, The general form of the Metropolis-Hastingsprobability is written as, see Hastings, 1970; Ripley, 1987, p.I 15:

{P(y + y*) =min 1, f(v; lxi)~(v; + vi)

f(vilxi)f(vi + v:) }

where we have now introduced a new but completely arbitrarytransition probability t. t describes a transition of one pixelvalue into another pixel value Lhat is independent of the neigh-boring values in xi. Wc find hack the original Metropolis cri-terion ( 1) when t is symmetric, i.e. t(u~ + vi) = t(y, + y;).To honour a specific histogram we will define an objective func-tion thal measures the difference between the target histogrumand the hislogram at the current iteration in the Markov chain.If the global his(ogram is described by C quantiles q, then theobjective function becomes

c

O(Y) = ~(~c – ~$s)(Y))2c= 1

(s)where q~ are quantiles of the global histogram at the currentsimulation step, depending on the current set of pixels y. Theobjcctivc function is then used to detinc the ratio of transitionprobabilities as follows

t(y; + yi)

~(Yi+ v;)= cxp(–(o(y*) – o(y))/kB)

where k~ is a positive constant detincd such thti[ the previousratio is of the same order of magnitude than ~(~~ lxi)/~(yilxi).The Metropolis-Hastings sampling criterion then becomes

{P(y + y*) =min 1,

.f(y~lxi)

f(Yzlxi)exp(-(o(y”) – o(y)/k}j)

}(2)

Note that the objective function O(y) is easily updatable aftereach iteration, since only one grid node value changes, hencethe honouring of a target histogram can bc obtained at virtuallyno extra cost. Experience has shown that the easiest histogramto honour is the uniform his~ogram. It is therefore advisable tomake a uniform score transform of the training image.

Nlultiple grid approach Many images have large scale fea-tures that cannot bc represented with u template of limited size.Large scale trends may be present or some structures such aschannels may run over the entire image. In such case we can de-fine multiple grids, see Figure 4: simulation starts on the coarsergrid allowing to capture the large scale structure, then proceedsonto the finer grids to depict the smaller scale details. The val-ues simulated on the coarse grid are frozen as conditioning datain the subsequent finer grids. In our approach, we need a set ofcpdf’s for each nested grid, which means that we have to traindifferent neural nets fur different grids, with scanned data usingtemplates of different sizes, see Figure 4. It is advisable to usemultiple grids when the number of simulation nodes exceedsvastly the original sample siTe (number of conditioning data).

A network architecture for building localconditional distributions

A one layer network The neural network is used to modelall local conditional distributions needed in the generalizationphase of the conditional simulation method. Markov chain al-gorithms ask for the conditional distribution f(~~lx~) at thevisited grid location i given any current realization x, of thencighbuuring data. The neural network utilizes a general non-Gaussian cpdf model. which is “fitted” or “truined” on the train-ing set. There is no close form mathematical representation forthe corresponding multivariate distribution model, the net deliv-ers the table of all required conditional distributions, once theirparameters are established. Before describing the parameter fit-ting algorithm, we must define the network architecture. Wefirst consider a network with one hidden Iaycr (Bishop, 1995).

Consider again gridded values which take continuous out-comes, the ~’s in Figure 2. We assume a Markov screening tohold in the 2- or 3-dimensional space. Targer ~’al~~eis typicalneural network language for the value yi for which the condi-tional distribution f (~ilx{ ) has to bc derived.

A neural network consists of neural nodes which containmathematical operations and branches which direct the resultsof these operations to other ncurul nodes 1. Most common oper-ations consist of summation, identity functions (input= output)and non-linear tranforms using some sigmoidal functions. Theneural nodes are arranged in layers which are connected. Thesimplest neural network is a network that connects the input toa middle layer and then to [he output layer. The middle layeris called the /?i(lcle/I/flyer. A single hidden layer feed forwardneural network consis[s [hcrefore 01’three layers. If the hiddenlayer has 1{ internal nodes the ccdf is modeled by mapping the

‘to avoid confusion with nodes of the sinarlation grid wc will use the ter[ns //c,~/r(//)Ioc/(,rand xrid 11(1(/e.sto differcntitite whctc~er ncceis;~uy

324

SPE 49026 Stochastic Reservoir Simulation using Neural Networks 5

input glx into the output, see Figure 5:

K

(L

F(ylx) =~okT ~?ik,lxl + Vky ) (3)k=l 1=1

where ok are K output weights and ~k,l are K X L inputweights linking the hidden layer, Uk are weights between theinput target value y and the hidden layer. K is the number ofneural nodes in the hidden layer and L is the number of datavalues Zl in the templa[e x. T is a non-linear transfer function,

typically a sigmoid with some known mathematical expression.

In our case, there is a further restriction on the function T inthat it should be a licit calf. Since a linear combination of calf’salso is a calf, if ok > 0, ~~=1 ok = 1, the resulting networkoutput (3) is a legitimate calf.

For convenience, without changing its architecture, we canrewrite the network as

K ( (Y-:wk’x’))‘4)~(~lX) =~ ok T t~k

k= 1

with ~k,j = “Vk’Wk, j . The network architecture is drawn inFigure 5.

From expression (4), it is clear that the ccdf appears as a lin-ear combination of ‘“kernel” cdf’s. One can calculate the meanof this ccdf, i.e. the conditional mean as

/

mEIYIx] = y dF(ylx)

—co

= EOk~:ydT(’’k(y-$wklxl))if T(y) is a cdf with zero mean and unit variance then

~:ydT(vk(’-:wk,xl))=~wk,,x,1=1

= m~ (x)

and the conditional mean becomes

K

EIYIx] = ~okmk(x) = m(X)

k=l

Similarly, the conditional variance is derived as:

Var[Ylx] =

.—

which is a classical

1“_m(Y -m(x))’ dF(ylx)

K

Ek( o ~ + (ink(X) – rn(X))2

k=l )

(5)

(6)

result from variance analysis. Thus theone hidden layer network architecture results in a ccdf whichis a mixture of K cdf kernels of the same type (i.e. a cdf Twith zero mean and unit variance) but centred at different val-ues mk (x). For this one layer network the parameter vector is:

6 = {ok,uk,w~,l, k = 1,...11,1 = I,.. ., L}, Thecondi-~onal variance (6) is heteroscedastic in that it is dependent ofthe data values x. The conditional mean (5) is a linear func-tion of the data x in that each of the K kernel means ~,(x) islinear in x. This linear relation of m(x) with x may be too re-strictive to model a general image, hence a second hidden layershould be added to obtain a more general non-linear approxi-mator to the conditional distribution function. Correspondingly,the computational effort will increase considerably and, as willbe shown, the learning process becomes more difficult.

A two layer network The fact that the conditional mean inEq. (5) is linearly dependent on the conditioning data valuesis a restriction to the flexible approximation capabilities of theneural network, To remove this restriction, we will add a sec-ond hidden layer of neural nodes (Bishop, 1995). A first hiddenlayer containing K1 nodes is inserted between the input and theoriginal hidden layer with K2, see Figure 6. A preliminary non-linear transform S of the data x is applied prior to applying theT transform. In neural net jargon, the S transform constitutesthe first layer and the calf-type transform is then called the sec-ond hidden layer, see Figure 6. Thus, the first hidden layer con-taining K1 neural nodes is inserted between the input and theoriginal hidden layer (now called second layer) with K2 neuralnodes. This results in the following conditional distribution:

kz=l k,=l 1=1

(7)where ~kl,1 are weights associated to the first layer and S isany non-decreasing func(ion. Based on the same notation forthe one layer neural network, we can simplify Eq. (7).

F(y[x) = ~ Ok, T (Vk, (Y – ~k, (x)))

k*=l

where we replace the linear regression mk (x) of expression (5)by a combination of KI non-linear functions of type S

K, /L \

‘k2(x)=21’L’k2’k’[zu’’’’x’)‘8)The following comments can be made about this two layer net-work

●

●

●

For F(y[x) to be a permissible ccdf it suffices that

~ti=l ok, = 1 ok, >0 and T is a permissible cdf

S is valued within the probability range of the cdf T,which in this case is the whole real line [–w, +w].

The two layer network appears as a mixture of K2 kernelcalf’s of type T, each with zero mean and unit variance,and centred on

KI

mk2(X) = r?Zk2 (x, w, u) = ~ Wkz,k, s

()? Uk, ,/xl

kl=l 1=1

325

6 Jef Caers and Andr6 G. Journel SPE 49026

Note that mk, (x) is non-linear in x. Finally:

KZ

E[lr[x] = ~ 0~,~~2(X, W, U) = m(x)

Kz

var[~7[X] = ~ O&,

(

4 + (~k, (x, w, u) – rrz(x))z

k*=l ‘kz )

It appears that the sole function of the added first layer ofK1 nodes, is to make the regression E[171x] non-linear.

Understanding the network parameters It is essential for theimplementation of a fast learning neural net to have a better in-sight into the approximation capabilities of the network and themeaning of each network parameter, We proceed as follows.From the cumulative conditional distribution in expression 7,the conditional density is calculated as the derivative

f(Ylx) = ~ ok, (~k,~’(~k, (Y – ~kz(x)))) (9)k2=l

where T’ is the derivative of T with respect to its argument. Ifone introduces the function h as

h(y – m~, (x), VA,) = V~2T’(V~2 (Y – mkn (x))

then

j(ylx) = ~ ok. (~(Y - mk. (x), vkz)) (lo)k2=l

The function h is a bell-shaped pdf if T is a sigmoidal calf. Forexample, if T is Gaussian calf, h is a Gaussian pdf.

The parameter Vkz is a shape parameter controlling thewidth of the bell. Expression ( 10J can be interpreted as a de-composition of the output density into various “kernel densi-ties” or “nodal densities” h. The neural network outputting theconditional density is drawn in Figure 7.

The function h is seen as the conditional density of the tar-get y given the conditioning data x arzd node k2 in the neuralnetwork

h(y – m~2(X,W,U), Vk2) = h(ylx,kz) (11)

Since the set {Ok,} is a valid probability mass, the nodal den-sities h(~[x, k~) are compounded with the weights o~, into thefinal cpdf:

(12).4!2=1

k2 are all the network parameters that define the node kz, itthusincludes all the parameters of @except the mixing parametersok2. In statistical jargon, expression (12) is termed a Gaussianmixture. The parameters okz thus appear as prior probabilitiesor relative contributions of each of the K2 nodes of the secondhidden layer. The parameters Vk, are scale parameters control-ling the width of the radial basis functions h and the parametersw, u define the non-linear connection between the means ofthese radial basis functions and the input data x.

Determining the parameters of the neural net

To determine the values of the various parameters Q ={oA,, v,., W, u, kz = 1,....K2} the network must belearned from training data. The training data can be obtained byscanning a training image for all available combinations {y, x}.Edge effects are ignored. The classical method of training neu-ral nets, where some mean square deviation of present versusdesired output is iteratively mininimized, cannot be used sincethe output here is a conditional probability which is not known.However, we can use the principle of maximum likelihood toobtain the optimal parameters Q. In the maximum likelihoodparadigm, one uses the joint density of all grid node values ofthe training image given the network parameters as the condi-tional likelihood for the training image given the model. Thisjoint density depends on the network parameters and can bemaximized, yielding a maximum likelihood estimate for Q,

Application of the likelihood principle can intuitively bejustified as follows and is symbolically depicted in Figure 8.We assume that our training image is one sample of an ensem-ble of images that is generated by a theoretical model definedby the set of parameters Q. This assumption need not be true,yet it is the basis of many methods of statistical inference. Sucha model can be a Metropolis-Hastings sampler using a neuralnetwork to model the conditional distribution. The questionnow arises about where the training image should fit into thedistribution of images defined by the model, see Figure 8. It isargued that the training image should be situated somewhere inthe middle of that distribution since among all images it is themost likely to occur, hence the maximum likelihood principle.Note that in the likelihood principle we do not use the actualconditioning data, we do use though the training image.

Suppose, we have a training image consisting of the set of

pixel values y(”) = {y\o), i = 1,. ... ,V}. Maximizing the

likelihood of drawing the specific training image Y(o) requiresknowledge of the joint density f(y[~) for any set y of all IVpixel values given any choice for the parameters d. Althoughthis joint density can in theory be calculated from knowledgeof the local cpdf’s ~(yzlxi, Q), that calculation is not practical(see Caers, 1998). Thus we need to compromise and replacethe joint density by, for example, the product of all Ar known

local cpdf’s ~(yllxi, Q). More precisely, the set of parameters @retained is that which maximizes the pseudo-likelihood definedas (see Besag, 1975; Gidas, 1993)

,\r

t(~, D) = ~ f(yj”)l~, X\o)) (13),=1

Expression (13) is called the ‘“likeliness” of the training im-

age D constituted by the N observed pixel values {y~o), i =1,.. .,,\’}. It can be shown that in many cases, maximizing thelikeliness yields a consistent estimate of the network parame-ters, Actually, instead of maximizing the likeliness, we will

326


minimize minus the log-likeliness, defined as the error func-tion:

Er(Q, D) = – log(~(~, D)]

= - ~log(f(u:”’ld, Xj”))), (14)i=l

where Er stands for error function.Many methods could be considered to minimize this error-

function, such as steepest descent or conjugated gradient meth-ods, However, these usually take a lot of iterations, renderingthe training procedure slow and tedious. Also there is a risk toend up in a local minimum, In Caers and Journel (1998), aniterative minimization of error function (14) is presented basedon the EM-algorithm (Expectation-Maximization). It is impor-tant to note that the numbers K1 and K2 are fixed a priori, theyare not determined through our optimization procedure,

Obtaining the essential features of an imageThe neural network architecture is defined by the numbers(Kl, K2) of internal nodes, the input, output and how all theseare linked. Training on an image is performed through maxi-mization of the error function (14) through iterative EM-steps,One cannot however simply fit the data extracted from an imageusing the EM-algorithm, since the fitting error will graduallydecrease and the neural network may end up fitting the trainingdata exactly beyond their essential and exportable spatial fea-tures. Indeed, by using enough nodes (Kl, K2 large) one couldfit exactly all training patterns. A way to overcome this prob-lem is to stop the training early, also denoted in neural languageas early stopping. However, when to stop is not clear.

Consider a second image, or if not available, we divide upour training image such that an independent set can be obtained.The idea is to evaluate the same likelihood error function on thesecond cross-validation set using the parameters fitted on thefirst training set. Wc define then a second error function whichis calculated at each step of the EM-algorithm

Er,v (@tr,D’”) = – log(~.,, (Qtr,Dcv))N

where y~v is the cross-validation data. Note that the cross-validation error is calculated exclusively with the cross-validation data DCUbut using the training parameter values Qt..The cross-validation data set is }zot used to determine the pa-rameters Qt. in the minimization, yet the cross-validation error( 15) it is updated at each step of the EM-algorithm. The crossvalidation error in (15) will tend to have a minimum as the EM-algorithm proceeds, see Figure 9. This can be understood byviewing the training with maximum likelihood as a result oftwo opposing tendencies. On one hand, the goodness of fit ofthe training patterns by the neural net will increase as the like-lihood is maximized, on the other hand the amount of learningwill decrease.

The cross-validation error will tend to decrease at first as thenetwork learns the essential features of the first training image

but will increase as the network starts overfitting the first train-ing data modeling features which are not shared by the secondcross-validation image. We will show examples of training andcross-~alidation eror functions in the examples.

Example training images

A range of example training images (constructed images andreal outcrop) is shown in order to assess the performance of thestochastic simulation method proposed.

Sand dunes. A first image (see Figure 10) selected is a digi-tized section of aeolian sandstone images with black and whitepixels. The black pixels correspond to the fine sediments withgood horizontal connectivity. The white pixels correspond tocoarser higher permeability sediments. The most important fea-ture of this image is the curvilinear shape of the fine sedimentswhich would act as vertical flow barriers. Such curved shapesare difficult to reproduce using traditional two-point simulationalgorithms. The image is scanned with the template shown inFigure I (left). Due to the small size of the image, early stop-ping (after 60 iterations of the EM-algorithm) was preferred tocross-validation. The neural network trained contains 20 nodesin the first and 10 nodes in the second hidden layer. Figure 10shows three conditional simulations: the connectivity of blackpixels with their curvilinear feature is reasonably reproduced,

Walker Lake data set. The Walker lake data set is a set of78000 (260 x 300) topographic measurements (altitudes in feet)of a mountain range near Walker Lake, Figure I I gives the ex-haustive data set which features the mountain ridges and flat saltlakes in the valleys. A normal score transform of the data wastaken. To obtain a training and a cross-validation set we split theimage in two halfs (bottom and top). The aim of this study is toproduce unconditional simulations on small 130x 150 grid andconditional simulations on a larger 260 x 300 grid. To be ableto reproduce the long range structure along the NS-direction weused the multiple grid approach. For conditional simulation,three successive grids were retained of size 65 x 75, 130 x 150and 260 x 300. For each grid a neural network was trainedwith 20 nodes in the first and 10 nodes in the second hiddenlayer. For each of these grids we used the same template, seeFigure I (left). 390 regularly gridded sample locations (0.5 %of the total amount of pixels) were selected on the referenceWalker Lake data set to provide local conditioning data. FigureI I shows two resulting conditional simulations. The large scalefeature of mountain versus lake area is reproduced, whereassmaller details within the mountain are not. Two unconditionalsimulations on a 130 x 150 grid are also shown in Figure 11.The unconditional simulations reproduce the transition betweenmountain and lake area characteristic of the exhaustive data set.

Channels. Fluvial reservoirs are characterized by sinuoussand-filled channels within a background of mudstone. A re-alistic modeling of the curvilinear aspect of such chunnels iscritical for reliable connectivity assessment and for flow sim-ulation, The modeling of such channels usually occurs within

327

8 Jcf Caers and Andr6 G. Journel SPE 49026

well-defined stratigraphic horizons. An object-based method(fluvsim, Deutsch and Wang, 1996) was used to generate atraining and cross-validation image, see Figure 12. The corre-sponding pctrophysical properties were generated as a randomtexture within both the mudstone and the chunnel. The trainingand cross-validation images clearly exhibit Iong-range structuredue to the channeling. To reproduce such long-range structures,we must simulate on multiple grids. Three grids were selectedof sizes 25 x 25, 50 x 50 and 100 x 100. For each of these gridswc used the same template, sce Figure 1 (left). For each grid,a neural network is learned from the scanned data using thecorresponding template. In each case we opted for a net with30 nodes in the first and 15 nodes in the second hidden layer.Figure 12 shows two unconditional simulations and two condi-tional simulations, each with it different amount of conditioningdata, in order to assess the influence of conditioning data on thelocation of the channels. When using conditioning data, themeandering and thickness of the chonnels is belter reproduced.The conditioning data locations were taken randomly from thetraining image.

Conclusions

In this paper wc propose a stochastic simulation method basedon sequential modeling of local conditional distributions (cpdf)using neural networks. The conditional distribution of a set ofunknown variables given a set of known variables is at the rootof many statistical prediction algorithms and the methology de-veloped here carries over [o other estimation problems, not nec-essarily spatial ones. The kcy aspect of the methodology pro-posed consists of integrating into the cpdf’s, right from the start,essential textures read from training images. This approach ismarkedly different from most geostatistical method where thetexture is embedded into the random function model (such asMulti-Gaussiun or multiple indicator models) \vhich borrowsfrom reality only 2-point statistics, It is also different fromthe simulated annealing approach where on objective functionis defined to impose reproduction of various multi-point statis-tics scanned from the training image. Simulated annealing dotsnot provided a method to decide which multi-point statisticsare essential and thus should bc retained in the objcctivc func-tion. The usc of a cross-validation data set allowed the neuralnetwork approach to avoid overfittirtg the multi-point patternsscanned from the training image. In case of simulated anneal-ing, if too many multi-point statistics borrowed from the train-ing image are imposed, the training image would bc reproducedexactly, which is never the ultimate goal. The neural networkapproach allows for data expansion, a concept that is of criticalimportance in any imaging tcchniquc within the earth sciences.

The proposed method does not involve any of the classicalsteps of a gcostatistical study such as kriging, modeling of var-iograms or making univariatc transforms, nor does it involveany kind of consistency condition and correction. At the sametime this simplification entails its major disadvantage: wc donot know what textures are being borrowed from the trainingimages. From the examples it appears that the neural networkapproach has the capability of reproducing curvilinear features,fractures or repetitive patterns,

Many practical issues still need to be addressed: how largeshould the template(s) be ‘? How many nodes should each hld-dcn layer in the net contain ‘?\Vhcn should wc stop the visit ofthe whole image in the Metropolis-Hastings sampler’? How canwc speed up the learning phase ? Many of the answers lie in abetter understanding of the inner working of the neural networkand how the two hidden Iaycr interact with each other duringIcarning and generalization phases.

value a( any nodeVal LIC at node i, Image has IV nodes.values at all nodes = one imageset of ncighbourhood data in template centrcdonc specific ncighbourhood datum in any temset of ncighbourhood data in template ccntredset of parametersdensity (conditional. marginal or joint)cumulative distribution (conditional, marginalnumber of neural nodes in second hidden layenumber of neural nodes in first hidden Iaycrnon-linear functionsset of parameters feeding in the first hidden Iaset of parumctcrs feeding in the second layerscale parameters of the radial basis functionweight into the output layerlikelinesserror function

Acknowledgements

The authors tvould like to acknowledge the Stanford Center forReservoir Forecasting, the BAEF (Belgian American Educa-tional Foundation) ond the NATO for support of this research,

References

1. BESAG, J., 1974. Spatial interaction and the statisticalinteraction of lattice systems (with discussion). J. Roy.Stclt. Sot. B, 36, 192–236,

2. BESAG, J., 1975. Sta~istical interaction of non-latticedata. Tlte Stcttistici~~tt,24, 179– 195.

3. BESAG, J., GREEN, P., HIGDON, D. ANDMENGERSEN, K., 1995. Baycsian computation andstochastic systems (with discussion ). St[ltistical Scietzce,10,3-66.

4. BISHOP C.M, 1995. Neural networks for pattern recog-nition. Oxford University Press.

5. DEMPSTER. A. P., LAIRD, N.M. AND ROBIN, D. B.,1977. Maximum likelihood from incomplete data viathe EM-algorithm (with discussion). J. Roy. S[at. Sot.,B, 39, I–38.

328

SPE49026 Stochastic Reservoir Simulation using Neural Networks 9

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

DEUTSCH, C., 1992. Annealing techniques applied toreservoir modeling and the integration of geological andengineering (well test) data. Ph.D. thesis, Department ofPetroleum Engineering, Stanford University, Stanford.

DEUTSCH, C., AND WANG, L., 1996. Hierarchicalobject-based stochastic modeling of fluvial reservoirs.Matlz.Geol., 28, 857-880.

DEUTSCH, C., ,4ND JOURNEL, A. G., 1997. GSLIB:Geostatistical Software Library. Oxford UniversityPress.

CAERS, J., 1998. Stochastic simulation using neuralnetworks. 11th Annual report, Stanford Center for Reser-voir Forecasting, Stanford, USA, 66p.

CAERS, J., AND JOURNEL, A. G., 1998. A neural net-work approach to stochastic simulation. GOCAD EN-SMSP meeting, Nancy, France, June 4-5.

GUARDIANO, F. AND SRIVASTAVA,M., 1992. Borrow-ing complex geometries from training images: the ex-tended normal equations algorithm. SCRF Report, 1992.

IGELNIK, B. AND PAO, Y.H., 1995. Stochastic choiceof basis functions in adaptive functional approximationand the Functional Link net. IEEE Trans. on Nel{ral Net-~’OrkS,6, 1320-1329.

GOODMAN J. AND SOKAL A. D., 1989. MultigridMonte Carlo method. Conceptual foundations. PhysicalRevue D, Particles and Fields, 40,2035-2071.

GOOVAERTS, P., 1997. Geostatistics for natural re-sources evaluation. Oxford University Press,

HUSME[ER, D. AND TAYLOR, J. G., 1997. Neural net-works for predicting conditional probability densities:improved training scheme combining EM and RVFL.King’s College, Department of Mathematics, London.Pre-print KCL-MTH-97-47.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

JORDAN, M.I. AND JACOBS, R. A., 1994. Hierarchicalmixtures of experts and the EM-algorithm. Neural Comp-utation, 6, 181–214.

JOURNEL, A.G. ,4ND ALABERT, F., 1989. Non-Gaussian data expansion in the earth sciences. TerraNova, 1, 123–134.

HASTINGS, W. K., 1970. Monte Carlo samplingmethods using Markov chains and their applications.Bionletrika, 57, 97–1 09.

PRESS. W. H., TEUKOLSKY, S. A.. VETTERLING, W.T.AND FLANNERY, B. P,, 1996. Numerical recipes in For-tran 90, volume 2 of Fortran numerical recipes. Cam-bridge University Press.

RIPLEY, B. D., 1987. Stochastic simulation. Wiley, NewYork.

RIPLEY, B. D., 1996. Pattern recognition and neural net-works. Cambridge University Press.

SRIVASTAVA,M., 1992. Iterative methods for spatialsimulation. SCRF Reporr, 1992.

TJELMELAND, H., 1996. Stochastic models in reservoircharacterization and markov random fields for compactobjects. Ph.D. Thesis, Department of Mathematical Sci-ences, Norwegian University of Science and Technology,Trondheim.

TRAN, T., 1994. Improving variogram reproduction ondense simulation grids. Conlputers and Geosciences, 20,1161-1168.

WANG, L., 1995. Modeling complex reservoir geome-tries with multi-point statistics. SCRF Report, 1995.

Xu, W. AND JOURNEL, A. G., 1994. DSSIM: A gen-eral sequential simulation algorithm. SCRF reporr, 1994,

329

10 Jef Caers and Andr& G. Journel SPE 49026

Isotropic template

m

Anisotropic template

x: multipoint data

BY

Figurel: Dejning aneighbourhood orscanningtemplute

Training =Calibrationof Q

x

. .

. .

. .

. .

. .

.,

. .

ii

O\ ‘\ylo

Neura1

Netwo

;

Q

f(ylx)-

Generalization = Simulation

;——

yl and O_

Neura1

Netw

y : value of visited node (could be continuous )

x : value of neighbors of visited node

Q : network parameters

f : conditional distribution

Figure 2: General network configuration for stochastic simulation for a binary case.

f(f)lxs)

f(llxs)

330

SPE49026 Stochastic Reservoir Simulation using Neural Net~~/orks 11

Multi~le random visits

Neural net

+’1

\,:..old i... Prob(n ❑ -m=) =Bold ❑

? to keep

new

+

Prob(m •~mH) =P~~w ❑

Transition probability : t(pold, ~new)

❑ ——— ❑

Pnewexample : min(l, )

Pold

Figure 3: Metropolis sampling: thejeld is initially random, Each node is visited and updated according to the ratio pn,~, /po/d.

L

I● coarse . Intermediateti

fiml

I II

I 1 I I I I I ! I I I I I I I I I I I

t

•1

❑ lmn

•1Coarse grid t-late

fiml grid tzlate

F~gure 4: Using multiple grids in stochastic simulation,

331

12 Jef Caers and Andr4 G. Journel SPE49026

1.1

x

I-L

Y

K nodesWk[

ccdf

PF(ylx)

Figure 5: Single hidden layer nemork.

First hidden Second hidden

layer layerKI nodes – KZ nodes

ccdf

F(y lx)

Y –Vk2

Y

Figure 6: Adding a second laye~

K I nodes K2 nodes

x f(y lx)

Y

Radial basis function

B=&.

y-mk2(x I W,u)Figure 7: Nel[ral net)i’ork that [Irttpnt.Y a density instead of a cdfi

332


Model Idefined by local prohai>iIities

ccdf

Gibbs sampler + neural net I m/

I (0) I (o)

QA Maximum likelihood

: initial guess ~ Q .10

/

. estimate

Maximization of the likelihood f ( tr:ii]~ing iin:ige ({}) IQ)

= Training the neural net

Figure 8: Likelihood pritlciple it! spatial statistics.

Cross–validation principle

t

Error function

Error of reproducing the patterns of first image

\\\\

“Training error”

\\

\\ Cross-validation error

\\

\ i_-

----

--------——— - v iteration

I 1 2 3 .......... stop

Figure 9: Traitzitlg error atzd cross- \crlidotiot] error

333

14 Jef Caers and Andr6 G. Journel SPE49026

84,000 84000

00 LO0.0 1m 000 00 1E4 000

a.000

0,00.0 lM,000

85 conditioning data800 .- .

“’.OO 00 ●

700 “o.

0 ● .* ●

600 0 e“ o

500 0“. .

400..

0. 0“. . .* o

“o .300 0 00 ●

:.. ”O. ”OO0

0 . .● . .*

o0.0

“8 O*”1 1 1 1

0. 40. 80. 120. 160.

84.000

000.0 1E4,000

Figure 10: Durzes: results.

334

SPE 49026 Stochastic Reservoir Sin~ulation using Neural Netvorks 15

Unconditional simulation 2

300

Figure 11: Walker Luke: simt~lation resltlts

300.0(

o

u 0.0

335

16 Jcf Caers and Andr6 G. Journcl SPE 49026

,Ci

1 000

0900

n RoOO

0 7CQ0

060W

o 50Q0

O4000

a 3009

02000

01000

00

Figure 12: Chanrlel sintulation resu[ts.

336

Stochastic Reservoir Simulation Using Neural Networks...

Documents

Transcript of Stochastic Reservoir Simulation Using Neural Networks...