An Ensemble of Geostatistical Simulated Realizations Using a Clustering Algorithm to Reproduce...

19
This article was downloaded by: [Moskow State Univ Bibliote] On: 30 December 2013, At: 07:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Marine Georesources & Geotechnology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/umgt20 An Ensemble of Geostatistical Simulated Realizations Using a Clustering Algorithm to Reproduce Model-Based Statistics: Case Study of a Gravel Reserve at Kivalina, Alaska Snehamoy Chatterjee a & Sukumar Bandopadhyay b a Department of Mining Engineering, National Institute of Technology , Rourkela , India b College of Engineering & Mines, University of Alaska Fairbanks , Fairbanks , AK , USA Published online: 30 Apr 2013. To cite this article: Snehamoy Chatterjee & Sukumar Bandopadhyay (2013) An Ensemble of Geostatistical Simulated Realizations Using a Clustering Algorithm to Reproduce Model-Based Statistics: Case Study of a Gravel Reserve at Kivalina, Alaska, Marine Georesources & Geotechnology, 31:3, 225-241, DOI: 10.1080/1064119X.2013.774685 To link to this article: http://dx.doi.org/10.1080/1064119X.2013.774685 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Transcript of An Ensemble of Geostatistical Simulated Realizations Using a Clustering Algorithm to Reproduce...

This article was downloaded by: [Moskow State Univ Bibliote]On: 30 December 2013, At: 07:14Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Marine Georesources & GeotechnologyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/umgt20

An Ensemble of Geostatistical SimulatedRealizations Using a Clustering Algorithmto Reproduce Model-Based Statistics:Case Study of a Gravel Reserve atKivalina, AlaskaSnehamoy Chatterjee a & Sukumar Bandopadhyay ba Department of Mining Engineering, National Institute ofTechnology , Rourkela , Indiab College of Engineering & Mines, University of Alaska Fairbanks ,Fairbanks , AK , USAPublished online: 30 Apr 2013.

To cite this article: Snehamoy Chatterjee & Sukumar Bandopadhyay (2013) An Ensemble ofGeostatistical Simulated Realizations Using a Clustering Algorithm to Reproduce Model-BasedStatistics: Case Study of a Gravel Reserve at Kivalina, Alaska, Marine Georesources & Geotechnology,31:3, 225-241, DOI: 10.1080/1064119X.2013.774685

To link to this article: http://dx.doi.org/10.1080/1064119X.2013.774685

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

An Ensemble of Geostatistical SimulatedRealizations Using a Clustering Algorithm toReproduce Model-Based Statistics: Case Study

of a Gravel Reserve at Kivalina, Alaska

SNEHAMOY CHATTERJEE1 ANDSUKUMAR BANDOPADHYAY2

1Department of Mining Engineering, National Institute of Technology,Rourkela, India2College of Engineering & Mines, University of Alaska Fairbanks,Fairbanks, AK, USA

This paper demonstrates the convergence of model-based statistics from multiplesimulated realizations. Theoretically, the convergence of realization statistics isguaranteed over the number of realizations that are independent among themselves.The rate at which realization-based statistics converges with model-based statisticsis important and must be assessed. However, due to poor selection of the randomnumber generator, the generated realization might be far from mutual independence.We use the k-means clustering algorithm to select nearly independent realizationsfrom a set of realization models. We apply the proposed algorithm to a coastalerosion problem in Alaska to estimate the amount of gravel.

Keywords clustering algorithm, geostatistics, gravel reserve, Kivalina Alaska

Introduction

Coastal erosion is a threatening problem in Alaska. A 1971 U.S. Army Corps ofEngineers study (U.S.A.C.E. 1971) showed that approximately 11% (5,100miles)of Alaska’s coastline is undergoing significant erosion.

Coastal erosion is the wearing away of land, caused by natural activity or humaninfluences, that results in loss of beach, shoreline, or dune material. Coastal erosionis measured as the rate of change in the position or horizontal displacement of shore-line over time. Coastal erosion due to natural activity can occur from wind, waves,multi-year impacts, and long-term climatic change such as sea level rise, lack of sedi-ment supply, and subsidence. Erosion also occurs because of long-term human fac-tors, such as construction of shore protection structures and dams, or aquiferdepletion. A report by the U.S. Environmental Protection Agency (EPA 2004)

Received 2 February 2009; accepted 20 May 2011.We would like to thank our esteemed reviewer Eric Grunsky for providing valuable

comments to improve our manuscript.Address correspondence to Sukumar Bandopadhyay, College of Engineering & Mines,

PO Box 755800, Fairbanks, AK 99775-5800. E-mail: [email protected]

Marine Georesources & Geotechnology, 31:225–241, 2013Copyright # Taylor & Francis Group, LLCISSN: 1064-119X print=1521-0618 onlineDOI: 10.1080/1064119X.2013.774685

225

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

and a report on Arctic climate impact assessment (ACIA 2004) indicate significantchances of increasing mean sea level and other consequences due to global warming.Thus, long-term planning is required to solve Alaska’s coastal erosion problem.

Kivalina, a community situated on a barrier island in the southeast Chukchi Sea,is exposed to natural hazards. Review of aerial photos since the 1980s indicates lossof beach width over the last few years, from the mouth of the Wulik River northtowards the airport, with a rapid increase in erosion into specific upland areas ofthe community. To tackle the problem of coastal erosion, the Kivalina Village Coun-cil resolved to relocate the village from the barrier island to an adjacent onshore site(The Associated Press 2001). The plain identified for village relocation is underlainby continuous permafrost that will require huge amounts of aggregate material priorto the building of any foundation. The relocation process will take considerable time;hence, a short-term solution is needed to protect the town from coastal erosion.

Aggregate material in large amounts will be required for both long- andshort-term solutions to Kivalina’s coastal erosion problem. Gravel is considered acost-effective material for foundations, and the continental shelf adjacent toKivalina has been targeted as a potential source of large volumes of gravel.

The amount of gravel in unsampled locations of the Kivalina area must be esti-mated, and prediction requires some kind of model of spatial behavior of gravel.Because the model would be generated based on few sample observations, somedegree of uncertainty will be present in the analysis.

Geostatistical methods are widely accepted techniques for spatial analysis(Journel and Huijbregts 1978; Isaaks and Srivastava 1989; Goovaerts 1997). Geosta-tistical techniques have been applied in various areas of spatial analysis, such asrainfall prediction (Hengl et al. 2010; Hiemstra et al. 2010), topography mapping(Hengl et al. 2008), radioactivity mapping (Pebesma 2005), and mining resource esti-mation (Journel and Huijbregts 1978). The main concept of geostatistics is the use ofquantitative measures of spatial correlation using variogram modeling. An interp-olation technique like kriging is used next to estimate an unsampled location bythe weighted average of known neighboring sample data. Weights are calculatedbased on variogram structure. Kriging estimation variances, which are independentof the estimated value, are related only to the spatial arrangement of the sample dataand the model variogram (Isaaks and Srivastava 1989). Therefore, the krigingvariance is not the true representation of uncertainty.

Stochastic simulation, which represents an alternative modeling technique, isparticularly suited in applications where global statistics are more important thanlocal accuracy (Journel and Alabert 1989; Gomez-Hernandez and Srivastava 1990;Deutsch and Journel 1998; Goovaerts 2000, 2001). The simulation tries to reproducefirst- and second-order statistics of the data. Stochastic simulation techniques gener-ate a number of alternative stochastic images of the random process by choosing arandom path and changing the seed number of the random number-generation pro-cess. The technique, therefore, generates a number of equi-probable realizations thatcan be helpful in carrying out uncertainty analysis.

Because stochastic simulation generally tries to reproduce global statistics ofdata by developing a number of realization maps, the rate at which realization-basedstatistics converges with model-based statistics is important and therefore, must beassessed (i.e., histogram, variogram). From a theoretical point of view, such a repro-duction of data to match histogram and variogram is guaranteed only on averageover a number of independent realizations (De Iaco and Palma 2002). The conver-

226 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

gence criterion is strongly affected by the random number generator used in geosta-tistical software. Furthermore, it is well known that simulated realizations of a ran-dom process are affected by the random number generator used in the simulationprocess (De Iaco and Palma 2002). Appropriate random number generation is amathematical problem and practitioners who use the geostatistical software dailyfind it difficult to understand.

In this study, an attempt has been made to select an optimum number of simu-lated realizations using an ensemble of simulated models. Members of the ensemblefrom the set of equi-probable realizations were selected using the k-means clusteringtechnique. Optimum numbers of clusters were selected by minimizing the objectivefunction. E-type (expected) and variance maps were generated using members ofrealizations from optimum cluster numbers.

This paper is organized as follows: Section 2 describes the methodology of thestudy, which includes a brief overview of the sequential Gaussian simulation tech-nique and the procedure used to select the ensemble members from multiple realiza-tions. Section 3 describes the Kivalina gravel case study. Section 4 demonstrates theresults obtained from the study, and Section 5 contains the conclusions.

Methodology

Sequential Gaussian Simulation

The stochastic simulation technique used in this study is sequential Gaussian simula-tion (SGS). The main aim in geostatistical simulation is to generate the local con-ditional probability-distribution function of the variable of interest using knownobservations. In SGS, the conditional probability-distribution function is considereda random function of multivariate Gaussian form. As per stationarity assumption,the spatial distribution of the multivariate Gaussian function can be perfectlycharacterized by its mean value and its covariance matrix, which entails that allconditional distributions under the decomposition are also Gaussian. Moreover,the mean of each conditional distribution is expressed as a linear combination ofpreviously simulated nodes:

E½ZðuiÞjZðui�1Þ ¼ zi�1; . . . ;Zðu1Þ ¼ z1� ¼Xi�1

j¼1

kjðuiÞzðujÞ ¼ z�SKðuiÞ ð1Þ

where the weights kj (ui) are determined using simple kriging, and where the varianceequals the kriging variance Var½ZðuiÞjZðui�1Þ ¼ zi�1; . . . ;Zðu1Þ ¼ z1� ¼ r2

SKðuiÞ:Given these mathematical results, the SGS algorithm can be defined as follows:

. Transform the sample data to standard normal scores.

. Assign the data (n) to the grid.

. Define a random path visiting all nodes u.

. Loop over all nodes ui.

. Construct a conditional Gaussian distribution:

Gðui; zjðnþ i � 1ÞÞ ¼ Gz� z�SKðuiÞrSKðuiÞ

� �ð2Þ

Ensemble of Geostatistical Simulation 227

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

. Draw a simulated value z(ui) from the conditional distribution:

Gðui; zjðnþ i � 1ÞÞ

. Add simulated value to data set (nþ i� 1).

. End simulation.

. Transform the entire simulation back to the original data histogram.

Simulation Model Ensemble

The convergence of realization-based statistics is a problem of ensembling multiplerealizations of a simulated variable. Ensembling multiple realization models is awell-accepted technique in regression and classification problems (Breiman 1996;Bauer and Kohavi 1999). A simple approach to combining equi-probable realiza-tions is to average them. The basic ensemble model (BEM) output is defined by

fBEM ¼ 1

n

Xni¼1

fiðxÞ ð3Þ

where fBEM is the ensemble of realization, fi is the ith realization, and n is number ofrealizations.

Equation (3) provides a valid result when the realizations generated are inde-pendent of each other. However, due to an inappropriate choice of random numbergenerator, the realizations are not always independent. Krogh and Vedelsby (1995)demonstrated that the generalization ability of an ensemble model is strictly depen-dent on its average generalization ability (accuracy) and the average ambiguity(divergence) of individual realization in the ensemble model.

The accuracy of an ensemble model can be improved by combining those reali-zations that perform better than others do. Since the simulation realization isequi-probable, there is no valid measure to check the superiority of one realizationover another. How many and which realizations should be ensembled to achieve fastconvergence of realization-based statistics is an open problem. The ambiguity of anensemble model can be obtained by selecting only those realizations that have lesscorrelation among themselves (Cho and Ahn 2001; Rosen 1996).

Although diversity (independence) of realizations is considered an importantconcern in ensemble modeling, measuring independence among realizations is diffi-cult. In regression and classification problems, many approaches have been devel-oped to construct ensembles for increasing diversity (Breiman 1996; Schapire1990; Krogh and Vedelsby 1995; Rosen 1996). However, the problem is quite easyto solve, since the objective is to minimize the regression or classification error byusing the least squares method in a supervised manner from the known observedoutput value or class. In spatial simulation, our aim is convergence towardsmodel-based statistics. The objective function has been chosen in such a way thatensemble output reproduces the first- and second-order statistics of the data, thatis, the histogram and variogram. The objective function used in this study is:

EðnÞ ¼ w1 �Xki¼1

�1� qi

pi

�2

þ w2 �Xmj¼1

�1� sj

rj

�2

ð4Þ

228 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

where pi is the ith percentile value of observed data and qi is the ith percentile valueof the ensemble of n realizations, rj is the variogram value at lag j of observed data,and sj is the variogram value at lag j of the ensemble of n realizations. The weights w1

and w2 are associated with histogram reproduction and variogram reproductionparameters. These two weights are set equal (0.5) in this study.

There are a number of methods available in the literature for selecting diversifiedsimulation realization, a clustering-based method that we used. Several simulatedrealizations are first generated with the same seed number, which is an odd largenumber (Deutsch and Journel 1998), and then k-means clustering is applied to dividethe generated realization into different clusters. The k-means clustering algorithmcreates k clusters with maximum inter-cluster distance. We used the Mahalanobisdistance measurement, which considers inter-cluster variance for clustering purposes.The clustering centers of k clusters have maximum diversity among themselves.The one single realization is then selected from each cluster to make the ensemblemodel. Since the simulation realizations are equi-probable, those realizations areselected that have minimum distance from the center of their member cluster. Thesame distance function is used to select simulation realization for the ensemblemember.

The next determination concerns ideal cluster numbers that form the ensemble.For this purpose, the ensemble is developed by increasing the number of clusters. Byincreasing the cluster number, the objective function in Eq. (4) is calculated for eachensemble. When no significant improvement occurs in the objective function, thealgorithm stops and the cluster number is selected that provides minimum error inobjective function.

Case Study

Kivalina is located at the tip of a 13 km long barrier island, between the Chukchi Seaand Wulik River (Figure 1). It is located approximately 130 km north of the ArcticCircle on the Chukchi Sea coast. This low-lying island is subject to flooding duringoccasional storm surges and to erosion due to wave action (Scheffner and Miller1998).

Acoustic-reflection studies within the Hope Valley (part of the shelf offKivalina) indicate that sediments overlying the basement rocks are up to 10m thick(Moore 1964). The Cape Thompson–Kivalina area contains gravel along the near-shore (Stauffer 1987; Creager and McManus 1966). The gravels are probably relictglacial deposits, the source being the De Long Mountains.

Data for this study were obtained from research conducted by the MineralIndustry Research Laboratory (MIRL) at the University of Alaska Fairbanks.Raw data were obtained by seismic survey and sediment sampling. Data wereprocessed and grain size was analyzed at MIRL.

Geostatistical modeling of gravel deposits is a difficult job, since there is nostandardized rule of modeling. Bliss (1998) showed that size models and other typesof aggregate models could be considered good parameters for modeling gravel esti-mation. In this study, size modeling was performed to evaluate gravel deposits. Thefirst step was to compute basic statistics of the sample data; this was followed bygeostatistical analyses of the data set. The geostatistical analysis led to resourceestimation that described the size of the deposit and the volume of gravel at variouscutoff grades.

Ensemble of Geostatistical Simulation 229

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

Results

Exploratory Data Analysis

The sample data set consists of 149 data points obtained from a previous study(Bandopadhyay et al. 2007; D’Souza et al. 2009). The sample data location is pre-sented in Figure 2. The histogram and descriptive statistics of the 149 sample valuesare presented in Figure 3 and Table 1, respectively. From Figure 3, observe that thesample data is highly skewed towards the small value. Table 1 shows that skewnessand kurtosis values are far outside the normal distribution. Table 1 also shows thatthe data are highly erratic in nature, which is supported by the coefficient of vari-ation of the sample (1.23). To perform the Gaussian simulation on Kivalina gravel,therefore, the data need to be transformed to a normal score space. The gravel datawere transformed to a normal score space using normal score transformation (Verly1994). The histogram of the normal score data, which is presented in Figure 4, showsthat the transformed data nicely match the Gaussian distribution shape, with a meanof zero and a standard deviation of 1.

The SGS algorithm assumes that the spatial domain simulated follows the multi-variate Gaussian distribution, but checking multivariate normality is a difficult taskto perform with few sample observations. Thus, it is assumed that if the bivariatenormality is followed by the data set, the multivariate normality will also be followed(Goovaerts 1997). With the data set already in the normal score domain, it isexpected that multivariate normality is respected by the transformed data set.However, to examine bivariate normality, the h-scatterplot of the transformed dataset should be inspected; they should appear elliptical, with the long axis of the ellipseoriented along the one-to-one line (Tabachnick and Fidell 1989). Figure 5 shows the

Figure 1. Map of the Cape Thompson–Kivalina area. (Color figure available online.)

230 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

h-scatterplot of the normal score data set. Observe from the figure that theh-scatterplot is almost elliptical with the main diagonal line. This result supportedour assumption of the multivariate normality of the data set.

Figure 3. Histogram of sample data. (Color figure available online.)

Figure 2. Data location in the Kivalina area. (Color figure available online.)

Ensemble of Geostatistical Simulation 231

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

To model the spatial variability of normal score data, a variogram model wasconstructed. Prior to variogram modeling, the directional anisotropy was checkedby plotting a rose diagram. Figure 6 shows the rose diagram that was generated fromthe normal score data; it revealed that there is distinct anisotropy present in the dataset along the 120� angle. Therefore, two directional variograms were generated along30� and 120� directions.

Directional variograms of the normal transformed data are presented inFigure 7. Experimental variograms were fitted with the model variogram. An aniso-tropic spherical variogram model with nugget effect was selected to represent thedata, and the parameters of the variogram model were selected by a cross-validationstudy. The results (shown in Figure 8) were satisfactory for normal score data,because the scatterplot of normal score data and estimated values falls close to thediagonal line (Figure 8a), and the standardized error was close to mean zero andvariance 1 (Figure 8b).

Gaussian Simulation Results

After fitting the experimental variogram with the theoretical model variogram, thesequential Gaussian simulation was performed. Two different realizations of theSGS run are presented in Figure 9. Unlike kriging, stochastic simulation is not aimedat minimizing a local variance error, but rather at reproducing the statistics, that is,

Figure 4. Normal score histogram. (Color figure available online.)

Table 1. Basic statistics of the entire data set

StatisticsNo. ofsamples Minimum Maximum Mean

St.Dev

Coefficientof

variation Skewness Kurtosis

Gravel 149 0 97.78 32.82 34.59 1.23 1.8 7.6

232 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

the sample histogram and variogram. The performance of the SGS model was thentested by generating the histogram, variogram, and cumulative distribution function,and compared with sample statistics. Figures 10 through 12 show the histogram,variogram, and cumulative distribution function of sample data and one simulated

Figure 5. h-scatterplot of the normal score data. (Color figure available online.)

Figure 6. Rose diagram of the normal score data. (Color figure available online.)

Ensemble of Geostatistical Simulation 233

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

realization of data. The results show that the simulated data statistics reproduced thesample data statistics.

Selection of Ensemble Members

After running the simulation algorithm and generating numbers of realizations, weselected ensemble members from the generated realizations. Note that the simulationrealizations are equi-probable. There is no need to calculate the weight of

Figure 7. Experimental and fitted directional variograms of normal score data. (Color figureavailable online.)

Figure 8. Cross-validation results for variogram fitting. (Color figure available online.)

234 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

selected realizations in the ensemble, since they will be assigned equal weights. Twohundred realizations were generated to verify our approach. Figure 13 shows scatter-plots of three different realizations, one against another, for example, realization 1vs. 2, realization 2 vs. 3, and realization 3 vs. 1. We observed from the figure thatgenerated realizations are not independent of each other, which is the main assump-tion of E-type estimation. To select the less-correlated members in the ensemble,diversified realizations were selected by using the k-means clustering algorithm.The main purpose of any clustering algorithm is to maximize the inter-cluster dis-

Figure 9. Two different simulated realizations of sequential Gaussian simulation. (Color fig-ure available online.)

Figure 10. Sample histogram and realization histogram. (Color figure available online.)

Ensemble of Geostatistical Simulation 235

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

tance in an iterative manner, which ultimately leads to a number k of diversified clus-ters. Those members from different clusters will be nearly independent in nature.Since 200 realizations were available, the maximum number of clusters could be200, with each individual realization acting as cluster center.

First, we assumed that all 200 realizations belong to one single cluster. A singlebest realization out of 200 realizations was selected from that cluster. Selection wasmade based on the objective function [Eq. (4)]. The same objective function was usedto select the ensemble members by changing the cluster numbers incrementally from2 to 200. The ensemble member from a given cluster is chosen best when based onthe minimum Mahalanobis distance from that cluster center. The objective functionvalues over all cluster combinations, which is the sum of the deviation of the first-and second-order statistics of the ensemble from the data statistics, were then calcu-lated and plotted (see Figure 14). Observe from the figure that after cluster number13, no significant reduction of the objective function value occurs. Therefore, 13clusters were used in the final ensemble model. In Figure 14, up to 28 cluster valuesare plotted for good visualization. Figure 14 demonstrates that, up to a point, theobjective function values decrease with increasing cluster numbers; then the objectivefunction values follow a zigzag pattern. The reason for this change is that certainclusters in the ensemble are not diversified enough, which ultimately increases theensemble error. For example, with cluster number 14, when the realizations are selec-

Figure 12. Reproduction of cumulative distribution. (Color figure available online.)

Figure 11. Directional variograms reproduction of different realizations. (Color figure avail-able online.)

236 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

ted as ensemble members from 14 different clusters, the value of the objective func-tion increases due to the high correlation between ensemble members. When the clus-ter number is further increased, the objective function decreases due to a different setof ensemble members with low correlation.

Figure 13. Scatterplots of (a) realization 1 vs. 2; (b) realization 1 vs. 3; and (c) realization 2 vs. 3.

Figure 14. Value of objective function with different cluster number. (Only the first 28 clustervalues are plotted; no improvement in objective function value occurs after this point.) (Colorfigure available online.)

Ensemble of Geostatistical Simulation 237

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

Uncertainty Analysis

After selecting ensemble members from the generated realization, uncertainty analy-sis was carried out to determine the confidence of our estimated gravel deposit. Themost widely used method for risk assessment is the generation of probability maps atdifferent thresholds. The three maps shown in Figure 15 are generated probability

Figure 15. Probability maps above three different cutoff grades. (Color figure availableonline.)

238 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

maps of the likelihood of exceeding gravel values of 15, 45, and 75. Observe fromthese maps that the maximum probability of obtaining gravel is closer to the coastalside. If we go farther from the coast, the probability decreases.

The expected map and variance map from the ensemble of 14 clusters arepresented in Figure 16.

Conclusion

This paper has presented an approach to assessing the convergence of statistics ofsimulated realization to sample statistics. The convergence of realization statisticsis theoretically guaranteed over a number of independent realizations. However,due to a wrong choice in the random number generator, the generation of inde-pendent realization may not always be possible. We have proposed an approachto selecting the diversified realization (nearly independent) by using a k-means clus-tering algorithm. Results revealed that the ensembled average of simulated realiza-tions using clustering techniques reproduces sample statistics much better than theusual expected map (E-type). The algorithm was applied successfully to offshoregravel data for evaluation of a deposit.

We tried to minimize our objective function in Eq. (4) by assigning equal weightto the histogram and variogram reproduction. Since we tried to minimize theweighted average of two different objectives, we cannot guarantee that both termsin the objective function will minimize simultaneously. Although this approachreduces computational complexity by combining two different objectives, it doesnot guarantee a truly optimum ensemble model. A multi-objective optimization tech-nique can be applied to minimize both objectives together. Higher order statisticsreproduction like kurtosis and skewness can be incorporated in the objective func-tion also.

Figure 16. Ensemble E-type and variance map. (Color figure available online.)

Ensemble of Geostatistical Simulation 239

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

References

Arctic Climate Impact Assessment. 2004. Arctic Climate Impact Assessment. New York:Cambridge University Press.

Associated Press. 2001. Feds, state must work together on village erosion. October 31,Fairbanks Daily News-Miner, Fairbanks, AK.

Bandopadhyay, S., S. Naidu, J. Kellety, and A. D’Souza. 2007. Exploration and estimation ofgravel Resource potential in southeast Chukchi sea continental shelf off Kivalina, Alaska,Final report: Minerals Management Services, U.S. Dept. of Interior CooperativeAgreement, 1435–01-02-CA-85124, pp. 84.

Bauer, E. and R. Kohavi. 1999. An empirical comparison of voting classification algorithms:Bagging, boosting and variants. Machine Learning 36(1=2): 105–139.

Bliss, J. D. 1998. Aggregate modeling and assessment. In: Aggregate Resources: A GlobalPerspective. Bobrowsky, P. T. (ed.), 255–274. Rotterdam, Netherlands: A. A. Balkema.

Breiman, L. 1996. Bagging predictors. Machine Learning 26(2): 123–140.Cho, S-B., J.-H. Ahn. 2001. Speciated neural networks evolved with fitness sharing technique.

Congress on Evolutionary Computation 1: 390–396.Creager, J. S. and D. A. McManus. 1966. Geology of the southeastern Chukchi Sea. In:

Environment of the Cape Thompson Region. Wilomovsky, N. J. and Wolf, J. N. (eds.),755–786. Alaska: U. S. Atomic Energy Comm. Report, Chapter 26.

De Iaco, S. and M. Palma. 2002. Convergence of realization-based statistics to model-basedstatistics for the LU unconditional simulation algorithm: Some numerical tests. Stochas-tic Environmental and Risk Assessment 16: 333–341.

Deutsch, C. V. and A. G. Journel. 1998. GSLIB: Geostatistical Software Library and User’sGuide. New York: Oxford University Press, 340 pp.

D’Souza, A., S. Bandopadhyay, S. Naidu, R. Ganguli, R. and D. Misra. 2009. Explorationand estimation of gravel resource potential in Southeast Chukchi Sea Continental Shelfoff Kivalina. Alaska Marine Georesources & Geotechnology 27(4): 255–272.

Environmental Protection Agency. 2004. Fiscal Year 2004 Annual Report, U.S. Environmen-tal Protection Agency, EPA-190-R-04–001.

Gomez-Hernandez, J. J. and R. M. Srivastava. 1990. ISIM3D: An ANSI-C three dimensionalmultiple indicator conditional simulation program. Computers and Geosciences 16: 395–440.

Goovaerts, P. 1997. Geostatistics for Natural Resources Valuation. New York: OxfordUniversity Press, 483.

Goovaerts, P. 2000. Estimation or simulation of soil properties? An optimization problemwith conflicting criteria. Geoderma 97(3=4): 165–186.

Goovaerts, P. 2001. Geostatistical modelling of uncertainty in soil science. Geoderma 103(1=2):3–26.

Hengl, T., A. AghaKouchack, andM. Percec Tadic. 2010.Methods and data sources for spatialprediction of rainfall. In: Rainfall: Microphysics, Measurement, Estimation, and StatisticalAnalyses. Testik, F.Y. and M. Gebremichael (eds.), Washington, D.C.: AGU Books.

Hengl, T., B. Bajat, H. I. Reuter, and D. Blagojevic. 2008. Geostatistical modelling oftopography using auxiliary maps. Computers & Geosciences 34: 1886–1899.

Hiemstra, P. H., E. J. Pebesma, G. B. M. Heuvelink, and C. J. W. Twenhofel. 2010. Usingrainfall radar data to improve interpolated maps of dose rate in the Netherlands. Scienceof the Total Environment 409(1): 123–133.

Isaaks, E. H. and R. M. Srivastava. 1989. Applied Geostatistics. New York: Oxford UniversityPress, 561.

Journel, A. G. and F. Alabert. 1989. Non-Gaussian data expansion in the earth sciences. TerraNova 1: 123–134.

Journel, A. G. and C. J. Huijbregts. 1978. Mining Geostatistics. New York: Academic Press,600.

240 S. Chatterjee and S. Bandopadhyay

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013

Krogh, A. and J. Vedelsby. 1995. Neural network ensembles, cross validation, and activelearning. Neural Inf. Process. Syst. 7: 231–238.

Moore, D. G. 1964. Acoustic-reflection reconnaissance of continental shelves: Eastern BeringSea and Chukchi Seas. In: Papers in Marine Geology: Shepard Commemorative Volume.Miller, R.L. (ed.), 319–362. New York: Macmillan.

Pebesma, E. J. 2005. Mapping Radioactivity from monitoring data, automating the classicalgeostatistical approach. Applied GIS 1(2).

Rosen, B. E. 1996. Ensemble learning using de-correlated neural networks. Connection Science8(3=4): 373–384.

Schapire, R. E. 1990. The strength of weak learnability. Machine Learning 5(2): 197–227.Scheffner, N. W. and M. C. Miller. 1998. Development of water surface elevation frequency-

of-occurrence relationships for Kivalina, Alaska. Department of the Army, U.S. ArmyEngineer District, Anchorage, AK.

Stauffer, P. H. 1987. Quaternary depositional history and potential sand and gravel resourcesof the Alaskan Continental margins. In: Geology and Resource Potential of the Continen-tal Margin of Western North America and Adjacent Ocean Basins – Beaufort Sea to BajaCalifornia. Scholl, D. W., Grantz, A., and J. G. Vedder (eds.), 649–690. Houston, TX:Circum-Pacific Council for Energy and Mineral Resources.

Tabachnick, B. G. and L. S. Fidell. 1989. Using Multivariate Statistics (2nd ed.). New York:Harper Collins.

U. S. A. C. E. Division, North Atlantic. 1971. National Shoreline Study Regional InventoryReport North Atlantic Region, Vol. 1. New York: Corps of Engineers, 1–4.

Verly, G. W. 1994. Sequential Gaussian cosimulation: a simulation method integrating severaltypes of Information. In: Geostatistics Troia. Soares, A. (ed.), 85–94. Dordrecht: KluwerAcademic Publishers.

Ensemble of Geostatistical Simulation 241

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

07:

14 3

0 D

ecem

ber

2013