What is a Semivariogram¿

33
Apuntes de Geoestadística 1 INDICE INDICE......................................................................................................... 1 6.1: What is a Semivariogram ......................................................................... 2 6.2: Calculating the Experimental Semivariogram.............................................. 3 6.2.1: Components of the Experimental Semivariogram Calculation .................. 4 6.2.2: Semivariogram Equation Types ........................................................ 8 6.2.3: Sample Size ..................................................................................... 9 6.3: Identifying the Appropriate Semivariogram Calculation Method....................10 6.4: Calculating the Isotropic Experimental Semivariogram ...............................11 6.5: Calculating Anisotropic Experimental Semivariogram ..................................12 6.6: Fitting a Model Semivariogram to an Individual Experimental Semivariogram13 6.6.1: Rules of Thumb ................................................................................13 6.6.2: Nested Model Semivariogram Structures .............................................15 6.6.3: Fitting the Model Semivariogram ........................................................16 6.6.3.2: Fitting the Maximum Range .........................................................17 6.6.3.3: Fitting the Variance (C i ) ...............................................................17 6.6.3.3.1: Single Structure ....................................................................18 6.6.3.3.2: Multiple Structures / Nests .....................................................19 6.6.4: Clues That the Data Set Requires Re-Evaluation...................................20 6.7: Anisotropic Semivariogram Modeling ........................................................21 6.7.1: Selecting the Principle Experimental Semivariograms............................21 6.7.2: Defining Model Semivariogram Anisotropy ...........................................23 6.7.3: Rules of Thumb for Developing Anisotropic Semivariogram Models .........25 6.8: Summary: Typical Steps in Evaluating an Experimental and a Model ............25 6.9: Strategies for Large Data Sets .................................................................26 6.10: Other Semivariogram Requirements: Zones, Indicators and Thresholds, Covariances, and Soft Data ............................................................................26 6.10.1: Indicator and Class Kriging and Simulation ........................................26 6.10.2: Covariance ....................................................................................27 6.10.3: Imprecise Data and Indicator Kriging / Simulation ..............................27 6.10.3: Imprecise Data and Indicator Kriging / Simulation ..............................29 6.10.4: Data Variability: Non-Stationarity .....................................................31 6.12: References ..........................................................................................32

description

Semivariograma

Transcript of What is a Semivariogram¿

  • Apuntes de Geoestadstica 1

    INDICE

    INDICE......................................................................................................... 1 6.1: What is a Semivariogram......................................................................... 2 6.2: Calculating the Experimental Semivariogram.............................................. 3

    6.2.1: Components of the Experimental Semivariogram Calculation .................. 4 6.2.2: Semivariogram Equation Types ........................................................ 8

    6.2.3: Sample Size ..................................................................................... 9 6.3: Identifying the Appropriate Semivariogram Calculation Method....................10 6.4: Calculating the Isotropic Experimental Semivariogram ...............................11 6.5: Calculating Anisotropic Experimental Semivariogram..................................12 6.6: Fitting a Model Semivariogram to an Individual Experimental Semivariogram13

    6.6.1: Rules of Thumb................................................................................13 6.6.2: Nested Model Semivariogram Structures .............................................15 6.6.3: Fitting the Model Semivariogram........................................................16

    6.6.3.2: Fitting the Maximum Range .........................................................17 6.6.3.3: Fitting the Variance (Ci)...............................................................17

    6.6.3.3.1: Single Structure....................................................................18 6.6.3.3.2: Multiple Structures / Nests .....................................................19

    6.6.4: Clues That the Data Set Requires Re-Evaluation...................................20 6.7: Anisotropic Semivariogram Modeling ........................................................21

    6.7.1: Selecting the Principle Experimental Semivariograms............................21 6.7.2: Defining Model Semivariogram Anisotropy...........................................23 6.7.3: Rules of Thumb for Developing Anisotropic Semivariogram Models .........25

    6.8: Summary: Typical Steps in Evaluating an Experimental and a Model ............25 6.9: Strategies for Large Data Sets.................................................................26 6.10: Other Semivariogram Requirements: Zones, Indicators and Thresholds, Covariances, and Soft Data............................................................................26

    6.10.1: Indicator and Class Kriging and Simulation ........................................26 6.10.2: Covariance ....................................................................................27 6.10.3: Imprecise Data and Indicator Kriging / Simulation ..............................27 6.10.3: Imprecise Data and Indicator Kriging / Simulation ..............................29 6.10.4: Data Variability: Non-Stationarity .....................................................31

    6.12: References ..........................................................................................32

  • Apuntes de Geoestadstica 2

    6.1: What is a Semivariogram? In traditional statistics, the variance is used to define the variation of the sample values from the sample mean. A semivariogram measures the variation of samples with distance and direction. It describes the spatial relationship between the sample values. Traditionally, the semivariogram is presented as a graph: variance ( (h)) vs. distance. This graph represents the variation of sample values with distance. A set of semivariograms can be used to describe the sample variation with direction. An example model semivariogram is shown in Figure 6-1 where the principal components are identified. The Sill is equal to the data set variance. For distances less than the Range, the estimate of (h) is less than the Sill (the data set variance). The Sill is composed of two components; the Nugget and C. The Nugget is the expected variance when two different samples are separated by a zero distance (to small to measure, or from a split sample). Normally one would expect this to be 0.0, but from a practical point of view, this is often is not the case. The term Nugget comes from the gold mining industry, and arises when a single sample is divided into multiple sub-samples for quality control analysis. The sub-samples are identified with the same location, thus have a zero separation distance. If the identical point could be re-sampled with no measurement error, the variation would actually be 0.0 (REF - Clark), but this reproducibility is not normally possible. Because gold often forms in nuggets of pure gold, it is possible that one sub- sample will have a high gold assay, and the remaining sub-samples will have little, or no gold. As a result, there can be significant variation over very small, or zero, distances and thus there is a non-zero variance (note: as the sample size increases, the variance decreases, because average values for larger volumes are less influenced by small scale variations). In evaluating field data, a nugget may exist because the sample spacing is not small enough to define the short-range features of the semivariogram. The remainder of the Sill, not defined by the Nugget, is defined by C. The nugget, C, and range are a function of the model equation used. There are restrictions on acceptable functions, mainly the result must yield a positive definite kriging matrices. With the Nugget, C, Range, and model equation type defined, the spatial statistics of the site are defined and the data can be kriged.

    Figure 6-1: Basic components of a semivariogram.

  • Apuntes de Geoestadstica 3

    mention anisotropy at this point, or at least directional variation. A number of steps are involved in fully defining the model semivariogram. The basic steps for developing an experimental and a model semivariogram are shown in the flowchart in Figure 6-2. The difference between the experimental semivariogram and the model semivariogram are: The experimental semivariogram is discrete. The model semivariogram uses a mathematical equation to describe the spatial variability defined by the experimental semivariogram. This model is used by the kriging algorithm. A model function must be positive definite to ensure matrix solutions are non-singular.

    Figure 6-2: Flow chart for calculating experimental and model semivariograms.

    6.2: Calculating the Experimental Semivariogram The first step in the semivariogram analysis is to describe the spatial variation of the sample data by developing an experimental semivariogram. The quality of the experimental semivariogram depends largely on the number of samples available, the spacing of the samples, the range of structures that influence the samples at the site compared to the sample spacing, the degree of heterogeneity, and the

  • Apuntes de Geoestadstica 4

    values used as input parameters in the experimental semivariogram equation (lag, bandwidths, half angles, search directions). If good quality semivariograms cannot be developed, the quality of the kriged maps or simulated realizations will be poor and likely misleading.

    6.2.1: Components of the Experimental Semivariogram Calculation Before proceeding through the steps of calculating the experimental semivariogram, it is important to understand the calculation and what the parameters. Fundamentally, the experimental semivariogram represents the variance of the sample value at various separation distances. The general form of the equation is:

    (EQ. 6-1)

    where N is the number of data sample pairs separated by the distance h, z(xi) are sample values, z(xi + h) are all the sample values a distance h away from the sample xi, and *(h) is one-half the sample population variance. More simply:

    (EQ. 6-2)

    From a practical point of view, h is a distance interval, which is increased step-wise (0.0 - h, h - 2h, 2h - 3h, ...) over area of the field data. For each interval, pairs of points separated by that distance interval are used to calculate *(h). A commonly selected value for h, is the sampling interval or an approximate, average nearest sample spacing. For example, if samples are collected on a 10m x 10m grid, a good lag distance would be 10m. Once the lag is defined, the data set variance at that lag and integer multiples of that lag are calculated (1h, 2h, 3h, ... , nh). *(h) can be estimated for distances across the width of the field site, but due to boundary effects, estimates for lags greater than one-half the width of the sampled zone are suspect (REF). For large data sets, it may also be possible to calculate fewer than the largest lag intervals. If the range can be identified using only the initial lag intervals, calculating the additional lags just requires more computer time with minimal benefit. On large data sets, this time saving can be significant (hours or days) if the model parameters are carefully constrained. Lag sampling is conceptually displayed in Figure 6-3 , for a one-dimensional sampling pattern. For this example, the number of sample pairs and (h) values for each lag interval, are shown in Table 6-1. The experimental semivariogram is shown in Figure 6-4. Notice that the sample data repeat in sequences (1,1,1; 3,3; 2,2,2; ...) two or three times, this is accurately captured by the experimental semivariogram. The *(h) lag estimates reach the data set variance at about two and one-half lag spacings. Also note that the *(h) lag estimates drop markedly past five lag spacings. This occurs because, starting at seven lags, the middle data points are no longer used to estimate *(h), and overall, very few pairs are used to calculate *(h). *(h) estimates for lags greater than one-half the maximum search distance, should be suspect . This response reflects boundary effects. In this example the *(h) lag estimates dropped dramatically; in other data sets they may rise or fluctuate erratically.

  • Apuntes de Geoestadstica 5

    Figure 6-3: One-dimensional example of twelve equally spaced data points. For this example the logical lag spacing would be even increments of the sample spacing. With increasing lag spacing the existence or lack of long range structures is identified. NOTE: As the lag spacing increases, the support (number of pairs) decreases. Also note that many points are not even used in the calculations of the longer lags (lags 7 - 11: unused samples turn red as mouse moved over lag interval). See Table 6-1 and Figure 6-4 for tabulated and graphical results.

    Figure 6-4: Experimental semivariogram of 1D data presented in Figure 6-3. NOTE: As the range approaches 1/2 the maximum data range, the estimates of *(h) drop quickly. This is due to the reduction in the number of samples contributing to the calculation, because of the size of the sampled zone, and has little to do with the data samples themselves.

    Table 6-1: The variance, or sill, for this data set is 1.00. This suggests the range is between two and three lag spacings.

    Lag Interval Differences Squared

    Lag x 1 2 3 4 5 6 7 8 9 10 11 Sum g'(h) Pairs

    1x 2x 3x 4x 5x 6x 7x 8x 9x 10x 11x

    0 0 4 4 9 1 1 1 0 0 1

    0 4 4 9 1 1 1 0 0 1

    4 4 9 1 1 1 0 0 1

    0 1 1 1 1 4 4 1

    1 1 1 1 4 4 1

    4 4 4 9 9 1

    0 0 1 1 0

    0 1 1 0

    1 1 0

    0 1

    1

    11 17 25 26 25 15 7 2 1 1 1

    0.50 0.85 1.39 1.62 1.79 1.25 0.70 0.25 0.17 0.25 0.50

    11 10 9

  • Apuntes de Geoestadstica 6

    In addition to the lag, several other parameters are often used in calculating the experimental semivariogram: the search direction and plunge; horizontal and vertical bandwidths; and horizontal and vertical half-angles. These parameters are shown diagrammatically in Figure 6-5. The search direction refers to the azimuth direction (0 = North, 90 = East) of the principle search. The plunge (not shown) refers to the search direction relative to the horizontal ((-) below horizontal, (+) above horizontal) in the search direction. In application, points are not likely to lie exactly along the search direction, therefore half-angles and bandwidths are used. The half-angle refers to the number of degrees to either side or the search direction and plunge, a point may lie from the point being evaluated before the sample is ignored. Note, a 90 half-angle is used for isotropic models (all points within a given lag interval will be used). If the scope of the search cannot be constrained sufficiently using half-angles, bandwidths may also be used. This technique is useful when data are widely spaced, but come from distinctive layers. Use of the appropriate layer thickness or bandwidth, allows use of much longer ranges than can be achieved if only small half-angles are used (Figure 6-6). Use of narrow bandwidths, and large half-angles has the advantage that points within a layer can be used without including nearby points from adjacent layers that are not of interest. Points near layer boundaries will still affect results, and layers do need to be linear.

    Figure 6-5: Parameters used to calculate experimental semivariogram.

  • Apuntes de Geoestadstica 7

    Move mouse over grid to see experimental semivariogram and lag pairs for different search direction and half-angle combinations. Note: Athough there is a great deal of variablity in the different experimental semivariograms, there is very little variation at distances less than 1.5 km, where the variance is generally reached. For this data set, it would be appropriate to use an isotropic semivariogram model.

    Figure 6-6: To demonstrate half-angles, ...

    a).

    b).

  • Apuntes de Geoestadstica 8

    c). Figure 6-7: To demonstrate anisotropy, half-angles, and bandwidths, the indicator semivariograms (SH vs. SH-SS and SS) are calculated for the Yorkshire cross-section data (a). The isotropic *(h)s reach the variance at about 20m. If the search direction is horizontal, semi-parallel to bedding, and a 5 half-angle is used, the *(h)s reach the variance at about 200m. Using the same half-angle, if a 3m bandwidth (about 1/2 SS/SS-SH lens thickness) is used, the early lags are identical (< 40m), but the *(h)s reach the variance at about 500m. Use of a bandwidth here, improves the identification of correlated structures.

    6.2.2: Semivariogram Equation Types

    There are a limited number of mathematical functions that can be used to model the experimental semivariogram, because the function must ensure that the kriging matrices will be positive definite (Journal and Huijbregts, 1978; Isaaks and Srivastava, 1989). A non-positive definite function will allow singular matrices and nearly singular matrices that have no solution, or are numerically unstable. Several functions are commonly used, and although this may appear to be limiting, experience has shown that they satisfy the needs of most projects and most natural systems. Selecting the appropriate model function is based on the shape of the experimental semivariogram (the short lags that have not intercepted the sill) and prior experience with the functions and the type of data being evaluated. Common functions (Figure 6-7) are spherical, Gaussian, and exponential. Linear models and linear models with a sill, though are only semi-positive definite. In special cases, the hole-effect model is used; this model is useful when the samples exhibit a cyclic nature. Even with these models instability problems may still occur; Gaussian models are typically unstable, when the nugget is less than about 10% of the variance (Posa, 1989).

  • Apuntes de Geoestadstica 9

    Figure 6-7: Shape of various positive definite and semi-positive definite model equations. Notice that the linear model is not asymptotic the data set variance.

    A misleading term in these functions is the range term. Often the range is assumed to reflect the physical distance at which samples cease to be correlated. This is true for the range term in the spherical model, but for the exponential and Gaussian models, it is not. For the exponential model, the physical range is approximately three times the range term used in the exponential function. The exponential model increases indefinitely, and arbitrarily the range is defined as the distance at which the function reaches 95% of the data set variance (Journel and Huijbregts, 1978). The range term used in the Gaussian model is similarly misleading. The physical sqrt (3) * range equals approximately defined in the Gaussian model (Journel and Huijbregts, 1978).

    6.2.3: Sample Size In the simple example shown in Figures 6-3 and 6-4, a well behaved experimental semivariogram model was generated (it has a steady rising limb to the data set variance). This however, was a contrived example, and small samples such as this rarely yield such well behaved models. From a statistical view point, estimating a population variance based on twelve data points would be highly suspect (REF). Typically data sets with fewer than 50 data points should not be expected to yield sound statistical analyses (REF). *(h) values can also be ranked in importance by the number of sample pairs available for each lag. *(h) values for lags based on more pairs are thought to be more accurate, than those based on a few pairs (REF). Having more data points does not guarantee a well behaved semivariogram, but the results are likely to be statistically repeatable, and it is appropriate to continue with the semivariogram analysis. If the data set has fewer than 50 data points, it may be appropriate to 1) stop and use non-geostatistical techniques to evaluate the data, 2) stop and collect more data, or 3) continue, but recognize that the results are suspect. If the semivariogram analysis is to continue, it may be appropriate to evaluate the

  • Apuntes de Geoestadstica 10

    uncertainty associated with the experimental semivariogram models using a jackknifing technique (Wingle and Poeter, 1993). One reason for continuing the semivariogram analysis may be to guide future data collection. Even though the data are limited, it may be possible to identify possible site anisotropy, identify areas of large uncertainty (kriging estimation error), and identify if the range of the data has been captured. It is possible, that even at the shortest lag spacing, the *(h) values are equal to, or exceed the data set variance. A model of this type, is said to be pure nugget. This effect implies that neighboring data at the spacing interval being collected offer no useful information about unsampled data locations, other than to identify the mean value of the parameter in the area. This behavior suggests that samples need to be collected at smaller spacings. With a pure nugget semivariogram, kriging methods yield no useful information, other than identifying the mean and identifying that extremely little is known about the site.

    6.3: Identifying the Appropriate Semivariogram Calculation Method Semivariogram analysis refers to a specific function for evaluating how sample values vary spatially, but when used more generally, refers to a group of methods that use similar techniques for defining the spatial variation of the data. Alternative methods for describing spatial variation include semivariogram, covariogram, indicator semivariogram, log semivariogram, and soft indicator semivariogram analysis. Semivariogram analysis examines the difference between samples based on a single variable. Semivariograms are used for simple and ordinary kriging. Covariogram (Covariance) analysis examines the difference between samples where each sample is associated with more than one variable. Covariograms are used for cokriging. Indicator semivariograms are useful for evaluating discrete data (e.g. sand, gravel, and clay), or continuous data which can be divided into discrete groups, i.e. a sample value is less than, or greater than a cut-off or threshold value. Samples with values less than or equal to the cut-off value are assigned a value of 1. Samples with values greater than the cut-off value are assigned a value of 0 (there are minor variations on this theme if classes instead of thresholds are used (Wingle, 1997)). Once the data have been reclassified into 0s and 1s, traditional semivariogram techniques are applied. For a particular data set, there may be several thresholds (cut-off) or classes, and an indicator semivariogram will be needed for each. If poor quality indicator semivariograms are being generated, it has been argued that using the median indicator semivariogram (values < median data set value = 1, values > median = 0) will still yield reasonable results (Isaaks and Srivastava, 1989). Indicator semivariograms are used with indicator and Bayesian kriging. Log semivariograms are used in the same manner as semivariograms, except that the data are log transformed before the differences are calculated. This technique can be useful for data which exhibit a log-normal rather than a normal, distribution. Log semivariograms are used for disjunctive kriging. Soft indicator semivariograms are a combination of indicator and covariance techniques, and are used when there is a combination of precise (hard) data and

  • Apuntes de Geoestadstica 11

    imprecise (soft) data available. With this technique, only one soft indicator semivariogram needs to be calculated for each threshold or class. Evaluating soft data imprecision requires considerable effort. Soft indicator semivariograms are used in indicator kriging where hard and soft data are available. There are other experimental measures of spatial variation, but the previous methods are commonly used, and others are not discussed here.

    6.4: Calculating the Isotropic Experimental Semivariogram When initiating experimental semivariogram analysis, it is good practice to start by testing isotropic conditions. Isotropic analysis assumes data variability is independent of direction. Site conditions may not be isotropic, but testing isotropic conditions provides insight to general site conditions (approximate range, sill, nugget), and facilitates selection of a reasonable lag spacing. Also, if an acceptable isotropic experimental semivariogram cannot be calculated, it is unlikely that acceptable anisotropic semivariograms can be calculated (REF), although there are exceptions. Selecting an appropriate lag is important in experimental semivariogram analysis. For regularly spaced or gridded data, the lag should equal the data or grid spacing or an interval thereof. Shorter lags will be of little use because the shortest estimate possible for *(h) will be at the distance equal to the shortest data spacing. If selected lag is too large, the correlation structures may be missed, the range may be lost, or pure nugget effect may be found. If the lag too small, each *(h) lag estimate may be defined by so few pairs that they are statistically meaningless. When developing experimental semivariograms, try several different lags when calculating the isotropic semivariogram. If an experimental semivariogram is reasonably well behaved and a range is identifiable, continue with the next step of the analysis. If the initial lags are at or above the variance, the data spacing is too large (you have pure nugget effect). Data should be resampled at shorter lag intervals in this case. If the *(h) value at small lags dont follow a trend, or fluctuate erratically, try using: Different lag spacings. Lags near the average spacing between data or multiples thereof are usually the most useful. Anisotropic experimental semivariograms along directions of smallest spacing. Vertical spacings for example are usually much shorter than horizontal spacings. The results may yield insights to appropriate parameters in other directions. Collecting more data. Evaluate the relative number of pairs used to calculate each lag *(h) estimate. If the number of pairs alternates between a few and many pairs, this indicates a poorly selected lag interval. If the experimental semivariogram rises indefinitely (Figure 6-8) after reaching the variance, or if the curve of the lags is concave upward until the variance is reached, consider the possibility that there is a trend in the data. If there is a trend, remove the it using a trend surface analysis algorithm (Chapter 5). The semivariogram analysis on the residuals should produce a more conventional experimental semivariogram (Figure 6-8). For the rest of the geostatistical analysis (kriging, simulation), the residual data need to be used. The trend is added back into the final estimation results (after the kriging or simulation has been completed).

  • Apuntes de Geoestadstica 12

    Figure 6-8: These experimental semivariograms are based on the same data (RMA data (Appendix A.2): lag = 500 ft., search direction = 135, half-angle = 20), except the second-order trend was removed from the data on the right. The bedrock surface dipped to the Northwest. The continuously increasing *(h) values and the concave upward nature of the left semivariogram even above the variance indicate a trend is present in the data.

    6.5: Calculating Anisotropic Experimental Semivariogram Once the parameters have been defined for calculating a good isotropic semivariogram, it is time to evaluate whether anisotropic conditions can be identified at the site. If the isotropic model is useful, it may be possible to identify spatial anisotropy (generally if the isotropic model is poor, the anisotropic models will be worse). Judgement on how many anisotropic variograms to calculate depends on several factors: If the data set is large, computation time may limit the number of anisotropic semivariograms that can be calculated. This is a practical constraint. If data are limited, the quality of the anisotropic models is apt to deteriorate (too few pairs per lag). If the isotropic model is very good, it will probably be possible to get more good anisotropic models using small half angles, than if the isotropic model is relatively poor. If the model is three-dimensional, aligning the X and Y search directions to be parallel to, and the Z search direction perpendicular to the stratigraphy may be adequate. Care needs to be taken here. Internal structures such as cross-bedding could be more influential to the parameter of concern than the stratigraphic boundary. If a wide half-angle is used, probably not as many directions need to be calculated. Enough search directions need to be used, however so that no samples are missed (e.g.; if a 10 half-angle is used, that means every search direction will cover a 20 arc. To evaluate all directions (180, in 2D), a minimum of nine search directions

  • Apuntes de Geoestadstica 13

    are needed (0, 20, 40, 60, 80, 100, 120, 140, 160: Note 180 is the mirror of 0, 200 mirrors 20, etc.). Sometimes visually inspecting the data, can lead to identification of the initial estimates for the principle and minor search directions. This interpretation should be confirmed, but it can reduce computational effort. The smaller the half-angle, the more pronounced the anisotropy. Large half-angles effectively average or smooth the anisotropy (Figure 6-6).

    Interactive Example 6.1: Calculate isotropic and anisotropic experimental semivariogram.

    6.6: Fitting a Model Semivariogram to an Individual Experimental Semivariogram Once the experimental semivariogram has been calculated, a model semivariogram must be calculated, before it can be used by a kriging or simulation program. A model semivariogram is a special function which mathematically describes the spatial variation shown by the experimental semivariogram. Defining the model semivariogram is often accomplished in two stages. First models are fit to individual experimental semivariograms with different search directions. These are than merged to define site anisotropy. This final model, for 2D and 3D data sets is usually defined with a single semivariogram model describing the major axis, and range anisotropy factors to describe the minor axes. Only recent advances allow the individual axes to be defined separately (Wingle, 1997). This section will focus strictly on fitting a single model semivariogram to an individual experimental semivariogram. Fitting model semivariograms to experimental semivariograms is a qualitative process. The goal is straight forward: match the mathematical equation as closely as possible with the relevant lag points in the experimental semivariogram, but there are subtle problems. A series of guidelines and examples to help geostatisticians model the simple, and not so simple, experimental semivariograms are presented in the following sections.

    6.6.1: Rules of Thumb A number of considerations need to be kept in mind during the model fitting process. Some rules of thumb include: The model semivariogram should closely fit all lag set points. Assuming all points are equally important (not a good assumption), the model that minimizes the sum of squared errors, is the best solution. All lag points are not equally accurate. The lag set points that are defined by more sample pairs should be more closely honored than lag set points defined with fewer pairs. If a point is based on three samples and another is based on 100 samples, the point with more sample pairs should be more strongly honored. If the *(h) values, or the number of pairs, for consecutive lags oscillate ( *(h)n > *(h)n+1, *(h)n < *(h)n+2), it may be a poor lag spacing has been selected (Figure

    6-9). Often a minor change in the lag spacing will reduce or remove the oscillation ( *(h)n < *(h)n+1 < *(h)n+2).

  • Apuntes de Geoestadstica 14

    Figure 6-9: Experimental semivariogram of regularly gridded Yorkshire data. The grid spacing is 2m, but a lag of 7m was selected. Not being an even interval of the lag spacing the bouncing of the *(h) estimates, and the number of pairs per lag, is expected (Line connecting *(h) estimates is to emphasize fluctuation; it is not a model semivariogram). A more appropriate lag spacing will remove this fluctuation (Figure 6-6). Notice that neighboring estimates tend to have larger *(h) estimates if they are based of fewer pairs.

    Matching *(h) for lags less than the range is more important than matching those beyond the range. Points beyond the range are nearly irrelevant. The purpose of semivariogram analysis is to evaluate the distance over which neighboring points influence one another. Samples separated by distances larger than the range are not correlated (an exception to this rule occurs in areas with repeating structures; here hole-effect models are used). If the first lag point exceeding the variance is represented by a reasonable number of sample pairs, it should be used to define the maximum limit of the model range. If it is based on only a few pairs, and other lag points suggest a longer range, it is reasonable to use the longer range. Boundary effects can adversely affect the usefulness of the *(h) values at larger lags (Figure 6-4). This is usually due to the lack of sample pairs, but in larger data sets, unusual behaviors may occur because the long lags are based on data points that are not necessarily representative of the rest of the site. If the first lag point is near the variance or greater than the data set variance, the model should be defined as pure nugget. This result suggests that the minimum lag spacing is too large to identify the short range spatial correlation of the data. If possible, recalculate the semivariogram with a shorter lag. If the lag spacing already reflects the minimum sample spacing, more data, at closer spacings, will have to be collected before useful semivariograms can be generated. Non-zero nuggets are often appropriate, because the parameter of interest, has an effective random component, or there is a shorter range structure that cannot be identified with the current sample spacing. The lag interval should equal approximately the average neighbor data spacing. This is a practical rule for gridded data, but may not be useful for scattered data. Shorter lags may also be necessary to identify short range structures, but the lag should be related to the sample spacing if possible. It is more important to honor the short lag portion of the experimental semivariogram, than the latter portion, if compromises are required, because only

  • Apuntes de Geoestadstica 15

    the nearest samples are used in kriging estimation. If data are abundant within the maximum range, longer ranges may never be sampled, and the long range model fit may be irrelevant. Samples close to the location being estimated will also be weighted more heavily, therefore errors at short lag distances, will have a greater negative influence on the resulting estimation than errors at larger distances. Again, these again are only rules of thumb and can, at time, be contradictory. Semivariogram model fitting involves balancing these contradictory considerations.

    6.6.2: Nested Model Semivariogram Structures Often, it is not possible to accurately model an experimental semivariogram using a single C component, and range. Multiple correlation structures can contribute different amounts of variability over different distances and different directions. These different structures are reflected in the experimental semivariogram, and can be modeled using a series of nested model structures. To accomplish this the component C is divided into multiple components. Each Ci describes the data variability over different ranges. In reality, the basic nugget model is a two nested structure. The nugget (C0) describes the variability over zero distance, and C (C1) describes the variability over the remaining range. These Ci components are additive, and each component can be based on a different semivariogram equation type (e.g. spherical, gaussian, exponential). In summary:

    (EQ. 6-3)

    Each of the structures represents a different influence on the data and may have different orientations (Figure 6-10). In application, it is generally not possible, or justifiable to identify more the two or three nests (plus the nugget) in an experimental semivariogram.

    Figure 6-10: A multi-nested structure (left) is defined by a series of individual structures summed together (right). The nugget is nearly zero, but the best fit to the short lags suggests there is a small, random component at zero distance. The break in slope of the experimental semivariogram suggests a change in processes controlling the data distribution at about 20m. The remainder of the variance is more difficult to model. It has a slight sinusodal pattern, suggesting that a hole effect may be present. The *(h) values reach the variance at about 500m. This

  • Apuntes de Geoestadstica 16

    suggests a maximum range of about 500m, but because the maximum range of the data set is about 600m, this could be due to boundary effects. In this case, the trend on the earlier portion of the structure was honored.

    A multi-nested structure (left) is defined by a series of individual structures summed together (right). The nugget is nearly zero, but the best fit to the short lags suggests there is a small, random component at zero distance. The break in slope of the experimental semivariogram suggests a change in processes controlling the data distribution at about 20m. The remainder of the variance is more difficult to model. It has a slight sinusodal pattern, suggesting that a hole effect may be present. The *(h) values reach the variance at about 500m. This suggests a maximum range of about 500m, but because the maximum range of the data set is about 600m, this could be due to boundary effects. In this case, the trend on the earlier portion of the structure was honored.

    6.6.3: Fitting the Model Semivariogram Fitting a model semivariogram to a experimental semivariogram is straight forward, but there are a number of items to consider. Some rules of thumb for fitting semivariogram models were mentioned in Section 6.6.1. Basically the model semivariogram should come as close as possible to all points in the experimental semivariogram (less then the range), and then level out (if appropriate for the selected model function) once the range is reached. For well behaved experimental semivariograms, this process is straight forward. For less well behaved experimental semivariograms, the task involves a series of compromises. It is usually easiest to estimate the nugget and maximum range, then model and plot the resulting function (C equals the variance minus the nugget). Adjustments can then be made to this initial model. Basic steps for fitting the model semivariogram are discussed in the following sections, and displayed in Figure 6-11.

    Figure 6-11: This animation shows the basic steps involved in fitting a model semivariogram to an experimental semivariogram.

  • Apuntes de Geoestadstica 17

    6.6.3.2: Fitting the Maximum Range

    The maximum range defines the maximum distance over which neighboring samples improve estimates, over using the sample mean. When fitting the maximum range, it will ideally be less than the lag distance where the first *(h) estimate exceeds the data set variance, and more than the previous lag. Semivariograms models, however often are not ideal and the trend of the *(h) values at short lags may suggest a more natural intercept of the model semivariogram with the variance (Figure 6-11b).

    Figure 6-11b: A good initial estimate of the range often is where the first lag value crosses or equals the variance. Determining the precise range requires fine tuning the sill(C), nugget, and range parameters.

    6.6.3.3: Fitting the Variance (Ci)

    Once the nugget and maximum range have been approximately defined, Ci the rangei terms, and the model equation types can be defined (Figure 6-11c).

  • Apuntes de Geoestadstica 18

    Figure 6-11c: For a single nested model, C = variance - nugget. Determining the precise sill requires fine tuning the sill(C), nugget, and range parameters.

    6.6.3.3.1: Single Structure

    If modeling the semivariogram requires only a single structure (not including nugget), C is quickly calculated; it is the difference between the data set variance and the nugget (C = variance - nugget). By examining the shape of the experimental semivariogram, an equation type can be selected which best fits the rising limb of the experimental semivariogram. From this point, fine tuning of the of the nugget, C, and range will yield a satisfactory fit. If the model still does not fit well here are several things to consider: The model will not exactly pass though all the lag estimates. The model equation should minimize the squared deviations (some points should fall above an below the model curve). Lags based on a relatively large number of sample pairs should be more closely honored. Model fits with root smaller mean square errors (RMSE, square-root of the sum-of-squared weighted deviates, for lags less than the range), are generally better fits (Figure 6-12). Another model equation type may be more appropriate (Figure 6-11b, c and 6-12). It is possible, a multi-nested structure would be more appropriate. An obvious break in slope at a lag less than the range, suggests using multiple nests may be appropriate (Figure 6-10).

  • Apuntes de Geoestadstica 19

    a).

    b). Figure 6-12 a,b: At first, the spherical model (a) appears to fit the experimental semivariogram well. However use of a Gaussian model (b), provides a better fit and identifies that there is a non-negligible random component (nugget) to the data. (RMA residual bedrock data: lag = 400 ft., direction = 90, half-angle = 30). Using a least-squares method to rate the models (ignoring the number of pairs per lag), the MSE for lags less than 6600 ft. (the model range) are 446 for the spherical model (a) and 89.3 for the Gaussian model (b).

    6.6.3.3.2: Multiple Structures / &ests

    If multiple correlation structures compromise the experimental semivariogram, modeling is more difficult. Each structure has its own range and C component, and can be defined with a different equation type. Each structure is defined independently, and all are summed together for the final model semivariogram (Figure 6-10). Two conventions apply to developing nested semivariogram models: the physical range of a nest, should be less than the physical range of the next nest; and the sum of the nugget and the Ci terms should equal the data set variance. For nested structures, estimating the intermediate physical ranges is

  • Apuntes de Geoestadstica 20

    reasonably easy; they will coincide roughly with the break(s) in slope (Figure 6-10) of the experimental semivariogram. It is more difficult to estimate the Ci term, because the nested structures are cumulative, but they are constrained by the fact that the nugget plus the Ci must equal the data set variance. Once initial estimates are made, a trial-and-error approach is used to match the experimental semivariogram. The nugget, Cis, and range terms are adjusted one at a time. Some software packages include automated fitting algorithms which may yield good results. Even if the results arent satisfactory, they often provide a good initial estimate.

    6.6.4: Clues That the Data Set Requires Re-Evaluation When an experimental semivariogram is calculated, there are several things that should cause concern and lead to re-evaluation. One of the most common problems is that the model exhibits a pure nugget effect, the short lag values all exceed the data set variance (Figure 6-13), or successive lag values are so unrelated, that it is not possible to identify a model. Another problem arises when the *(h) estimates at each lag are based on a statistically insignificant number of pairs. The final problem discussed here is monotonically increasing *(h) lag estimates, at least in some directions; this final problem suggests there is a trend to the data (Figure 6-8).

    Figure 6-13: Sometimes the minimum lag spacing is too large to capture the data variance. When this occurs, the first lag *(h) estimate will be very near or above the data variance. In such cases, the experimental semivariogram is said to be pure nugget. If possible, the lag spacing should be reduced. If it cannot be reduced because it is already at the minimum sample spacing, additional data need to be collected, or non-geostatistical techniques should to be used.

    If the experimental semivariogram exhibits a pure nugget effect, the short lag estimates are greater than or equal to the variance, or consecutive lag estimates bare little relation to each other, there are a few things that can be tried. The likely cause of the problem is that to few samples were collected, or the sample spacing is too large. It is possible that using a different lag spacing will improve the semivariogram. Sometimes increasing the half-angles (and bandwidth) will help, because the number of pairs will increase for each lag. At times, the opposite, decreasing the half-angle (and bandwidth) may help, because the materials are similar along one orientation, but random in other directions (Figure 6-6; given a larger lag (Figure 6-13)), the isotropic model is pure nugget, but along bedding the range is about 500m). This is contrary to the general rule that, if a good isotropic semivariogram cannot be developed, it is unlikely that good anisotropic semivariograms can be generated. If the lag estimates are based on too few pairs,

  • Apuntes de Geoestadstica 21

    there are three options: 1) lengthen the lag, 2) increase the half-angles and bandwidths, or 3) collect more data. Sometimes visual inspection of the data, indicates the presence of long range structures. When the semivariograms are calculated however, the long ranges are not confirmed. In data sets where thin parallel units are common (e.g. cross-sections of stratigraphic units), use of a larger half-angle, in conjunction with a bandwidth equal to the approximate thickness of the average unit will produce good results (Figure 6-6). When evaluating a full suite of experimental semivariograms, some may look traditional, while others have *(h) lag estimates that continuously increase with larger lags. This suggests that there is a trend in the data. To correct this, the trend must be removed from the data (Figure 6-8). The semivariograms are recalculated based on the residuals, the residuals are kriged, and the trend is added back into the kriged results. If indicator semivariograms are being generated, it is possible that redefining the thresholds will improve results, however the regrouping must be physically meaningful. It has also been suggested that using a median threshold semivariogram model for all the thresholds, or at least for the poorly behaved thresholds is appropriate (Isaaks and Srivastava, 1989).

    6.7: Anisotropic Semivariogram Modeling To define the model semivariogram in two or three dimensions requires additional work and usually some compromise. The key steps are defining the principal experimental semivariograms and defining the model semivariogram anisotropy.

    6.7.1: Selecting the Principle Experimental Semivariograms When working strictly with isotropic semivariograms, go to the next section. If the data suggest the presence of anisotropy, this section provides guidance for defining anisotropic models. When calculating anisotropic experimental semivariograms, note which directions have longer and shorter ranges. Also view the sample data (posted in 2D or 3D space) and qualitatively evaluate if there are any obvious preferred orientations of continuity. If there are obvious visual structures, they generally will be identified by the experimental semivariograms. The longest range model is parallel with the orientation of the structure, the shortest range is perpendicular to the structure, and intermediate ranges will occur between these angles. For the ideal case (Figure 6-14), the ranges will correspond with an ellipse (2D) or ellipsoid (3D). If the structure is apparent when viewing the data, it should possible to get a good alignment of the major axis without having to preform many experimental semivariogram calculations. For 2D data sets, once the major axis of anisotropy is defined, the minor axis direction is also defined. It is 90 from the major axis. The minor axis of the experimental semivariogram is usually based on fewer lag sets, and is often of poorer quality (due to the shorter range, it is more difficult to model accurately). Three-dimensional models are more complex, but are similar conceptually. It is important to identify the orientation on the minor axis accurately, because it and the major-axis will define the orientation of the third, mid-axis. The mid-axis is perpendicular to both the major and minor-axes. Defining the orientation of the different axes is complex and can varies with the kriging package that is used. A common method is shown in Figure 6-15 (Deutsch and Journel, 1992).

  • Apuntes de Geoestadstica 22

    Figure 6-14: Ideal 2D range ellipse generated from anisotropic semivariogram models. Note that the modeled ranges do not fit exactly, with the ideal ellipsoid. This could be, because the principal axis was not specified correctly (maybe 47 or 48 would have improved the fit). Another alternative is that another structure is influncing the data. While the Gaussian model fits the shorter lags well; for some of longer range models, a mult-nested model may be a better choice.

  • Apuntes de Geoestadstica 23

    Figure 6-15: These diagrams describe the process used to define the search angles and anisotropy factors for semivariogram models.

    6.7.2: Defining Model Semivariogram Anisotropy Once the principal experimental semivariograms have been identified, they can be modeled, and the model anisotropy can be mathematically defined: Step 1): Identify the principal direction of anisotropy from the experimental semivariograms as the direction with the longest range (Figure 6-14 and 6-16). This is the major-axis of anisotropy. Identify the direction, perpendicular to the principal direction, with the shortest range (Figure 6-14 and 6-16). This is the minor-axis of anisotropy. If the model is a three dimensional model, the mid-axis of anisotropy will be perpendicular to the major and minor axes.

    a).

  • Apuntes de Geoestadstica 24

    b).

    Figure 6-16: These models define the primary semivariogram models for a 2D data set. If anisotropy factors are used a), the nugget, Ci, and nested model structures must be identical; only the range terms may vary. If directional semivariograms are used, b) the axis can be modeled independently, except for the nugget and the total data set variance. Data are residual bedrock elevations from the RMA, major axis direction = 0, minor = 90, and half-angle = 30.

    Step 2): Fit the best model semivariogram to the major-, minor-, and mid- (if 3D) axes of anisotropy. The nugget for each model must be the same. Depending on the kriging algorithm used, these model semivariograms may be used directly (Wingle, 1997, Wingle et al, 1995), or the mid- and minor-axes may need to be described in terms of anisotropy factors of the major axis. If anisotropy factors are used: One model (usually the major-axis model) must be selected as the key model. This is usually the most important model for describing spatial variability and is likely the most accurate (but this is not always the case). The parameters of this model are used to define the nugget, C, range, number of nests, and model types for all three axes. It is important to note that the perpendicular models will have the same nugget, the same number of structures, the same model equations, and the same Ci terms as the key model, only the range will vary. Once the key model is defined for the model semivariogram, the remaining axes should be fit to this model by adjusting the ranges. This is very restrictive, and it is likely that the fit will not be very satisfactory. If the results are very poor: Consider adjusting the key model. Compromising on all three models may help. Adding additional structures may improve the minor axis fit. This adds an additional level of complexity and can make the model fitting difficult. It also increases the kriging processing time slightly (each nest requires additional calculations). Once satisfactory models are defined for all three axes, the anisotropy factors can be determined. By convention, they are based on the principal axis of anisotropy. To define the anisotropy ratios, the ranges of the principal axis and other axis are divided (which value is taken as the denominator or numerator depends on the kriging package) for each structure. All other semivariogram parameters are identical. Note, if the model semivariograms suggest that the data set is isotropic, the anisotropy factors will be 1.0.

  • Apuntes de Geoestadstica 25

    6.7.3: Rules of Thumb for Developing Anisotropic Semivariogram Models The nugget for all search directions must be the same. Since the nugget is defined as the spatial variation at zero distance, direction is irrelevant. As a result, a well defined nugget from a good experimental semivariogram in one search direction can be used to define the nugget for poorly defined semivariograms with different orientations. The sill for all directions, theoretically does not have to be the same, but most kriging algorithms require that they be the same (Wingle, 1997). In some cases, this condition can be easily satisfied by adding an extra nest to the semivariogram model with an extremely large range. This assigns the balance of the variance to a fictitious nested structure. Because the last structures range is so long, it has relatively little impact on the kriged estimates over the range of interest. The ratios of range1 / range2, range2 / range3, etc. do not have to be the same for the different axes. Different structures can have different levels of influence in different directions.

    6.8: Summary: Typical Steps in Evaluating an Experimental and a Model Semivariogram Step 1) Data and Initial Parameters Acquire spatially located sample data and define values for the following items: Maximum search distance Lag interval Horizontal and vertical bandwidth Horizontal and vertical half-angles Horizontal search direction and vertical plunge Step 2) Data and Initial Parameters Calculate the experimental semivariograms. Depending on the size of the data set, and the number of lags, this process can take a considerable amount of computer time. Evaluate an isotropic experimental semivariogram. Experiment with various lags to determine a satisfactory lag interval. Evaluate anisotropic experimental semivariograms. Determine principal and minor anisotropy axes (longest, shortest ranges). Typically narrower half-angles and bandwidths, will maximize the observed anisotropy. Step 3) Model Semivariogram Fit the model semivariograms for at least the major-, minor-, and if appropriate (3D) mid-axes. If the model is isotropic, only the major axis needs to be defined (it is identical to the mid- and minor-axes). If an automated fitting algorithm is available, apply it, evaluate the results, and fine tune the model. Estimate the nugget. Estimate the maximum range. Identify and define nested structures. Experiment with, and select the components of the model semivariogram (nugget, Ci, rangei, structure equation type, number of structures). If anisotropy factors rather than directional semivariogram models will be used by the kriging algorithm: Define the primary model semivariogram axis. Re-model remaining semivariogram axes using the nugget, Cis, nests, and equations types from the primary model axis. The only variables that can be modified are the range terms for each nest. Calculate anisotropy factors.

  • Apuntes de Geoestadstica 26

    6.9: Strategies for Large Data Sets Thus far, the mechanics of calculating and modeling the semivariogram, and defining model anisotropy have been discussed. For small and non-complex data sets, these steps are sufficient. Some strategies for dealing with larger, more complex data sets are discussed below. The time required for semivariogram computation is dependent on the square of the number of samples (N2). If it requires five seconds to evaluate a data set with 1000 samples, it will take 25 seconds for 2000 samples, and about four hours for 30,000 samples, if every sample is compared with every other sample. If the sampled area is large, many samples may be beyond the range of influence of one another and there is little reason to compare them. By efficiently limiting comparisons, processing time can be substantially reduced. This is particularly important when anisotropy is evaluated, because a large number of experimental semivariograms need to be calculated (particularly in three-dimensional data sets). Consider the following options to reduce the processing time. Not all of the options are not applicable for all data sets: Visually examine the sample data. Estimate the maximum search range to something near the expected data correlation range. It may be appropriate to set the maximum search range to two or three times the expected range, to ensure the range is captured. If the range is not captured, the experimental lag *(h) estimate will not reach the data set variance, and the maximum range will have to be extended. If there are 20,000 points, the search area is 10,000m by 10,000m, and if the maximum search range can be limited to 3000m (about 2000 points in the search neighborhood), a 60% reduction in processing time would be expected. Reduce the size of the data set by discarding some of the data. It may be that the semivariograms are well defined, and more data has been collected than are needed to model the spatial variation. This can dramatically reduce processing time, and may only marginally reduce the experimental semivariogram quality. However, if the experimental semivariogram quality declines to unsatisfactory levels, some data will have to be reincorporated into the data set. Several subsets of the data set should be evaluated using the same search parameters to ensure the results do not change significantly.

    6.10: Other Semivariogram Requirements: Zones, Indicators and Thresholds, Covariances, and Soft Data Discussion thus far has focused on calculating experimental and model semivariograms using sample values with one measured parameter. Other measures of spatial variability are available for data sets that include a variety of types of data, and those in which the statistical quality of the data varies across the site of interest.

    6.10.1: Indicator and Class Kriging and Simulation Calculation of indicator or class semivariograms proceeds in exactly the same manner as the semivariogram analysis, once the data set is converted to 0s and 1s. The full semivariogram analysis needs to be repeated for each threshold (cut-off) or class (indicator). As the number of indicators increases, the amount of work required to define the model semivariograms increases. Also as the number of indicators increase, it is likely that the first and last indicator set will be small. This, generally, has the undesirable effect of reducing the quality of the experimental semivariograms, resulting in less accurate models.

  • Apuntes de Geoestadstica 27

    6.10.2: Covariance If cokriging techniques are to be used, covariograms between all the different variables need to be calculated. The same techniques that are used to model the experimental semivariograms are used to calculate covariograms, but there are many more models to evaluate.

    6.10.3: Imprecise Data and Indicator Kriging / Simulation Calculation of experimental semivariograms is more difficult when imprecise data are used. However, once calculated, fitting models to the semivariograms follows the same procedures. The fitting process can be less troublesome, because of the increase in information at short lags (Figure 6-17). The experimental models are often better behaved, but the nugget may be larger than if only hard data are used (Figure 6-18). The increased nuggets are due to the imprecise nature of the soft data. As with any indicator approach, the semivariogram analysis must be repeated for each indicator or class, depending on the methods used.

    a).

  • Apuntes de Geoestadstica 28

    b). Figure 6-17 a,b: The first semivariogram (a) is based on ten wells sampled at 30m intervals (hard data) from the Yorkshire cross-section (Figure 6-6). Samples at 2m intervals were derived from the actual values by adding a random noise component to simulate soft seismic data. Using only the hard data experimental semivariogram (a), it would be difficult to argue that this was not a pure nugget model. Certainly defending any other model would be difficult. By incorporating the soft data, the rising limb of the semivariogram is clearly defined (b); although the nugget is still large and reflects the imprecision on the soft data.

    Figure 6-18: Even when a reasonable model can be fit using only hard data (solid line), longer range structures may be identified using hard and soft data (dashed line). The model using the soft data is apt to have a larger nugget due to the data imprecision.

    The manner in which the hard (precise) and soft data (imprecise) are defined will vary for different kriging algorithms. Hard data are treated as before, but the imprecision of the soft data must be defined. Some data (Type-A), are evaluated as

  • Apuntes de Geoestadstica 29

    a group using the sample values. Using p1-p2 analysis (Section 3.3.1; Alabert, 1987; McKenna and Poeter, 1994), the probability that the value is correctly or incorrectly classified by the soft-data, can be defined. Some samples (Type-B) are known to belong to one of several indicator classes, but the probability of belonging to each class is unknown. Finally, multivariate analysis techniques (Chapter 4), can be used to determine the probability that a sample belongs to a particular indicator group (Type-C data, Table 6-8). the process of setting up the soft data for semivariogram analysis is highly dependent on the available software, and can require considerable effort. However, use of soft-data is invaluable in defining small scale features that couldnt otherwise be sampled. Setting up a problem with imprecise data can require considerably more effort, and the data sets are often much larger. As a result, processing time for semivariogram analysis is longer, because the spatial analysis algorithm is more complex and the data sets are larger. If the analysis is done correctly though, the results will be more accurate than those obtained using only limited hard data.

    6.10.3: Imprecise Data and Indicator Kriging / Simulation Calculation of experimental semivariograms is more difficult when imprecise data are used. However, once calculated, fitting models to the semivariograms follows the same procedures. The fitting process can be less troublesome, because of the increase in information at short lags (Figure 6-17). The experimental models are often better behaved, but the nugget may be larger than if only hard data are used (Figure 6-18). The increased nuggets are due to the imprecise nature of the soft data. As with any indicator approach, the semivariogram analysis must be repeated for each indicator or class, depending on the methods used.

    a).

  • Apuntes de Geoestadstica 30

    b). Figure 6-17 a,b: The first semivariogram (a) is based on ten wells sampled at 30m intervals (hard data) from the Yorkshire cross-section (Figure 6-6). Samples at 2m intervals were derived from the actual values by adding a random noise component to simulate soft seismic data. Using only the hard data experimental semivariogram (a), it would be difficult to argue that this was not a pure nugget model. Certainly defending any other model would be difficult. By incorporating the soft data, the rising limb of the semivariogram is clearly defined (b); although the nugget is still large and reflects the imprecision on the soft data.

    Figure 6-18: Even when a reasonable model can be fit using only hard data (solid line), longer range structures may be identified using hard and soft data (dashed line). The model using the soft data is apt to have a larger nugget due to the data imprecision.

    The manner in which the hard (precise) and soft data (imprecise) are defined will vary for different kriging algorithms. Hard data are treated as before, but the imprecision of the soft data must be defined. Some data (Type-A), are evaluated as

  • Apuntes de Geoestadstica 31

    a group using the sample values. Using p1-p2 analysis (Section 3.3.1; Alabert, 1987; McKenna and Poeter, 1994), the probability that the value is correctly or incorrectly classified by the soft-data, can be defined. Some samples (Type-B) are known to belong to one of several indicator classes, but the probability of belonging to each class is unknown. Finally, multivariate analysis techniques (Chapter 4), can be used to determine the probability that a sample belongs to a particular indicator group (Type-C data, Table 6-8). the process of setting up the soft data for semivariogram analysis is highly dependent on the available software, and can require considerable effort. However, use of soft-data is invaluable in defining small scale features that couldnt otherwise be sampled. Setting up a problem with imprecise data can require considerably more effort, and the data sets are often much larger. As a result, processing time for semivariogram analysis is longer, because the spatial analysis algorithm is more complex and the data sets are larger. If the analysis is done correctly though, the results will be more accurate than those obtained using only limited hard data.

    6.10.4: Data Variability: Non-Stationarity A key assumption of geostatistical analysis is the assumption of second-order stationarity (Journal and Huijbregts, 1978). This assumption implies that the mean difference between data samples may vary spatially, but the variance of the differences is constant (Figure 6-19). For many data sets, this assumption is not appropriate, therefore regions of the data set should be modeled separately. In Figure 6-18, an example is shown were the semivariogram models in the shallow and deep zones of a site are significantly different from one another, and from the semivariogram describing the entire data set. Is it more difficult to identify these differences in small data sets, but knowledge about site conditions may suggest the possibility. For example there may be a formation contact dividing the site where the materials are similar but were deposited by different processes. One example of this is a location including eolian sands and beach sands; though similar in material composition, they do not have a similar depositional structure, therefore the spatial statistics are likely to be different.

    Figure 6-19: One assumption in kriging is that the data are second-order stationary. In other words, while it is expected that the local mean will vary, the variance about the mean is assumed to be a constant. Significant deviations from this assumption can invalidate the model.

    The geostatistics of the data values can change across a site. If it does, the assumption of second-order stationarity is violated. The top semivariograms define the spatial variability for the entire Yorkshire cross-section (Figure 6-6a; Appendix A.4), when the upper and lower portions of the model are evaluated independently (Shallow and Deep sets), it is clear the spatial variances are significantly different. The horizontal range 70m in the shallow sediments, while the range 370m in the deep sediments. To identify this variability, statistical analysis can be preformed on various portions of the site. Some variability can be identified by examining histograms of the data (Figure 3-13), and semivariogram analysis of the

  • Apuntes de Geoestadstica 32

    data sub-sets can confirm the non-stationarity (Figure 6-20). If non-stationary exists, special steps need to be taken before proceeding to kriging or simulation (Dagdelen and Turner, 1996; Wingle and Poeter, 1996; Kushnir and Yarus, 1992). If multiple zones with different geostatistical characteristics are identified at the site, the semivariogram analysis needs to be repeated for each zone.

    Figure 6-20: The geostatistics of the data values can change across a site. If it does, the assumption of second-order stationarity is violated. The top semivariograms define the spatial variability for the entire Yorkshire cross-section (Figure 6-6a; Appendix A.4), when the upper and lower portions of the model are evaluated independently (Shallow and Deep sets), it is clear the spatial variances are significantly different. The horizontal range = 70m in the shallow sediments, while the range = 370m in the deep sediments.

    Interactive Example 6.3: Evaluate second order stationarity on an example data set.

    6.12: References Ababou, R., A.C. Bagtzoglou and E.F. Wood, 1994, On the Condition Number of Covariance matrices in Kriging, Estimation, and Simulation of Random Fields. Mathematical Geology, Vol. 26, No. 1, pp. 99-133. Clark, I., 1979, Practical Geostatistics, Elsevier Applied Science, New York. Dagdelen, K. and A.K. Turner, 1996, Importance of Stationarity for Geostatistical Assessment of Environmental Contamination. Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283.. R. M. Srivastava, S. Rouhani, M. V. Cromer and A. I. Johnson, Eds., Philadelphia, American Society For Testing and Materials.

  • Apuntes de Geoestadstica 33

    Deutsch, C.V. and A.G. Journel, 1992, GSLIB: Geostatistical Software Library and Users Guide. New York, Oxford Press. Isaaks, E.H. and R.M. Srivastava, 1989, An Introduction to Applied Geostatistics. New York, Oxford University Press. Journal, A.G., and Ch. J. Huijbregts, 1978, Mining Geostatistics, Academic Press, New York. Kushnir, G. and J.M. Yarus, 1992, Modeling Anisotropy in Computer Mapping of Geologic Data. Computer Modeling of Geologic Surfaces and Volumes, AAPG Computer Applications in Geology 1, Tulsa, The American Association of Petroleum Geologists, pp. 75-92. McKenna, S.A., 1994, Utilization of Soft Data for Uncertainty Reduction in Groundwater Flow and transport Modeling. Ph.D. Dissertation T-4291, Department of Geology and Geological Engineering. Golden, Colorado School of Mines. Wingle, W.L., and E.P. Poeter, 1993, Uncertainty Associated with Semivariograms Used for Site Simulation, Ground Water, Vol. 31, No. 5, pp 725-734. Wingle, W.L., and E.P. Poeter, 1996, Evaluating Subsurface Uncertainty Using Zonal Kriging, Uncertainty 96 (ASCE), University of Wisconsin, Madison, Wisconsin, August 1-3, 1996, Vol. 2, pp 1318-1330. Wingle, W.L., S.A. McKenna, and E.P. Poeter, 1995, UNCERT Users Guide: A Geostatistical Uncertainty Analysis Package Applied to Groundwater Flow and Contaminant Transport Modeling, (software and users guide), Colorado School of Mines. Wingle, W.L., 1997, Evaluating Subsurface Uncertainty Using Modified Geostatistical Techniques, Ph.D. Dissertation #T-4595, Department of Geology and Geological Engineering, Colorado School of Mines.