Geo479/579: Geostatistics Ch15. Cross Validation.

16
Geo479/579: Geostatistics Ch15. Cross Validation

Transcript of Geo479/579: Geostatistics Ch15. Cross Validation.

Page 1: Geo479/579: Geostatistics Ch15. Cross Validation.

Geo479/579: Geostatistics Ch15. Cross Validation

Page 2: Geo479/579: Geostatistics Ch15. Cross Validation.

Why is Cross Validation Useful?

Cross validation (CV) allows us to compare estimated and true values using only the information available in the sample data set

CV may help us to choose between different weighting procedures, search strategies, variogram models, or estimation methods

Page 3: Geo479/579: Geostatistics Ch15. Cross Validation.

Why is Cross Validation Useful..

In practice, CV results are often used simply to compare the distribution of the estimation errors or residuals from different estimation procedures and choose the one that works better

A careful study of the spatial distribution of cross validated residuals (estimated minus true values) can provide insights into where an estimation procedure may run into trouble

Page 4: Geo479/579: Geostatistics Ch15. Cross Validation.

Cross Validation Method

The sample value at a particular location is temporarily removed from the sample data set

Page 5: Geo479/579: Geostatistics Ch15. Cross Validation.

Cross Validation Method.. The value at the same location is then estimated using the remaining samples

Once the estimation is calculated we can compare it to the true sample value that was initially removed from the sample data set

This procedure is

repeated for all

available samples

Page 6: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Quantitative Tool

Table 15.2 shows that kriging is better because the estimation errors from ordinary kriging have a mean closer to 0 and have less spread

Page 7: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Quantitative Tool..

Smooth Effect !!!

Smooth Effect !!!

Page 8: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Quantitative Tool..

One of the factors that limits the conclusions that can legitimately be drawn from a cross validation exercise is recurring problem of clustering

=>If our original sample data set is spatially clustered, then so, too, are our cross validated residuals. Therefore, some conclusions drawn from it may be applicable to the entire map area, others may not

Page 9: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Qualitative Tool Figure 15.4 shows a

map of the ordinary kriging residuals from the cross validation study. A “+” symbol indicates an overestimation, and a “-“symbol for underestimation.

We prefer them to be conditionally unbiased with respect to their location. On this type of display we hope to see the “+” and “-“ symbols are mixed.

Page 10: Geo479/579: Geostatistics Ch15. Cross Validation.

Type 1 and Type 2 Samples

• These are two values of an indicator variable, T. This variable is explained on p4-6. Its statistical and spatial distribution is displayed on p73-75

Page 11: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Qualitative Tool..

In Figure 15.4 there is a fairly large patch of positive residuals around 110E, 180N

Most of the samples in this area are type 1 samples (type 1: T=1; type 2: T=2), so we need to consider how the ordinary kriging approach performs for the other type 1 samples

Page 12: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Qualitative Tool.. We focus on type 1

because of the specific goal. To improve the estimation, we expand the 25m search radius to 30m. The residuals were improved and shown in Figure 15.6

CV can also bring frustration since it often reveals problems that do not have straightforward solutions

Page 13: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Goal- Oriented Tool Imagine the Walker Lake

data set is an ore deposit, suppose that economic cutoff is 300 ppm; material with an ore grade of greater than 300 ppm will be classified as ore. Material less than 300 ppm will be classified as waste.

Figure 15.7: There are two types of misclassification

FalseFalseNegativeNegative

ErrorError

False Positive ErrorFalse Positive Error

OreOre

WasteWaste

Page 14: Geo479/579: Geostatistics Ch15. Cross Validation.

CV as a Goal- Oriented Tool..

For applications in which misclassification has important consequences, the minimization of the misclassification may be a much more relevant criterion than the various statistical criteria

The magnitude of misclassification is less important than the misclassification itself

Page 15: Geo479/579: Geostatistics Ch15. Cross Validation.

Limitations of Cross Validation

CV can generate pairs of true and estimated values only at sample locations

Clustering problem in the sample data set In practice, the residuals may be more representative

of only certain regions or particular ranges of values

Page 16: Geo479/579: Geostatistics Ch15. Cross Validation.

Limitations of Cross Validation..

Clustering problem can be overcome either by calculating declustered mean of residuals or by performing CV at a selected subset of locations that is representative of the entire study area

If very close nearby samples are not available in the actual estimation, it makes little sense to include them in CV

The problem areas identified by cross validation may warrant additional sampling, especially when there are major consequences