Spatial Data Uncertainty in Geostatistics and …...Spatial Data Uncertainty in Geostatistics and...
Transcript of Spatial Data Uncertainty in Geostatistics and …...Spatial Data Uncertainty in Geostatistics and...
Spatial Data Uncertainty in Geostatistics and Machine Learning: A Case Study
MINERAL RESOURCES
Franky Fouedjio and Jens Klump | Science Leader Earth Science Informatics30 May 2018
Introduction• Machine learning can be a great tool, but how does it compare
with established, well-understood tools?
• Specific questions:• Reporting and interpretation of uncertainty
• Spatially correlated variables
• Methods:• Real-world case of geochemical and auxiliary data
• Simulation of geochemical and auxiliary data
Spatial Data Uncertainty | Fouedjio & Klump2 |
Case Study: Soil Geochemistry• Data: soil geochemistry of southwest England (source: C. Kirkwood, BGS
G-BASE)
• Elements used in this study: Al, Ba, Br, Ca, Ce, Co, Cr, Cs, Fe, Ga, Ge, Hf, K, La, Mg, Mn, Mo, Na, Nb, Nd, Ni, P, Rb, Sc, Se, Si, Sm, Sr, Ta, Th, Ti, U, V, Y, Zr
• Other elements were excluded due to their hydrothermal mobility or concentrations below detection limits.
• Auxiliary data: Gravity, geomorphology, radiometrics, IR (LANDSAT)
• Geographically sparse data
• Aim: geochemical exploration (outliers)
Spatial Data Uncertainty | Fouedjio & Klump3 |
Study Area
Spatial Data Uncertainty | Fouedjio & Klump4 |
Auxiliary Variables
Spatial Data Uncertainty | Fouedjio & Klump5 |
Prediction – Kriging vs. Random Forest
Spatial Data Uncertainty | Fouedjio & Klump6 |
Pre
dic
tio
nU
nce
rtai
nty
Kriging Random Forest
KED vs. QRF in real-world data• Prediction uncertainty maps provided by KED vary less across the
study area than QRF.
• The largest prediction uncertainties given by KED are concentrated in those areas not surveyed or where the sampling was too sparse.
• Prediction uncertainty provided by KED depends mainly on the data configurations.
• Prediction uncertainty maps provided by QRF show spatial patterns not related to the density of sampling locations but to the distribution of particular auxiliary variables.
Spatial Data Uncertainty | Fouedjio & Klump7 |
Real-world vs. simulated data• Geological data are
notoriously sparse and incomplete because data are not easily obtained.
• To overcome the shortcomings of real-word geological data we decided to test KED and QRF on synthetic data sets.
Spatial Data Uncertainty | Fouedjio & Klump8 |
Simulated Data
Spatial Data Uncertainty | Fouedjio & Klump9 |
Simulated Data• Case 1: linear relationship between Y and X, where Y has a weak
spatial correlation.
• Case 2: linear relationship between the Y and X, where Y has a strong spatial correlation.
• Case 3: complex nonlinear relationship between Y and X, where Y displays a weak spatial correlation.
• Case 4: complex nonlinear relationship between Y and X, where the target variable displays a strong spatial correlation.
Spatial Data Uncertainty | Fouedjio & Klump10 |
Spatial Data Uncertainty | Fouedjio & Klump
Simulations of the auxiliary variables
11 |
An example of simulated auxiliary variables using Gaussian Random Fields.
Simulations of the target variables
Spatial Data Uncertainty | Fouedjio & Klump12 |
An example of the simulated target variables under different cases.
Case 2: Linear response, spatial correlation• As expected, Kriging with
External Drift (KED) outperforms Quantile Regression Random Forest (QRF).
Spatial Data Uncertainty | Fouedjio & Klump13 |
Case 3: non-linear, no spatial correlation• As expected, the Root Mean
Square Error (RMSE) shows that QRF performs better than KDE.
• The goodness of fit, accuracy and probability interval width show that KED has a better prediction uncertainty that QRF.
Spatial Data Uncertainty | Fouedjio & Klump14 |
Conclusions• We compared the predictive capacity and uncertainty of KED and
QRF using both real-world and synthetic data.• In a direct comparison of KED and QRF, both methods produced
similar predictive maps.• As expected, KED is able to exploit linear spatial dependencies
while QRF is better at handling non-linear dependencies.• Surprisingly, KED always outperformed QRF with respect to
measures of uncertainty.• Applications must weigh the benefits of better uncertainty in KED
against better handling of non-linearity in QRF.
Spatial Data Uncertainty | Fouedjio & Klump15 |
Mineral ResourcesJens KlumpScience Leader
t +61 8 6436 8828e [email protected] people.csiro.au/Jens-Klump
Stanford UniversityFrancky Fouedjio
t +61 2 9123 4567e [email protected] www.csiro.au/lorem
Thank you
MINERAL RESOURCES