An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An...

An Introduction to Model-basedGeostatistics

Peter J Diggle

School of Health and Medicine, Lancaster University

and

Department of Biostatistics, Johns Hopkins University

September 2009

Outline

• What is geostatistics?

• What is model-based geostatistics?

• Two examples

– constructing an elevation surface from sparse data

– tropical disease prevalence mapping

Example: surface elevation data

1

2

3

4

56

X

0

1

2

3

4

5

6

Y

6570

7580

8590

9510

0Z

Geostatistics

• traditionally, a self-contained methodology for spatialprediction, developed at Ecole des Mines,Fontainebleau, France

• nowadays, that part of spatial statistics which isconcerned with data obtained by spatially discretesampling of a spatially continuous process

Kriging: find the linear combination of the data that bestpredicts the value of the surface at an arbitrary location x

Model-based Geostatistics

• the application of general principles of statisticalmodelling and inference to geostatistical problems

– formulate a statistical model for the data

– fit the model using likelihood-based methods

– use the fitted model to make predictions

Kriging: minimum mean square error prediction underGaussian modelling assumptions

Gaussian geostatistics (simplest case)

Model

• Stationary Gaussian process S(x) : x ∈ IR2

E[S(x)] = µ Cov{S(x), S(x′)} = σ2ρ(‖x − x′‖)

• Mutually independent Yi|S(·) ∼ N(S(x), τ2)

Point predictor: S(x) = E[S(x)|Y ]

• linear in Y = (Y1, ..., Yn);

• interpolates Y if τ2 = 0

• called simple kriging in classical geostatistics

Predictive distribution

• choose the target for prediction, F(S),where S = {S(x) : x ∈ A}

• draw samples Si : i = 1, ..., N from [S|Y ]

• then Fi = F(Si) : i = 1, ..., N is a sample from requiredpredictive distribution [F(S)|Y ]

Interpolating the elevation surface

Under Gaussian modelling assumptions, we need to:

• identify a parametric family of correlation functions

• fit the model

• use the model for prediction


The empirical variogram

(xi, Yi) : i = 1, ..., n uij = ||xi − xj || vij =1

2(yi − yj)

2

The theoretical variogram

V (u) =1

2Var{Y (x) − Y (x − u)} = τ2 + σ2{1 − ρ(u)}

Exploratory analysis

E[vij] = V (uij) ⇒ smoothed scatterplot of (uij, vij)identifies rough shape of ρ(u) and initial estimates ofmodel parameters

geoR code:

library(geoR)

data(elevation)

summary(elevation)

vario<-variog(elevation,uvec=0.2*(0:25))

plot(vario)

?variog

vario2<-variog(elevation,uvec=0.2*(0:25),trend="1st")

plot(vario2)

plot(vario$u,vario$v,type="l",xlim=c(0,5),ylim=c(0,7000),

xlab="u",ylab="V(u)")

lines(vario2$u,vario2$v,col="red")


• fit the model

1. Classical: compute maximum likelihood estimates θ

2. Bayesian: prior [θ] implies posterior [θ|Y ]

geoR code for option 1:

mlfit<-likfit(elevation,ini.cov.pars=c(5000,2.0),

cov.model="matern",kappa=1)


• fit the model

• use the model for prediction

1. Plug-in:[S|Y ; θ]

2. Bayesian:[S|Y ] =∫[S|Y ; θ][θ|Y ]dθ

geoR code for option 1:

region<-matrix(c(0,0,6.4,0,6.4,6.4,0,6.4),4,2,T)

grid<-pred_grid(region,by=0.2)

KC<-krige.control(obj.model=mlfit)

OC<-output.control(n.predictive=100)

set.seed(24367)

predictions<-krige.conv(geodata=elevation,locations=grid,

borders=region,krige=KC,output=OC)

image(predictions)

points(elevation,add=T)

Tropical disease prevalence mapping

• “river blindness” – an endemic disease in wet tropics

• donation programme of mass treatment with ivermectin

• approximately 50 million people treated to date(target is 80 million by 2015)

• serious adverse reactions experienced by some patientshighly co-infected with Loa loa parasites

• precautionary measures put in place before masstreatment in areas of high Loa loa prevalence

http://www.who.int/pbd/blindness/onchocerciasis/en/

Diggle et al, Annals of Tropical Medicine and Parasitology,101, 499–509.

The Loa loa prediction problem

Ground-truth survey data

• random sample of subjects in each of a number of villages

• blood-samples test positive/negative for Loa loa

Environmental data (satellite images)

• measured on regular grid to cover region of interest

• elevation, green-ness of vegetation

Objectives

• predict local prevalence throughout study-region (Cameroon)

• compute local exceedance probabilities,

P(prevalence > 0.2|data)

Loa loa: a generalised linear model

• Latent spatial process

S(x) ∼ SGP{0, σ2, ρ(u)}

ρ(u) = exp(−|u|/φ)

• Linear predictor

d(x) = environmental variables at location x

η(x) = d(x)′β + S(x)

p(x) = exp{η(x)}/[1 + exp{η(x)}]

• Conditional distribution for positive proportion Yi/ni

Yi|S(·) ∼ Bin{ni, p(xi)}

The modelling strategy

• use relationship between environmental variables andground-truth prevalence to construct preliminarypredictions via logistic regression

• use local deviations from regression model to estimatesmooth residual spatial variation

• use fitted model for predictive inference

logit prevalence vs elevation

0 500 1000 1500

−5

−4

−3

−2

−1

0

elevation

logi

t pre

vale

nce

logit prevalence vs max NDVI

0.65 0.70 0.75 0.80 0.85 0.90

−5

−4

−3

−2

−1

0

Max Greeness

logi

t pre

vale

nce

Comparing non-spatial and spatial predictionsin Cameroon

Non-spatial

Predicted prevalence - 'without ground truth data'

3020100

Obse

rved p

reva

lence

(%

)60

50

40

30

20

10

0

Spatial

Predicted prevalence - 'with ground truth data' (%)

403020100

Obs

erve

d pr

eval

ence

(%

)

60

50

40

30

20

10

0

Probabilistic prediction in Cameroon

Take-home message

• model-based approach:

– makes assumptions explicit

– makes choice of analysis strategy less subjective

– emphasises uncertainty

• exceedance probabilty maps are often more useful thanpoint predictions and standard errors

• text-book linked to geoR software

Diggle, P.J. and Ribeiro, P.J. (2007). Model-based Geostatistics.New York : Springer.

An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An...

Documents

Transcript of An Introduction to Model-based Geostatistics - Home | · PDF file · 2009-09-18An...