Statistical tools for European biodiversity risk assessment
description
Transcript of Statistical tools for European biodiversity risk assessment
Statistical tools for European biodiversity Statistical tools for European biodiversity risk assessmentrisk assessment
Adam Butler, Stijn Bierman, Glenn Marion Biomathematics & Statistics Scotland
With: Alex Cook & Gavin Gibson (Heriot-Watt), Ruth Doherty (Edinburgh), Ingolf Kuehn (UFZ), Phil Hulme (CEH)
22ndnd annual NCSE workshop, UKC, June 2007 annual NCSE workshop, UKC, June 2007
The ALARM projectThe ALARM project
• Assessing Large-scale risks to biodiversity with tested methods
• Project of the 6th framework programme of the European Union
• Runs from 2004-2009, involves 200+ scientists and social scientists, working in 67 organisations in 35 countries
• Main website: www.alarmproject.net
• BioSS is a partner, with three staff currently working on the project
Key objectives
• Develop an integrated risk assessment for biodiversity in
terrestrial and freshwater ecosystems at the European scale
• Focus on four key pressures – climate change, invasive species,
chemical pollution, pollinator loss – and their interactions
• Contribute to the dissemination of scientific knowledge and to
the development of evidence-based policy
Scenarios
Assessments relate to six scenarios of climate & land use change
• GRAS: deregulation, free
trade, growth, globalization
• BAMBU: “Business as might
be usual”
• SEDG: Sustainable European
Development Goal
• CUT: collapse of the
thermohaline circulation
• SEL: energy price shock,
mass growth in biofuels
• DEATH: global pandemic
The role of BioSS
• Research-consultancy: develop & apply novel quantitative
methods to support scientific research within ALARM
• Training: Development of an online training course on statistical
methods for environmental risk assessment
• Dissemination: Contribute to the construction of a risk
assessment toolkit for European biodiversity
Research themes
1 Statistical analysis of species atlas data
2 Quantification of uncertainty in complex mechanistic models
3 Elicitation of expert opinion regarding environmental risk
Species atlas dataSpecies atlas data
mean annual temperature (1960-1990)(degrees centigrade)
>18
16-18
14-16
12-14
10-12
<10
Galium pumilum (slender bedstraw) Mean annual temperature 1960-1990 (oC)
Species atlas data record the presence/absence of species, for each cell
on a regular grid – e.g. Florkart database of German vascular plants
• Atlas data are often used to analyze relationships between
environmental variables & the spatial distribution of a
particular species
• Aim is often predictive:
e.g. climate envelope modeling
• Crude statistical analyses are based on multiple regression
• Analyses should be modified to account for spatial
autocorrelation & non-detection
Distribution of individual species
Spatial autocorrelation
Zi = I(species present in cell i)
xi = covariates for cell i
dij = distance between cells i and j
Autologistic model (Augustin et al., 1996)
iii nZ xP ),,|1(logit
Bierman, S.M., Wilson, I.J., Elston, D.A., Marion, G., Butler, A. & Kühn, I. (in preparation) Bayesian image restoration techniques to analyze species atlas data with spatially varying non-detection probabilities.
Zi is a latent random variable
ii Nj ijNj ij
ji dd
Zn
1
Mit = 1 if Oit =1
Mit = 0 or 1 if Oit = 0
set up Markov chain Monte Carlo sampler on Mit such that Oit = 0/1;
Non-recording
yi = I(species recorded present in cell i)
zi = I(species actually present in cell i); a
latent random variable
Prior
iii yzy
i
iii y
zzy
1,|P
ByzAyii
iiizyBA 1,,,|P
Bierman, S.M., Wilson, I.J., Elston, D.A., Marion, G., Butler, A. & Kühn, I. (in preparation) Bayesian image restoration techniques to analyze species atlas data with spatially varying non-detection probabilities.
BABA
BABA
1),|(
,Beta~,|
P
Likelihood
Posterior
<0.05
0.5
1
a Galium pumilum
<0.05
0.5
1
b Papaver argemone
123456789
control group
a
1 2 3 4 5 6 7 8 9
05
00
10
00
15
00
20
00
control group
nu
mb
er
of g
rid
ce
lls
b
Galium pumilum (slender bedstraw)Detection effort
insect
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 0.5
0.4
0.3
0.2
0.1
insect
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 0.5
0.4
0.3
0.2
0.1
insect
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 0.5
0.4
0.3
0.2
0.1
insect
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 0.5
0.4
0.3
0.2
0.1
N
xx x x x xx x x x xx xx x x x x x x x x
x x x x x x x x xx x x x x x x x x x xx x x x x x x xx x x x x x
x x x x x x xx x x x x x x x
x x xx x
x x x x x x xx x x
x xxxx
x xx xx x
xx x x xx x
xx x
x xx x x x x
xx x
xx x xx xx x x
xx x x xx x x x xx x x
x x x xx x x
x x x xx x x xx x x
x x x x x x xx x x x x x xx x x x x x
x x x x xx xx xx x
x xx x
x x x xx x
x xx x
x xx xx x
x xx x x xx x
xx
x xx x x xx x x x x x x x x x
x x x x x x x x x x x x x x xx x x x x x x
x x x x xx
a xx x x x xx x x x xx xx x x x x x x x x
x x x x x x x x xx x x x x x x x x x xx x x x x x x xx x x x x x
x x x x x x xx x x x x x x x
x x xx x
x x x x x x xx x x
x xxxx
x xx xx x
xx x x xx x
xx x
x xx x x x x
xx x
xx x xx xx x x
xx x x xx x x x xx x x
x x x xx x x
x x x xx x x xx x x
x x x x x x xx x x x x x xx x x x x x
x x x x xx xx xx x
x xx x
x x x xx x
x xx x
x xx xx x
x xx x x xx x
xx
x xx x x xx x x x x x x x x x
x x x x x x x x x x x x x x xx x x x x x x
x x x x xx
b xx x x x xx x x x xx xx x x x x x x x x
x x x x x x x x xx x x x x x x x x x xx x x x x x x xx x x x x x
x x x x x x xx x x x x x x x
x x xx x
x x x x x x xx x x
x xxxx
x xx xx x
xx x x xx x
xx x
x xx x x x x
xx x
xx x xx xx x x
xx x x xx x x x xx x x
x x x xx x x
x x x xx x x xx x x
x x x x x x xx x x x x x xx x x x x x
x x x x xx xx xx x
x xx x
x x x xx x
x xx x
x xx xx x
x xx x x xx x
xx
x xx x x xx x x x x x x x x x
x x x x x x x x x x x x x x xx x x x x x x
x x x x xx
c
d e
low
medium
high
proportion of
poll. types/
wind speed
0 - <75
75 - <150
150 - <300
300 - <450
450 - <600
600 - <900
900 - <1200
1200 - <1500
1500 - <2100
>= 2100
topography:
altitudes
Distribution of functional traits
Pollination types in Germany
Kühn, I., Bierman, S.M., Durka, W. & Klotz, S. (2006) Relating geographical variation in pollination types to environmental and spatial factors using novel statistical methods. New Phytologist, 172(1), 127-139.
Spread of invasive species
• Species atlas data for invasive species may also contain information on time of arrival, establishment or naturalization
• We can, with care, use such data to draw inferences about the spatio-temporal spread of a species across a landscape, and thereby to assess the risks associated with future expansion
• Need to deal with environmental heterogeneity: land use & climate
By 1910
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1920
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1930
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1940
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1950
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1960
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1970
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1980
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 1990
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
By 2000
Spread of Giant Hogweed (Spread of Giant Hogweed (Heracleum MantegazzianumHeracleum Mantegazzianum))
Data : Data : National Biodiversity NetworkNational Biodiversity Network
• Dispersal rate modelled using a symmetric power law kernel:
• Arrival rate is treated as additive
• Colonization rate
Dispersal modelled using symmetric power law kernel
Colonization suitability for each site a function of Land-cover & Climatic covariates
Key methodological challenge: estimate covariate effects
Cook, A., Marion, G., Butler, A. and Gibson, G. (2007). Bayesian
inference for the spatio-temporal invasion of alien species.
Bulletin of Mathematical Biology, in press.
dji = distance from cell i to cell j xi = covariates for cell i
Ni = neighborhood of cell I Ti = year of colonization
)()()( tSt iiii x
iji TTNjjii tt
,
)()(
22)( jii dt
• Extend previous work by inclusion of “suitability”, S(xi)
Colonization suitability
Colonization probability: 10 year prediction
Posterior meanPosterior mean
Cook, A., Marion, G., Butler, A. and Gibson, G. (2007)
Cumulative rate of colonization
Cook, A., Marion, G., Butler, A. and Gibson, G. (2007)
with covariates
without covariates
• Deal with inhomogeneities in the recording process:
e.g. could analyse as three atlas surveys
• Allow for decolonisation
• Allow for time-varying covariates: e.g. land use change
Further work
Complex modelsComplex models
• Complex mechanistic models provide a valuable tool for
generating projections of large-scale environmental change
• Models are typically deterministic, but with uncertain inputs
(parameter values, initial values & boundary conditions)
• Models are evaluated across a regular spatio-temporal lattice
• Model outputs tend to exhibit systematic bias, e.g. because
sub-gridscale processes are not represented
• Use the Lund-Potsdam-Jena dynamic vegetation model to generate projected trends in global vegetation for the 21st century
• Control run: use climate inputs provided by observational data
• Other runs: inputs provided by simulations from one of nine General Circulation Models
Scenario SRES A2
“A future world of very rapid economic growth, low population growth
and rapid introduction of new and more efficient technology. Major underlying themes are economic and cultural convergence and capacity building, with a
substantial reduction in regional differences in per capita income. In this world, people pursue personal wealth rather than environmental quality…”
Doherty, R., Butler, A. & Marion, G. (in prep.) title to be decided
Data: PCMDI (www-pcmdi.llnl.gov), CRU (www.cru.uea.ac.uk)
Global annual net primary productivity
“Net
primary production is the rate at which
new biomass
accrues in an
ecosystem”
(Wikipedia)
Doherty, R., Butler, A. & Marion, G. (in prep.) title to be decided
Statistical post-processing
• Regression (Allen et al., 2002):
x = mM mym + e
• Hierarchical Bayesian modeling (Tebaldi et al., 2005):
Model each of x and y1,…,y|M| as independent realisations of “reality”, which
is a latent variable,
• Bayesian model averaging (Raftery et al., 2005):
f(x) = mM wm g(ym)
g(ym) estimated from a simple statistical model
ym = output from model mM
x = corresponding data
1) Assign weights w1,…,w|M|
Butler, A., Marion, G. & Doherty, R. (in prep.) Statistical averaging of
long-term projections generated by a set of environmental models
ym = output from model mM
x = corresponding data
1) Assign weights w1,…,w|M|
2) Calculate zm = ym - x
ym = output from model mM
x = corresponding data
Butler, A., Marion, G. & Doherty, R. (in prep.) Statistical averaging of
long-term projections generated by a set of environmental models
1) Assign weights w1,…,w|M|
2) Calculate zm = ym – x
3) Fit a set of possible statistical models, hn(zm), where nN
ym = output from model mM
x = corresponding data
Butler, A., Marion, G. & Doherty, R. (in prep.) Statistical averaging of
long-term projections generated by a set of environmental models
1) Assign weights w1,…,w|M|
2) Calculate zm = ym – x
3) Fit a set of possible statistical models, hn(zm), where nN
4) Apply a simple form Bayesian model averaging,
gm(zm) = mM vn hn(zm),
where vn exp(-BICn / 2)
ym = output from model mM
x = corresponding data
Butler, A., Marion, G. & Doherty, R. (in prep.) Statistical averaging of
long-term projections generated by a set of environmental models
1) Assign weights w1,…,w|M|
2) Calculate zm = ym – x
3) Fit a set of possible statistical models, hn(zm), where nN
4) Apply a simple form Bayesian model averaging,
gm(zm) = nN vn hn(zm),
where vn exp(-BICn / 2)
5) Apply a second level of model averaging,
f(x) = mM wm gm(ym – x)
ym = output from model mM
x = corresponding data
Butler, A., Marion, G. & Doherty, R. (in prep.) Statistical averaging of
long-term projections generated by a set of environmental models
Doherty, R., Butler, A. & Marion, G. (in prep.) title to be decided
Doherty, R., Butler, A. & Marion, G. (in prep.) title to be decided
Statistical methods: an overview
Single deterministic model
SACCO: Statistical Analysis of Computer Code Output
Single stochastic model
ABC: Approximate Bayesian Computation
Multiple deterministic models
Statistical post-processing
SACCO methodsR bundle: http://cran.r-project.org/src/contrib/Descriptions/BACCO.html
Generate a set of ensembles, y(1),…, y(M)
Emulation: construct a statistical model, (), which describes the relationship between inputs & outputs – a basis for interpolation
Calibration (Kennedy & O’Hagan, 2001): relate output to reality, via x = () + () + e, where and are Gaussian processes
y() = output from model given inputs
x = corresponding data
ABC methods
Assign informative prior distribution () to parameters
Simulate from the prior, m ~ (), and then model, y(m) ~Y(m)
Accept m if any only if
D(y(m), x) <
where D is a suitable distance metric and is small
Samples from an approximation to the posterior of |x,Y
Y() = output from model given inputs
x = corresponding data
Integrated risk assessmentIntegrated risk assessment
• One of the key tasks of ALARM is to produce a risk assessment
toolkit (RAT) for European biodiversity
• The RAT requires us to link detailed, often quantitative, scientific
assessments about risk with the requirements of policy-makers
• This involves the integration of observational data, output from
mechanistic models, and expert knowledge
A process of representing expert beliefs and opinions
about the properties of a system in the form of one or
more probability distributions
“The goal of elicitation, as we see it, is to make it as easy as possible for subject-matter experts to tell us what they believe, in probabilistic terms, while reducing how much they need to know about probability theory to do so…”
(Kadane & Wolfson, 1998)
Expert elicitation
The elicitation process
Kadane & Wolfson (1998), O’Hagan (1998)
Elicit basic quantities: means, quantiles
Produce a graphical representation
Fit a statistical model
Negative feedback Negative feedback
Potential dangers
Availability
Overconfidence
Anchoring
Inconsistency
Hindsight bias
Principles
Focus on observables
Quantiles rather than moments
Avoid tail probabilities
Focus on prediction
Interactive process
• Within ALARM, we are assisting Koos Biesmeijer (Leeds) in using expert opinion to identify the primary cause of decline in threatened European bee species
Habitat lossIntrinsic factors
Climate change
Native species dynamics
Low densitiesRestricted range
Resource changes:Host plantsHosts (cleptoparasitic bees)
Primary threat: species on UK red list
Threats to Bees
Potential threats