Advanced analytical approaches in ecological data analysis
-
Upload
lamar-james -
Category
Documents
-
view
34 -
download
0
description
Transcript of Advanced analytical approaches in ecological data analysis
Advanced analytical approaches in ecological data analysis
The world comes in fragments
Species abundance matrix M
Site GPS location matrix D
Environmental variable matrix V
Spec
ies
Sites
Sites
SitesVa
riabl
es
Multivariate approaches to biodiversity
L
Spatial regressionCo-occurrence mappingRegression treeImpact analysis
S G6-3 A2-2 C4-4 J4-4 D2-4 K7-2 K7-4 F1-3 M7-2Achillea_pannonica 0 0.1 2 0.5 0 0 0 0 0.5Agrostis_capillaris 0.5 0.5 0.5 0.5 0 0.5 0 0.5 0.5Agrostis_stolonifera_agg. 0 0 0 0 0 0 0 0 0Agrostis_vinealis 0 0 0.5 0 0 0 0 0 0Ajuga_genevensis 0 0 0 0 0 0 0 0 0.5
S G6-3 A2-2 C4-4 J4-4 D2-4 K7-2 K7-4 F1-3 M7-2CaCO3 0.95 0.11 0.85 1.53 1.93 0.58 0.58 0.38 0.63Sand 85.66 81.31 74.42 74.24 74.24 83.45 83.45 78.45 82.15pH 8.69 8.01 7.97 8.05 8.08 8.23 8.23 8.25 8.4
Plot G6-3 A2-2 C4-4 J4-4 D2-4 K7-2 K7-4 F1-3 M7-2Longitude 317.78 187.24 237.32 322.62 217.79 388.38 382.38 226.3 412.75Latitude 266.85 307.27 299.92 188.9 259.69 209.6 209.6 221.79 177.88
The raw data
Basic questions:• Do soil characteristics influence
species abundances and diversity?
• How do these relationships change in time?
Starting hypotheses:• Neighboured plots are similar in
species composition.• CaCO3 is of major importance for plant
diversity.• Species occurrences is not random
with respect to soil characteristics
Neighboured plots are similar in species composition
We calculate the Soerensen (Dice) index of species
similarity and transform to a distance matrix (D = 1 – S)
We calculate the distance matrix of GPS data
Mantel test
CaCO3 is of major importance for plant diversity
Plot Long Lat Year Species Abundance CaCO3 Sand pH
A3-2 203.09 319.46 2006 3 0.7 0.95 85.66 8.69A3-3 197.09 325.46 2006 6 1 0.11 81.31 8.01A3-4 197.09 319.46 2006 4 0.8 0.85 74.42 7.97A4-2 218.95 331.64 2006 4 0.8 1.53 74.24 8.05A4-3 212.95 337.64 2006 4 1.2 1.93 74.24 8.08B3-3 209.28 309.6 2006 3 0.7 0.58 83.45 8.23B4-2 231.14 315.78 2006 4 0.8 0.58 83.45 8.23B4-4 225.14 315.78 2006 3 1.5 0.38 78.45 8.25B5-2 247 327.97 2006 3 0.7 0.63 82.15 8.4B5-4 241 327.97 2006 3 0.7 2.21 80.01 7.78C1-1 195.75 269.37 2006 6 1.8 1.51 79.16 8.02C2-2 211.61 275.55 2006 3 0.3 0.1 84.09 7.9
The SAM input file
CaCO3Species richness
5
1
5
7
17
41
35
Species richness at sites of different area Area Species31 355 59 5
15 722 1750 415 1
We did not include the spatial distance into the regression
Spatial autocorrelation is inevitable in ecology
y = 0.94x - 2.47r2 = 0.93, P < 0.01
0
20
40
60
0 20 40 60
Rich
ness
Area
General linear models in the face of spatial autocorrelation
Temperature Precipitation Aridity8.9 56.5 0.15
10.9 799.5 0.948.4 343.5 0.941.2 305.2 0.008.3 952.3 0.75
15.0 286.3 0.695.6 651.5 0.593.2 572.1 0.110.5 836.6 0.833.4 399.0 0.450.2 984.3 0.565.7 655.6 0.11
13.7 269.6 0.269.0 561.8 0.56
18.5 457.8 0.94
5
1
5
7
17
41
35
Abundance28.317.713.516.126.229.011.717.43.7
10.11.53.2
21.214.40.7
Spatial autocorrelation
Spatial autocorrelation is inevitable.All ecological field data sets have a spatial structure.
Collinearity
Autocorrelation
y = 5.0x-0.96
0.0001
0.001
0.01
0.1
1
1 100 10000
r2
N
𝐹= 𝑟2
1−𝑟2𝑛−𝑘−1
𝑘
𝐹= 𝑟2
1−𝑟2(𝑛−2)Bivariate case
F increases proportionally to the degrees of freedom n, that is to the number of
data points.P decreases with increasing number of
data points (sample size).
Any statistical test will eventually become significant if you only increase
the sample size.
Statistical significance at
the 1% error level
Plot Species CaCO3 Sand pHA3-2 3 0.95 85.66 8.69A3-3 6 0.11 81.31 8.01A3-4 4 0.85 74.42 7.97A4-2 4 1.53 74.24 8.05A4-3 4 1.93 74.24 8.08
5
1
5
7
17
41
35
Spatial autocorrelation
7
1735
7
1735
7
7
77
7
𝐹= 𝑟2
1−𝑟2𝑛−𝑘−1
𝑘= 𝑟 2
1−𝑟 218−3−1
3=4.5
𝐹= 𝑟2
1−𝑟2𝑛−𝑘−1
𝑘= 𝑟 2
1−𝑟 24−3−13
=0
Spatial autocorrelation reduces the effective degrees of freedom.
Using spatially autocorrelated data we artificially increase the degrees of freedom and the F-score.We get too often statistically significant results.
What to do??? First, test for spatial autocorrelation:Moran’s I
Reduce the degrees of freedom
N = 15
Neff = 4 𝐹 𝑒𝑓𝑓=𝐹4−44
=0 𝑡𝑒𝑓𝑓=𝑡 √ 4−44 =0
Neighbor joining cluster analysis
What to do???
UPGMA cluster analysis
Correct for the effects of spatial autocorrelation
What to do???
Correct for the effects of spatial autocorrelation
𝒀=𝑿𝑏+𝑐 𝑳𝒂𝒕+𝑑𝑳𝒐𝒏𝒈+𝐸
𝒀=𝑿𝑏+𝑐1𝑳𝒂𝒕+𝑐2𝑳𝒂𝒕2+𝑐1𝑳𝒐𝒏𝒈+𝑐2𝑳𝒐𝒏𝒈
2+𝐸
Trend surface analysis is able to capture broad scale trends
What to do???
𝜮𝑼=𝜆𝑼
(𝜮− 𝜆 𝑰 )𝑼=0
𝒀=𝑿𝑏+𝑼 𝑐+𝑬 Eigenvector regression or eigenvector mapping
SEuclidean distances G6-3 A2-2 C4-4 J4-4 D2-4
G6-3 0.0 136.7 87.0 78.1 100.3A2-2 136.7 0.0 50.6 179.8 56.5C4-4 87.0 50.6 0.0 140.0 44.7J4-4 78.1 179.8 140.0 0.0 126.5D2-4 100.3 56.5 44.7 126.5 0.0
Eigenvalues2140.4 -938.7
Eigenvectors0.191 0.0460.307 0.3940.244 0.3160.176 -0.1460.236 0.284
(𝑿𝑇𝑪 𝑿 )−1𝑿𝑇𝑪𝒀=𝑏
𝑪𝒀=𝑪 𝑿𝑏
Autocorrelation models
Multiply Y and X by a spatial corrective
𝑤𝑖𝑗=1
𝑑𝑖𝑗❑𝛼
Spatial weights of C
Often all the whole variance goes into the spatial component leaving no room for the predictors.
The larger a the more variance goes into space.a = means no spatial effect (OLS).
r is an additional weight factor (r < 1).r = 0 means no spatial effect (OLS).r = 1 means all variance goes into space.
Plot Longitude Latitude EV1 S CaCO3 Sand pHG6-3 317.78 266.85 0.01639 15 0.95 85.66 8.69A2-2 187.24 307.27 -0.03864 17 0.11 81.31 8.01C4-4 237.32 299.92 -0.02015 16 0.85 74.42 7.97J4-4 322.62 188.9 0.04304 16 1.53 74.24 8.05D2-4 217.79 259.69 -0.01349 13 1.93 74.24 8.08K7-2 388.38 209.6 0.05755 21 0.58 83.45 8.23K7-4 382.38 209.6 0.05561 15 0.58 83.45 8.23F1-3 226.3 221.79 0.001452 16 0.38 78.45 8.25M7-2 412.75 177.88 0.0756 13 0.63 82.15 8.4I3-1 300.57 198.58 0.03283 12 2.21 80.01 7.78
The input tab delimited text file for SAM
No clear spatial trend
in species richness
OLSVariables Coeff. Std.err. t p R^2Constant 6.14 5.62 1.09 0.28 0.00CaCO3 -0.10 0.37 -0.27 0.79 0.00Sand -0.02 0.04 -0.42 0.67 0.00pH 1.27 0.50 2.52 0.01 0.02r2 0.02P 0.08
Trend surface analysisVariables Coeff. Std.err. t p R^2Constant 6.13 5.58 1.10 0.27 0.00Longitude 0.01 0.00 3.00 0.00 0.02Latitude 0.01 0.00 1.75 0.08 0.00CaCO3 -0.35 0.38 -0.91 0.36 0.00Sand -0.06 0.05 -1.31 0.19 0.00pH 1.00 0.51 1.97 0.05 0.02r2 0.04P 0.007
Eigenvector mappingVariables Coeff. Std.err. t p R^2Constant 6.34 5.62 1.13 0.26 0.00EV1 8.08 5.72 1.41 0.16 0.01CaCO3 -0.19 0.38 -0.51 0.61 0.00Sand -0.01 0.04 -0.26 0.80 0.00pH 1.16 0.51 2.27 0.02 0.02r2 0.02P 0.06
The dependence of richness on pH vanishes after accounting for spatial
structure.
Do soil properties influence species richness?
The Hühnerwasser catchment is divided into a western and an eastern part with different sand soil content and pH. Trend surface analysis captures this gradient.