Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November...
Transcript of Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November...
Managing and analysing (next-generation)multivariate ecological data:
new concepts and tools
Steve C. Walker
McMaster UniversityDepartment of Mathematics and Statistics
Bolker lab
November 21, 2012EEB seminar, McMaster University
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
Bythotrephes longimanus
Wisconsin Dept. of Natural Resources
Bythotrephes longimanus
Yan et al. 2002
Fourth-corner problem
abundance
species
siteproperties
prop
erties
spec
ies
fourthcorner
site
s
Legendre et al. 1997
Fourth-corner problem
abundance
speciessite
properties
prop
erties
spec
ies
fourthcorner
site
s
Legendre et al. 1997
Fourth-corner problem
abundance
speciessite
properties
prop
erties
spec
ies
fourthcorner
site
s
Legendre et al. 1997
Fourth-corner problem
abundance
speciessite
properties
prop
erties
spec
ies
fourthcorner
site
s
Legendre et al. 1997
Statistical methods for analyzing ‘fourth-corner’-esque data
I Chessel et al. (1996) — RLQ analysis
I Legendre et al. (1997) — coined term ‘fourth-corner’
I Ives and Godfray (2006) — mixed models ofphylogenetically-structured foodwebs
I Dray and Legendre (2008) — extends Legendre et al.
I Pillar and Duarte (2010) — phylogenetic null models
I Leibold et al. (2010) — semi-partial correlations
I Ives and Helmus (2011) — phylogenetic generalized linearmixed models (PGLMMs)
I ter Braak et al. (2012) – multiple comparison tests
The data frame
variables
repl
icat
es
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
e
species
env.vars.
Cantin et al. (2011)
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
e
species
env.vars.
Cantin et al. (2011)
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
e
species
env.vars.
Cantin et al. (2011)
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
e
species
env.vars.
Cantin et al. (2011)
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
e
species
env.vars.
Cantin et al. (2011)
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
e
species
env.vars.
Cantin et al. (2011)
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
How do we convert this into a data frame?
abundance
speciessite
properties
prop
erties
spec
ies
fourthcorner
site
s
Summarisation → lost information
site
smeanspecies
properties
=site
sabundance
species
×
spec
ies
speciesproperties
e.g. Leibold et al. 2010
Summarisation → lost information
site
s
meanspecies
propertiessite
properties
e.g. Leibold et al. 2010
Summarisation → lost information
site
s
functionaldiversityindices
siteproperties
e.g. Leibold et al. 2010
Repetition → redundant information
species 1, site 1
species 1, site 2
species 1, site 3
species 1, site 4
species 1, site 5
species 1, site 6
species 2, site 1
species 2, site 2
species 2, site 3
species 2, site 4
species 2, site 5
species 2, site 6
species 3, site 1
species 3, site 2
species 3, site 3
species 3, site 4
species 3, site 5
species 3, site 6
Abundance Environment Traits
Repetition → redundant information
species 1, site 1
species 1, site 2
species 1, site 3
species 1, site 4
species 1, site 5
species 1, site 6
species 2, site 1
species 2, site 2
species 2, site 3
species 2, site 4
species 2, site 5
species 2, site 6
species 3, site 1
species 3, site 2
species 3, site 3
species 3, site 4
species 3, site 5
species 3, site 6
Abundance Environment Traits
Repetition → redundant information
species 1, site 1
species 1, site 2
species 1, site 3
species 1, site 4
species 1, site 5
species 1, site 6
species 2, site 1
species 2, site 2
species 2, site 3
species 2, site 4
species 2, site 5
species 2, site 6
species 3, site 1
species 3, site 2
species 3, site 3
species 3, site 4
species 3, site 5
species 3, site 6
Abundance Environment Traits
Repetition → redundant information
species 1, site 1
species 1, site 2
species 1, site 3
species 1, site 4
species 1, site 5
species 1, site 6
species 2, site 1
species 2, site 2
species 2, site 3
species 2, site 4
species 2, site 5
species 2, site 6
species 3, site 1
species 3, site 2
species 3, site 3
species 3, site 4
species 3, site 5
species 3, site 6
Abundance Environment Traits
When converting a fourth-corner problem into a single dataframe you’ve got two choices:
I Summarisation → lost information
I Repetition → redundant information
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
Linear algebra as data management
Ancient Chinese text (∼150 BCE)
Linear algebra as data management
Hart (2009)
Linear algebra as data management
Hart (2009)
Linear algebra as data management
Solve for the b’s
y1 = b1x11 + b2x12 + . . . + bmx1m
y2 = b1x21 + b2x22 + . . . + bmx2m...
......
. . ....
yn = b1xn1 + b2xn2 + . . . + bmxnm
(1)
Linear algebra as data management
y =
y1
y2...yn
,X =
x11 x12 . . . x1m
x21 x22 . . . x2m...
.... . .
...xn1 xn2 . . . xnm
,b =
b1
b2...
bn
y = Xb
XTy = XTXb
(XTX)−1
XTy = b
(2)
Linear algebra as data management
y =
y1
y2...yn
,X =
x11 x12 . . . x1m
x21 x22 . . . x2m...
.... . .
...xn1 xn2 . . . xnm
,b =
b1
b2...
bn
y = Xb
XTy = XTXb
(XTX)−1
XTy = b
(2)
Linear algebra as data management
y =
y1
y2...yn
,X =
x11 x12 . . . x1m
x21 x22 . . . x2m...
.... . .
...xn1 xn2 . . . xnm
,b =
b1
b2...
bn
y = Xb
XTy = XTXb
(XTX)−1
XTy = b
(2)
The importance of data management to science
I Good theories of data management
(e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)
allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction,
therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem
(e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details
(e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation)
are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied
(e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such details
until we step outside of the domain of the theory.
The importance of data management to science
I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).
I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).
I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
Ihaka and Gentleman 1996
The data frame
variables
repl
icat
es
The R framework for data management
replicates
den
temp
precip
Chambers and Hastie 1991
The R framework for data management
rep
licate
s
+den ~ temp + precip
den
tem
p
pre
cip
Chambers and Hastie 1991
The R framework for data management
rep
licate
s
+den ~ temp + precip
+lm / glmer / plot / xyplot
den
tem
p
pre
cip
Chambers and Hastie 1991
The R framework for data management
rep
licate
s
+den ~ temp + precip
+lm / glmer / plot / xyplot
den
tem
p
pre
cip
=
temp
den
p < 0.0001
(intcpt)tempprecip
coef-1.2 2.1-0.1
s.e.0.40.10.1
Chambers and Hastie 1991
> datasetden temp precip
1 0.2 24.5 36.52 0.5 -26.4 36.03 0.8 4.9 15.54 1.5 12.2 34.85 0.6 18.7 99.3
> dataset[1:2, ]den temp precip
1 0.2 24.5 36.52 0.5 -26.4 36.0
> lm(den ~ temp + precip, data = dataset)Coefficients:(Intercept) temp precip
0.837385 0.001930 -0.002937
The data frame
variables
repl
icat
es
Fourth corner problem
abundance
speciessite
properties
prop
erties
spec
ies
fourthcorner
site
s
Legendre et al. 1997
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
especies
env.vars.
Cantin et al. (2011)
Goal Analyze next-generation multiple-table data setsusing this framework
Problem R doesn’t do multiple-tables ‘out-of-the-box’
Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets
data sourcesdata list
data frame + formula + function = analysis
Goal Analyze next-generation multiple-table data setsusing this framework
Problem R doesn’t do multiple-tables ‘out-of-the-box’
Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets
data sourcesdata list
data frame + formula + function = analysis
Goal Analyze next-generation multiple-table data setsusing this framework
Problem R doesn’t do multiple-tables ‘out-of-the-box’
Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets
data sourcesdata list
data frame + formula + function = analysis
Goal Analyze next-generation multiple-table data setsusing this framework
Problem R doesn’t do multiple-tables ‘out-of-the-box’
Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets
data sourcesdata list
data frame + formula + function = analysis
Goal Analyze next-generation multiple-table data setsusing this framework
Problem R doesn’t do multiple-tables ‘out-of-the-box’
Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets
data sources
data list
data frame + formula + function = analysis
Goal Analyze next-generation multiple-table data setsusing this framework
Problem R doesn’t do multiple-tables ‘out-of-the-box’
Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets
data sourcesdata list
data frame + formula + function = analysis
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
ock star: Hadley Wickham
reshape2
plyr
ggplot2
...
reshape2
abundance
time
spac
e
species
variables
repl
icat
escasting
melting
reshape2
abundance
time
spac
e
species
variables
repl
icat
es
casting
melting
reshape2
abundance
time
spac
e
species
variables
repl
icat
escasting
melting
reshape2
> X
, , capybara
midlatitude subtropical tropical equatorial arctic subarctic
2009 4 0 8 0 0 0
2008 0 10 0 7 0 0
1537 0 0 0 0 0 0
, , moss
midlatitude subtropical tropical equatorial arctic subarctic
2009 0 0 9 0 5 0
2008 6 0 0 3 0 0
1537 0 0 0 0 0 0
, , vampire
midlatitude subtropical tropical equatorial arctic subarctic
2009 0 0 0 0 0 0
2008 0 0 0 0 0 0
1537 0 1 0 0 0 0
reshape2
> Xmelt <- melt(X, varnames = c(’year’,’biome’,’species’),
value.name = ’abundance’)
> Xmelt
year biome species abundance
1 2009 midlatitude capybara 4
2 2008 midlatitude capybara 0
3 1537 midlatitude capybara 0
4 2009 subtropical capybara 0
5 2008 subtropical capybara 10
6 1537 subtropical capybara 0
7 2009 tropical capybara 8
...
48 1537 equatorial vampire 0
49 2009 arctic vampire 0
50 2008 arctic vampire 0
51 1537 arctic vampire 0
52 2009 subarctic vampire 0
53 2008 subarctic vampire 0
54 1537 subarctic vampire 0
reshape2
> acast(Xmelt, year ~ biome ~ species)
, , capybara
arctic equatorial midlatitude subarctic subtropical tropical
1537 0 0 0 0 0 0
2008 0 7 0 0 10 0
2009 0 0 4 0 0 8
, , moss
arctic equatorial midlatitude subarctic subtropical tropical
1537 0 0 0 0 0 0
2008 0 3 6 0 0 0
2009 5 0 0 0 0 9
, , vampire
arctic equatorial midlatitude subarctic subtropical tropical
1537 0 0 0 0 1 0
2008 0 0 0 0 0 0
2009 0 0 0 0 0 0
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
especies
env.vars.
Cantin et al. (2011)
Peter Solymos
mefa / mefa4
vegan
dclone
...
mefa / mefa4
count data matrix(x$xtab)
segments(x$segm)
data framefor samples(x$samp)
data frame for taxa(x$taxa)
Solymos 2009
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
especies
env.vars.
Cantin et al. (2011)
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
multitable
data sourcesdata list
data frame + formula + function = analysis
The central distinction of multitable
Variables
I Things that can berelated
I Axes on a scatterplot
I Columns in a dataframe (or database)
Replicates
I Information aboutrelationships
I Points on a scatterplot
I Rows in a data frame(or database)
The central distinction of multitable
VariablesI Things that can be
related
I Axes on a scatterplot
I Columns in a dataframe (or database)
Replicates
I Information aboutrelationships
I Points on a scatterplot
I Rows in a data frame(or database)
The central distinction of multitable
VariablesI Things that can be
related
I Axes on a scatterplot
I Columns in a dataframe (or database)
Replicates
I Information aboutrelationships
I Points on a scatterplot
I Rows in a data frame(or database)
The central distinction of multitable
VariablesI Things that can be
related
I Axes on a scatterplot
I Columns in a dataframe (or database)
Replicates
I Information aboutrelationships
I Points on a scatterplot
I Rows in a data frame(or database)
The scatterplot
●
●
●
●
●
●
●
●
●
●
−0.5 0.5 1.0 1.5
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
x variable
y va
riabl
e
The data frame
variables
repl
icat
es
The data frame — bipartite graph
replicates variables
Variables and replicates in the fourth corner problem?
abundance
speciessite
properties
prop
erties
spec
ies
fourthcorner
site
s
Legendre et al. 1997
Fourth corner problem — bipartite graph
sites
species
environment
abundance
traits
Thermocline manipulation experiment
abundance
time
species
traits
time
basin
scal
estim
especies
env.vars.
Cantin et al. (2011)
Thermocline manipulation experiment — bipartite graph
sites
time
species
environment
abundance
time scales
traits
Biadjacency matrices
sites
species
environment
abundance
traits
abundance environment traitssites 1 1 0
species 1 0 1
> install.packages(‘multitable’)> library(multitable)
> dlabundance:---------
sppA sppB sppCsiteA 0 1 10siteB 0 2 12siteC 2 1 1siteD 0 7 0siteE 2 0 0Replicated along: || sites || species ||
temperature:-----------siteA siteB siteC siteD siteE-0.24 0.40 2.12 -0.72 5.95Replicated along: || sites ||
continued...
bodysize:--------sppA sppB sppC0.87 1.52 2.67Replicated along: || species ||
REPLICATION DIMENSIONS:sites species
5 3
> summary(dl)abundance temperature bodysize
sites TRUE TRUE FALSEspecies TRUE FALSE TRUE
> dl[1:3, ]abundance:---------
sppA sppB sppCsiteA 0 1 10siteB 0 2 12siteC 2 1 1Replicated along: || sites || species ||
temperature:-----------siteA siteB siteC-0.24 0.40 2.12Replicated along: || sites ||
continued...
bodysize:--------sppA sppB sppC0.87 1.52 2.67Replicated along: || species ||
REPLICATION DIMENSIONS:sites species
3 3
> df <- as.data.frame(dl)> df
abundance temperature bodysizesiteA.sppA 0 -0.24 0.87siteB.sppA 0 0.40 0.87siteC.sppA 2 2.12 0.87siteD.sppA 0 -0.72 0.87siteE.sppA 2 5.95 0.87siteA.sppB 1 -0.24 1.52siteB.sppB 2 0.40 1.52siteC.sppB 1 2.12 1.52siteD.sppB 7 -0.72 1.52siteE.sppB 0 5.95 1.52siteA.sppC 10 -0.24 2.67siteB.sppC 12 0.40 2.67siteC.sppC 1 2.12 2.67siteD.sppC 0 -0.72 2.67siteE.sppC 0 5.95 2.67
> lm(abundance ~ temperature + bodysize, data = df)
Coefficients:(Intercept) temperature bodysize
-0.3613 -0.4403 2.1083
> lm(abundance ~ temperature * bodysize, data = df)
Coefficients:(Intercept) temperature bodysize
-2.1612 0.7580 3.1755temperature:bodysize
-0.7105
> df <- as.data.frame(dims_to_vars(dl))> df
abundance temperature bodysize sites speciessiteA.sppA 0 -0.24 0.87 siteA sppAsiteB.sppA 0 0.40 0.87 siteB sppAsiteC.sppA 2 2.12 0.87 siteC sppAsiteD.sppA 0 -0.72 0.87 siteD sppAsiteE.sppA 2 5.95 0.87 siteE sppAsiteA.sppB 1 -0.24 1.52 siteA sppBsiteB.sppB 2 0.40 1.52 siteB sppBsiteC.sppB 1 2.12 1.52 siteC sppBsiteD.sppB 7 -0.72 1.52 siteD sppBsiteE.sppB 0 5.95 1.52 siteE sppBsiteA.sppC 10 -0.24 2.67 siteA sppCsiteB.sppC 12 0.40 2.67 siteB sppCsiteC.sppC 1 2.12 2.67 siteC sppCsiteD.sppC 0 -0.72 2.67 siteD sppCsiteE.sppC 0 5.95 2.67 siteE sppC
> library(lme4)> form <- abundance ~ (temperature * bodysize) +
(-1 + temperature | species)> glmer(form, data = df, family = ’poisson’)
Bates, Maechler, and Bolker
Generalized linear mixed model fit by maximum likelihood:
Random effects:Groups Name Variance Std.Dev.species temperature 0.003439 0.05864
Number of obs: 15, groups: species, 3
Fixed effects:Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8456 0.5988 -1.412 0.1579tmprt 0.2845 0.2491 1.142 0.2535bdys 1.0077 0.2566 3.928 8.57e-05 ***tmprtr:bdys -0.2848 0.1391 -2.049 0.0405 *
Bates, Maechler, and Bolker
library(ggplot2)ggplot(df) +
facet_wrap(~ species) +aes(x = temperature, y = abundance, size = bodysize) +geom_point()
● ●
●
●
●
●●
●
●
●
●●
●● ●
sppA sppB sppC
0.02.55.07.5
10.012.5
0 2 4 6 0 2 4 6 0 2 4 6temperature
abun
danc
e
bodysize●
●
●
●
1.0
1.5
2.0
2.5
> cor(as.data.frame(dl))abundance temperature bodysize
abundance 1.0000000 -0.2839765 0.4176379temperature -0.2839765 1.0000000 0.0000000bodysize 0.4176379 0.0000000 1.0000000> summary(dl)
abundance temperature bodysizesites TRUE TRUE FALSEspecies TRUE FALSE TRUE
> dlmelt(dl)
$sites.species
abndnc sites species
siteA.sppA 0 siteA sppA
siteB.sppA 0 siteB sppA
siteC.sppA 2 siteC sppA
siteD.sppA 0 siteD sppA
siteE.sppA 2 siteE sppA
siteA.sppB 1 siteA sppB
siteB.sppB 2 siteB sppB
siteC.sppB 1 siteC sppB
siteD.sppB 7 siteD sppB
siteE.sppB 0 siteE sppB
siteA.sppC 10 siteA sppC
siteB.sppC 12 siteB sppC
siteC.sppC 1 siteC sppC
siteD.sppC 0 siteD sppC
siteE.sppC 0 siteE sppC
$sites
temp sites
siteA -0.24 siteA
siteB 0.40 siteB
siteC 2.12 siteC
siteD -0.72 siteD
siteE 5.95 siteE
$species
bodysize species
sppA 0.87 sppA
sppB 1.52 sppB
sppC 2.67 sppC
> identical(dl, dlcast(dlmelt(dl)))TRUE
> dlapply(dl, 2, mean)abundance:---------sppA sppB sppC0.8 2.2 4.6
Replicated along: || species ||
bodysize:--------sppA sppB sppC0.87 1.52 2.67Replicated along: || species ||
REPLICATION DIMENSIONS:species
3
IntroductionMotivation from observational community ecologyIllustrating the basic issue
Previous work on the data management-analysis interface, bothinside and outside of community ecology
The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school
The R multitable package
Thermocline deepening experiment
● ●
●
● ●
●
● ●
● ●
●
●
● ●
● ● ●
●
● ●
●
●
●
●
● ● ●
●
● ●
2.5
5.0
7.5
10.0
200 240 280week
The
rmoc
line.
Dep
th
basin
●
●
●
B1
B2
B3
(0.18) armoured rot (0.18) nauplii (0.18) unprotected rot
(0.33) Bosmina (0.36) colonial rot (0.45) Cycl adults
(0.75) Cal cope (0.77) Holopedium (0.80) Daphnia l&d
(0.96) Daphnia cat (1.23) Cycl cope (1.31) Cal adults
0.2
0.3
0.4
0.5
0.6
0.03
0.06
0.09
0.12
0.15
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.06
0.10
0.14
0.00
0.01
0.02
0.03
0.04
0.00
0.02
0.04
0.06
0.000.010.020.030.040.05
0.00
0.01
0.02
0.03
0.04
0.00
0.02
0.04
0.04
0.08
0.12
0.16
4 6 8 10 4 6 8 10 4 6 8 10
4 6 8 10 4 6 8 10 4 6 8 10
4 6 8 10 4 6 8 10 4 6 8 10
4 6 8 10 4 6 8 10 4 6 8 10Thermocline.Depth
sqrt
(abu
ndan
ce)
Length
0.25
0.50
0.75
1.00
1.25
basin
B1
B2
B3
(0.18) armoured rot (0.18) nauplii (0.18) unprotected rot
(0.33) Bosmina (0.36) colonial rot (0.45) Cycl adults
(0.75) Cal cope (0.77) Holopedium (0.80) Daphnia l&d
(0.96) Daphnia cat (1.23) Cycl cope (1.31) Cal adults
0.2
0.3
0.4
0.5
0.04
0.06
0.08
0.10
0.12
0.08
0.12
0.16
0.05
0.10
0.025
0.050
0.075
0.100
0.075
0.100
0.125
0.150
0.00
0.01
0.02
0.03
0.0100.0150.0200.0250.030
0.00
0.01
0.02
0.03
0.04
0.00
0.01
0.02
0.03
0.04
0.01
0.02
0.03
0.04
0.04
0.08
0.12
0.16
4 6 8 10 4 6 8 10 4 6 8 10
4 6 8 10 4 6 8 10 4 6 8 10
4 6 8 10 4 6 8 10 4 6 8 10
4 6 8 10 4 6 8 10 4 6 8 10Thermocline.Depth
sqrt
(abu
ndan
ce)
Length
0.25
0.50
0.75
1.00
1.25
basin
B1
B2
B3
ConclusionI The fundamental distinction between variables and replicates
that unifies most statistical software also applies tomultiple-table next-generation data
I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology
I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates
I Current limitations:I multitable only deals with arrays (not phylogenies, distance
matrices, etc...)I although data lists can be coerced to data frames which can
be used in virtually any R analysis function, it may be moreefficient to pass data lists directly
ConclusionI The fundamental distinction between variables and replicates
that unifies most statistical software also applies tomultiple-table next-generation data
I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology
I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates
I Current limitations:I multitable only deals with arrays (not phylogenies, distance
matrices, etc...)I although data lists can be coerced to data frames which can
be used in virtually any R analysis function, it may be moreefficient to pass data lists directly
ConclusionI The fundamental distinction between variables and replicates
that unifies most statistical software also applies tomultiple-table next-generation data
I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology
I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates
I Current limitations:I multitable only deals with arrays (not phylogenies, distance
matrices, etc...)I although data lists can be coerced to data frames which can
be used in virtually any R analysis function, it may be moreefficient to pass data lists directly
ConclusionI The fundamental distinction between variables and replicates
that unifies most statistical software also applies tomultiple-table next-generation data
I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology
I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates
I Current limitations:I multitable only deals with arrays (not phylogenies, distance
matrices, etc...)
I although data lists can be coerced to data frames which canbe used in virtually any R analysis function, it may be moreefficient to pass data lists directly
ConclusionI The fundamental distinction between variables and replicates
that unifies most statistical software also applies tomultiple-table next-generation data
I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology
I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates
I Current limitations:I multitable only deals with arrays (not phylogenies, distance
matrices, etc...)I although data lists can be coerced to data frames which can
be used in virtually any R analysis function, it may be moreefficient to pass data lists directly
Acknowledgements
I Ben Bolker (for being my new postdoc supervisor...and forextremely useful and encouraging discussions on this topiclong before that)
I Collaborators on the multitable project:I Pierre Legendre (previous postdoc supervisor)I Guillaume Guenard (Universite de Montreal)I Peter Solymos (University of Alberta)I Beatrix Beisner (Universite du Quebec a Montreal)
I Contributers to the free software I use
I Collectors of the free data I use
I Funding (NSERC, OGS, Pierre, Don, and the U of T)
I Laura Timms (McGill / ROM)