Pennell-Evolution-2014-talk

Post on 23-Aug-2014

699 views 7 download

Tags:

description

Talk on assessing the adequacy of phylogenetic trait models. Presented at Evolution 2014.

Transcript of Pennell-Evolution-2014-talk

The adequacy of phylogenetic trait modelsMatthew Pennell @mwpennell

In collaboration withRich FitzJohn Will Cornwell Luke Harmon

●●

●●

●●●

●●

●●

●●

●●

●●

●●

R2=0.67; p=0.002 R2=0.67; p=0.002

R2=0.67; p=0.002 R2=0.67; p=0.002Anscombe 1973

Is the model appropriate?

If not, what are we missing?

Is the model appropriate?

And if not, what are we missing?

● ● ● ● ● ● ● ●● ●

For simple regression models

Coo

k’s d

istan

ce

Observation

●●

For simple regression modelsRe

sidua

ls

Fitted values

● ●

Statistical tests of model adequacycompliment visual intuition

For phylogenetic trait models

Plotting the relevant data is challenging

No general methods for assessing model adequacy

Especially for complex models

θ1

θ2

θ3

For phylogenetic trait models

Plotting the relevant data is challenging

No general methods for assessing model adequacy

Our approach

Establishing scope

Quantitative traits

Univariate trait models

Tip states assume to ~ multivariate Gaussian

Fit a model to comparative data

Use "tted parameters to simulate data

Compare observed to simulated data

The general idea

The general idea

Fit a model to comparative data

Use "tted parameters to simulate data

Compare observed to simulated data

The general idea

Fit a model to comparative data

Use "tted parameters to simulate data

Compare observed to simulated data

Old statistical idea

θ

Pr(D

|θ)

θ

Pr(θ

|D)

Parametric bootstrapping

Posterior predictive simulation

If we re-ran evolution, how likely are we to see a dataset like ours?

Simulated data similar to observedModel likely adequate

Simulated data very different from observedModel likely inadequate

Comparing observed to simulated data

No two datasets are exactly alike

Use test statistics to summarize data in meaningfulways

No two datasets are exactly alike

Use test statistics to summarize data in meaningfulways

Comparing observed to simulated data

Species are not independent data points

Calculate test-statistics on contrasts

Comparing observed to simulated data

Species are not independent data points

Calculate test statistics on contrasts

Comparing observed to simulated data

Independent contrasts

A

B

C

Ci

Cj

n-1contrasts for n tips

Under BM modelC ~ Gaussian(0, σ)

When model is not Brownian motion

Contrasts no longer expected to be ~ Gaussian

Rescale branch lengths of phylogeny

When model is not Brownian motion

Contrasts no longer expected to be ~ Gaussian

Rescale branch lengths of phylogeny

For models that predict tip states to be multivariate Gaussian

ln L = -0.5[n ln(2π) + ln|Σ| + (Y - μX)’Σ-1(Y - μX)]

For models that predict tip states to be multivariate Gaussian

ln L = -0.5[n ln(2π) + ln|Σ| + (Y - μX)’Σ-1(Y - μX)]

Y is the observed tip states for the n species

μ is the mean of observed data

X is a column vector of 1

Σ is the expected variance-covariance matrixfor the tip states under the model

For models that predict tip states to be multivariate Gaussian

ln L = -0.5[n ln(2π) + ln|Σ| + (Y - μX)’Σ-1(Y - μX)]

Y is the observed tip states for the n species

μ is the mean of observed data

X is a column vector of 1

Σ is the expected variance-covariance matrixfor the tip states under the model

For models that predict tip states to be multivariate Gaussian

ln L = -0.5[n ln(2π) + ln|Σ| + (Y - μX)’Σ-1(Y - μX)]

Y is the observed tip states for the n species

μ is the mean of observed data

X is a column vector of 1

Σ is the expected variance-covariance matrixfor the tip states under the model

For models that predict tip states to be multivariate Gaussian

ln L = -0.5[n ln(2π) + ln|Σ| + (Y - μX)’Σ-1(Y - μX)]

Y is the observed tip states for the n species

μ is the mean of observed data

X is a column vector of 1

Σ is the expected variance-covariance matrixfor the tip states under the model

The Σ matrix

If we "t a Ornstein-Uhlenbeck model

Σij = σ2/2α(1-e-2αT)e-αCij

The Σ matrix

If we "t a Ornstein-Uhlenbeck model

Σij = σ2/2α(1-e-2αT)e-αCij

σ2 rate of diffusion

α pull towards optimum

T tree height

Cij shared branch lengthbetween tips i and j

The Σ matrix

If we "t a Ornstein-Uhlenbeck model

Σij = σ2/2α(1-e-2αT)e-αCij

σ2 rate of diffusion

α pull towards optimum

T tree height

Cij shared branch lengthbetween tips i and j

The Σ matrix

If we "t a Ornstein-Uhlenbeck model

Σij = σ2/2α(1-e-2αT)e-αCij

σ2 rate of diffusion

α pull towards optimum

T tree height

Cij shared branch lengthbetween tips i and j

The Σ matrix

If we "t a Ornstein-Uhlenbeck model

Σij = σ2/2α(1-e-2αT)e-αCij

σ2 rate of diffusion

α pull towards optimum

T tree height

Cij shared branch lengthbetween tips i and j

Building a unit tree

Rescale branch lengths by the amount of co(variance) we expect to accumulate under the model

A

B

C

vi’ = ΣAB - ΣAC

vi

Unit tree example

Ornstein-Uhlenbeck modelσ2 = 0.5 | α = 1

A

B

C

A

B

C

The nice thing about unit trees

Transformation applies to most* models ofcontinuous trait evolution

If model is adequate, contrasts on unit tree will beI.I.D. ~ Gaussian(0, 1)

Also applies to PGLS-style models

Create unit tree from parameter estimates

Compute contrasts on the residuals

If model is adequate contrasts of residuals will beGaussian(0,1) - same test statistics apply

Can compute test statistics onunit tree contrasts to assess adequacy

Var(contrasts)

|Con

tras

ts|

Ancestral state Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

}

|Con

tras

ts|

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Var(contrasts)

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Ancestral state

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

}

|Con

tras

ts|

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Var(contrasts)

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Ancestral state

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

}

|Con

tras

ts|

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Var(contrasts)

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Ancestral state

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

}

|Con

tras

ts|

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Var(contrasts)

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Ancestral state

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

}

|Con

tras

ts|

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Var(contrasts)

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Ancestral state

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

}

|Con

tras

ts|

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Var(contrasts)

|Con

tras

ts|

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Ancestral state

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Node height

Contrasts2

Den

sity

Den

sity

Contrasts XCu

mul

ativ

e Pr

}

|Con

tras

ts|

|Con

tras

ts|

Simulating new datasets

Tree has already been transformed

Simulate m new datasets under BM with σ2 = 1

Calculate test statistics on contrasts of simulated data

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

}●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

}

Compare observed test statistics todistribution of simulated test statistics

Putting it all together

Estimate θ

Estimate θ

Build unit tree

Estimate θ

Build unit tree

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Test statisticsobs data

Estimate θ

Build unit tree

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Test statisticsobs data

Simulate BM data

Estimate θ

Build unit tree

Test statisticsobs data

Simulate BM data

Test statisticssim data

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Estimate θ

Build unit tree

Test statisticsobs data

Simulate BM data

Test statisticssim data

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

Compare sim to obstest statistics

arbutus R package

Designed to interact with other R packages

Object-oriented

New models and test statistics can easily be added

arbutus R package

library(diversitree)lik <- make.bm(phy, data)div.fit <- find.mle(lik, x.init=1)

arbutus(div.fit)

library(geiger)g.fit <- fitContinuous(phy, data, model = “BM”)

arbutus(g.fit)

E.g.: seed mass evolution in Fagaceae

Ornstein-Uhlenbeck model

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Ornstein-Uhlenbeck model

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Are common trait models adequatefor real comparative data?

Analysis of 337 comparative datasets

Brownian motion

ree important plant functional traits

72 datasets (20-2,200 spp.) for speci"c leaf area

226 datasets (20-22,817 spp.) for seed mass

39 datasets (20-936 spp.) for leaf nitrogen

Wright et al. 2004Kleyer et al. 2008

Kew SID 2014

Brownian motionZanne et al. 2014

For each dataset

Fit three simple models of trait evolution (Brownian Motion, Ornstein-Uhlenbeck, Early Burst)

Compared model "t using AIC

Assessed the adequacy of the best-supported model

Model comparison using AIC

Datasets (1-337)

AIC

w

Brownian motion

Brownian motion Ornstein-Uhlenbeck Early Burst

Here’s the dark side

Best model rejected (p>0.05) - ML

72/72 speci"c leaf area datasets

185/226 seed mass datasets

39/39 leaf nitrogen datasets

p-values -- REML est. of σ2

p-value0 0.80

Den

sity

Speci"c leaf area Seed mass Leaf nitrogen

Models get worse as trees get bigger

Log(Tree Size)20 11,000

Dist

(sim

, obs

)

Speci"c leaf area Seed mass Leaf nitrogen

Simple, commonly used modelsare often woefully inadequate

But...we already knew that

We are (often) here

●●

●●

●●

●●

●●●

This is how we learn about biology!

Learn about issues with the data

Common issues with data

Phylogenetic error (topology & branch lengths)

Measurement error

Biologically interesting ‘outlier’ species

Learn about evolutionary processes

●●

Many ways to add complexity

Time heterogeneous models

Different models for different parts of the tree

Biologically motivated models

Test statistics can help us make informed decisions

May suggest types of models that have not even beendeveloped yet

Does it matter if a model is inadequate?

It depends on the question...

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

What is the rate of seed mass evolution?

Single optimum OU model is very misleading

It depends on the question...

What is the rate of seed mass evolution?

Single optimum OU model is very misleading

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

It depends on the question...

Was there an “early burst” in seed mass evolution?

Inadequate OU model likely doesn’t affect inference

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

It depends on the question...

Was there an “early burst” in seed mass evolution?

Inadequate OU model likely doesn’t affect inference

}

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Model adequacy is not binary

Whether the model is “good enough” depends on what questions you are asking

Some concluding thoughts

Understanding how a model fails can provide interesting biological insights

Pay attention to parameter estimates

Look carefully at the data

Plot the test statistics

Keep the question in mind

Pay attention to parameter estimates

Look carefully at the data

Plot the test statistics

Keep the question in mind

Pay attention to parameter estimates

Look carefully at the data

Plot the test statistics

Keep the question in mind

Pay attention to parameter estimates

Look carefully at the data

Plot the test statistics

Keep the question in mind

Advice and encouragementJosef UyedaDaniel CaetanoPaul JoyceGraham Slater

Amy ZanneRoxana HickeyAnahi EspindolaSimon Uribe-Convers

FundingNSFNSERC

NESCentUniversity of Idaho

NESCent Tempo & mode working group