Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003...

23
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 [email protected]

Transcript of Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003...

Page 1: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Phylogeny Estimation: Traditional and Bayesian Approaches

Molecular Evolution, [email protected]

Page 2: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Phylogeny Estimation

• Traditional approaches– Neighbour-joining algorithm– Tree searches with optimality criterion

• Parsimony• Maximum likelihood

• Bayesian Approaches

Page 3: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Traditional approaches

• Neighbour-joining algorithm– extremely popular.– relatively fast.– performs well when the divergence between sequences is low.

• The first step in the algorithm is converting the DNA or protein sequences into a distance matrix that represents the evolutionary distance between sequences.

1 2 3 4 5 1 H959 - 2 H3847 0.00752 - 3 H5539 0.00809 0.01069 - 4 H1067 0.00681 0.01593 0.01547 - 5 H3368 0.00855 0.01126 0.01706 0.01505 -

Page 4: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Traditional approaches

• Neighbour-joining algorithm– A serious weakness for distance methods, is that the observed

differences between sequences are not accurate reflections of the evolutionary distances between them

• Multiple substitutions

Page 5: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Traditional approaches

• Parsimony– In contrast to distance-based approaches, parsimony and ML map the

history of gene sequences onto a tree.

– In parsimony, the score is simply the minimum number of mutations that could possibly produce the data.

– Parsimony has a few obvious disadvantages.• The score of a tree is completely determined by the minimum number of

mutations among all of the reconstructions of ancestral sequences.• Another serious drawback of parsimony arises because it fails to account for

the fact that the number of changes is unlikely to be equal on all branches in the tree.

Page 6: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Traditional approaches

• Maximum likelihood• In ML, a hypothesis is judged by how well it predicts the observed data; the

tree that has the highest probability of producing the observed sequences is preferred.

• To use this approach, we must be able to calculate the probability of a data set given a phylogenetic tree.

– model of sequence evolution that describes the relative probability of various events.

– These probabilities take into account the possibility of unseen events.

• From many perspectives, ML is the most appealing way to estimate phylogenies. All possible mutational pathways that are compatible with the data are considered and the likelihood function is known to be a consistent and powerful basis for statistical inference in general.

Page 7: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Bayesian phylogenetics

• Bayesian approaches• Bayesian approaches to phylogenetics are relatively new, but they are

already generating a great deal of excitement because the primary analysis produces a tree estimate and measures of uncertainty for the groups on the tree.

• The field of Bayesian statistics is closely allied with ML.

Page 8: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Bayesian vs ML

• Maximum likelihood vs. Bayesian estimation– Maximum likelihood

• search for tree that maximizes the chance of seeing the data (P (Data | Tree))

– Bayesian inference• search for tree that maximizes the chance of seeing the tree

given the data (P (Tree | Data))

Page 9: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

The phylogenetic inference process

• Data collecting, the first step

• Typically, a few outgroup sequences are included to root the tree.

• Insertions and deletions obscure which of the sites are homologous.

• Multiple-sequence alignment is the process of adding gaps to a matrix of data so that the nucleotides in one column of the matrix are related to each other by descent from a common ancestral residue.

Page 10: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

The phylogenetic inference process

• In addition to the data, the scientist must choose a model of sequence evolution.

• Increasing model complexity improves the fit to the data but also increases variance in estimated parameters.

• Model selection strategies attempt to find the appropriate level of complexity on the basis of the available data.

• Model complexity can often lead to computational intractability.

model Free parameters

GTR 8

TN93 5

HKY85 F84 4

F81 3

K80 1

JC69 0

Page 11: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

The phylogenetic inference process

Page 12: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Bootstrapping

• The bootstrapping approach involves the generation of pseudoreplicate data sets by re-sampling with replacemtent the sites in the original data matrix.

• When optimality-criterion methods are used, a tree search is performed for each data set, and the resulting tree is added to the final collection of trees.

Page 13: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Markov chain Monte Carlo

• The Markov chain Monte Carlo (MCMC) methodology is similar to the tree-searching algorithm.

• From an initial tree, a new tree is proposed. The moves that change and the tree must involve a random choice.

• The MCMC algorithm also specifies the rules for when to accept or reject a tree.

Page 14: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Markov chain Monte Carlo

Page 15: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Markov chain Monte Carlo

Page 16: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

heated landscape

Markov chain Monte Carlo

• Metropolis Coupled Markov Chain Monte Carlo

Page 17: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Markov chain Monte Carlo

Page 18: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Bootstrapping and MCMC generate a sample of trees

Page 19: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Bootstrapping and MCMC generate a sample of trees

• Note that MCMC yields a much larger sample of trees in the same computational time, because it produces one tree for every proposal cycle versus one tree per tree search in the traditional approach.

• However, the sample of trees produced by MCMC is highly auto-correlated.

• As a result, millions of cycles through MCMC are usually required, whereas many fewer (of the order of 1,000) bootstrap replicates are sufficient for most problems.

Page 20: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Bayesian phylogenetics

• Bayesian approaches• Bayesian methods are exciting because they allow complex models of

sequence evolution to be implemented.– estimating divergence times– finding the residues that are important to natural selection– detecting recombination points

Page 21: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Comparison of Methods

Page 22: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Comparison of Methods

Page 23: Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003 mbhebsgaard@zi.ku.dk.

Conclusions

• The estimation of phylogenies has become a regular step in the analysis of new gene sequences.

• Still too early to tell if Bayesian approaches will revolutionize tree estimation in general, but it is already clear that MCMC-based approaches are extending the field by answering previously intractable questions.