Phylogeny. Reconstructing a phylogeny The phylogenetic tree (phylogeny) describes the evolutionary...

40
Phylogeny Phylogeny
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    236
  • download

    2

Transcript of Phylogeny. Reconstructing a phylogeny The phylogenetic tree (phylogeny) describes the evolutionary...

Page 1: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Phylogeny Phylogeny

Page 2: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Reconstructing a phylogenyReconstructing a phylogeny The phylogenetic tree (phylogeny) describes The phylogenetic tree (phylogeny) describes

the evolutionary relationships between the the evolutionary relationships between the studied datastudied data

The data must be comprised of homologous The data must be comprised of homologous typestypes

In molecular evolution, the studied data are In molecular evolution, the studied data are homologous DNA/AA sequenceshomologous DNA/AA sequences

Phylogeny reconstruction explicitly assumes Phylogeny reconstruction explicitly assumes that the sequences are alignedthat the sequences are aligned

INPUT = MSA

Page 3: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Reminder: MSA and phylogeny Reminder: MSA and phylogeny are dependentare dependent

Inaccurate guide tree

MSA

Sequence alignment

0.4

Phylogeny reconstruction

Unaligned sequences

Page 4: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Phylogeny representation

CA D

Textual representation (Newick format)

B

• Each pair of parenthesis () encloses a clade in the tree • A comma “,” separates the members of the corresponding clade• A semicolon “;” is always the last character

Visual representation

((A,C),(B,D));

Page 5: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Some terminology

root

internal branches

(splits)

internal nodes External nodes (leaves)

monophyletic group (clade)

External branches

Neighbors

Neighbors

Page 6: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Chimp HumanGorillaHuman ChimpGorilla

=

Chimp GorillaHuman

= =

Human GorillaChimp

(Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human))

= ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla)

Swapping neighbors is meaningless

Page 7: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

1

2

3A

B

C

1

CBA

2

BCA

3

ABC

Rooted vs. unrooted

Page 8: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

1

2

3A

B

C

1

CBA

2

BCA

3

ABC

((C,B),A) ((A,B),C)

((A,C),B)(A,B,C)

In newick format

Page 9: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

How can we root a tree?

Page 10: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Rooting the tree based on a priori knowledge: using an outgroup

Human ChimpChicken Gorilla

INGROUPOUTGROUP

HumanChimpGorilla

Chicken

Human

Chimp

Chicken

Gorilla

The outgroup should be close enough for detecting sequence homology, but far enough to be a clear outgroup

Page 11: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

The gene tree is not always identical to the species tree

Gorilla

Chimp

Chicken

Human

Gorilla ChimpChicken Human Human ChimpChicken Gorilla

Gene tree

Species tree

Page 12: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Phylogeny reconstruction approaches

Distance based methods: Neighbor Joining

B

D

AC

E

AD

C

EB

A,B

B

D

AC

E

ABCDEA02344B0345C034D05E0

A,BCDEA,B02.54.53.5C034D05E0

The Minimum Evolution (ME) criterion: in each iteration we separate the two sequences which result with the minimal sum of branch lengths

Page 13: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Maximum Parsimony: finds the most parsimonious topology

Seq 1:

Seq 2:

Seq 3:

Seq 4:

1 3 2 4 1 4 2 3 1 2 3 4

Phylogeny reconstruction approaches

1 3 2 4 1 4 2 3 1 2 3 4

P(Data|T)

Maximum Likelihood: finds the most likely topology

Topology search methods: MP, ML

Page 14: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Distance based methods Distance based methods Neighbor Joining (e.g., using ClustalX)Neighbor Joining (e.g., using ClustalX)

FastFast InaccurateInaccurate

Topology search methods Topology search methods Maximum parsimony (e.g., using Maximum parsimony (e.g., using MEGAMEGA))

× CrudeCrude× Questionable statistical basisQuestionable statistical basis

Maximum likelihood (e.g., using Maximum likelihood (e.g., using RAxMLRAxML, , phyMLphyML))AccurateAccurate SlowSlow

Bayesian methods Bayesian methods Monte Carlo Markov Chains (MCMC) (e.g., using Monte Carlo Markov Chains (MCMC) (e.g., using MrBayesMrBayes))

Most accurateMost accurate Very slowVery slow

Phylogeny reconstruction approaches: summary

Page 15: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

How robust is our treeHow robust is our tree??

Human GorillaChimp

Page 16: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

We need some statistical way to We need some statistical way to estimate the confidence in the estimate the confidence in the tree topologytree topology

But we don’t know anything But we don’t know anything about the distribution of tree about the distribution of tree topologiestopologies

The only data source we have is The only data source we have is our data (MSA)our data (MSA)

So, we must rely on our own So, we must rely on our own resources: resources: “pull up by your “pull up by your own bootstraps”own bootstraps”

Bootstrap for estimating robustness

Page 17: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Bootstrap1. Create n (100-1000) new MSAs (pseudo-MSAs) by randomly sampling K positions from our original MSA with replacement

12345 K1 : ATCTG…A 2 : ATCTG…C3 : ACTTA…C 4 : ACCTA…T

11244…31 : AATTT…C2 : AATTT…C3 : AACTT…T4 : AACTT…C

97478…101 : TTTTA…T2 : CATAC…A3 : CATAC…T4 : AGTGG…A

51578… 121 : GAGTA…T2 : GAGAC…G3 : AAAAC…A4 : AAAGG…C

Sp1Sp2

Sp3

Sp4

Page 18: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Bootstrap2. Reconstruct a pseudo-tree from each pseudo-MSA with the same method used for reconstructing the original tree

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

11244…31 : AATTT…C2 : AATTT…C3 : AACTT…T4 : AACTT…C

97478…101 : TTTTA…T2 : CATAC…A3 : CATAC…T4 : AGTGG…A

51578… 121 : GAGTA…T2 : GAGAC…G3 : AAAAC…A4 : AAAGG…C

Page 19: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Bootstrap3. For each split in our original tree, we count the number of times it appeared in the pseudo-trees Sp1

Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3

Sp4

67%100%

In 67% of the pseudo-trees, the split between SP1+SP2 and the rest of the tree was found

In general bp support < 80% is considered low

Page 20: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

ClustalX: NJ phylogeny reconstruction

Page 21: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

ClustalX: NJ phylogeny reconstruction

Page 22: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

http://phylobench.vital-it.ch/raxml-bb//

Page 23: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Page 24: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Viewing the tree with njPlot

Page 25: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Note :unrooted

tree

Page 26: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Defining an outgroup

Page 27: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Swapping nodes

Page 28: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Bootstrap support

Page 29: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

FigTree: tree visualization and figure creationhttp://tree.bio.ed.ac.uk/software/figtree/

Page 30: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Reconstructing the tree of lifeReconstructing the tree of life

Page 31: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Darwin’s vision of the tree of life Darwin’s vision of the tree of life from the from the Origin of SpeciesOrigin of Species

Page 32: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

The three-domain tree of life based The three-domain tree of life based on SSU rRNA MSAon SSU rRNA MSA

Page 33: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

But branching of several But branching of several kingdoms remain in disputekingdoms remain in dispute

Page 34: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Lateral Gene Transfer (LGT) Lateral Gene Transfer (LGT) challenges the conceptual basis of challenges the conceptual basis of

phylogenetic classificationphylogenetic classification

Page 35: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Page 36: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

MethodologyMethodology Started with 36 genes universally present in 191 Started with 36 genes universally present in 191

species (spanning all 3 domains of life), for species (spanning all 3 domains of life), for which orthologs could be unambiguously which orthologs could be unambiguously identifiedidentified

Eliminated 5 genes that are LGT suspects Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases)(mostly tRNA synthetases)

Constructed an MSA for each of the 31 Constructed an MSA for each of the 31 orthogroupsorthogroups

Concatenated all 31 MSAs to a super-MSA of Concatenated all 31 MSAs to a super-MSA of 8090 columns8090 columns

The phylogeny was reconstructed based on the The phylogeny was reconstructed based on the super-MSA using the maximum likelihood super-MSA using the maximum likelihood approachapproach

Page 37: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Archaea

Eukaryota

Bacteria

http://itol.embl.de

Page 38: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Tree supportTree support

81.7% of the splits show bootstrap support 81.7% of the splits show bootstrap support of over 80%of over 80%

65% of the split show bootstrap support of 65% of the split show bootstrap support of 100%100%

However, several deep splits show low However, several deep splits show low supportssupports

Page 39: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

Still, the debate goes onStill, the debate goes on

Page 40: Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.

““Tree of one percent of lifeTree of one percent of life”” Ciccarelli et al. on the one hand favor the claim

that bacteria adhere to a bifurcating tree of life, given that the small amount of LGT genes are filtered

On the other hand, their filtering process left only 31 proteins, which represent ~1% of an average prokaryotic proteome and ~0.1% of a large eukaryotic proteome

““If throwing out all non-universally distributed If throwing out all non-universally distributed genes and all LGT suspects leaves a 1% tree, then genes and all LGT suspects leaves a 1% tree, then we should probably abandon the tree as a working we should probably abandon the tree as a working hypothesis” hypothesis”