Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms...

30
1 Phylogeny and Molecular Evolution Distance Based Phylogeny

Transcript of Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms...

Page 1: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

1

Phylogeny

and

Molecular Evolution

Distance Based Phylogeny

Page 2: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

2/62

Credit

• Serafim Batzoglou (UPGMA slides) http://www.stanford.edu/class/cs262/Slides

• Notes by Nir Friedman, Dan Geiger, Shlomo Moran, Ron

Shamir, Sagi Snir, Michal Ziv-Ukelson

• Durbin et al.

• Jones and Pevzner’s lecture notes

• Bioinformatics Algorithms book by Phillip Compeau and

Pavel Pvzner – all book photos shown in this lecture

are from there.

Page 3: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Phylogenetic Trees

3/62

Page 4: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Phylogenetic Trees are Unordered

4/62

Page 5: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Phylogenetic Trees could be Rooted or Unrooted

5/62

Page 6: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

6/62

Type of Tree Reconstruction • Character-based

• Input is a multiple alignment of the sequences at the

leaves. (find the topology that best explains the

evolution of leaf sequences via mutations)

• Distance-based

• Input is a matrix of distances between species.

Page 7: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Distance Based Tree

Reconstruction

7/62

Page 8: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Distance Based Tree

Reconstruction

8/62

Page 9: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

9/62

Distances in Trees

• Edges may have weights, which reflect:

• Number of mutations on evolutionary path from one species to another

• Or, time estimate for evolution of one species into another

• In a tree T with n leaves, we often compute the length of a path between leaves i and j, dij(T)

• dij refers the the distance between i and j and is the sum of the weight of the edges between i and j

Page 10: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

10/62

Distance in Trees (cont’d)

For i = 1, j = 4, dij is:

d(1,4) = 12 + 13 + 14 + 17 + 13 = 69

i

j

Page 11: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

12/62

Additive Distance Matrices

A distance matrix is called ADDITIVE if there

exists a tree T with dij(T) = Dij

Page 12: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

13/62

Additive Distance Matrices

Is this matrix additive???

Page 13: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Additive Distance Matrices

A distance matrix is

called ADDITIVE if

there exists a tree T

with dij(T) = Dij

Page 14: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

15/62

Additive Distance Matrices

Is this matrix additive???

Page 15: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

16/62

Additive Distance Matrices

A distance matrix is

called ADDITIVE if

there exists a tree T

with dij(T) = Dij

NONADDITIVE

otherwise

Page 16: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Additive Matrices have a Simple

Tree Fitting

17/62

Page 17: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

18/62

Distance Based Phylogeny Problem

• Goal:Reconstruct an evolutionary tree from a

distance matrix

• Input: n x n distance matrix Dij

• Output: weighted unrooted (or rooted) tree T

with n leaves fitting D

• If D is additive, this problem has a solution

and there are simple algorithms to solve it

(we will not learn them in class)

• However usually D is not additive

Page 18: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

Rooted Ultrametric Trees

20/62

Page 19: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

21/62

UPGMA Unweighted Pair Group Method with Arithmetic Mean

• UPGMA is a clustering algorithm that:

• Computes the distance between clusters using

average pairwise distance

• Cssigns a height to every vertex in the tree, effectively

assuming the presence of a molecular clock and

dating every vertex

• Assumes the matrix D is additive, so the generated

tree fits D.

• If D is not additibe, UPGMA will generate a heuristic

solution that does not fit D

• Produces ultrametric trees – all leaves are equi-distant

from the root.

Page 20: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

22/62

Clustering in UPGMA

Given two disjoint clusters Ci, Cj of sequences,

1

dij = ––––––––– {p Ci, q Cj}dpq

|Ci| |Cj|

Note that if Ck = Ci Cj, then distance to another cluster Cl is:

dil |Ci| + djl |Cj|

dkl = ––––––––––––––

|Ci| + |Cj|

Page 21: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

23/62

UPGMA Algorithm

Initialization:

Assign each xi into its own cluster Ci (clusters of size 1)

Define one leaf per sequence, height 0

Iteration:

Find two clusters Ci, Cj s.t. dij is min

Let Ck = Ci Cj

Define node connecting Ci, Cj, & place it at height dij/2

Delete Ci, Cj

Termination:

When two clusters i, j remain, place root at height dij/2

Page 22: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

24/62

UPGMA Algorithm (cont’d)

1 4

32 5

1 4 2 3 5

Page 23: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

25/62

UPGMA Algorithm (cont’d)

1 4

32 5

1 4 2 3 5

Page 24: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

26/62

UPGMA Algorithm (cont’d)

1 4

32 5

1 4 2 3 5

Page 25: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

27/62

UPGMA Algorithm (cont’d)

1 4

32 5

1 4 2 3 5

Page 26: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

28/62

UPGMA Algorithm (cont’d)

1 4

32 5

1 4 2 3 5

Page 27: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

29/62

Example

Page 28: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

31/62

UPGMA’s Weakness

• The algorithm produces an ultrametric tree:

the distance from the root to any leaf is the

same

• UPGMA assumes a constant molecular

clock: all species represented by the

leaves in the tree are assumed to

accumulate mutations (and thus evolve)

at the same rate. This is one of the

major pitfalls of UPGMA.

Page 29: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

32/62

UPGMA’s Weakness: Example

2

3

4

11 4 32

Correct tree UPGMA

Page 30: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture

33