Building phylogenetic trees
-
Upload
armand-fowler -
Category
Documents
-
view
35 -
download
0
description
Transcript of Building phylogenetic trees
![Page 1: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/1.jpg)
Building phylogenetic trees
Jurgen Mourik &
Richard VogelaarsUtrecht University
![Page 2: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/2.jpg)
Building phylogenetic trees2
Overview
• Background
• Making a tree from pairwise distances;
• Parsimony;– <break>;
• Assessing the trees: the bootstrap;
• Simultaneous alignment and phylogeny;
• Application: Phylip
![Page 3: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/3.jpg)
Building phylogenetic trees3
Background
• Phylogenetic tree: diagram showing evolutionary lineages of species/genes
• Trees are used:– To understand lineage of various species– To understand how various functions evolved– To inform multiple alignments
![Page 4: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/4.jpg)
Building phylogenetic trees4
Phylogenetic tree approaches
• Distance:– UPGMA– Neighbour-joining
• Parsimony:– Traditional parsimony– Weighted parsimony
![Page 5: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/5.jpg)
Building phylogenetic trees5
Making a tree from pairwise distances
• Given a set of sequences you want to build a tree.
• Compute the distances dij between each pair i, j of the sequences.
• There are many different distance measures.
• Average distance between pairs of sequences from each cluster.
![Page 6: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/6.jpg)
Building phylogenetic trees6
UPGMA
• Unweighted Pair Group Method using arithmetic Averages.
• It works by clustering the sequences, at each stage combining two clusters and at the same time creating a new node in a tree, using a distance measure.
![Page 7: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/7.jpg)
Building phylogenetic trees7
Distance between points
• |Ci| and |Cj| denote the number of sequences in clusters i and j.
ji , q in Cp in C
pq
ji
ij dCC
d1
3
2 4
i
l
j
411
1 )(d
*d ilil
![Page 8: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/8.jpg)
Building phylogenetic trees8
Distance between clusters
• Let Ck be the union of clusters Ci and Cj,then dkl
• Where Cl is any other cluster.
ji
jjliil
klCC
CdCdd
3
4k
l
5.32
7
11
1*31*4
kld
i
j
![Page 9: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/9.jpg)
Building phylogenetic trees9
Building the tree: UPGMA
Initialisation:
Assign each sequence i to its own cluster Ci,
Define one leaf of T for each sequence, and place at height zero.Iteration:
Determine the two clusters i, j for which dij is minimal.
Define a new cluster k by , and define dkl for all l.
Define a node k with daughter nodes i an j, and place it at height dij /2.
Add k to the current clusters and remove i and j.Terminiation:
When only two clusters i, j remain, place the root at height dij /2.
jik CCC
![Page 10: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/10.jpg)
Building phylogenetic trees10
UPGMA: Initialisation
![Page 11: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/11.jpg)
Building phylogenetic trees11
UPGMA: Iteration 1
![Page 12: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/12.jpg)
Building phylogenetic trees12
UPGMA: Iteration 2
![Page 13: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/13.jpg)
Building phylogenetic trees13
UPGMA: Iteration 3
![Page 14: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/14.jpg)
Building phylogenetic trees14
UPGMA: Terminiation
![Page 15: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/15.jpg)
Building phylogenetic trees15
Properties of UPGMA
• Molecular clock & ultrametric property of distances
• Additivity
![Page 16: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/16.jpg)
Building phylogenetic trees16
Properties of UPGMA:Molecular clock & ultrametric
• The molecular clock assumption: divergence of sequences is assumed to occur at the same rate at all points in the tree.
• If this does holds, then the data is said to be ultrametric.
![Page 17: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/17.jpg)
Building phylogenetic trees17
Properties of UPGMA:Additivity
• Given a tree, its edge lengths are said to be additive if the distance between any pair of leaves is the sum of the lengths of the edges on the path connecting them.
j
i
m
k
)(21
ijjmimkm
jkikij
kmjkjm
kmikim
dddd
ddd
ddd
ddd
![Page 18: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/18.jpg)
Building phylogenetic trees18
Neighbour-joining
• N-j constructs a tree by iteratively joining subtrees (like UPGMA).
• Produces an unrooted tree.
• Doesn’t make the molecular clock assumption, therefore the ultrametric property does not hold.
![Page 19: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/19.jpg)
Building phylogenetic trees19
Distances in Neighbour-joining
• Given a new internal node k, the distance to another node m is given by:
)dd(dd ijjmimkm 21
)dd(dd jmimijik 21
ikijjk ddd j
i mk
![Page 20: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/20.jpg)
Building phylogenetic trees20
Distances in Neighbour-joining
• Generalizing this so that the distance to all other leaves are taken into account:
• Where
• And |L| denotes the size of the set L of leaves.
)rr(dd jiijik 21
Lm
imi dL
r2
1j
i mk
![Page 21: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/21.jpg)
Building phylogenetic trees21
Building the tree:Neighbour-joining
Initialisation:Define T to be the set of leaf nodes, one for each given sequence, and put L=T.
Iteration:Pick a pair i, j in L for which defined by is minimal.Define a new node k and set , for all m in L.Add k to T with edges of lengths , joining k to i and j, respectively.Remove i and j from L and add k.
Termination:When L consists of two leaves i and j add the remaining edge between i and j, with length dij.
)rr(dd jiijik 21
ikijjk ddd )dd(dd ijjmimkm 2
1
)( jiijij rrdD
Lm
imi dL
r2
1
![Page 22: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/22.jpg)
Building phylogenetic trees22
Rooting trees
• Finding a root in an unrooted tree is sometimes accomplished by using an outgroup:– A species known to be more
distantly related to remaining species than they are to each other
• The point where the outgroup joins the rest of the tree is the best candidate for root position j
i
m
k
outgroup
Candidateroot
l
![Page 23: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/23.jpg)
Building phylogenetic trees23
Comments on distance based methods
• If the given data is ultrametric (and these distances represent real distances), then UPGMA will identify the correct tree.
• If the data is additive (and these distances represent real distances), then Neighbour-joining will identify the correct tree.
• Otherwise, the methods may not recover the correct tree, but they may still be reasonable heuristics.
![Page 24: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/24.jpg)
Building phylogenetic trees24
Phylogenetic tree approaches
• Distance:– UPGMA– Neighbour-joining
• Parsimony:– Traditional parsimony– Weighted parsimony
![Page 25: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/25.jpg)
Building phylogenetic trees25
Parsimony
• Most widely used tree building algorithm(?).• Finds the tree that explains the data with a
minimal number of changes.• Instead of building a tree, it assigns a cost to a
given tree.• Two components of the parsimony algorithm can
be distinguished:– The computation of a cost for a given tree;– A search through all trees, to find the overall
minimum of this cost.
![Page 26: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/26.jpg)
Building phylogenetic trees26
Parsimony example
• Given the following sequences: AAG,AAA,GGA,AGA.
• Several trees could explain the phylogeny
![Page 27: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/27.jpg)
Building phylogenetic trees27
Traditional Parsimony
• Count the number of substitutions
• At each node keep:– a list of minimal cost residues– the current cost
• Post-order traversal of the tree
![Page 28: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/28.jpg)
Building phylogenetic trees28
Traditional Parsimony
Initialisation:Set current cost C=0 and k =2n-1, the number of the root node.
Recursion: To obtain the set Rk:If k is a leaf node:
SetIf k is not a leaf node:
Compute Ri , Rj for the daughter i, j of k, and set if this intersection is not empty, or else
set and increment C.Termination:
Minimal cost of tree = C.
kuk xR
jik RRR jik RRR
![Page 29: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/29.jpg)
Building phylogenetic trees29
Weighted Parsimony
• Extension of the traditional parsimony.
• Adds a cost function S(a,b) for each substitution of a by b.
• Post-order traversal of the tree
• Aim is now to minimize the cost.
![Page 30: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/30.jpg)
Building phylogenetic trees30
Weighted Parsimony
Initialisation:Set k =2n-1, the number of the root node
Recursion: Compute Sk(a) for all a as follows:If k is a leaf node:
Set , otherwiseIf k is not a leaf node:
Compute Si(a), Sj(a) for all a at the daughter i, j and define
Termination:
Minimal cost of tree = minaS2n-1(a).
)),()((min)),()((min)( baSbSbaSbSaS jbibk
)( ,for )( aSxaaS kkuk
![Page 31: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/31.jpg)
Building phylogenetic trees31
Break
• Questions so far?
• After the break:– Assessing the trees: the bootstrap;– Simultaneous alignment and phylogeny;– Application: Phylip
![Page 32: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/32.jpg)
Building phylogenetic trees32
Branch and bound
• Parsimony itself can not build a tree!
• Using simple enumeration methods the number of trees become very large very fast.
• How to build the trees?– Stochastically– Branch and bound
![Page 33: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/33.jpg)
Building phylogenetic trees33
Branch and bound
• B&B uses the parsimony algorithm.
• It guarantees to find the overall best tree.
• It systematically builds trees by increasing the number of leaves.
• Abandons a particular avenue of tree building whenever the current incomplete tree (T*) has a cost(T*)>cost(Tmin).
![Page 34: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/34.jpg)
Building phylogenetic trees34
The Bootstrap
• A measure how much a tree should be trusted.
• Use the bootstrap as a method of assessing the significance of some phylogenetic feature.
![Page 35: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/35.jpg)
Building phylogenetic trees35
The Bootstrap (2)
• The bootstrap works as follows:– Given a dataset of an alignment of sequences.– Generate an artificial dataset of the same size as the original
dataset by picking columns from the alignment at random with replacement.
– Apply the tree building algorithm to this artificial dataset.– Repeat selection and tree building procedure n times.– The feature with which a chosen phylogenetic features
appears is taken to be a measure of the confidence we can have in this feature.
![Page 36: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/36.jpg)
Building phylogenetic trees36
Simultaneous alignment and phylogeny
• Simultaneously aligning sequences and finding a plausible phylogeny:– Sankoff & Cedergren’s gap-substitution algorithm;– Hein’s affine cost algorithm.
• Both find an optimal alignment given a tree.
![Page 37: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/37.jpg)
Building phylogenetic trees37
Sankoff & Cedergren’s gap-substitution algorithm
• Guarantees to find ancestral sequences, and alignments of them and the leaf sequences.
• It uses a character-substitution model of gaps
• Together this minimizes a tree-based parsimony-type cost.
• The algorithm is a combination of two known methods:– Dynamic programming method (Chapter 6);– Weighted Parsimony algorithm.
![Page 38: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/38.jpg)
Building phylogenetic trees38
Hein’s affine cost algorithm
• It uses affine gap penalties.
• Faster than the Sankoff & Cedergren algorithm.
• The aim is to find sequences z at a given node aligned to both of the sequences x and y at the daughter nodes satisfying:
• Where S is the total cost for a given alignment of two sequences. (mismatch cost =1 and 0 otherwise)
),(),(),( yxSyzSzxS
![Page 39: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/39.jpg)
Building phylogenetic trees39
Hein’s affine cost algorithm
• Compared to equation (2.16) (alignment with affine gap scores) here the algorithm searches for the minimal cost path.
• The affine gap cost for a gap of length k isd+(k-1)e, where e<=d.
ejiV
djiVjiV
ejiV
djiVjiV
yxSjiV
yxSjiV
yxSjiV
jiV
Y
MY
X
MX
iiY
iiX
iiM
M
)1,(
)1,(min),(
),1(
),1(min),(
),()1,1(
),()1,1(
),()1,1(
min),(
![Page 40: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/40.jpg)
Building phylogenetic trees40
Dynamic programming matrix for two sequences
VM
VX
VY
d=2
e=1
i
j
![Page 41: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/41.jpg)
Building phylogenetic trees41
Hein’s affine cost algorithm
• Find the z for whichis minimal.
• From the matrix follows: – C - - A C -– C A C - - -
• CAC could be possible z.
),(),(),( yxSyzSzxS
CAC(?)
CAC CTCACA
![Page 42: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/42.jpg)
Building phylogenetic trees42
Hein’s affine cost algorithmCAC(?)
CAC CTCACA
CACACA(?)
CAC CTCACA
CACAC(?)
CAC CTCACA
Which z could serve best as
ancestor?
![Page 43: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/43.jpg)
Building phylogenetic trees43
Hein’s affine cost algorithm
CAC
CACACA
CACAC
12),(
0),(
edCTCACACACS
CACCACS12),( edCTCACACACS
1),(
2),(
CTCACACACACAS
edCACCACACAS12),( edCTCACACACS
1),(
),(
dCTCACACACACS
edCACCACACS12),( edCTCACACACS
![Page 44: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/44.jpg)
Building phylogenetic trees44
Sequence graph
• Follow a path through the dynamic programming matrix.
• Derive a graph from this matrix.
• Whenever a cell is used by an optimal path a vertex is added to the graph.
![Page 45: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/45.jpg)
Building phylogenetic trees45
Sequence graph
Graph 1
![Page 46: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/46.jpg)
Building phylogenetic trees46
Sequence graph:line arrangement
Graph 1
Graph 2
![Page 47: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/47.jpg)
Building phylogenetic trees47
Sequence graph:replacing the dummy edges
Graph 2
Graph 3
![Page 48: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/48.jpg)
Building phylogenetic trees48
Dynamic Programming matrix:TAC – Graph 3
![Page 49: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/49.jpg)
Building phylogenetic trees49
Ancestors
• Possible ancestral sequences for the leaf sequences TAC, CAC and CTCACA given the tree shown.
• Derived from the sequence graphs.CAC
CTCACA
CACTAC
CAC
1
5
![Page 50: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/50.jpg)
Building phylogenetic trees50
Limitations of Hein’s model
• Hein’s algorithm takes the minimal cost sequences at each node upward.
• This can fail to give the overall optimum.
• Suppose the cost for a gap of length k is:– 13+3(k-1)
• Mismatch:– 4
• Suppose the leaves G and GTT.
![Page 51: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/51.jpg)
Building phylogenetic trees51
Limitations of Hein’s model
• A eligible ancestor of G and GTT would be themselves, since they both have a cost of 13+3=16.
• GT would not be eligible because of the total cost of 2*13=26.
• Now we want to branch to the ancestor of G and GTT and there is a third leave GT.– The total cost for ineligible GT would be lower than
for either G or GTT.
![Page 52: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/52.jpg)
Building phylogenetic trees52
Application: PHYLIP (Phylogeny Inference Package)
• Many features, among:– Traditional (unrooted) parsimony – Branch and bound to find all most parsimonious
trees
![Page 53: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/53.jpg)
Building phylogenetic trees53
Application: PHYLIP
• Test dataset:Jurgen AACGUGGCCAAAU
Alpha ACCGCCGCCAAAU
Beta AAGGUCGCCAAAC
Gamma CAUUUCGUCACAA
Delta GGUAUCUCGGCCU
Epsilon GAAAUCUCGAUCC
Richard GGGCUCUCGGCUC
![Page 54: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/54.jpg)
Demo
![Page 55: Building phylogenetic trees](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813177550346895d97f127/html5/thumbnails/55.jpg)
Questions?