Lecture 30. Phylogeny methods, part 2 (Searching tree...
Transcript of Lecture 30. Phylogeny methods, part 2 (Searching tree...
Lecture 30. Phylogeny methods, part 2 (Searching treespace)
Joe Felsenstein
Department of Genome Sciences and Department of Biology
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.1/22
All possible trees
a b
a bca bc bc a
a d c b
c bd a
c b
c
a d
a b d
a c b d
etc. etc.
Forming all 4-species trees by adding the next species in all possible
places
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.2/22
The number of rooted bifurcating trees:
1 × 3 × 5 × 7 × . . . × (2n − 3)
= (2n − 3)!/(
(n − 2)! 2n−2)
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.3/22
which is:
species number of trees1 12 13 34 155 1056 9457 10,3958 135,1359 2,027,025
10 34,459,42511 654,729,07512 13,749,310,57513 316,234,143,22514 7,905,853,580,62515 213,458,046,676,87516 6,190,283,353,629,37517 191,898,783,962,510,62518 6,332,659,870,762,850,62519 221,643,095,476,699,771,87520 8,200,794,532,637,891,559,37530 4.9518 ×10
38
40 1.00985 ×1057
50 2.75292 ×1076
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.4/22
Rooting an unrooted tree
12
3
4
5 6
7
8
1
6
47
5
2
8 3
12
3
4
5 6
7
8
1
6
47
5
2
8 3
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.5/22
A global maximum is not easy to find
end up here but global maximum is here
if start here
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.6/22
Is@
U V
U V
U V
S T
S T
S T S T
U V
is rearranged by dissolving the connections to an interior branch:
and reforming them in one of the two possible alternative ways:
a subtree
Nearest-neighbor interchange (NNI) rearrangement of a tree (the
triangles are subtrees) Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.7/22
all 15 trees, connected by NNIs
C DB EA
D BC EA
D BE CA
C ED AB D C
A EB
A CD EB
E BC DA B C
D EA
C BD EA
A BD EC
A BE CD
B CE DA
B DC EA E B
D CA
E CB DA
The graph of all 15 5-species unrooted trees, connected by NNIs
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.8/22
with parsimony scores
9
11
1111
10
11
10
9
11
9
9
9
11
8
11
The same graph with parsimony scores (try a “greedy" search with NNI’s)
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.9/22
Subtree pruning and regrafting (SPR) rearrangement
A
BC D
E
F
G
HI
J
K
L
M
Break each branch, remove a subtree
A
BC
E
F
G
H
K
L
D
I
J
M
A
BC
E
F
G
H
K
L
A
BC
E
F
G
H
K
L
D
I
J
M
* Here is the result:
Add it in, attaching to one (*) of the other branches
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.10/22
Tree bisection and reconnection (TBR) rearrangement
A
BC D
E
F
G
HI
J
K
L
M
L
Break each branch, separate the subtreesD
I
J
MA
BC
E
F
G
H
K
Here is that result:
A
E
G
K
L
F
H
BC
J
D
I
M
Connect a branch of one to a branch of another
D
I
J
MA
BC
E
F
G
H
K
L
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.11/22
Greedy search by sequential addition
A
D
B
C
A
B
C
B
C
D
A
D
A
C
B
BA
D
C
E
A
D
E
C
B
A
D E
BC
A
C
B
E
D
D C
BEA
8 7 9
11
9
9
9
9
Greedy search by addition of species in a fixed order (A, B, C, D, E) in the
best place each time.Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.12/22
Goloboff’s time-saving trick
H−K
L
M−R
S−U
A
V−Z
V−Z
A−G H−R
S−U
B−G
Goloboff’s economy in computing scores of rearranged trees
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.13/22
Star decomposition
A
C
D
E F
B
E
C
D
A
B
F
B C
D
F
A
E F
E
C
D
A
B
F
B C
D
F
A
E F
E
C
D
A
B
F
“Star decomposition" search for best tree can happen in multiple ways
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.14/22
Disk-covering
A
B
C D
EF
0.1
0.05
0.1 0.04 0.1
0.030.030.02
0.05
“Disk covering" – assembly of a tree from overlapping estimated subtrees
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.15/22
Shortest Hamiltonian path problem
(a) (b)
(c) (d)
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.16/22
Search tree for this problem
etc. etc.
etc.etc.
add 3 add 3
etc. etc.
start
(1,2,3,4,5,6,7,8,9,10)
(1,2,3,4,5,6,7,8,10,9)
(1,2,3,4,5,6,7,9,8,10)
(1,2,3,4,5,6,7,9,10,8)
(1,2,3,4,5,6,7,10,8,9)
(1,2,3,4,5,6,7,10,9,8)
etc.
add 1 add 2 add 3
add 2 add 3 add 4 add 5
add 3 add 5
add 8 add 10add 9
add 9
add 9add 10
add 10 add 8
add 8add 10
add 10 add 8
add 8
add 9
add 4
add 9
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.17/22
Search tree of trees
A
A
E
A D
E
B
A C D
E
BA
CD
BA
C D
B
A C
D
A
BC
A
E
D B
C
D
E
A B
C
D
A E B
C
A
D
BC
E
C
E
A B
D
A
E
C B
D
C
E B
D
BD
C
C
B
A
D
B E
C
A
E
B C
D
B
E
A C
D
B
A E C
D
B
A
E
CD
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.18/22
same, with parsimony scores in place of trees
11
11
11
11
11
11
11
3
9
7
8
9
9
9
9
9
10
8
10
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.19/22
Some referencesCamin, J. H. and R. R. Sokal. 1965. A method for deducing branching sequences in phylogeny.
Evolution 19: 311-326. [Early parsimony paper includes rearrangement of trees]Cavalli-Sforza, L. L. and A. W. F. Edwards. 1967. Phylogenetic analysis: models and estimation
procedures. American Journal of Human Genetics 19: 233-257. also Evolution 21: 550-570.
[Includes counting and tree shapes]Farris, J. S. 1970. Methods for computing Wagner trees. Systematic Zoology 19: 83-92. [Early
parsimony algorithms paper is one of first to mention sequential addition strategy]Felsenstein, J. 1978. The number of evolutionary trees. Systematic Zoology 27: 27-33.
(Correction, vol. 30, p. 122, 1981) [Review of counting tip-labelled trees, recursion for countingmultifurcating case]
Foulds, L. R. and R. L. Graham. 1982. The Steiner problem in phyloge ny is NP-complete.
Advances in Applied Mathematics 3: 43-49. [Parsimony is NP-hard]Graham, R. L. and L. R. Foulds. 1982. Unlikelihood that minimal phylogenies for a realistic
biological study can be constructed in reasonable computat ional time. Mathematical
Biosciences 60: 133-142. [ ... and more]Hendy, M. D. and D. Penny. 1982. Branch and bound algorithms to determine minimal
evolutionary trees. Mathematical Biosciences 60: 133-142 [Introduced branch-and-bound forphylogenies]
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.20/22
continuedHuson, D., S. Nettles, L. Parida, T. Warnow, and S. Yooseph. 1998. The disk-covering method for
tree reconstruction. pp. 62-75 in Proceedings of “Algorithms and Experiments” (ALEX98),
Trento, Italy, Feb. 9-11, 1998, ed. R. Battiti and A. A. Bertossi. [“Disk-covering method” for longstringy trees]
Maddison, D. R. 1991. The discovery and importance of multiple islands of most-parsimonious
trees. Systematic Zoology 40: 315-328. [Discusses heuristic search strategy involving ties,multiple starts]
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Molecular Biology and Evolution 4: 406-425. [First mention ofstar-decomposition search for best trees, sort of]
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood
method for reconstructing tree topologies. Molecular Biology and Evolution 13: 964-969.
[Assembles trees out of quartets]Swofford, D. L. and G. J. Olsen. 1990. Phylogeny reconstruction. Chapter 11, Pp. 411-501 in
Molecular Systematics, ed. D. M. Hillis and C. Moritz. Sinauer Associates, Sunderland,
Massachusetts. [Review that discusses strategies, names SPR and TBR rearrangementmethods]
Waterman, M. S. and T. F. Smith. 1978. On the similarity of dendrograms. Journal of Theoretical
Biology 73: 789-800. [Defines NNIs. Uses them to get a distance between trees.]
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.21/22
How it was done
This projection produced
using the prosper style in LaTeX,
using Latex to make a .dvi file,
using dvips to turn this into a Postscript file,
using ps2pdf to mill it into a PDF file, and
displaying the slides in Adobe Acrobat Reader.
Result: nice slides using freeware.
Lecture 30. Phylogeny methods, part 2 (Searching tree space) – p.22/22