The number of fully resolved trees taxa (with a root): n...

15
Number of taxa Number of unrooted bifurcating trees 3 1 4 3 5 15 10 2 x 10 6 20 2 x 10 20 50 2 x 10 74 100 2 x 10 182 1,000 2 x 10 2,860 10,000 2 x 10 38,658 The number of fully resolved trees for n taxa (with a root): The number of fully resolved trees for n taxa (without a root): Chapter 2 in: Felsenstein J. (2004). Inferring phylogenies. Sinauer Associates, Sunderland. 664 pp.

Transcript of The number of fully resolved trees taxa (with a root): n...

Page 1: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Number of taxa Number of unrooted

bifurcating trees

3 1

4 3

5 15

10 2 x 106

20 2 x 1020

50 2 x 1074

100 2 x 10182

1,000 2 x 102,860

10,000 2 x 1038,658

• The number of fully resolved trees

for n taxa (with a root):

• The number of fully resolved trees

for n taxa (without a root):

Chapter 2 in: Felsenstein J. (2004). Inferring phylogenies. Sinauer Associates, Sunderland. 664 pp.

Page 2: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Tree “building” methods

• Purely algorithmic methods (e.g., neighbor-

joining)

• Methods based on an optimality criterion

– Maximum likelihood

– Parsimony

– Bayesian phylogenetics

Page 3: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Basic concepts: tree searches based on optimality criteria

• Exact algorithms– Exhaustive searches

All possible trees are evaluated

– Branch-and-bound (Hendy & Penny 1982)

Does not require evaluating all possible trees, but guarantees an optimal solution

• Heuristic algorithms

Do not guarantee finding an optimal solution

– Initial tree building (Wagner addition)• Single starting points

– Stepwise addition of taxa

– Star decomposition methods

• Multiple starting points: Random addition sequence (ras)

– Branch swapping• Subtree pruning and regrafting (spr)

• Tree bisection and reconnection (tbr)

– Other methods for refining trees• Sectorial searches (Goloboff 1999)

• Tree-drifting (Goloboff 1999)

• Tree-fusing (Goloboff 1999)

• Ratchet (Nixon 1999)

ISLANDS

Page 4: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Islands of trees

Maddison DR (1991) The discovery and importance of multiple islands of most-

parsimonious trees. Systematic Zoology 40:315-328

Page 5: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Exhaustive enumeration of all possible (unrooted) trees for 5 taxa (15 trees)

Page 6: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Branch-and-bound

algorithm

Page 7: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Heuristic searches

• Wagner tree (or other startingpoint)

• Branch swapping

– Subtree pruning and regrafting(SPR)

– Tree bisection andreconnection (TBR)

– Tree fusing (geneticalgorithms)

– Tree drifting (simulatedannealing)

– Sectorial searches (divide andconquer techniques)

– Ratcheting

Page 8: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Wagner algorithmRandom seed derived from system time: -985792416

default outgroup: Lineus_bilineatus

addition sequence 'as is'

outgroup is Lineus_bilineatus, add node Neocrania_anomala ->41

add node 3/128 Phascolion_strombi discrepancy: real cost 75 calculated cost 66 ->75 1 tree

add node 4/128 Chaetoderma_nitidulum ->110 1 tree

add node 5/128 Loxosomella_murmanica ->154 1 tree

add node 6/128 Lepidopleurus_cajetanus discrepancy: real cost 182 calculated cost 187 ->182 1 tree

add node 7/128 Leptochiton_asellus discrepancy: real cost 209 calculated cost 210 ->209 1 tree

add node 8/128 Callochiton_septemvalvis ->259discrepancy: real cost 252 calculated cost 255 ->252 1 tree

add node 9/128 Chaetopleura_apiculata discrepancy: real cost 271 calculated cost 274 ->271 1 tree

add node 10/128 Callistochiton_antiquus ->287 1 tree

add node 11/128 Lorica_volvox ->314->309 1 tree

add node 12/128 Chiton_olivaceus discrepancy: real cost 317 calculated cost 318 ->317 1 tree

add node 13/128 Mopalia_muscosa ->355->350->339 1 tree

add node 14/128 Tonicella_lineata ->373->368->367discrepancy: real cost 351 calculated cost 352 ->351 1 tree

add node 15/128 Acanthochitona_crinita ->381->373->366 2 trees

add node 16/128 Cryptochiton_stelleri ->399->392discrepancy: real cost 374 calculated cost 373

->374discrepancy: real cost 374 calculated cost 373 2 trees

add node 17/128 Rhabdus_rectius ->425->421discrepancy: real cost 418 calculated cost 420 ->418 1 tree

add node 18/128 Dentalium_inaequicostatum ->458discrepancy: real cost 447 calculated cost 448

->447->445discrepancy: real cost 443 calculated cost 437 ->443->442discrepancy: real cost 439 calculated cost 440 ->439 1 tree

add node 19/128 Antalis_entalis ->496->492discrepancy: real cost 491 calculated cost 492 ->491->483 1 tree

add node 20/128 Alcadia_dyssonia discrepancy: real cost 528 calculated cost 529

->528discrepancy: real cost 521 calculated cost 524

->521discrepancy: real cost 512 calculated cost 515

->512discrepancy: real cost 509 calculated cost 512 ->509 1 tree

Page 9: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

SPR branch swapping TBR branch swapping

Allen, B. L. and M. Steel. 2001. Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5: 1-15.

Page 10: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Search strategies

• Number of starting trees

• Number of trees to swap per replicate

• Number of trees to swap in total

• Algorithms to use

• “Stopping rules”

• Parallelism in systematics

Page 11: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Traditional searches

• Random addition sequence followed by some sort of tree refining

technique (e.g., SPR and/or TBR)

• Cannot deal with problems of composite optima, as in large data sets

(> 150 taxa)

“Large phylogeny estimation is a combinatorial optimization problem that no

future computer will ever be able to solve exactly in practical computing

time. The difficulty of the problem is amplified by the need to use complex

evolutionary models and large taxon samplings. Hence, many heuristic

approaches have been developed, with varying degrees of success.”

Lemmon & Milinkovitch 2002

Page 12: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Tree estimation is a np-hard problem

Heuristic implementations

• Initial tree (random, Wagner, etc.)

• Some process of tree refining technique (spr, tbr, ratchet, tree-

fusing, sectorial searches, tree-drifting, DCM…)

• Repeat the process multiple times (hopefully seeking for

convergence towards a solution)

Applications of parallel computing

• Multiple starting points (replicates)

– Sequential

– Parallel

• Refining techniques

– Sequential

– Parallel

• Spawning the jobs Fault tolerance

• Achieving linearity

Page 13: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

Levels of parallelism in phylogenetic

reconstruction

Technique Efficiency

Tree building

• “Sneaker” 100% linearity

• Multibuild 100% linearity

• Parallel build communication tradeoff

Page 14: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)
Page 15: The number of fully resolved trees taxa (with a root): n ...sites.fas.harvard.edu/~bio181/lectures/Lecture 11.pdf · Heuristic implementations • Initial tree (random, Wagner, etc.)

P, NP, and NPC

• Easy problems for which exists apolynomial time algorithmic solution(P). An algorithm that can solve aproblem in time O(nk) for someconstant k.

• Hard problems (NP or nondeterministicpolynomial) require super-polynomialtime to solve, but if given a solution,the solution can be verified inpolynomial time. NP-completeproblems (NPC) exist in a nether worldwhere no known polynomial timesolution exist (but there is no proof oftheir non-existence either). Theseproblems are frequentlycombinatorially explosive with solutionspaces increasing at a factorial pace.NPC problems are traveling salesman,circuit design, scheduling, andphylogenetic tree search andalignments.