Conducting efficient tree searches - Harvard...

96
Conducting efficient tree searches Algorithms Hill climbing “traditional” algorithms SPR, TBR • Ratchet Optimizing branches in ML analyses Genetic algorithms Divide and conquer algorithms Simulated annealing algorithms Strategies Tree buffers Constrained searches Previous searches “pre-processed searches” or “jumpstarting phylogenetics” SATF (sensitivity analysis tree fusing)

Transcript of Conducting efficient tree searches - Harvard...

Page 1: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 2: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 3: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 4: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 5: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 6: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 7: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 8: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 9: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 10: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 11: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 12: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 13: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 14: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 15: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 16: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 17: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 18: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 19: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 20: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 21: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 22: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 23: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 24: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 25: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 26: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 27: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 28: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 29: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 30: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 31: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 32: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 33: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 34: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 35: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 36: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 37: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 38: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 39: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 40: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 41: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 42: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 43: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 44: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 45: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 46: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 47: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 48: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 49: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 50: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 51: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 52: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 53: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 54: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 55: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 56: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 57: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 58: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 59: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 60: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 61: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 62: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 63: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 64: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 65: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 66: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 67: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 68: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 69: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 70: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 71: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 72: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 73: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 74: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 75: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 76: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 77: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 78: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 79: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 80: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 81: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 82: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 83: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 84: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Page 85: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Page 86: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 87: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 88: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

Page 89: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 90: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

Page 91: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 92: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

Page 93: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Page 94: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Page 95: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”
Page 96: Conducting efficient tree searches - Harvard Universitysites.fas.harvard.edu/~bio181/lectures/Lecture 12.pdfConducting efficient tree searches • Algorithms – Hill climbing “traditional”

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT