ITCS 3153 Artificial Intelligence Lecture 7 Informed Searches Lecture 7 Informed Searches.
Conducting efficient tree searches - Harvard...
Transcript of Conducting efficient tree searches - Harvard...
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT
Conducting efficient tree searches
• Algorithms– Hill climbing “traditional” algorithms
• SPR, TBR
• Ratchet
• Optimizing branches in ML analyses
– Genetic algorithms
– Divide and conquer algorithms
– Simulated annealing algorithms
• Strategies– Tree buffers
– Constrained searches
– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”
– SATF (sensitivity analysis tree fusing)
Parsimony ratchet (island hopper)
1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)
2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)
3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.
4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.
5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)
Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,
407-414.
Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology
52: 368-373.
• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The
subgroups must be of identical composition.
1. Obtain several trees via some sort of tree search
2. Randomly select a tree (the “target” tree)
3. Randomly select one of the remaining trees (the “source” tree)
4. Evaluate the results of moving each clade in the source tree to the target tree
5. Repeat several times (“rounds”) (e.g. 3 to 5)
• 10 RAS + TBR + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree
search. Cladistics 17: S12-S25.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the
problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.
• Sectorial Searches (SS)
• Need a tree as starting point and reanalyze sectors separately. Sectors can be
selected randomly or based on a consensus
– Random Sectorial Searches (RSS)
– Consensus-Based Sectorial Searches (CSS)
– Mixed Sectorial Searches (MSS)
• RAS + TBR + SS
• Tree-Drifting (DFT)
• Accepts suboptimal solutions with a certain probability (simulated annealing)
• Combined strategies: RAS + TBR + SS + DFT + TF
Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics
15: 415-428.
Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),
Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.
Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for
reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems
Bioinformatics Conference (CSB 2004): 12.
• Tree buffers
• Constrained searches
• “Pre-processed searches”
• Sensitivity Analysis output + TF
– Generate a diversity of cladograms under different parameters/models
(don’t need to be full searches)
– Collect all trees in a file
– Submit the trees to tree fusing and other refining algorithms
• Other strategies: bootstrapping or jackknifing trees
Strategies
Multiple trees or multiple hits?
• Driven searches
– Minimum number of hits to optimal trees
– Achieving a stable consensus
– Consensus techniques
Software implementations
• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat
• Tree Fusing: TNT, POY, MetaPIGA
• Tree Drifting: TNT
• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3
• Constrained searches: Most softwarepackages
• Driven searches:– Hits to optimal trees: TNT, POY
– Stabilize consensus: TNT