Testing alternative hypotheses
description
Transcript of Testing alternative hypotheses
![Page 1: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/1.jpg)
Testing alternative hypotheses
![Page 2: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/2.jpg)
Outline
• Topology tests:– Templeton test
• Parametric bootstrapping (briefly)
• Comparing data sets
![Page 3: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/3.jpg)
Topology tests
• The best tree for your data contradicts a prior hypothesis. This does not mean that the data refute the hypothesis
• Compare the optimality score of the best tree and the best trees given the hypothesis
![Page 4: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/4.jpg)
Tree space
Region of tree space satisfying the hypothesis
Optimal tree
Optimal tree satisfying the hypothesis
![Page 5: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/5.jpg)
Does one tree explain the data significantly better than the other?• If the data are “significantly” more
compatible with the optimal tree than the constrained tree, the hypothesis is rejected
• Parsimony framework– Constrained tree length = X– Optimal tree length = Y– Is the cost (X-Y) significant?
![Page 6: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/6.jpg)
Templeton test
A T G T G A A C A AB T G T G A C C A AC T G C G G C C T AD A G C G G C G T AE A A C T A A G T GF A A C T A A G C GL1 1 1 1 1 2 2 1 2 1 = 12 L4 3 2 2 2 2 1 3 3 2 = 20
![Page 7: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/7.jpg)
Templeton test
A T G T G A A C A AB T G T G A C C A AC T G C G G C C T AD A G C G G C G T AE A A C T A A G T GF A A C T A A G C GL1 1 1 1 1 2 2 1 2 1 = 12 L4 3 2 2 2 2 1 3 3 2 = 20Diff 2 1 1 1 0-1 2 1 1 =
![Page 8: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/8.jpg)
Templeton test
Score Rank
2 1.5
2 1.5
1 5.5
1 5.5
1 5.5
1 5.5
1 5.5
-1 -5.5
Sum of the negative ranks = 5.5
N (number of chars varying in length) = 8
P-value = ca. 0.045
![Page 9: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/9.jpg)
Problems of topology tests
• The tests compare trees, they don’t compare the competing hypotheses
![Page 10: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/10.jpg)
Tree space
Region of tree space satisfying the hypothesis
Optimal tree
Optimal tree satisfying the hypothesis
![Page 11: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/11.jpg)
Another problem of topology tests
• Suppose we had a prior hypothesis that species A-B form a clade
• We conduct a phylogenetic analysis of 8 species and find that A-B do not form a clade
• The shortest tree that has them as a clade is 6 steps longer (decay = -6) which is significant under a Templeton test
![Page 12: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/12.jpg)
• Suppose we had a prior hypothesis that species A-Z form a clade
• We conduct a phylogenetic analysis of 100 species and find that A-Z do not form a clade
• The shortest tree that has them as a clade is 6 steps longer (decay = -6) which is significant under a Templeton test
Another problem of topology tests
![Page 13: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/13.jpg)
Are these results equivalent?
• The two hypotheses are differently stringent– The former delimits a much larger proportion of tree-space
• One solution is to reverse the question: If the hypothesis were true, how likely is it that the optimal tree would reject it?
– Requires parametric bootstrapping
![Page 14: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/14.jpg)
Find the region of tree space that is plausible if the hypothesis is true:
Tree space
Optimal tree
Optimal tree satisfying the hypothesis
Hypothesis rejected
![Page 15: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/15.jpg)
Find the region of tree space that is plausible if the hypothesis is true:
Tree space
Optimal tree
Optimal tree satisfying the hypothesis
Hypothesis not rejected
![Page 16: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/16.jpg)
How do you do this?
• Find the optimal tree under the constraint (not just the optimal topology but also branch lengths, etc.)
• Simulate data up that tree many times
• For each data set calculate the cost of the hypothesis
• If the observed cost was greater than the cost from the simulated data, the hypothesis is rejected.
![Page 17: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/17.jpg)
Strepsiptera sister to the
Diptera (Whiting et
al. 1997)
![Page 18: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/18.jpg)
Could be a long-branch problem(Huelsenbeck, 1997)
![Page 19: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/19.jpg)
What if this were the true tree?
Probability of Strepsiptera being sister to Diptera on the MP tree = 92%
![Page 20: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/20.jpg)
Testing hypotheses
• Topology tests are good ways to test hypotheses
• Parametric bootstraping tests are powerful but laborious
• Other approaches are available using likelihood or Bayesian approaches (later)
![Page 21: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/21.jpg)
Multiple data sets for the same sets of taxa
• Analyze each data set separately and then compare the trees (consense)
• Concatenate the data and conduct a single combined analysis (combine)
![Page 22: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/22.jpg)
Argument for consensus
• If the same clades appear with multiple data sets we can be more confident
• The method is conservative
![Page 23: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/23.jpg)
Is consensus conservative?Barrett et al. 1994. Syst. Zool. 40:486
![Page 24: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/24.jpg)
Is consensus conservative?Barrett et al. 1994. Syst. Zool. 40:486
![Page 25: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/25.jpg)
Arguments against combined analysis
• Some data sets might have strong misleading signals (e.g., due to lab errors)
• Different partitions might have tracked different histories
![Page 26: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/26.jpg)
Conditional combined analysis
• Assess if the data look like they have tracked different histories– If they do not: combine– If they do: analyze separately
• Can you do this with topology tests?
![Page 27: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/27.jpg)
Optimal tree for data set 2
Optimal tree for data set 1
Do they conflict?
![Page 28: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/28.jpg)
But topology tests can be used more carefully
• Two data sets don’t conflict significantly if there is one tree that neither data set rejects
• Two data sets do conflict if:– Data set 1 rejects all trees that lack a certain
clade– Data set 2 rejects all trees that have that same
clade
![Page 29: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/29.jpg)
Optimal tree for data set 2
Optimal tree for data set 1
Significantly worse
Optimal tree without the constraint for data set 2
Optimal tree with the constraint for data set 1
Significantly worse
![Page 30: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/30.jpg)
The Incongruence Length Difference (ILD) test
(Farris et al., 1994)• Conflict is manifest as longer trees (or
lower likelihood)
• Look to see how length (or likelihood) increases when we combine data
• Determine significance compared to random partitions
![Page 31: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/31.jpg)
ILD test (= Partition Homogeneity Test in PAUP*)
One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCATwo TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCAThree TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCGFour TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCAFive TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCGSix TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCGSeven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCAEight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA
Partition 1Length = 12
Partition 2Length = 9
Combined L = 21
![Page 32: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/32.jpg)
ILD test (= Partition Homogeneity Test in PAUP*)
One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCATwo TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCAThree TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCGFour TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCAFive TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCGSix TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCGSeven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCAEight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA
![Page 33: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/33.jpg)
ILD test (= Partition Homogeneity Test in PAUP*)
Combined L = 25
One TACATAAACAAGCCTAAAATGCGACACTACGTTCACTGTTACGCTCTCCACTGCCTAGACGAAGAAGCTTCATwo TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGCCTAGACGAAGACGCTTCAThree TACATAAACAAGCCCAAAATGCGACACTACGTCCACTGTTACGCTCTTCACTGCCTAGACGAGGATGCCTCGFour TACATAAATAAGCCAAAAATGCGACACTACGTTCATTGTTACGCACTCCATTGCCTCGACGAAGAAGCTTCAFive TACATAAACAAACCCAAAATGCGACACTACGTCCACTGTTATGCTCTCCACTGTCTAGACGAAGACGCTTCGSix TACATAAACAAGCCCAAGATGCGTCACTACGTCCACTGCTACGCCCTCCACTGTCTCGACGAGGAGGCCTCGSeven TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCAEight TACATAAACAAACCAAAAATGCGACACTACGTCCATTGTTACGCCCTACACTGCCTAGACGAAGACGCTTCA
Partition 1Length = 14
Partition 2Length =11
![Page 34: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/34.jpg)
Results Sum of Number of tree lengths replicates ----------------------------- 1661 1 1662 2 1663 1 1665* 9 1666 8 1667 9 1668 5 1669 11 1670 10 1671 9
* = sum of lengths for original partition P value = 1 - (87/100) = 0.130000
Sum of Number of tree lengths replicates -------------------------------- 1672 10 1673 7 1674 4 1675 4 1676 1 1677 4 1678 2 1679 1 1680 1 1683 1
![Page 35: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/35.jpg)
What does a positive result mean?
• The data sets have tracked different histories?
• The original partition is non-random
• Does not even look at topology
![Page 36: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/36.jpg)
Option if you find conflict
• Conduct separate analyses only
• Delete taxa until conflict disappears - then combine
• Combine anyway
![Page 37: Testing alternative hypotheses](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814680550346895db3a123/html5/thumbnails/37.jpg)
Conditional conditional combined analysis
• You believe that conflict reflects data partitions tracking different histories– Keep the data separate and find ways to
summarize the discrepancy
• You believe that conflict reflects artifactual signals (noise) in one or both data sets– Combine anyway in the hope that the real
signal will come to dominate