Post on 01-Jan-2016
description
Mareike Fischer
How many characters are needed to reconstruct the true tree?
Mareike Fischer
and Mike Steel
Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07
Mareike Fischer
The Problem
Given: Sequence of characters (e.g. DNA)
Wanted: Reconstruction of the ‘true’ tree
Solution: Maximum Parsimony, Maximum Likelihood, etc.
But: Is the sequence long enough for a reliable reconstruction?
Mareike Fischer
Previous Approaches
1.Churchill, von Haeseler, Navidi (1992)
• 4 taxa scenario• Observations:
The probability of reconstructing the true tree increases with the length of the interior edge.
“Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.”
more characters
Rec.
Pro
b.
int. edge
Mareike Fischer
Previous Approaches
2. Yang (1998)
• 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length
• 5 different tree-shapes were investigated• Observations:
‘Farris Zone’: MP
better
‘Felsenstein Zone’: ML better
The optimal length for the interior edge ranges
between 0.015 and 0.025.
Tree length
Rec.
Pro
b.
Mareike Fischer
Our Approach
• Limitation: Most previous approaches are based on simulations.
• Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction.
• We investigate MP first and consider other methods afterwards.
Mareike Fischer
Already known
x
y
y
y
y
Here, the number k of
characters needed to
reconstruct the true
tree grows at rate .
But what happens if we fix the ratio (y:=px), and then
take the value of x that minimizes k?
Steel and Székely (2002):
Mareike Fischer
Our Approach
Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, 2-state symmetric model.
x
px
px
px
px
Mareike Fischer
Main Result
k grows at least at rate p2
For the optimal value of x, k grows at rate p2
For ‘reliable’ MP reconstruction:
Mareike Fischer
Idea of Proof: 1. Applying the CLT
. Then (by CLT)
Set
Xi i.i.d., and
Note that the true tree T1 will be favored over T2 if and only if Zk>0.
Mareike Fischer
Idea of Proof: 2. The Hadamard Representation
Since the Xi are i.i.d., μk and σk depend only on k and the probabilities P(X1=1) and P(X1=-1).
These probabilities can using the ‘Hadamard Representation’:
(Here, θ=e-
2x.)
Thus, for fixed p, the ratio
to find a value of x that minimizes k.
Note that P(X1=1) and P(X1=-1) only depend on x and p.
can be used
Mareike Fischer
Summary and Extension
• For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p2.
• Can other methods do better (e.g. rate p)? No! [Can be shown using the
‘Hellinger distance’.]
Mareike Fischer
Outlook
Questions for future work:• What happens when you approach the
‘Felsenstein Zone’?
• What happens in general with different tree shapes or more taxa?
Mareike Fischer
Thanks…
… to my supervisor Mike Steel,
… to the Newton Institute for organizing this great conference,
… to the Allan Wilson Centre for financing my research,
… to YOU for listening or at least waking up early enough to read this message .
Mareike Fischer
The only true tree…
Merry Christmas!
… is a Christmas tree .
(And it does not even require reconstruction!)