Triplet and Quartet Distances Between Trees of Arbitrary Degree
description
Transcript of Triplet and Quartet Distances Between Trees of Arbitrary Degree
![Page 1: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/1.jpg)
Triplet and Quartet DistancesBetween Trees of Arbitrary Degree
Gerth Stølting BrodalAarhus University
Rolf FagerbergUniversity of Southern Denmark
Thomas Mailund, Christian N. S. Pedersen, Andreas SandAarhus University, Bioinformatics Research Center
ETH Zürich, Switzerland, 22 November 2012
(paper to be presented at SODA’13)
![Page 2: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/2.jpg)
Evolutionary Tree
Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan
Tim
e
Rooted
![Page 3: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/3.jpg)
Unrooted Evolutionary Tree
Dominant modern approach to study evolution is from DNA analysis
![Page 4: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/4.jpg)
Constructing Evolutionary Trees – Binary or Arbitrary Degrees ?
Sequence dataDistance matrix
1 2 3 ··· n123···
n
Neighbor JoiningSaitou, Nei 1987
[ O(n3) Saitou, Nei 1987 ]
Refined Buneman TreesMoulton, Steel 1999
[ O(n3) Brodal et al. 2003 ]
Buneman TreesBuneman 1971
[ O(n3) Berry, Bryan 1999 ]
123···
n
.... ....
Binary trees(despite no evidence
in distance data)
Arbitrary degrees(strong support for all edges ; few branches)
Arbitrary degree (compromise ; good
support for all edges)
![Page 5: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/5.jpg)
Data Analysis vs Expert Trees – Binary vs Arbitrary Degrees ?
Linguistic expert classification (Aryon Rodrigues)
Neighbor Joining on linguistic data
Cultural Phylogenetics of the Tupi Language Family in Lowland South America. R. S. Walker, S. Wichmann, T. Mailund, C. J. Atkisson. PLoS One. 7(4), 2012.
![Page 6: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/6.jpg)
Evolutionary Tree Comparison
?
split1357|2468
1
4
32
5
67
T2
81
6
3
2
5
47
T1
8
Common Only T1 Only T2
1357|2468 35|124678 57|123468
13567|248 48|123567
Robinson-Foulds distance = # non-common splits = 2 + 1 = 3
[Day 1985] O(n) time algorithm using 2 x DFS + radix sort
D. F. Robinson and L. R. Foulds. Comparison of weighted labeled trees. In Combinatorialmathematics, VI, Lecture Notes in Mathematics, pages 119–126. Springer, 1979.
![Page 7: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/7.jpg)
8
8
Robinson-Foulds Distance (unrooted trees)
?
T1
Common Only T1 Only T2
(none) 12567|3481257|3468157|2346857|123468
125678|3412578|3461578|2346578|1234678|123456
1
6
2
5
4
7T2
3
1
6
2
5
4
7
3
RF-dist(T1 , T2) = 4 + 5 = 9RF-dist(T1\{8} , T2\{8}) = 0
Robinson-Foulds very sensitive to outliers
D. F. Robinson and L. R. Foulds. Comparison of weighted labeled trees. In Combinatorialmathematics, VI, Lecture Notes in Mathematics, pages 119–126. Springer, 1979.
![Page 8: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/8.jpg)
resolved : ij|kl
Quartet Distance (unrooted trees)
Consider all quartets, i.e. topologies of subsets of 4 leaves {i,j,k,l}
Quartet T1 T2
{1,2,3,4} 14|23 14|23{1,2,3,5} 13|25 15|23{1,2,4,5} 14|25 1245{1,3,4,5} 14|35 1345{2,3,4,5} 25|34 23|45
i
j
k
l
i
j
k
l
unresolved : ijkl(only non-binary trees)
Quartet-dist(T1 , T2) = - # common quartets = 5 - 1 = 4
1
3
2 5
24
3 1
5
T1 T2
4
G. Estabrook, F. McMorris, and C. Meacham. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology, 34:193-200, 1985.
![Page 9: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/9.jpg)
Triplet Distance (rooted trees)
Consider all triplets, i.e. topologies of subsets of 3 leaves {i,j,k}
Triplet T1 T2{1,2,3} 2|13 2|13{1,2,4} 1|24 4|12{1,2,5} 1|25 5|12{1,3,4} 4|13 4|13{1,3,5} 5|13 5|13{1,4,5} 1|45 1|45{2,3,4} 3|24 4|23{2,3,5} 3|25 5|23{2,4,5} 5|24 2|45{3,4,5} 3|45 3|45
resolved : k|iji j
k i j k
unresolved : ijk(only non-binary trees)
Triplet-dist(T1 , T2) = - # common triplets = 10 - 5 = 5
1
2
53
T1
44
1
52
T2
3
D. E. Critchlow, D. K. Pearl, C. L. Qian: The triples distance for rooted bifurcating phylogenetic trees. Systematic Biology, 45(3):323-334, 1996.
![Page 10: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/10.jpg)
RootedTriplet distance
UnrootedQuartet distance
BinaryO(n2)
O(nlog n)CPQ 1996
[SODA 2013]
O(n3)O(n2)
O(nlog2 n)O(nlog n)
D 1985
BTKL 2000
BFP 2001
BFP 2003
Degrees d
O(n2)O(nlog n)
BDF 2011
[SODA 2013]
O(d 9nlog n)
O(n2.688)O(dnlog
n)
SPMBF 2007
NKMP 2011
[SODA 2013]
Computational Results
12
534 1
3
2 5
4
12
534 6
712
3 110
6 7 1311
58
9
![Page 11: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/11.jpg)
Distance Computation T2
Resolved Unresolved
T1
ResolvedA : Agree
C
B : Disagree
Unresolved D E
i jk
i j k i j k
i j k
i jk
j ki
i kj
i kj
Triplet-dist(T1 , T2) = A – E = B + C + D
A + B + C + D + E = D + E and C + E unresolved in one tree
Sufficient to compute A and E or A and B
i jk
j ki
O(n) triplets & quartets
O(n·log n) triplets
O(n·log n) triplets & quartets
O(d·n·log n) quartets
![Page 12: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/12.jpg)
T2
Resolved Unresolved
T1
ResolvedA : Agree
C
B : Disagree
Unresolved D E
i jk
i j k i j k
i j k
i jk
j ki
i kj
i kj
i jk
j ki
Parameterized Triplet & Quartet DistancesB + α·(C + D) , 0 α 1
BDF 2011 O(n2) for triplet, NKMP 2011 O(n2.688) for quartet[SODA 13] O(n·log n) and O(d·n·log n), respectively
![Page 13: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/13.jpg)
Counting Unresolved Triplets in One Tree
n1 n2 n3 ··· nd
v∑v
∑i < j < k
n i ·n j·n k
∑v ( ∑
i < j <k <ln i· n j· n k · n l+(n−∑l n l ) ∑i < j < k n i ·n j·n k )
Computable in O(n) time using DFS + dynamic programming
n1 n2 n3 ··· nd
v
Triplet anchored at v
Quartet anchored at v
Quartets (root tree arbitrary)
![Page 14: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/14.jpg)
Counting Agreeing Triplets(Basic Idea)
v
1 i j d
∑v T1
∑wT 2
∑c
∑1≤i ≤d
(n i❑c
2 ) (nw−nc−n i❑w+n i
❑c )
T1
j
T2
c
i i
∑1≤ i≤d
n i❑w
w0
![Page 15: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/15.jpg)
Efficient ComputationLimit recolorings in T1 (and T2) to O(n·log n)
v
1 1 1
0v
1 2 d
0v
1
0v
0
v
1
0
v
1
0...
Count T2 contribution
(precondition)
Recolor Recolor Recurse
Recolor & recurse
T1
Reduce recoloring cost in T2 to O(n·log2 n)
1
2 3
4
56
7 891
24
79
8563
Reduce recoloring cost in T2 from O(n·log2 n) to O(n·log n)
T2
arbitrary
height
degree
H(T2)
binary
height
O(log n)
Contract T2 and reconstruct H(T2) during recursion
![Page 16: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/16.jpg)
Counting Agreeing Triplets (II)
v
1 i j d
T10
node in H(T2) =component
composition in T2
∑1≤ i ≤d
(n i❑C1
2 )(n∗❑C2−n i❑C2 )∑
1≤ i ≤d(n∗❑C1−n i
❑C1 ) n( ii )❑ C2∑1≤ i ≤d
n i❑C1 ·n i↑∗
❑ C2+ +
Contribution to agreeing triplets at node in H(T2)
i
i j
ii
j
j
ii
C2
C1
![Page 17: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/17.jpg)
From O(n·log2 n) to O(n·log n)Update O(1) counters for all
colors through node
∑2≤ i ≤d
log (¿T2∨¿n i )= ∑2≤i ≤d
n i ∙ log nv
n iColored path lengths
v
1 i j d
T1 0
nv
ni
w H( T2)
Compressed version of T2 of size O(nv)
Total cost for updating counters
∑leaf l∈ T1
∑ancestor a( j)not heavy child
log na ( j +1)
na ( j )= n ∙ log n
a(1)a(2)
l=a(0)
a(3)
a(4) a(5)
T1
![Page 18: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/18.jpg)
Counting Quartets...
Bottleneck in computing disagreeing resolved-resolved quartets
v
1 i j d
T10
i j i j
∑1≤ i<d
∑i < j ≤ d
n ( ij )G1 ·n ( ij )G2G1 G2
T2
double-sum factor d time
Root T1 and T2 arbitrary Keep up to 15+38d different counters per node in H(T2)...
![Page 19: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/19.jpg)
Distance Computation T2
Resolved Unresolved
T1
ResolvedA : Agree
C
B : Disagree
Unresolved D E
i jk
i j k i j k
i j k
i jk
j ki
i kj
i kj
Triplet-dist(T1 , T2) = A – E = B + C + D
A + B + C + D + E = D + E and C + E unresolved in one tree
Sufficient to compute A and E or A and B
i jk
j ki
O(n) triplets & quartets
O(n·log n) triplets
O(n·log n) triplets & quartets
O(d·n·log n) quartets
![Page 20: Triplet and Quartet Distances Between Trees of Arbitrary Degree](https://reader035.fdocuments.us/reader035/viewer/2022070423/5681678c550346895ddcae8d/html5/thumbnails/20.jpg)
RootedTriplet distance
UnrootedQuartet distance
BinaryO(n2)
O(nlog n)CPQ 1996
[SODA 2013]
O(n3)O(n2)
O(nlog2 n)O(nlog n)
D 1985
BTKL 2000
BFP 2001
BFP 2003
Degrees d
O(n2)O(nlog n)
BDF 2011
[SODA 2013]
O(d 9nlog n)
O(n2.688)O(dnlog
n)
SPMBF 2007
NKMP 2011
[SODA 2013]
Summary
d = maximal degree of any node in T1 and T2
12
534 1
3
2 5
4
12
534 6
712
3 110
6 7 1311
58
9
O(n·log n) ?
o(n·log n) ? The End