The Structure of Level-k Phylogenetic Networks
-
Upload
philippe-gambette -
Category
Education
-
view
603 -
download
1
description
Transcript of The Structure of Level-k Phylogenetic Networks
Lille – 24/06/2009 – CPM'09
The Structure of Level-k Phylogenetic Networks
Philippe Gambette
in collaboration withVincent Berry, Christophe Paul
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
Outline
Phylogenetic networks
median network
minimumspanning network
TCS
level-2 network
split network
SplitsTree
Network
Level-2
T-Rex
reticulogram
synthesisdiagram
HorizStory
median network
minimumspanning network
TCS
level-2 network
split network
SplitsTree
Network
Level-2
T-Rex
reticulogram
synthesisdiagram
HorizStory
Phylogenetic networks
Abstract or explicit networks
An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events.
An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events.
Doolittle : Uprooting the Tree of Life, Scientific American (Feb..2000)
Abstract or explicit networks
An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events.
An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events.
Abstract phylogenetic network of mushroom species, www.splitstree.org
Hierarchy of network subclasses
http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php
Level-k phylogenetic networks
= level 0
level 1 =
http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php
Level-k phylogenetic networks
Motivation to generalize “galled trees” (= level-1) :
Arenas, Valiente, Posada : Characterization of Phylogenetic ReticulateNetworks based on the Coalescent withRecombination, Molecular Biology and Evolution, to appear.
r1
r2
r3
r4
r
Level-k phylogenetic networks
A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:
- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,
- any blob has at most k reticulation vertices.
a b c d e f g h i j k
N
All arcs are oriented downwards
r1
r2
r3
r4
r
Level-k phylogenetic networks
A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:
- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,
- any blob has at most k reticulation vertices.
a b c d e f g h i j k
N A blob is a maximal induced connected subgraph with no cut arc.
A cut arc is an arc which disconnects the graph.
r1
r2
r3
r4
r
Level-k phylogenetic networks
A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:
- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,
- any blob has at most k reticulation vertices.
a b c d e f g h i j k
N A blob is a maximal induced connected subgraph with no cut arc.
A cut arc is an arc which disconnects the graph.
N has level 2.
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
Outline
Decomposition of level-k networks
We formalize the decomposition into blobs:
a b c d e f g h i j k a b c d e f g h i j kN decomposed as a tree of simple graph patterns: generators.
N, a level-k network.
Generators were introduced by van Iersel & al (Recomb 2008) for the restricted class of simple level-k networks.
Level-k generators
A level-k generator is a level-k network with no cut arc.
G0 G1 2a 2b 2c 2d
The sides of the generator are:- its arcs- its reticulation vertices of outdegree 0
Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)
jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)and a sequence (G
j)
jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,Attach
k(l
1,G
1,G
0))...)),
- or N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,SplitRoot
k(G
1,G
0))...)).
G0
Decomposition theorem of level-k networks
SplitRootk(G
1,G
0)
G1 G
0G
1
N is a level-k network
iff
there exists a sequence (lj)
jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)and a sequence (G
j)
jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,Attach
k(l
1,G
1,G
0))...)),
- or N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,SplitRoot
k(G
1,G
0))...)).
Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)
jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)and a sequence (G
j)
jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,Attach
k(l
1,G
1,G
0))...)),
- or N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,SplitRoot
k(G
1,G
0))...)).
Attachk(l
i,G
i,N)
Gi
li is an arc of N
li
Gi
N
Decomposition theorem of level-k networks
Attachk(l
i,G
i,N)
Gi
li is a reticulation vertex of N
li G
i
N
N is a level-k network
iff
there exists a sequence (lj)
jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)and a sequence (G
j)
jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,Attach
k(l
1,G
1,G
0))...)),
- or N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,SplitRoot
k(G
1,G
0))...)).
Decomposition theorem of level-k networks
This decomposition is not unique!
N is a level-k network
iff
there exists a sequence (lj)
jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)and a sequence (G
j)
jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,Attach
k(l
1,G
1,G
0))...)),
- or N = Attachk(l
r,G
r,Attach
k(... Attach
k(l
2,G
2,SplitRoot
k(G
1,G
0))...)).
Decomposition theorem of level-k networks
Unique “graph-labeled tree” decomposition:
Possible applications:- exhaustive generation of level-k networks- counting of level-k networks
31
21
a b c d e f g h i
N
a b c d e f g h i
1h1 h
2 h3 h
4
ρ
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
Outline
Construction of the generatorsVan Iersel & al build the 4 level-2 generators by a case analysis, generalized by Steven Kelk into an exponential algorithm to find all 65 level-3 generators.
Construction of the generators
Van Iersel & al give a simple case analysis for level-2.
We give rules to build level-(k+1) from level-k generators.
h2
h1
Construction of the generators
Construction rules of level-k+1 generatorsfrom level k-generators:
e1
e2
h2
h1
h3
h2
h1 h
3
h2
h1 h
3
h2
h1 h
3R
1(N,h
1,h
2) R
1(N,h
1,e
2) R
1(N,e
2,e
2) R
1(N,e
1,e
2)
N
Rule R1 :
Construction of the generators
e1
e2
h2
h1
N
Rule R2 :
R2(N,h
1,e
2) R
2(N,e
1,e
1) R
2(N,e
2,e
1)
h2
h1 h
2h1
h2h
1
h3
h3
h3
Construction rules of level-k+1 generatorsfrom level k-generators:
h3
h2
h1
Construction of the generators
Problem!Some of the level-k+1 generators obtained from level-k generators are isomorphic!
h2
h1 h
3R
1(2b,h
1,e
2) R
1(2b,h
2,e
1)
h3
h2
h1
Construction of the generators
Problem!Some of the level-k+1 generators obtained from level-k generators are isomorphic!
h2
h1 h
3R
1(N,h
1,e
2) R
1(N,h
2,e
1)
→ difficult to count!
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
Outline
Upper bound
R1 and R
2 can be applied at most on all pairs of sides
A level-k generator has at most 5k slides:
gk+1
< 50 k² gk
Upper bound:g
k < k!² 50k
Theoretical corollary:There is a polynomial algorithm to build the set of level-k+1 generators from the set of level-k generators.
Practical corollary:g
4 < 28350
→ it is possible to enumerate all level-4 generators.
Number of level-k generators
http://www.lirmm.fr/~gambette/ProgramGenerators
It is possible to enumerate all level-4 generators.
Isomorphism of graphs of bounded valence:polynomial
(Luks, FOCS 1980)
Practical algorithm?Simple backtracking exponential algorithm sufficient for level 4 :
go through both graphs from their root in parallel andidentify their vertices: O(n2n-h)
→ g4 = 1993
→ g5 > 71000
Number of level-k generators
Lower bound:g
k ≥ 2k-1
There is an exponential number of generators!
Idea:Code every number between 0 and 2k-1-1 by a level-k generator.
Lower bound
Lower bound
0 1
0 1
k = 1
Lower bound:g
k ≥ 2k-1
There is an exponential number of generators!
Idea:Code every number between 0 and 2k-1-1 by a level-k generator.
Lower bound
0 1
10
0 1
1 0
0 1 2 3
k = 2
Lower bound:g
k ≥ 2k-1
There is an exponential number of generators!
Idea:Code every number between 0 and 2k-1-1 by a level-k generator.
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
Outline
Simulated level-k networks
http://www.lirmm.fr/~gambette/ProgramGenerators
Simulate 1000 phylogenetic networks using the coalescent model with recombination.
Arenas, Valiente, Posada 2008Program Recodon
How many are level-1,2,3... networks?
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100
0
100
200
300
400
500
600
700
800
900
1000
n=50,r=1n=50,r=2n=50,r=4n=50,r=8n=50,r=16n=50,r=32
level
nb of networks
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100
0
100
200
300
400
500
600
700
800
900
1000
n=10,r=1n=10,r=2n=10,r=4n=10,r=8n=10,r=16n=10,r=32
level
nb of networks
Simulated level-k networks
level
number of reticulations
0 5 9
9
5
0
Simulate 1000 phylogenetic networks using the coalescent model with recombination.
Link between level and number of reticulations:
http://www.lirmm.fr/~gambette/ProgramGenerators
Summary on the level parameter
Advantages:• natural structure for all explicit phylogenetic networks• global tree-structure used algorithmically• finite graph patterns to represent blobs: generators
Limits:• number of generators exponential in the level• complex structure of generators• when recombination is not local, level doesn't help
Questions?Thank you for your attention!
TreeCloud available at http://treecloud.org - Slides available at http://www.lirmm.fr/~gambette/RePresentationsENG.php
Split network and reticulogram of the 50 most frequent words in CPM titles, hyperlex cooccurrence distance.
TreeCloud
TreeCloud T-Rex