Download - bla bla bla new 2014

Transcript
  • 8/11/2019 bla bla bla new 2014

    1/11

    Evolutionary Bioinformatics 2008:4 237247 237

    ORIGINAL RESEARCH

    Correspondence:Marc Thuillard, Belimo Automation AG, CH-8340 Hinwil, Switzerland.Email: [email protected]

    Copyright in this article, its metadata, and any supplementary data is held by its author or authors. It is published under the

    Creative Commons Attribution By licence. For further information go to: http://creativecommons.org/licenses/by/3.0/.

    Minimum Contradiction Matrices in Whole GenomePhylogenies

    Marc Thuillard

    Belimo Automation AG, CH-8340 Hinwil, Switzerland.

    Abstract:Minimum contradiction matrices are a useful complement to distance-based phylogenies. A minimum contradictionmatrix represents phylogenetic information under the form of an ordered distance matrix Yi j

    n

    ,. A matrix element corresponds

    to the distance from a reference vertex nto the path (i,j). For an X-tree or a split network, the minimum contradiction matrixis a Robinson matrix. It therefore fulfills all the inequalities defining perfect order: Y Yi j

    n

    i k

    n

    , , , Y Yk j

    n

    k i

    n

    , , , ijkn.

    In real phylogenetic data, some taxa may contradict the inequalities for perfect order. Contradictions to perfect order cor-respond to deviations from a tree or from a split network topology. Efficient algorithms that search for the best order are

    presented and tested on whole genome phylogenies with 184 taxa including many Bacteria, Archaea and Eukaryota. Afteroptimization, taxa are classified in their correct domain and phyla. Several significant deviations from perfect order corre-spond to well-documented evolutionary events.

    Keywords:phylogenetic trees, whole genome phylogeny, minimum contradiction, split network

    1. Introduction

    The discovery of the importance of lateral transfers, losses and duplications events in the evolution ofgenetic sequences has motivated the development of new approaches to graphically represent phylog-enies. Methods like NeighborNet (Bryant and Moulton, 2004), T-Rex (Makarenkov et al. 2006),SplitTrees (Bandelt and Dress, 1992; Dress and Huson, 2004; Huson, 1998), Qnet (Grnewald et al.2006), Pyramids (Bertrand and Diday, 1985), Tree of Life (Kunin et al. 2005a) allow visualizingdeviations from a tree topology. All these methods have in common that they summarize the informa-tion in the form of a planar network. Deviations from an X-tree are often represented by supplementaryedges (Makarenkov et al. 2006; Nakhleh et al. 2004) that create cycles in the graph.

    Phylogenetic information can be represented by a distance matrix Yi jn

    , . For an X-tree, the elementsof the distance matrix Yi j

    n

    , correspond to the distance from a reference taxon nto the path (i,j).The taxacan be ordered through permutations, so that the distance matrix is a Robinson matrix (Bertrand andDiday, 1985), with values of both rows and columns decreasing away from the diagonal. The corre-

    sponding circular order is defined as a perfect order. We have shown with a probabilistic model thatperfect order is quite robust against lateral transfer and crossover (Thuillard, 2007). The search for theorder minimizing a measure of the deviation from perfect order can be efficiently done with a multi-resolution algorithm (Thuillard, 2001, 2007). The method has been tested on SSU rRNA data for Archaea.The matrix with the best order corresponds quite well to a Robinson matrix. In this article, the minimumcontradiction approach is further developed and applied to whole genome phylogenies.

    With the availability of complete genomes, many methods have been proposed to determine theevolution of whole genomes (For reviews see Galperin et al. 2006; Delsuc et al. 2005; Henz et al. 2005).The construction of trees from whole genomes has proved over recent years to be a quite difficult task.This is mainly because of the very limited number of genes shared by Archaea, Eukaryota and Bacteria.Furthermore, gene evolution can sometimes be very different from species evolution. The main difficultyconsists in finding a good operator to estimate the distance between genomes. Distances have been

    estimated with measures based on gene order or arrangement (Wolf et al. 2002; Wang et al. 2006),gene content (Fitz-Gibbon and House, 1999; Snel et al. 1999; Korbel et al. 2002), protein domain orga-nization (Fukami-Kobayashi et al. 2007; Yang et al. 2005), folds (Lin and Gerstein, 2007), combiningthe information from many genes in a supertree or a superdistance (Dutihl et al. 2007 for a comparativestudy) or using a local alignment search tool such as Blast (Kunin et al. 2005b; Clarke et al. 2002).

    http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/
  • 8/11/2019 bla bla bla new 2014

    2/11

    238

    Thuillard

    Evolutionary Bioinformatics 2008:4

    Among genome distances obtained with Blast, thegenome conservation (Kunin et al. 2005b) hasfurnished some of the best trees up to date, if thequality of a whole genome phylogeny is measuredby its concordance to broadly accepted classifica-tions. The genome conservation estimates thedistance between two taxa using the sum of BlastP

    reciprocal best hits between two genomes. Themethod is capable of quite correctly recovering allmain phyla. At the phylum level, the evolution ofthe different genes is sufficiently similar to form adistinct cluster. The main uncertainties in wholegenome phylogenies are on the relationshipsbetween phyla. Different evolution rates of thegenes, gene losses or duplications, lateral genetransfer may result into large deviations ofthe distance matrix from a tree topology. In thiscontext, minimum contradiction matrices can fur-nish information not contained in a single tree or a

    split network.The paper is organized as follows. After intro-

    ducing minimum contradiction matrices in section 2and their connection to Robinson matrices andKalmanson inequalities, section 3 explains whythe identification of deviations from perfect orderis a useful complement to phylogenetic studies.Section 4 presents an algorithm to search for theorder minimizing a measure of the deviation fromperfect order over all taxa. This order can be inter-preted as an average best order over all referencetaxa Yi j

    N

    , (N =1, , n).The algorithm is appliedin section 5 to distance matrices for whole genomephylogenies obtained with the genome conserva-tion method.

    2. Circular Order and the MinimumContradiction Approach

    2.1. DefinitionsLet us start by recalling a number of definitions thatare necessary to introduce the notion of circular

    order. A graph G is defi

    ned by a set of vertices V(G)and a set of edgesE(G). Let us write e(x,y), the edgebetween the two verticesxandy. In a graph G, apathPbetween two verticesxandyis a sequenceof non-repeating edges e(x1,z1), e(z1,z2), , e(zi,y)connecting xtoy. The degree of a vertexxis thenumber of edges e E(G) to which x belongs.A leafxof a graph is a vertex of degree one. A ver-tex of degree larger than one is called an internalvertex.

    A valued X-tree Tis a graph withXas its set ofleaves and a unique path between any two distinctverticesxandy, with internal vertices of at mostdegree 3. The distance dbetween leaves satisfiesthe classical triangular inequality

    d x z d x y d y z x y z X ( , ) ( , ) ( , ) , , + for all (1)

    with d(x,y) representing the sum of the weights onthe edges of Tin the path connectingxandy.

    A central problem in phylogeny is to determineif there is anX-tree Tand a real-valued weightingof the edges of Tthat fits a dissimilarity matrix .Typically, a dissimilarity matrix corresponds toan estimation of the pairwise distance d(xi,xj)between all elements in X. A necessary andsatisfactory condition for the existence of a uniquetree is that the dissimilarity matrix satisfies the

    so-called 4-point condition (Bunemann, 1971). Forany four elements in X, the 4-point conditionrequires that

    ( , ) ( , ) max( ( , )

    ( , ), ( , ) ( , ))

    x x x x x x

    x x x x x x

    i k j n i j

    k n i n j k

    +

    + +

    .. (2)

    2.2. Circular order and KalmansoninequalitiesConsider a planar representation of a tree Tor asplit network S. A circular order corresponds to anindexing of the n leaves according to a circular(clockwise or anti-clockwise) scanning of the leaves(Barthlemy and Gunoche, 1991; Makarenkov andLeclerc, 1997, 2000; Yushmanov, 1984).

    In an X-tree, a circular order has the propertythat for any integer k (modulo n), all thebranches on the pathP(xk,xk+1) betweenxkandxk+1 correspond to the left branch (or rightbranch if anti-clockwise). A circular order canbe obtained by considering the distance matrix

    Yi jn

    , . As illustrated in Figure 1, the matrixelement Y d x x d x x d x xi jn

    i n j n i j , ( ( , ) ( , ) ( , ))= + 12

    corresponds to the distance between a referenceleaf nand the pathP(xi,xj). A circular order canbe computed by ordering the distance matrixY

    i j

    n

    , so that it fulfils the inequalities defining aperfect order

    Y Y Y Y i j k ni jn

    i k

    n

    k j

    n

    k i

    n

    , , , ,, ( ). (3a)

  • 8/11/2019 bla bla bla new 2014

    3/11

    239

    Minimum contradiction matrices in whole genome phylogenies

    Evolutionary Bioinformatics 2008:4

    The above inequalities characterize also aRobinson matrix (Christopher et al. 1996; Thuillard,2007). Using the definition of Yi j

    n

    , the inequalitiesbecome

    d x x d x x d x x d x x

    d x x d x x d

    j n i k k n i j

    j n k i

    ( , ) ( , ) ( , ) ( , )

    ( , ) ( , ) (

    + +

    +

    and

    xx x

    d x x i j k n

    i n

    k j

    , )

    ( , ) ( ).+

    (3b)

    These inequalities have a similar form to the4-point condition (2) and are known as theKalmanson inequalities.

    2.3. Minimum contradiction matrixIn real applications, the distance matrix Yi j

    n

    , doesoften only partially fulfill the inequalities corre-sponding to a perfect order. The contradiction onthe order of the taxa can be defined as

    C Y Y

    Y Y

    i k

    n

    i j

    n

    k j i

    i j k n

    i k

    n

    j k

    n

    = ( )( )( )

    + ( )(

    max ,

    max ,

    , ,

    , ,

    , ,

    0

    0

    ))( )

    k j ii j k n

    , ,

    .

    (4)

    The best order of a distance matrix is, perdefinition, the order minimizing the contradiction.The ordered matrix Yi j

    n

    , corresponding to the bestorder is defined as the minimum contradictionmatrix for the reference taxon n.

    For a perfectly ordered X-tree, the contradictionCis zero. A tree with a low contradiction valueC is a tree that can be trusted, while a highcontradiction value Cis the indication of a distancematrix deviating significantly from an X-tree.

    3. Why Perfect Orderis an Important Property?Kalmanson inequalities are at the center of anumber of important results relating convexity(Kalmanson, 1975), the Traveling SalesmanProblem (TSP) (Deineko et al. 1995; Korostenskyand Gonnet, 2000), phylogenetic trees andnetworks (Christopher et al.1996; Dress andHuson, 2004). Let us explain why perfect order isan important property.

    If the error on the distance in an X-tree is notgreater thanxmin/2 withxmin the shortest edge onthe tree, then the Neighbor-Joining algorithmwill recover the correct tree topology andKalmanson inequalities hold (Atteson, 1999;Korostensky and Gonnet, 2000).

    If a distance matrix d fulfills Kalmansoninequalities, then the distance matrix can beexactly represented by a split network (Bandeltand Dress, 1992).

    If Kalmanson inequalities are fulfilled, then thetour (1, 2, , n) corresponds to a solution of theTraveling Salesman Problem (Christopher et al.

    1996).The last result can be demonstrated starting

    from the sum = +i n i inY

    1 2 1

    , , ,

    . When Kalmanson

    inequalities are fulfilled, the sum =

    +i n

    i i

    nY1 2

    1, ,

    ,

    is

    maximized. As Y Yi in

    i i m

    n

    , ,+ +1 (i+m n, m 1).Developing = +i n i i

    nY1 2

    1, ,

    ,

    , one gets ==

    +i n

    i i

    nY1 1

    1, ...,

    ,

    + = =

    +i n

    i n ni n

    i id d d

    1 1

    1 1 11 2

    , ..., , ,

    , , ,/ .( ).

    The first

    sum =i n

    i nd1, ...,

    , is independent of the order and

    one concludes that a perfect order minimizesd d

    ni n

    i i11 1

    1,, ...,

    ,+ =

    + .The tour (1, 2, ,n) is there-

    fore a solution of the TSP.The solution to the TSP has the Master Tour

    property (Deineko et al. 1995). A Master Tour isa solution of the TSP with the property that theoptimal tour restricted to a subset of points isalso a solution of the reduced TSP. This resultfollows directly from the inequalities for perfectorder Y Yi j

    n

    i k

    n

    , , , Y Yk jn

    k i

    n

    , , (ijkn). Anyrestriction of a perfectly ordered distance matrix

    n

    n

    jiY ,

    i j

    )),(),(),((21

    , jinjni

    n

    ji xxdxxdxxdY +=

    nv

    Ne

    Re Le

    Figure 1. The distance matrix Yi j

    n

    ,corresponds to the distance between

    the leaf nand the path P(i,j ).

  • 8/11/2019 bla bla bla new 2014

    4/11

    240

    Thuillard

    Evolutionary Bioinformatics 2008:4

    Yi jn

    , to a subset of taxa is perfectly ordered andconsequently is a solution to the reduced TSP. Incontrast to this result, one finds with numericalexperiments that, if the minimum contradictionmatrix does not fulfill the inequalities for perfectorder, the best order is not always preserved whena number of taxa are removed. The order mini-

    mizing the contradiction over n taxa does notalways minimize the contradiction whenrestricted to a subset of taxa. It follows that onecannot exclude that the topology of a tree or asplit network may change when taxa contradict-ing perfect order are removed. Deviations fromperfect order correspond to problematic regionsthat have to be interpreted very carefully. For thatreason we suggest that minimum contradictionmatrices are a useful complement to any distance-based phylogeny.

    4. Searching for the Best Orderin Whole Genome Phylogenies

    4.1. Fast algorithm to searchfor the best orderThe choice of the reference taxon n in Yi j

    n

    , cansignificantly influence the best order, when the dis-tance matrix cannot be perfectly ordered. For thatreason, an average best order is determined by mini-mizing the contradiction over all reference taxa.

    The contradiction over all n reference taxa isgiven by

    C Y Yi m k mn m

    i m j m

    n m

    k j ii j k n

    = ( )( )( )

    max ,( ), ( )( ) ( ), ( )( )

    , ,

    0

    + ( )( )( )

    =

    m n

    i m k m

    n m

    j m k m

    n m

    k

    Y Y

    1

    0

    , ...,

    ( ), ( )

    ( )

    ( ) , ( )

    ( )max ,

    jj ii j k n

    , ,

    with i(m) =mod(m+i02, n) +1;j(m) =mod(m+j02, n) +1, n(m) =n0m+1 and = 2.

    The best order is the order (1, , i0, ,j0, , n0) minimizing the contradiction. The com-putation of the contradiction requires O(n4) opera-tions. For a large ensemble of taxa, the computationalcost may become quite high. We will thereforeintroduce below an algorithm requiring only O(n3)

    operations to compute a (slightly different) mea-sure of the contradiction.

    Let us start by considering an X-tree and the3 vertices i,j,kas in Figure 2. The distance matrixfulfills the inequalities for perfect order. The orderbetween the vertices i, j, k is preserved for anyreference vertex not in the interval (i,k) and the

    inequalities Y Yi jn

    i k

    n

    , ,

    and Y Yk jn

    k i

    n

    , ,

    n=1, , i,k, ,Nhold. The inequalities can be summed upover all nand one obtains two new inequalities:

    Sa i j Sa i k i j k ( , ) ( , ) ( ) (6a)

    Sb i j Sb i k i j k ( , ) ( , ) ( ) (6b)

    With

    Sa i j Y Sb i j Y i jn

    n i

    i j

    n

    n k N

    ( , ) ; ( , ),, ...,

    ,, ...,

    = == = 1

    (6c)

    If the contradiction ci,jbetween the vertices i,jisdefined as the sum of two terms

    c ca cbi j i j i j , , ,= + with (7a)

    ca Sa i k Sa i j i jk j i

    , max( , ( ( , ) ( , )) )= 0 2

    (7b)

    cb Sb i k Sb i j i jk j i

    , max( ,( ( , ) ( , )) )= 0 2

    (7c)

    i j k

    n''

    n

    n'

    Figure 2.The inequalities Y Yi j

    n

    i k

    n

    , , are fulfilled for any reference

    vertex nwith n kor n i.

    (5)

  • 8/11/2019 bla bla bla new 2014

    5/11

    241

    Minimum contradiction matrices in whole genome phylogenies

    Evolutionary Bioinformatics 2008:4

    then the best order is the order minimizing

    c ca cbi Nj i

    i j i j = += 1, ..., , , .

    Computing the contradiction

    requires O(n3) operations (As the computation ofthe contradiction is the most computer-intensive,the algorithm requires approximately n times lesscomputing time than the O(n4) algorithm).

    The quantities Sa and Sb in Eq. (6) can berelated to the NJ algorithm. For 3 consecutivevertices (i, j= i+1, k= i+2), Eq. (6a) can bewritten, assuming perfect order, as

    Y Yi i

    n

    n i i i

    i i

    n

    n i i i

    ,, ,

    ,, ,

    + + +

    + + +

    11 2

    21 2

    (8)

    Writing r d x xin N

    i n= = 1, ..., ( , ) and Si,j = ri + rj (N 2) d(xi,xj) one obtains

    S Si j i i, , .+ +1 2 0 (9)

    The value Si,j is central to the NJ algorithm(Saitou and Nei, 1987; Gascuel and Steel, 2006 ).Two vertices i,jare joined by the NJ algorithm, ifthey maximize S (i.e. max(S) = Si,j). From theabove discussion, it seems natural to initialize thesearch for the best order on the NJ tree. The searchfor the best order of Yi j

    n

    , is initialized with the NJalgorithm and a small supplementary procedure

    that we describe below. Given two vertices a andb that are joined by the NJ algorithm and the leavesa1, a2, , ai(resp. b1, b2, , bj) that have the ver-tex a(resp. b) as first ancestor. The best order ofthe leaves is chosen so as to minimize the contra-diction among 4 possibilities: (ab ab ab ab, , , withabthe order a1, a2, , ai, b1, b2, , bjand a theinversed order ai, ai1, , a1. Once the order isoptimized over the NJ tree, the best order is refinedwith a multiresolution search algorithm (Thuillard,2001, 2007).

    4.2. Similarity matrix for wholegenomes phylogeniesFor whole genome phylogenies, the search forappropriate measures to estimate the evolutionarydistance between taxa is still the subject of sig-nificant research efforts (Korbel et al. 2002; Kuninet al. 2005b; Yang et al. 2005; Fukami-Kobayashi,2007). Distance matrices obtained from BlastPscores have been quite successful to generate good

    trees. The similarity score obtained with BlastPprograms can be given a probabilistic interpreta-tion. The statistics of high scoring segments in theabsence of gaps tends to an extreme value distribu-tion (Karlin and Altschul, 1990). The probabilityPoffinding at least a high scoring segment is wellapproximated, for small values ofP, by the formula

    P = m1m22Score

    with m1, m2 the length of the2 sequences. It follows that Score = log2P+log2(m1m2). Defining the distance dbetweentwo sequences as d=Scoreand assuming equallengths one has d = log2(P/m

    2). Using thatdefinition, the distance matrix Yi j

    n

    , becomes for3 sequences

    YP i n P j n

    P i j mi j

    n

    , / log( ) ( )

    ( ).=

    1 2 2 2 (10)

    The log term has the form of a mutual informa-tion and furnishes a measure of the similarity ofthe genomes i and j in reference to the genome n.

    Different approaches have been proposed tonormalize the distance matrix using the marginalentropy (Kraskov et al. 2005), the self-score(Kunin et al. 2005b), Korbel normalization(Korbel et al. 2002) or the average score. Thenormalization by the self-score in the genomeconservation gives some of the best results. It isbased on a nonlinear weighted sum of the BlastPscores. The gene conservation method computes

    the distance between two taxa by normalizing thesum of reciprocal best hits between genome iandj by the self-score. The effect of duplication islimited by using only reciprocal best hits. Thenormalization by the self-score is important tocorrect, at least partially, the effect of differentgenome sizes. The genome conservation similaritymatrix is given by

    S i j j i i i j j i j, min ( , ), ( , ) min ( , ), ( , )= ( ) ( ) (11)

    with(i,j) the sum of reciprocal best hits betweenthe genomes of the two taxa.

    5. Whole Genome Phylogenies

    5.1. Search for the best average orderThe algorithms described in section 4 have beenused to search for the best order. The distance matrix

  • 8/11/2019 bla bla bla new 2014

    6/11

    242

    Thuillard

    Evolutionary Bioinformatics 2008:4

    was computed using the data furnished by thegenome phylogeny server (Kunin et al. 2005b)obtained with an e-value cut-off set to 1010. Thecontradiction is significantly lower with the score(1 Si,j) than with the logarithm of the score.Figure 3 shows the best order after optimizationwith the algorithms described in section 4 followed

    by 5000 steps of the multiresolution search algorithmusing Eq. (7) to compute the contradiction.Table 1 gives the order of the different taxa

    corresponding to the best order. Archaea andEukaryota are grouped into two adjacent clustersof taxa. One observes, for Bacteria, that all themembers of a class or a phylum are neighbors. Allproteobacteria (together with Aquifex?) aregrouped together. The best order obtained with theminimum contraction approach differs from theNJ tree on the following aspect: all spirochetes and-proteobacteria form a cluster. This is not the caseof the NJ tree.

    5.2. Interpreting minimum contradictionmatricesThis article focus on the mathematical aspects ofMinimum Contradiction Matrices. We will limitthe discussion to 3 examples showing how tointerpret Minimum Contradiction Matrices. Thematrix Yi j

    n

    , can be imaged for different reference

    taxa using the best order of Figure 3 given in theannex. Figure 4 shows the matrix Yi j

    n

    , usingPire-llula(taxa 177) as reference taxa. The scale on theright of the figure gives the color code used torepresent Yi j

    n

    , after rescaling. The minimum valueof Yi j

    n

    , corresponds to dark blue, while the largestvalues are coded red. Low values of Yi j

    n

    , areassociated to two vertices (i, j) having a firstcommon ancestor vertex close to the referencetaxa. A cluster of adjacent taxa with large values(red cluster) can be interpreted as a group of closetaxa. One observes that Archaea and Eukaryota are

    not only adjacent but form also a cluster.

    20 40 60 80 100 120 140 160 180

    20

    40

    60

    80

    100

    120

    140

    160

    180

    10

    20

    30

    40

    50

    60

    20 40 60 80 100 120 140 160 180

    20

    40

    60

    80

    100

    120

    140

    160

    180

    10

    20

    30

    40

    50

    60

    Figure 3.Minimum contradiction matrices corresponding to the best order found after optimization with Eq. 6,7. The contradiction is minimized

    over the lines of the matrix Sa i j Y n i

    i j

    n

    ( , ), ...,

    ,=

    =1

    (left) and the columns Sb i j Y n k N

    i j

    n

    ( , ), ...,

    ,=

    = (right).

  • 8/11/2019 bla bla bla new 2014

    7/11

    243

    Minimum contradiction matrices in whole genome phylogenies

    Evolutionary Bioinformatics 2008:4

    The best order in Figure 3 is obtained by mini-mizing the contradiction using all taxa as referencevertex at least once. The best order is therefore akind of average best order. The matrixYi j

    n

    , (resp.

    =n n ni j

    n

    k

    Y1, ...,

    , ) with n corresponding to a unique taxon

    (resp. a group of taxa belonging to some phylum)allows the identification of large contradictionsfrom the best order. These contradictions can oftenbe specifically related to the reference taxon. A lossof a gene, a lateral gene transfer or a crossover inthe reference taxon modifies all elements of thedistance matrix Yi j

    n

    , . A similar perturbation on a

    taxon that is not a reference taxon affects at mostthe row and the column corresponding to thattaxon.

    Many contradictions in Figure 5 can be associ-ated to well accepted endosymbiotic events(Chloroplasts in plants or mitochondria inEukaryota). Figure 5a shows Yi j

    n

    , for Archaea,Eukaryota and some Bacteria (Taxa 72116) usingRickettsiales (Taxa 14 in annex) as reference taxa.The average best order is used to order the taxa.Contradictions on the order of the taxa are identi-fied by looking for regions with Yi j

    n

    , increasing

    away from the diagonal (i.e.Y Y i j k ni jn

    i k

    n

    , , , )< < < < .Contradictions are observed for i = Bacteria (with-out Mycoploasma) j = Eukaryota. One observesthat Yi j

    n

    , decreases away from the diagonal exceptbetween Eukaryota and Archaea (dark bluecompared to light blue for Archaea). This result is,at first glance, somewhat surprising. Similar valuesof Yi j

    n

    , for Archaea and Eukaryota are expected

    20 40 60 80 100 120 140 160 180

    20

    40

    60

    80

    100

    120

    140

    160

    180

    Proteobacteria Firmicutes Euka. Arch.812 1620

    Figure 4.Distance matrix Yi j

    n

    ,using the best order in Figure 3 and Pirellula(taxon 177) as reference taxon.

    Table 1. Best Order (Fig.3, 4).

    1. -Proteobacteria 1142. -Proteobacteria 15183. -Proteobacteria 19294. -Proteobacteria 30545. -Proteobacteria 55596. Aquificae 607. -Proteobacteria 61638. Chlorobi 649. Bacteroidetes 656610. Spirochetes 677111. Thermotogae 7212. Fusobacteria 7313. Firmicutes 7411614. Eukaryota 11713515. Archaea 13615216. Actinobacteria 15316617. Deinococcus-Thermus 16716818. Cyanobacteria 16917619. Planctomycetes 17720. Chlamydiae 178184

    (see annex for detailed list of taxa).

  • 8/11/2019 bla bla bla new 2014

    8/11

    244

    Thuillard

    Evolutionary Bioinformatics 2008:4

    when i, ncorrespond to Bacteria. The low valuesfor Eukaryota can be explained by a lateral transferbetween the Rickettsiales and Eukaryota. We haveshown with a probabilistic model (Thuillard, 2007)that a lateral transfer between the reference taxaand some taxa reduces the expected values of Yi j

    n

    , for those taxa. In this model, the expected value

    ,Yi jn

    after an -lateral transfer is given by ( ), , , ,Y Y YE E

    R

    E E

    R

    R R

    R

    E E

    R

    1 2 1 2 1 2 1 2

    1= + with theproportion of the genome laterally transferred( 1) from the reference taxaR, andR1,R2thelaterally transferred sequence after furtherevolution into the Eukaryota genomesE1,E2. Theobserved contradiction and the small values of Yi j

    n

    ,

    for Eukaryota are consistent with a lateral transferbetween the reference taxa (Rickettsiales) andEukaryota. Let us recall here that mitochondria arebelieved to be the result of an endosymbiotic eventinvolving Rickettsia (Timmis et al. 2004), an eventthat resulted also into the transfer of some

    Rickettsia genes into the nucleus of the host.Figure 5b shows the distance matrix using allCyanobacteria as reference taxa. The elementsassociated to Arabidopsisand Cyanidioschyzonhave lower values than both adjacent lines (resp.columns). The observed contradictions for Ara-bidopsis and Cyanidioschyzon merolae (a plantand a red alga) may be explained by the manygenes that are found in both Cyanobacteria andplants/red alga but absent in other Eukaryota,

    a hypothesis that is supported by the small valueof the distance between Cyanobacteria and(Arabidopsis, Cyanidioschyzon). Chloroplasts inplants and red alga are generally considered tohave originated as endosymbiotic Cyanobacteria.The low values of Yi j

    n

    , for i =Arabidopsis, Cyanid-ioschyzonare compatible with the hypothesis thatsome Cyanobacteria genes have been transferredinto the host.

    ConclusionsFor an X-tree or a split network the mini-mum contradiction matrix Y d x xi j

    n

    i n, ( ( , )= +12

    d x x d x x

    j n i j( , ) ( , )) fulfills all the inequalitiesdefining perfect order (i.e. Y Yi j

    n

    i k

    n

    , , ,Y Yk jn

    k

    n

    , , , ij k n). In real applications a number of taxamay typically be in contradiction to the inequalitiesfor perfect order. In that case, the Master Tourproperty does not hold. It follows that the removalor the addition of taxa in contradiction to the

    inequalities may change the topology of the asso-ciated NJ tree or split network.An average best order can be obtained by

    searching for the best circular order over Yi jn

    , (N = 1, , n). The matrix Yi j

    n

    , can be used to local-ize a problematic taxon, as large deviations fromthe average best order are often related to the ref-erence taxon n. This approach was applied towhole genome phylogenies using distancescomputed with the genome conservation method.

    2 4 6 8 10 12 14 16 18

    2

    4

    6

    8

    10

    12

    14

    16

    18

    10

    20

    30

    40

    50

    60

    10 20 30 40 50 60 70 80

    10

    20

    30

    40

    50

    60

    70

    80

    10

    20

    30

    40

    50

    60

    EukaryotaArchaea

    Eukaryota

    Bacteria

    a) b)

    Figure 5.Distance matrix Yi j

    n

    ,for a) Rickettsiales (Taxa 14) as reference taxa and taxa 72152 in Figure 3. b) Eukaryota using Cyanobac-

    teria as reference taxa. The arrow points toArabidopsisand Cyanidioschyzon.

  • 8/11/2019 bla bla bla new 2014

    9/11

    245

    Minimum contradiction matrices in whole genome phylogenies

    Evolutionary Bioinformatics 2008:4

    Several large deviations from the average bestorder were found to correspond to well-docu-mented evolutionary events.

    DisclosureThe authors report no conflicts of interest.

    ReferencesAtteson, K. 1999. The Performance of Neighbor-Joining Methods of

    Phylogenetic Reconstruction.Algorithmica, 25:25178.

    Barthlemy, J.P. and Gunoche, A. 1991. Trees and proximity representa-

    tions. New York: Wiley.

    Bandelt, H.J. and Dress, A. 1992. Split decomposition: a new and useful

    approach to phylogenetic analysis of distance data. Molecular

    Phylogenetic Evolution, 1:24252.

    Bertrand, P. and Diday, E. 1985. A visual representation of the compatibil-

    ity between an order and a dissimilarity index: the pyramids.

    Computational Statistics Quarterly, 2:3144.

    Bryant, D. and Moulton, V. 2004. Neighbor-Net: an agglomerative method

    for the construction of phylogenetic networks. Molecular Biology

    and Evolution, 21:25565.

    Buneman, P. 1971. The recovery of trees from measures of dissimilarity.

    In: Hodson F.R. Kendall, DG. and Tautu P, (eds). Mathematics in theArchaeological and Historical Sciences. Edinburgh: Edinburgh Uni-

    versity Press, 38795.

    Christopher, G.E., Farach, M. and Trick, M.A. 1996. The structure of

    circular decomposable metrics. In European Symposium on

    Algorithms (ESA)96, Lectures Notes in Computer Science,

    1136:455500.

    Clarke, G.D.P., Beiko, R., Ragan, M.A. and Charlebois, R.L. 2002. Inferring

    genome trees by using afilter to eliminate phylogenetically discordant

    sequeneces and a distance matrix based on mean normalized BlastP

    scores.Journal of Bacteriology, 184:207280.

    Delsuc, F., Brinkmann, H. and Philippe, H. 2005. Phylogenomics and the

    reconstruction of the tree of life. Nature Rev iew s Geneti cs ,

    6:36176.

    Deineko, V., Rudolf, R. and Woeginger, G. 1995. Sometimes traveling is

    easy: the master tour problem, Institute of Mathematics, SIAM.Journal on Discrete Mathematics, 11:8193.

    Dress, A. and Huson, D. 2004. Constructing split graphs. IEEE Transactions

    onComputational Biology and Bioinformatics, 1:10915.

    Dutilh, B.E., Noort, V., Heijden, R.T.J.M., Boekhout, T., Snel, B. and

    Huynen, M.A. 2007. Assessment of phylogenomic and orthology

    approaches for phylogenetic inference.Bioinformatics, 23:81524.

    Fitz-Gibbon, S.T. and House, C.H. 1999. Whole genome-based phylogenetic

    analysis of free-living microorganisms. Nucleic Acids Research,

    27:4718222.

    Fukami-Kobayashi, K., Minezaki, Y., Tateno, Y. and Nishikawa, K. 2007.

    A tree of life based on protein domain organizations. Molecular

    Biology and Evolution, 24:11819.

    Galperin, M.Y. and Kolker, E. 2006. New metrics for comparative genomics.

    Current Opinion in Biotechnology, 17:4407.

    Gascuel, O. and Steel, M. 2006. Neighbor-joining revealed. Molecular

    Biology and Evolution, 23:19972000.

    Grnewald, S., Forslund, K., Dress, A. and Moulton, V. 2006. QNet: an

    agglomerative method for the construction of phylogenetic net-

    works from weighted quartets.Molecular Biology and Evolution,

    24:5328.

    Henz, S.R., Huson, D.H., Auch, A.F., Nieselt-Struwe, K. and Schuster, S.C.

    2005. Whole-genome prokaryotic phylogeny. Bioi nformatics,

    21:232933.

    Huson, D. 1998. Splitstree- a program for analyzing and visualizing

    evolutionary data.Bioinformatics, 14:6873.

    Kalmanson, K. 1975. Edgeconvex circuits and the traveling salesman

    problem. Canadian Journal of Mathematics, 27:100010.

    Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical

    significance of molecular sequence features by using general scoring

    schemes. Proceedings Nati onal Academy of Sciences U.S.A.,

    87:22648.

    Korbel, J.O., Snel, B., Huynen, M.A. and Bork, P. 2002. SHOT: a web server for

    the construction of genome phylogenies. Trends Genetics, 18:15862.

    Korostensky, C. and Gonnet, G.H. 2000. Using traveling salesman problem

    algorithms for evolutionary tree construction. Bioinformatics,

    16:61927.

    Kraskov, A., Stgbauer, H., Andrezejak, R.G. and Grassberger, P. 2005.

    Hierarchical clustering using mutual information.Europhysics Letter,

    70:27884.

    Kunin, V., Goldovsky, L., Darzentas, N. and Ouzounis, C.A. 2005a. The

    net of life: reconstructing the microbial phylogenetic network.

    Genome Research, 15:9549.

    Kunin, V., Ahren, D., Goldovsky, L., Janssen, P. and Ouzounis, C.A. 2005b.

    Measuring genome conservation across taxa: divided strains and

    united kingdoms.Nucleic Acids Research, 33(2):61621. http://cgg.

    ebi.ac.uk/cgibin/gps/GPS.pl

    Lin, J. and Gerstein, M. 2007. Whole-genome trees based on the occurence

    of folds and orthologs: implications for comparing genomes on dif-

    ferent levels. Genome Research, 2000:80818.

    Makarenkov, V. and Leclerc, B. 1997. Circular orders of tree metrics, and

    their uses for the reconstruction and fitting of phylogenetic trees. In:

    Mirkin B., Morris FR., Roberts F, Rzhetsky A, (eds). Mathematical

    hierarchies and Biology, DIMACS Series in Discrete Mathematicsand Theoretical Computer Science. Providence:Amer. Math. Soc.,

    183208.

    Makarenkov, V. and Leclerc, B. 2000. Comparison of additive trees using

    circular orders.J. Computational Biol., 5:73144.

    Makarenkov, V., Kevorkov, D. and Legendre, P. 2006. Phylogenetic network

    construction approaches. Applied Mycology and Biotechnology,

    International Elsevier Series, vol. 6.Bioinformatics, 6197.

    Mihaescu, R., Levy, D. and Pachter, L. 2006. Why neighbour joining works.

    arXiv cs.DS/0602041, Accessed 20 Mai 2007, http://arxiv.org/PS_

    cache/cs/pdf/0602/0602041v3.pdf

    Nakhleh, L., Warnow, T. and Linder, C.R. 2004. Reconstructing Reticulate

    Evolution in Species- Theory and Practice. Recomb04 March 2731

    2004, San Diego: ACM33746.

    Pauplin, Y. 2000. Direct calculation of a tree length using a distance matrix.

    J. Mol. Biol., 51:417.Robinson, W. 1951. A method for chronologically ordering archaeological

    deposits.American Antiquity, 16:293301.

    Saitou, N. and Nei, M. 1987. The neighbour-joining method: a new method

    for reconstructing phylogenetic trees.Molecular Biology and Evolu-

    tion, 4:40625.

    Snel, B., Bork, P. and Huynen, M.A. 1999. Genome phylogeny based on

    gene content.Nature Genetics, 21:10810.

    Thuillard, M. 2001. Wavelets in Soft Computing. Singapore: World

    Scientific.

    Thuillard, M. 2004. Adaptive multiresolution search: how to beat brute force?

    International Journal Approximate Reasoning, 35(3):22338.

    Thuillard, M. 2007. Minimizing contradictions on circular order of phylo-

    genic trees.Evolutionary Bioinformatics, 3:26777.

    Timmis, J.N., Ayliffe, M.A., Huang, C.Y. and Martin, W. 2004. Endosym-

    biotic gene transfer: organelle genomes forge eukaryotic chromo-

    somes.Nature Reviews Genetics, 5:12335.

    Wang, L.S., Warnow, T., Moret, BME., Jansen, R.K. and Raubeson, L.A.

    2006. Distance-based Genome Rearrangement Phylogeny.Journal

    of Molecular Evolution, 63:47383.

    Wolf, Y.I., Rogozin, I.B., Grishin, N.V. and Koonin, E.V. 2002. Genome

    trees and the tree of life. Trends Genetics, 18:4729.

    Yang, S., Doolittle, R.F. and Bourne, P.E. 2005. Phylogeny determined by

    protein domain content.Proceedings National. Academy of Sciences

    U.S.A., 102:3738.

    Yushmanov, S.V. 1984. Construction of a tree with p leaves from 2p3

    elements of its distance matrix (Russian).Matematicheskie Zametki,

    35:87787.

  • 8/11/2019 bla bla bla new 2014

    10/11

    246

    Thuillard

    Evolutionary Bioinformatics 2008:4

    Annex: List of Taxa Correspondingto the Best Order of Figure 31. RTYP-144-01-Rickettsia typhi ATCC VR-1442. RPRO-MAD-01-Rickettsia prowazekii Madrid E3. RCON-MAL-01-Rickettsia conorii str. Malish 74. WPIP-WME-01-Wolbachia pipientis wMeI5. BJAP-USD -01-B radyr hizob ium japon icum

    USDA110

    6. RPAL-009-01-Rhodopseudomonas palustris CGA0097. BQUI-TOU-01-Bartonella quintana Toulouse8. BHEN-HOU-01-Bartonella henselae Houston-19. BMEL-M16-01-Brucella melitensis M1610. BSUI-133-01-Brucella suis str. 133011. ATUM-C58-01-Agrobacterium tumefaciens C5812. SMEL-102-01-Sinorhizobium meliloti strain 102113. MLOT-MAF-01-Mesorhizobium loti MAFF30309914. CCRE-XXX-01-Caulobacter crescentus CB1515. XAXO-306-02-Xanthomonas axonopodis pv. citri

    str. 30616. XCAM-AT3-01-Xanthomonas campestris pv. campes-

    tris ATCC3391317. XFAS-XPD-01-Xylella fastidiosa PD

    18. XFAS-9A5-01-Xylella fastidiosa 9a5c19. NEUR-718-01-Nitrosomonas europaea ATCC1971820. CVIO-472-01-Chromobacterium violaceum ATCC

    1247221. NMEN-Z24-01-Neisseria meningitidis Z249122. NMEN-MC5-01-Neisseria meningitidis MC5823. BPER-251-01-Bordetella pertussis NCTC-1325124. BBRO-252-01-Bordetella bronchiseptica NCTC-

    1325225. BPAR-253-01-Bordetella parapertussis NCTC-1325326. RSOL-XXX-01-Ralstonia solanacearum27. PSYR-DC3-01-Pseudomonas syringae pv. tomato

    DC300028. PPUT-KT2-01-Pseudomonas putida KT2440

    29. PAER-PAO-01-Pseudomonas aeruginosa PAO130. CBUR-RSA-01-Coxiella burnetii RSA 49331. WGLO-BRE-01-Wigglesworthia glossinidia brevipalpis32. BAPH-XSG-01-Buchnera aphidicola SG33. BUCH-APS-01-Buchnera sp. APS34. BAPH-XBP-01-Buchnera aphidicola Bp35. BFLO-XXX-01-Blochmannia floridanus36. ECAR-043-01-Erwinia carotovora subsp. atroseptica

    SCRI104337. YPES-CO9-01-Yersinia pestis CO9238. YPES-KIM-01-Yersinia pestis KIM39. SFLE-457-01-Shigella flexneri 2457T40. SFLE-301-01-Shigella flexneri str. 30141 . E COL -RIM-01-E sche r i ch i a co l i 0157H7

    RIMD050995242. ECOL-EDL-01-Escherichia coli O157H7 EDL93343. ECOL-MG1-01-Escherichia coli MG165544. ECOL-CFT-01-Escherichia coli CFT07345. SENT-TY2-01-Salmonella enterica Ty246. SENT-CT1-02-Salmonella enterica serovar Typhi

    CT1847. SENT-LT2-01-Salmonella enterica serovar

    Typhimurium LT248. PLUM-TO1-01-Photorhabdus luminescens TTO1

    49. HINF-KW2-01-Haemophilus influenzae KW2050. PMUL-PM7-01-Pasteurella multocida Pm7051. VCHO-N16-01-Vibrio cholerae El Tor N1696152. VVUL-YJ0-01-Vibrio vulnificus YJ01653 . VP AR-RIM-01-Vibr io pa rahaemoly t i cus

    RIMD221063354. SONE-MR1-01-Shewanella oneidensis MR155. CJEJ-NCT-01-Campylobacter jejuni NCTC 1116856. HPYL-266-01-Helicobacter pylori 2669557. HPYL-J99-01-Helicobacter pylori J9958. HHEP-449-01-Helicobacter hepaticus ATCC5144959. WSUC-740-01-Wolinella succinogenes strain DSM

    174060. AAEO-VF5-01-Aquifex aeolicus VF561. GSUL-PCA-01-Geobacter sulfurreducens PCA62. DVUL-HIL-01-Desulfovibrio vulgaris str. Hildenbor-

    ough63. BBAC-100-01-Bdellovibrio bacteriovorus HD10064. CTEP-TLS-01-Chlorobium tepidum TLS65. BTHE-VPI-01-Bacteroides thetaiotaomicron VPI-

    548266. PGIN-W83-01-Porphyromonas gingivalis W83

    67. BBUR-B31-01-Borrelia burgdorferi B3168. TDEN-405-01-Treponema denticola ATCC 3540569. TPAL-NIC-01-Treponema pallidum Nichols70. LINT-566-01-Leptospira interrogans str. 5660171. LINT-130-01-Leptospira interrogans L1-13072. TMAR-MSB-01-Thermotoga maritima MSB873. FNUC-ATC-01-Fusobacterium nucleatum ATCC

    2558674. TTEN-MB4-01-Thermoanaerobacter tengcongensis

    MB475. CTET-E88-01-Clostridium tetani E8876. CPER-X13-01-Clostridium perfringens str. 1377. CACE-ATC-01-Clostridium acetobutylicum ATCC

    824

    78. LLAC-IL1-01-Lactococcus lactis IL140379. SMUT-UA1-01-Streptococcus mutans UA15980. SAGA-260-01-Streptococcus agalactiae 2603 V/R81. SAGA-NEM-01-Streptococcus agalactiae NEM31682. SPYO-SF3-01-Streptococcus pyogenes M1 SF37083. SPYO- MGA-01 -Strept ococc us pyoge nes M18

    MGAS823284. SPYO-XM3-01-Streptococcus pyogenes M3

    MGAS315

    85. SPYO-SSI-01-Streptococcus pyogenes M3 SSI-186. SPYO-394-01-Streptococcus pyogenes MGAS1039487. SPNE-TIG-01-Streptococcus pneumoniae TIGR488. SPNE-XR6-01-Streptococcus pneumoniae R689. EFAE-V58-01-Enterococcus faecalis V583

    90. LPLA-WCF-01-Lactobacillus plantarum WCFS191. LJOH-533-01-Lactobacillus johnsonii NCC 53392. LINN-CLI-01-Listeria innocua CLIP 1126293. LMON-365-01-Listeria monocytogenes F236594. LMON-858-01-Listeria monocytogenes H785895. LMON-854-01-Listeria monocytogenes F685496. LMON-EGD-01-Listeria monocytogenes EGD-e

    97. SAUR-476-01-Staphylococcus aureus MSSA47698. S AUR-MW2-01-St aphylococcus aureus MRSA

    MW2

  • 8/11/2019 bla bla bla new 2014

    11/11

    247

    Minimum contradiction matrices in whole genome phylogenies

    Evolutionary Bioinformatics 2008:4

    99. SAUR-N13-01-Staphylococcus aureus MRSA N315100. SAUR-MU5-01-Staphylococcus aureus VRSA

    Mu50101. SAUR-252-01-Staphylococcus aureus MRSA252102. BSUB-168-01-Bacillus subtilis 168103. BANT-AME-01-Bacillus anthracis Ames104. BCER-987-01-Bacillus cereus ATCC 10987105. BCER-579-01-Bacillus cereus ATCC 14579106. OIHE-HET-01-Oceanobacillus iheyensis HET831107. BHAL-C12-01-Bacillus halodurans C-125108. MMYC-G1T-01-Mycoplasma mycoides subsp. mycoi-

    des SC strain PG1109. MMOB-63K-01-Mycoplasma mobile 163K110. MPUL-UAB-01-Mycoplasma pulmonis UAB CTIP111. UURE-SV3-01-Ureaplasma urealyticum serovar 3112. MGEN-G37-01-Mycoplasma genitalium G-37113. MPNE-M12-01-Mycoplasma pneumoniae M129114. MGAL-RLO-01-Mycoplasma gallisepticum Rlow115. MPEN-HF2-01-Mycoplasma penetrans HF2116. PAST-XOY-01-Phytoplasma asteris OY117. CPAR-TII-01-Cryptosporidium parvum typeII118. PFAL-3D7-01-Plasmodium falciparum 3D7119. ECUN-XXX-01-Encephalitozoon cuniculi120. NCRA-XX3-01-Neurospora crassa121. YLIP-B99-01-Yarrowia lipolytica CLIB99122. AGOS-XXX-01-Ashbya gossypii123. KLAC-210-01-Kluyveromyces lactis CLIB210124. SCER-S28-01-Saccharomyces cerevisiae S288C125. CGLA-138-01-Candida glabrata CBS138126. DHAN-767-01-Debaryomyces hansenii CBS767127. SPOM-XXX-01-Schizosaccharomyces pombe128. ATHA-XXX-01-Arabidopsis thaliana129. CMER-10D-01-Cyanidioschyzon merolae 10D130. CBRI-XXX-01-Caenorhabditis briggsae131. CELE-XXX-01-Caenorhabditis elegans132. DMEL-XXX-02-Drosophila melanogaster

    133. AGAM-PES-01-Anopheles gambiae PEST134. HSAP-XXX-03-Homo sapiens v15.33.1135. MMUS-XXX-02-Mus musculus136. NEQU-N4M-01-Nanoarchaeum equitans Kin4-M137. APER-XK1-01-Aeropyrum pernix K1138. PAER-IM2-01-Pyrobaculum aerophilum IM2139. STOK-XX7-01-Sulfolobus tokodaii str. 7140. SSOL-XP2-01-Sulfolobus solfataricus P2141. TACI-DSM-01-Thermoplasma acidophilum

    DSM1728142. TVOL-GSS-01-Thermoplasma volcanium GSS1

    143. PTOR-790-01-Picrophilus torridus DSM9790144. PABY-GE5-01-Pyrococcus abyssi GE5145. PHOR-OT3-01-Pyrococcus horikoshii OT3146. MTHE-DEL-01-Methanobacterium thermoautotro-

    phicum deltaH147. MJAN-DSM-01-Methanococcus jannaschii DSM

    2661148. MKAN-AV1-01-Methanopyrus kandleri AV19149. AFUL-DSM-01-Archaeoglobus fulgidus DSM4304150. MMAZ-GO1-01-Methanosarcina mazei Go1151. MACE-C2A-01-Methanosarcina acetivorans C2A152. HALO-NRC-01-Halobacterium sp. NRC-1153. BLON-NCC-01-Bifidobacterium longum NCC2705154. MTUB-CDC-01-M ycobac terium tuberc ulosis

    CDC1551155. MTUB-H37-01-Mycobacterium tuberculosis H37Rv156. MBOV-AF2-01-Mycobacterium bovis AF2122/97157. MLEP-XTN-01-Mycobacterium leprae TN158. CEFF-YS3-01-Corynebacterium efficiens YS314T159. CGLU-XXX-01-Corynebacterium glutamicum160. CDIP-129-01-Coryn ebacterium dip htheriae

    NCTC13129161. PACN-202-01-Propionibacterium acnes KPA171202162. LXYL-B07-01-Leifsonia xyli subsp. xyli CTCB07163. TWHI-TWI-01-Tropheryma whipplei Twist164. TWHI-TW0-01-Tropheryma whipplei TW08/27165. SCOE-A32-01-Streptomyces coelicolor A3166. SAVE-XXX-01-Streptomyces avermitilis167. TTHE-B27-01-Thermus thermophilus HB27168. DRAD-XR1-01-Deinococcus radiodurans R1169. GVIO-421-01-Gloeobacter violaceus PCC 7421170. SYNE-PCC-01-Synechocystis sp. PCC6803171. TELO-BP1-01-Thermosynechococcus elongatus

    BP-1172. NOST-PCC-01-Anabaena sp. strain PCC 7120173. PMAR-SS1-01-Prochlorococcus marinus SS120

    174. PMAR-MED-01-Prochlorococcus marinus MED4175. SYCC-WH8-01-Synechococcus sp. WH8102176. PMAR-MIT-01-Prochlorococcus marinus MIT9313177. PIRE-ST1-01-Pirellula sp. strain 1178. PCHL-E25-01-Parachlamydia sp. UWE25179. CCAV-GPI-01-Chlamydophila caviae GPIC180. CPNE-J13-01-Chlamydia pneumoniae J138181. CPNE-CWL-01-Chlamydia pneumoniae CWL029182. CPNE-AR3-01-Chlamydia pneumoniae AR39183. CTRA-SVD-01-Chlamydia trachomatis serovar D184. CTRA-MOP-01-Chlamydia trachomatis MoPn