Early Evolution and Phylogeny

225
HAL Id: tel-00345743 https://tel.archives-ouvertes.fr/tel-00345743 Submitted on 9 Dec 2008 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Early Evolution and Phylogeny Bastien Boussau To cite this version: Bastien Boussau. Early Evolution and Phylogeny. Symbiosis. Université Claude Bernard - Lyon I, 2008. English. tel-00345743

Transcript of Early Evolution and Phylogeny

Page 1: Early Evolution and Phylogeny

HAL Id: tel-00345743https://tel.archives-ouvertes.fr/tel-00345743

Submitted on 9 Dec 2008

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Early Evolution and PhylogenyBastien Boussau

To cite this version:Bastien Boussau. Early Evolution and Phylogeny. Symbiosis. Université Claude Bernard - Lyon I,2008. English. tel-00345743

Page 2: Early Evolution and Phylogeny

N 205-2008 Année 2008 - 2009

THÈSE

Présentée

devant L’UNIVERSITÉ CLAUDE BERNARD - LYON 1

pour l’obtention

du DIPLÔME DE DOCTORAT

(arrêté du 7 août 2006)

soutenance prévue le3 novembre 2008

par

Bastien BOUSSAU

Early Evolution and Phylogeny

Directeur de thèse: Manolo GOUY

Jury: Laurent DURET ExaminateurPatrick FORTERRE ExaminateurManolo GOUY Directeur de thèseDidier PIAU RapporteurZiheng YANG Rapporteur

Page 3: Early Evolution and Phylogeny
Page 4: Early Evolution and Phylogeny

❯❱ ❯ ❨

Prés♥t ❯♥rsté Pr♦ssr

❱Prés♥t ♦♥s ♥tq Pr♦ssr ❳❱Prés♥t ♦♥s ♠♥strt♦♥ Pr♦ssr ❱Prés♥t ♦♥s s ts t Pr♦ssr ❱ ❯♥rstrrétr é♥ér ❨

♦♠♣♦s♥ts

❯ é♥ ②♦♥ ë♥♥ rtr Pr♦ssr P ❯ é♥ ②♦♥ r♥♥ rtr Pr♦ssr ❳ ❯ é♥ ②♦♥♦r rtr Pr♦ssr ❯ é♥ ②♦♥ rtr Pr♦ssr ❨❯ ♥t♦♦ rtr ♥sttt s ♥s Pr♠tqst ♦♦qs

rtr Pr♦ssr

♥sttt ♥qs é♣tt♦♥ rtr Pr♦ssr é♣rt♠♥t ♦r♠t♦♥ t ♥tr r ♥ ♦♦ ♠♥

rtr Pr♦ssr P

♦♠♣♦s♥ts

❯ P②sq rtr ♠ Pr♦ssr ❯ ♦♦ rtr Pr♦ssr P❯ é♥q rtr Pr♦ssr ❯ é♥ trq t s Pr♦éés rtr Pr♦ssr ❯ ♥s rr rtr Pr♦ssr P ❩P❯❯ té♠tq rtr Pr♦ssr ❯ ♥♦r♠tq rtr Pr♦ssr ❯❯ ♠ ♦♠ rtr ♠ Pr♦ssr P❯ P rtr Pr♦ssr srt♦r ②♦♥ rtr Pr♦ssr ♥sttt s ♥s t s ♥qs ♥é♥r ②♦♥

rtr Pr♦ssr

❯ rtr Pr♦ssr ❯❯ rtr Pr♦ssr ♥sttt ♥ ♥♥èr tssr♥s

rtr Pr♦ssr ❯

Page 5: Early Evolution and Phylogeny
Page 6: Early Evolution and Phylogeny

♠r♠♥ts

t♥s t♦t ♦r à r♠rr ♥♦♦ ♦② ♣♦r s♦♥ ♠♠♥s é♥ér♦sté t♦t ♦♥ s qtr ♥♥és ♦ù ♥♦s ♦♥s tré ♥s♠ ♥♦♦ ♠ ♦♣♦♥♥é ♦r ♥ ①♥t st rr ♥st s és r♥ts ts ♦♥ss s♥ts ♥ r♥ rté ♥♦♠r① ♥♦r♠♥ts ♦♠♣t♠♥r ♠ ♣r♦ss♦♥♥ t ♠ ♣rs♦♥♥ ♥ ♣r♥♥t ♦♠♠ ♠♦èt s♣èr ♦♠♠ réssr ♥ r♥ t rrèr s♥tq s♥s srr♦♥♥êtté rr ♠♦r t ♠

♠ t é♠♥t r♠rr ❱♥♥t ♥ st ♥ t à s♦♥ ♦♥ttt ♣r ♦♥trst q ♣ ♣♣rér ♣♥♠♥t s qtés ♥♦♦ ♦rsq ♣té s♦♥ ♥tt♦♥ à ♣rtr s♦♥ r ♠♥s ♥ t♠♦s♣èr tr s♥ t st♠♥t s és sé♥♥t r♠♥t s ♦♦rt♦♥s♥trs t r♠♦♥ss ❱♥♥t ♦♥trr t♦♦rs s r ♥ ♣rst♠é♣rs♥t s és q♥ s trs ♠♠rs r ♦st é♠ttr ♣♦r♠① s ♣rés♥tr ♠rqés s♦♥ s ♦rs ♥ ré♥♦♥ ♦ ♦♥t st♥♦r ♦r ♦♥ râtr t strtr r♥s ♥ ♣s réssrà ♠♣êr ♣♣♦sr s♦♥ ♥♦♠ sr ♣srs ♠s tr① à ♥r ♠ê♠ àr② r ♠ à ♠ ér s♦♥ ♥♥ ❱♥♥t étt ♦♥tré q ss s r♠r♠♥ts ♥ts r s sr♥t tr♦♣ ssqss♣é♠♥t ♣♦r ♦♥ ♦♣é s s♥s ♠s ♥ ss ♣s sûr ② ♦r♦♣ ♥é ♥ ♦r♥té

t♥s ss à r♠rr ♥♠r ♣♦r s♦♥ t s♦♥ ♥t♦ss♠ t♦t ♦♥ ♥♦s tèss s sss♦♥s ♥♦ s♦♥t t♦♦rs s♣s r♦r♥ts r ♦s ttr t♦ts s és ♦♥t st trt♦s s ♣r♦rès q ♥♥és st ♣♦rt♥t é♥t ♣♦r t♦s ① q♦♥t ôt♦②é q ♥ s♦♥ ♣rs♦♥♥ ♣♦r ♠♥r s rrs ♥ trèsr♥ qté st ♥ ♣sr ♦♦rr ♥♦ t ♥ ♦t ♣sq tt♥r ♥s s ♠♦s à ♥r s trs ♠♣t q ♠ért às♦r s ♣s ts

st ♥ ♣sr r♠rr r♥t rt ♦♥t é♥ér♦sté ♥ttt♥t q ♣é♥èr ♥♦rrt t♦t ♦rt♦r t ♠♦ t♦t ♣rtèr♠♥ts r♥rs ♠♦s ♦♣ ♣♣rs sr s♥ ♥ st♥t ♣r♠èr ♠♦té rrèr ♠ ♣rît ①♠♣r t ât ♦r st st ♥ ♦♥♥r q t ♣té ♣résr ♠♦♥ r② tès

♦rs é♠♥t sr s trs ♠♠rs ♠♦♥ r② tès Ptr♦rtrr r P t ❩♥ ❨♥ ♣♦r ♦r ♣té r ♠♦♥ trt ♦r t ♦♣ ♥t♥ t ♥♥ r ♣♣♦rt r♥♠♥t ♠é♦ré ♠♥srt s ♥ s♦♥t r♠rés

r♠r é♠♥t rst♥ tr t ♦♠♥q ♦r♦ ♠♦r ♥s r ♦rt♦r t ② ssrr ♥ ♠♥ tr ♦r t ♦rs ♥♦r r♠rr s♦ Pr t rstt t s

Page 7: Early Evolution and Phylogeny

s ♣♦r ♠♦r r♥ ♠♥strt ♦ t ré r♠r ss té♣♥ ♠♦tt ♦♥ ♠♦t ♠♦♥ P♥ t r♥♦

♣tr♦ ♣♦r r r♥ t♦ér♥ ♦♥r♥♥t ♠ ♦♥s♦♠♠t♦♥ érs♦♥♥é rss♦rs ♥♦r♠tqs r s♣♦♥té t♦s s ♥st♥ts t r ♠îtrss♥s s r♥s ♥r r t♦t ♣rtèr♠♥t à ♠♦♥ ♣♦r sr♣rés♥tt♦♥ ②♣rrést ❯ tt ♠ ♥t ♠s rês

r à t♦s s ♥s q ♥ ♦♦rr r♥té♥ s s♠♥tt♦♥ t ss ♦rs P②t♦♥ é♥ r♦rr♠♥t ♠♦♥tt r♦ t Ptr ♦rtrr q ♠♦♥t très é♥érs♠♥t ♥s♥ rs ♣r♦ts t ♠r à é♥ ♣♦r s♦♥ ♣rés sr trs ♣r♦ts ♥ t s r♥ ♣t♥ t ss ♦r♠s t♥ts ♥♦r♠t♥♦♦st♣é♦ r♥t t ♥é♥♠♦♥s ♠♦st ♦s rt♦t ♠♥qrt t ss rs réér r♥t t ❱♥♥t t t rs♣♦ss♦♥s ♣qés ♥rss♦♥ t r ♦♥s♦♥ q ♠♦♥t ♠s ♣ à étrr

r♠r ♥♦r r ♥♥r ♥ st♦r♥ ① st♣é♥ts ♦♥♥ss♥s♥ ♥♦r♠tq t ♥ ♦♦ ♦♣ ② t s♦♥ ♣ts ♠♣ ♥r♥ç♦s♦t ♥ r q ♦♣ ♥ ♣ tr♦♣ ♠♥ ♦ût ♦♥♦r P♠rq ♥ ♦s ♠ sr♣rt ♥ t♥t r②♠♠♥t ♠s ♥♠♥t r t①♥r P♦♣ ♦♥ ♦ ❨s é♠♥t t ♦♠s ♦t s s àôté q ♦♥t t♦éré ♠s réq♥ts ♥trs♦♥s ♥s r r à ♣♦rt r♠ér rs t q rs t r é♥érs ♥tt♦♥ ♥s r ♣rs♥st ②♥ ♦sst t s♦♥ ♦r♥tr q s♣èr r♦r ♥ ♦r ♥♥♦♣ rtr ♣♦r s ♦♥♥ ♠r t ss ♣r♦ts ♦① ①qs ♥♦s ♣st♦r ❱♥♥t ♦♠r t s♦♥ t♦♥ é♠♦♥strt ♥♥r r♦♥t ①♥r t ♣♦r s ♦♥s ♠♦♠♥ts q♦♥ ♣ssés ♥s♠ t q♦♥♣ss ♥s♠ ♦rsq♦♥ s rtr♦ ♥ ♦♥rès ♥ ♦r② ♣♦r ss ♦♥sssr s tr♠♦♠ètrs t ♦①②é♥♦♠ètrs ♥ ♦♦s ♣♦r ss ♦♥ss ♥♥②ss sttstqs ② Prrèr ♣♦r ♠♦r ♥té ① sts ③ sà P♦♥① ♥ ♥ t s♦♥ ♥st r♦sté s♥tq r é♠♦♥ tss r♥♦s P♣♣ ❱r t ss r♠r♠♥ts ①♣éts

t♥s à r♠rr ♠ ♠ q sst é♣é ♥ r♥ ♥♦♠r t ♠s♦t♥ ♦r ♦ts ♠qs ss ♣rtèr♠♥t r♦♥♥ss♥t à♠s ♣r♥ts ♦r s ♣tr s♥ r ♠ ♥ s♥tq t♦té♥ ♥ ♠♦♥str q ♥t r♥ ♣r♦♠ttr r à ♠ ♠♦r♣♣rs ♥s ♥t r s ♥♦s sst rééé t ♥♠♥tt ♠r à ♣♦r ♦r ♠♦ s ♥st❱ ♥♦r st ss♥ ♣ à q ♦s ♦r t s♥s ♣tôt q ttrs

♥♥ r♠r t ♣♦r s♦♥ t♥r s♦t♥ s ♦q ♥ ts résst♥ r♦ t q s♣♣♦rt ♠s rss strss ♣s t♥t ♥♥és ♥ é♦♠♥t q ♦r ♠rt♦♥ ♠rs q ♦♥t♥

Page 8: Early Evolution and Phylogeny
Page 9: Early Evolution and Phylogeny

♦♥t♥ts

és♠é ♥ r♥çs

êt ♣♦♥t st♦r ♥ s♥s é♥s♠ t ♣②♦é♥ ❯♥ rè st♦r

♥②s s é♥♦♠s ♣r♠t r♦♥strr st♦r

rr ♦♠♠ r♦♥té ♣r s é♥♦♠s ♦♥ tr tès ♦♥s♦♥

♥tr♦t♦♥

♠ ♦♥ts Pttr♥ ♥ ♣r♦ss

Pttr♥ Pr♦ss

❯ ♥ t tr ♥♦♠s ♥t② ♦ tr ♥♦♠s

s♦rt st♦r② ♦ ♦♥ rt s t♦ ② r♦s r♦♦sss ♦r ♦♠rrs s♦t♦♣ rt♦s s♠♣ ♦ s♦♠ ♥sts r♦♠ ♦♦ sts

st♦r ♦♥t♥t ♦ ①t♥t ♦r♥s♠s ♦r♣♦♦ t ♥ ♦♠♦♦② q♥ t

ttsts ♦r ♥r♥ ①♠♣ ♥r♥t sttsts ♦s ♦ ♦t♦♥ st♠t♦rs

s♦rt st♦r② ♦ ♦♥ rt s t♦ ② ♥♦♠s tr ♥♦♠s r♦♦t ♦ t tr ♦ Pr♠r② ♥♦s②♠♦ss ♦ t tr ♦

r♥st♦♥ ♦ t ♠♥sr♣t

P②♦♥② s ♥♦t s②

Page 10: Early Evolution and Phylogeny

♠♣r♦♥ t♦s ♦ P②♦♥t ♦♥strt♦♥

♥ ❯♥①♣t r

Pttr♥ Pr♦ss ♥ t r② ♦t♦♥ ♦ ♠♣rtr ♦♥

rt

♦rs ttr ♦♥♦♠♦♥♦s ♦s ♦ q♥ ♦

t♦♥

♦♣♥ t tr♦♥♦s ♦t♦♥r② ♦s ♥ ♥ ♥

♠t♥♦s ♥r♥ ♦ ♣s r ♥ ♦ ♥ rs

Pr♦♠s ♥ Prs♣ts ♦r t ♦t♦♥r② t② ♦ ♥♦♠s

♦♥s♦♥

♣♣♥s

♥♦♠ ♣t♦♥s ♥ srs ♥♦♠ ♦♥t♥t ♦t♦♥ ♥ t ♠② ♦ ♠t♦♦♥r

Page 11: Early Evolution and Phylogeny

1és♠é ♥ r♥çs

êt ♣♦♥t

r s ♠♦ss ♥♦♥ ♣♥tr P♦ Pss♦ s♠ ♦ ♦r♥ rt ❨♦r é♣éré ❲♣tt♣♥♣♦r♠sr♦♠♥♦♥♣

r qq♦s ét♦♥♥♥t ♥ t ♥s tt ♣♥tr P♦ Pss♦ s ① ♠♦ss sr r♦t ♦♥t s ss ♦rt ér♥ts rs ♦♥s♦rs ① ♣♣rss♥t ♦♠♠ é♦♠♣♦sés ♣ts st♣r♦♠♥t s♦r rr à ♦♠♣r♥r ♥ éts ♣♦rq♦ tr éé ♣♥r ♥s s ♠♦ss ♥ ♣rtr ♦♥ ♣t t♦t♦s ss②r s♦r ♦ù st ♥ tt étr♥ ♥s♣rt♦♥

Page 12: Early Evolution and Phylogeny

tt ♥♦r♠t♦♥ ♣t êtr tr♦é ♥s s ♦r♣ ét ♥tè♠sè ♣srs rtsts ♦♥t ♥ tss t Pss♦ ♦♥t été ♣r♦♦♥é♠♥t♥s♣rés ♣r é♦rt rts ♥♥s érq t r♥ ♥♦t♠♠♥t ♥s♥ ♥ ①♣♦st♦♥ ♦♥sré à rt érq t ♠♦♥té ♦r t àt♦♠♥ tss ♠♦♥tr ♥ stttt r♥ à Pss♦ q tt♦♥ ♦rt ♠♣rss♦♥ ♥s s trt♦♥s érqs t r♥s st s ♣rr rt r♥ ♦♠♠ ♥ t♦t ♦♠♦è♥ ♥é♥♠♦♥s r ♦♥ ② tr♦♥ r♥ rsté s ♦r♠s s♦♥t très st②sés t ♠♣♦rt♥ ♥ éé♠♥t st s♦♥t r♣rés♥té ♣r s t s ① ss s ♠♦ss r♦t rs r♥s ②① s♠♣s t é♠srés rss♠♥t ♥ à sstts r♥s ♦ érqs és ♦rs ♦♥ ♦♠♣r♥ q s ss ♣ts té♦♠♣♦sés q ♣rér♥t ♠♦♠♥t st s♦♥t ♥♥és ♣r s ♦r♠srt ① ♦♠♣r♥r rt Pss♦ ♥ésst s ♣♦♥r ♥s s♦♥ st♦r

st♦r ♥ s♥s

♥② ♣s q♥ rts ♣stqs q st♦r ♣r♠t ♠① ♦♠♣r♥r ♥♦srt♦♥ ♦tr ♠♦♥ ♣②sq st s♦♠s à ♠♣r♥t t♠♣s t t♦ss ♣é♥♦♠è♥s ♦r ♦srés s♦♥t rt ♥ st♦r ♠ê♥t sr t♥éssté tr♠♥t ♥t ♥ t ♣s ①♣t♦♥ t st ♥s s♦♥ st♦rq ♦♥ ♣t ♦♠♣r♥r ♦♠♠♥t s♦♥t ♣♣rs s ♦r♠s rtr♦és♥s s ♦sss ♦ ♦srés ♥♦s ♦rs

♥ ♦♦ té♦r é♦t♦♥ ♣r♠t ①♣qr ré♣rtt♦♥ t♦r♥st♦♥ s ♦r♥s♠s ♥ts ♦ ♣s ♣résé♠♥t ♦♠♠♥t s êtrs♥ts ♦♥t qs r ré♣rtt♦♥ é♦r♣q t é♦♦q t t ♦♠♠♥t s ♦♥t qs rs rtérstqs ts ♦r♠ ♦♥t♦♥ ttté♦r st sé sr ♥♦♠rss ♦♥♥és q ♥ étr ♣s ♠s♣t êtr ♠♥t rés♠é ♥ qqs ♣♦♥ts

• ♦s s êtrs ♥ts ♣♥ts térs ♦♠♠ rés ♣r♦trs ♠ét♥ ♥s st♦♠ s s s♦♥t ♣♣r♥tés ♥ t q♥♦♥ rr ♦♠♠♥t ♦♥t♦♥♥♥t t♦s s ♦r♥s♠s ♥ éts ♥♠♦ér ♦♥ s r♥ ♦♠♣t q t♦s s êtrs ♥ts s♦♥t très rss♠♥ts q trt r s♥♥ ♦♠♠♥ ♠ê♠ t♦s s ♦r♥s♠s ♥ts s♦♥t ♦♥strts t♦r ♥ é♥♦♠ t q ♦♥t♥tt♦s s è♥s ♥ ♦r♥s♠ t q r♥r♠ t♦ts s rtts s♥♥éssrs à ♦♥strt♦♥ t ♦♥t♦♥♥♠♥t ♥ êtr ♥t rq♥ ♦♥ ♦♠♣r s é♥♦♠s s ér♥ts êtrs ♥ts ♥tr ① tst ♥tr trs ♦t tt tès ♦♥ s r♥ ♦♠♣t à♥♦r q

Page 13: Early Evolution and Phylogeny

P ❯

② r♥s s♠rtés ♥tr t♦s s êtrs ♥ts ♦r♦r t♦ss êtrs ♥ts s♥♥t ♥ ♦♥t♥ ♥êtr ♦♠♠♥ q ♦♥ ♣♣s♦♥t ❯ q ♦rrs♣♦♥ à st ❯♥rs ♦♠♠♦♥ ♥st♦r s♦tr♥r ♥êtr ♦♠♠♥ ♥rs

• s êtrs ♥ts ♦♥t ① t②♣s rtérstqs s rtérstqs♥♥és q é♠♥♥t rt♠♥t r é♥♦♠ t s rtérstqsqss q s♦♥t rt r st♦r ♣rs♦♥♥ s s rtérstqs ♥♥és ♣♥t êtr tr♥s♠ss à r s♥♥ trrs s♠é♥s♠s érété

• ♦rs tr♥s♠ss♦♥ s rtérstqs ♥♥és s ♠tt♦♥s ts rérr♥♠♥ts ♣♥t sr♥r q ♣♦r ♦♥séq♥ q♥ s♥♥t ♠ê♠ s st très s♠ st très rr♠♥t r♦rs♠♥t♥tq à s♦♥ ♦ ss é♥trs ♣r q rs é♥♦♠s èr♥t t♠♣s s é♥ért♦♥s sé♥t ① é♥ért♦♥s s ♠tt♦♥ss♠♥t s é♥♦♠s rss♠♥t ♠♦♥s ♥ ♠♦♥s é♥♦♠ ♥êtr ♦♠♠♥ t ♣r ♦♥séq♥t s s♥♥ts rss♠♥t ♠♦♥s♥ ♠♦♥s à r ♦♥t♥ ï

• s ♥♠♥ts ♦rs tr♥s♠ss♦♥ s rtèrs ♥♥és ♦♥tq♥ êtr ♥t ♥st ♦♣ ♣rt ♥ tr ♥ t ♦rs q② ♥ r♥ rsté ♥tr êtrs ♥ts tt rsté t q ét♥t♦♥♥é ♥ ♥r♦♥♥♠♥t rt♥s êtrs ♥ts ♦♥t ♣s tés à ♦rs s♥♥ts t ♦♥ ♣♥t ♥ ♦r ♣s q trs s téss♦♥t é♥ér♠♥t rr♦♣és s♦s tr♠ ♥s t♥ss q ♦rrs♣♦♥ à ♣té ♣r♦r ♥ ♦r♥s♠ à s r♣r♦r s♦♥ ♣tt♦♥à s♦♥ ♥r♦♥♥♠♥t ♥ ♥st♥t ♦♥♥é s ♦r♥s♠s ②♥t ♥ ♣sr♥ t♥ss ♦♥t ♥ ♠♦②♥♥ ♣s s♥♥ts ♠é♥s♠ q ♦♥♥♦♠♠ ♥ é♥ér sét♦♥ r♥♥♥

• s ♦r♥s♠s s ♣s ♣tés ♥♦♥t ♣♦rt♥t ♣s t♦♦rs ♣s s♥♥ts q s trs ♦♥ stt♥ à qs ♥ ♥t ♣s ♠s s ♣r♠r ♦r st sr ① ♥t qs ♥tt♥♥t r â tr t♥ss ss r♥ s♦t ♥r ♣s ♦♣ t ② ♦♥♥ t ♠♣♦rt♥t sr sr q s r♣r♦t t q ♥ s r♣r♦t ♣s♥s ♥ ♣♦♣t♦♥ êtrs ♥ts t t sr st t♥t ♣s♦rt q ♥♦♠r ♥s ♥s ♣♦♣t♦♥ st s ♦♥ 10%♥s très ♣tés ♥s ♥ ♣♦♣t♦♥ t♦t s♠♥t 10 ♥s st ♥ ér ♣♦r q ♠r ♥tr ① ♥t ♣s s♥♥t ♥ ♣♣ t t ét♦r ér é♥étq ♣sq térr t♥ss ♠♦②♥♥ ♥ ♣♦♣t♦♥ ♦♥ q rt été s♥s

Page 14: Early Evolution and Phylogeny

P❨

é♦t♦♥ s êtrs ♥ts st ♦♥ ♥ ♠é♥ ♣srs ♠é♥s♠s① s♦♥t ét♦rs s ♠tt♦♥s ♥ ♣rt t ér sr q t♦♠ ♦r tr tr♦sè♠ st étr♠♥st t t q rt♥s ♥s ♦♥t à ♥ss♥ rr rs rtérstqs é♥étqs t ♥r♦♥♥♠♥t ♣rés♥t ♣s ♥s ♦r s s♥♥ts q trss tr♦s ♠é♥s♠s sss♦♥t t ♣r♦s♥t rsté ♥tr ♥s t rsté ♥tr s♣ès q ♦♥ ♣t ♦srr ♦r

é♥s♠ t ♣②♦é♥

♥ ♣t étr ① s♣ts é♦t♦♥

♥ ♣rt ♦♥ ♣t s♥térssr ① ♥♠♥ts q s♦♥t sr♥s ♦rs st♦r t r s s♦t ♠tt♦♥sét♦♥ ♦ ♥♠tt♦♥ér ♣♣r t s♣t ♠é♥s♠ é♦t♦♥

tr ♣rt ♦♥ ♣t rr à érr s rt♦♥s ♣r♥té ♥tr êtrs♥ts tt r♣rés♥tt♦♥ s rt♦♥s ♣r♥té ♥tr ♦r♥s♠ss♣♣ ♣②♦é♥

♦rs ♠ tès ♠ ss ♥térssé à s ① s♣ts é♦t♦♥ ré à ♣résr rt♥s rt♦♥s ♣r♥té t ♠ ss é♠♥t tté àé♦rr rt♥s ♥♠♥ts q ♦♥t ♣ s ♣r♦r ♥s ♣ssé ♥ t st♥tr s♥térssr ① ① à ♦s r s s♦♥t très é♣♥♥ts ♥ t ♦♥♥ s♥térss ① ♠é♥s♠s é♦t♦♥ q ♥s r ♥ ♣②♦é♥♣rtèr s ♦♥ ♣çt s ss♦rs ♣r♠ s ♦s① t ♥♦♥ ♣r♠ s♠♠♠èrs ♣r♦è♠ q rè ♣②♦é♥ ♦♥ ♥ s ♠♥rt ♣s ♣rq ♠é♥s♠ s ♦♥t qs rs s ♠s ♣tôt ♣r q ♠é♥s♠ s♦♥t qs rs ♠♠♠s

❯♥ rè st♦r

s é♦♦s st♠♥t q trr ♣s ♠rs ♥♥és t q ② ①st ♣s ♠♦♥s ♠rs ♥♥és ♦♣ ♣têtr♣s rrèr rrèr rrèr r♥ ♣èr t♦s s êtrs ♦r ♥ts❯ t ♦♥ ♣r♦♠♥t tt é♣♦q s s♥♥ts ❯ ♦♥t♥st ♦♥♥é ♥ss♥ à trs ♦r♥s♠s t t♠♣s ♥t t s ♠tt♦♥ss♠♥t à ♥♦s s♣ès ♣♦ssé♥t s rtérstqs ♥éts ♠rs ♥♥és ♣s tr t♦s s êtrs ♥ts s♦♥t s s♥♥ts s♣r♠rs ♦r♥s♠s

Page 15: Early Evolution and Phylogeny

P ❯

♥②s s é♥♦♠s ♣r♠t r♦♥strr st♦r

♥ r♣rés♥tr tt é♥é♦ ♥rs rr q r♣rés♥ts rt♦♥s ♣r♥té ♥tr t♦s s êtrs ♥ts ♦♥ ♣t ♥②sr s rss♠♥s t ér♥s ♥tr s ♦r♠s s êtrs ♥ts ♠ê♠ ç♦♥ q ♦♥♣♦rrt ss②r r♦♥sttr ♥ rr é♥é♦q ♥ ♥②s♥t s ér♥s♣②sqs ♥tr rèrs s♦rs ♦♥s t♥ts t r♥s ♣r♥ts é♥♠♦♥s tt♣♣r♦ ♥st ♣s très sé srt♦t ♦rsq♦♥ r à ♦♠♣rr s ♣♥ts s ♥♠① s ♠♣♥♦♥s s térs ♣s ♥ s ♥♥és ♠♠♥ss ♣r♦rès ♦♥t été ts ♥s séq♥ç t ♦♥ ♣tés♦r♠s séq♥r s é♥♦♠s ♥trs ♥ ♣t ♥s ♥②sr s é♥♦♠s sêtrs ♥ts t s ♦♠♣rr ♥ r♦♥strr s rt♦♥s ♣r♥té ttr♥èr ♣♣r♦ sèr ♥ ♣s ♣rtq

é♥♦♠ ♥ ♦r♥s♠ ♦♥t♥t t♦ts s rtts s♥ ts ♣♦r♣r♦r t r ♦♥t♦♥♥r t ♦r♥s♠ ♥ ♦♠♣r♥t s é♥♦♠s ♦♥ ♦♥ rt♠♥t ès à ss♥ s rtèrs ♥♥és ♥ ♦r♥s♠ ♦♠♠ss s rtèrs ♥♥és s♦♥t tr♥s♠s ♣r érété ♥ ♥②s♥t s é♥♦♠s♦♥ ès à t♦t ♥♦r♠t♦♥ q été tr♥s♠s ♣s ❯ sq①♦r♥s♠s ts s é♥♦♠s ♣♦rt♥t s trs éé♥♠♥ts ♠tt♦♥sét♦♥ t ér q ♦♥t ç♦♥♥é s ♦r♥s♠s ♥ts ♦rs r st♦rt ♦♥stt♥t s ♦♠♥ts st♦r é♦t ♥ qté ♥q s♠♣♠♥t ♥ s♥t s é♥♦♠s ♦♥ ♣t trr s ♦♥s♦♥s sr s rtérstqst st♦r s ♦r♥s♠s q s ♦♥t♥♥♥t ♥♦r t s♦r s r

♥ r s ♦♠♥ts t r ♥ ♣ ♠té♠tqs ♥ ♦r♥s sttstqs ♠♦ès sttstqs st ♣♦ss st♠r ♣r ①♠♣ ♣r♦té q s ♠♣♥♦♥s s♦♥t ♣s ♣r♦s ♣r♥ts s ♥♠①q s ♣♥ts ♣r♦té q♥ ♠tt♦♥ ♣rtèr s s♦t ♣r♦t à ♥♠♦♠♥t ♣rtr ♥s rr ♣r♦té q tt ♠tt♦♥ t ététr♥s♠s à s s♥♥ts ♣r sét♦♥ ♦ ♥ ♣r ér ♥ ♣t ♦♥ ♣♦srs qst♦♥s q rè♥t ♣②♦é♥ ♦ ♥ ♠é♥s♠ é♦t♦♥

rr ♦♠♠ r♦♥té ♣r s é♥♦♠s

♥②s sttstq s é♥♦♠s ♣r♠s é♦rr q s êtrs ♥ts sr♥♥t ♥s tr♦s r♥s té♦rs ❲♦s t ♦① s rés stérs t s r②♦ts

• s rés ♦♠♣r♥♥♥t s ♦r♥s♠s ♦♠♣♦sés ♥ s q ♦♥ tr♦ ♥ ♣ ♣rt♦t ♠s ss ♥s s ♠① très ♥s♦ts

Page 16: Early Evolution and Phylogeny

❯ ❱ ❱

♣s ♣♥s s s ♦ù s ♥t à st♦♥ ♥ é♥t ♠ét♥ sq① s♦rs tr♠s ♦ù s rés ♣♥t r à ♣s 100C ♥ ♣ss♥t ♣r s ♠① ①trê♠♠♥t s ♦ ♥ strés♥ s ♦rtrr

• s térs s ss s♦♥t s♦♥t ♦♠♣♦sés ♥ s ♥t♥s t♦ts s♦rts ♠① ♠s s s♦♥t ♥ é♥ér ♠♦♥s ①ér♥ts♥s rs ♣réér♥s é♦♦qs ♥ tr♦ ♥♦t♠♠♥t ♣r♠ s térs s ②♥♦térs q ♣♥t tsr é♥r ♠♥s ♣♦r rt q ♣r♦s♥t ♦①②è♥ ♥ tr♦ é♠♥t ♣r♠ s térss ♣rsts s ♣♥ts ♦ ♥ s ♥♠① rt♥s ♥tr s s♥t ♣srs ♠s ♥ ♦♥♥s ♦♠♠ t②♣s ♦r è♣r♦ts s térs ♥ s♦♥t t♦t♦s ♣s ♣rstqs t ♦♥ ♥♦♠r♥tr s s♦♥t ♥♦s s②♠♦ts ♥♦s ♥t ♥♦t♠♠♥t à érr

• ♥♥ s r②♦ts ♦♥t♥♥♥t s êtrs ♥ts s ♣s ♦♥♥s t ♥♦t♠♠♥t s ♣s r♥s q ♣♥t ♣♦ssér s ♠rs ss ♦♥t♥♥♥t ♠♣♥♦♥s ♥♠① ♣♥ts ♠s ♥s♦rs s s♦♥t♥ é♥ér ss③ ♣ sr♣r♥♥ts ♥s rs ♦ûts é♦♦qs ♥♣♣ré♥tèr s t♠♣értrs P ♦ s♥tés ①trê♠s ♥ ♦tr s ♥ ♠♦♥tr♥t♣s ♦♣ t②♣s ♠ét♦s♠s ér♥ts ♣sq ♥② ♥ r♦sq ① t②♣s r②♦ts ♣♦♥t ① q s r♣♦s♥t sr ♣♦t♦s②♥tès t ① q ♦♥s♦♠♠♥t ♠tèr ♦r♥q ♣r♦t ♣rtrs êtrs ♥ts s r②♦ts ♦♥t ♥ é♥ér s ♦r♥ts ♥srs s ♣tts strtrs q r♥r♠♥t ♥ é♥♦♠ ♣rtr ①st ♥♦t♠♠♥t ① t②♣s ♦r♥ts s ♠t♦♦♥rs ♦ù s srss♦♥t érés ♣♦r r é♥r t s ♦r♦♣sts ③ s ♣♥ts ♦ùs r②♦♥s ♠♥① s♦♥t tr♥s♦r♠és ♥ é♥r ♣s ♥ srs ♣rés♥ ♣srs é♥♦♠s ♥s ♥ s ♥ ①♣t♦♥ st♦rq q①♣qr ♥ ♣ ♣s tr

♥ ♣♥s t♠♥t q s rés t s r②♦ts s♦♥t ♣s ♣r♦s♣r♥ts ♥ tr qs ♥ s♦♥t s térs ♦rt♥ t t ♥♦♠r① tr① sèr♥t q rr ♣♦rrtrss♠r ♠♦♥s ♥s ss r♥s ♥s à q st r♣rés♥té

s ① r♥s ês ♦♦rés q trrs♥t rr ♥ ♣r♠tt♥t ①♣qr ♣rés♥ s ♦r♥ts ③ s r②♦ts ♥ ♥②s♥t sé♥♦♠s s ♠t♦♦♥rs t ♦r♦♣sts s ♣②♦é♥ét♥s ♦♥t ♣ ♠♦♥trrq s ♦r♥ts ét♥t ♥ t ♥♥♥s térs ♥♣♣és ♣r s r②♦ts ❩♥ t ♦♥♥ t ♦♦tt ♦♥♥ t ssrt s t t réts à s ♥ s r②♦ts♥♦♥t ♣s été ♠♥ts ♥ q ♦♥r♥ rs ♠ét♦s♠s t ♥ ♣s s ♦♥t♦é ♣ qs s♥t r s térs ♥ ♣t ♦♥♦r ♥ ②♣♦tès st♦rq ♣♦r ①♣qr ♠♥q ♦r♥té s s térs ét♥t ♣rés♥ts

Page 17: Early Evolution and Phylogeny

P ❯

ThaumarchaeotaThermoproteales

SulfolobalesDesulfurococcales

NanoarchaeaThermococcalesMethanopyrales

MethanobacterialesMethanococcales

ThermoplasmatalesArchaeoglobales

HalobacterialesMethanomicrobialesMethanosarcinales

AmoebozoaMetazoa

FungiMalawimonadozoa

RhodophytaGlaucophytaViridiplantae

CercozoaStramenopila

AlveolataJakobozoa

EuglenozoaHeteroloboseaalpha-Proteobacteria

beta-Proteobacteriagamma-Proteobacteria

delta-Proteobacteriaepsilon-Proteobacteria

SpirochaetesBacteroidetes-ChlorobiPlanctomycetes

ChlamydialesCyanobacteria

ChloroflexiFirmicutes

ActinobacteriaThermus-DeinococcusAquificales

Thermotogales

Bactéries

Eucaryotes

Archées

LUCA

r rr ①trê♠ rr ♥ â ♣s ♠rs ♥♥és ♣rt r♦t ♦♥r♥ s ♦r♥s♠s ts ❯ st r♣rés♥té ♥ ♣♦♥t r♦ s ♦r♥s♠s ♦♥t ♥♦♠ st s♦♥é ♥t à ♣s 80Cs ♦r♥s♠s rés sr ♦♥ rt s♦♥t s r♥♦ts t sr ♦♥ s r②♦tss ① r♥s sss rés ♦s s ♥♠① s♦♥t ssés s♥ s t③♦t♦s s ♠♣♥♦♥s s tr♦♥t s♥ s ♥ t t♦ts s ♣♥ts s♦♥t ♣és♥s s ❱r♣♥t ♦s s ♦r♥s♠s ♥ts q ♦♥ ♦t à ♦ ♥ r♣rés♥t♥t♦♥ ♥ ♥♠ ♣♦rt♦♥ ♦rsté

sr trr ♥t s r②♦ts ♦rs ♣♣rt s ♥s é♦♦qs ♥téà êtr ♦♣és s r②♦ts s s♦♥t ♦♥ s♣ésés ♥s ♥ tr s♦rt

Page 18: Early Evolution and Phylogeny

tté ♦ t ♣rét♦♥

♦♥ tr tès

♦rs ♠ tès ♠ ss ♥térssé à qqs ♣r♦è♠s ♣rtrs②♥t trt à ét s é♥♦♠s ♣♦r r♦♥strr r st♦r ré à♠é♦rr s ♠ét♦s r♦♥strt♦♥ é♦t♦♥ s é♥♦♠s t tsés ♠ét♦s ♣♦r ré♣♦♥r à s qst♦♥s ♦♦qs ♣réss Prsq t♦tss qst♦♥s s♦♥t és à ç♦♥ ♦♥t ♥ rtèr ♣rtr t♠♣értr♣rééré s ♦r♥s♠s é♦é

s♥é ♣s t q rt♥s rés ♣♦♥t r à ♣s 100C ♥ t ♣♣rt s s♣ès ♥ s♦♥t ♣s r q ♥s ♥ ♣tt♥êtr t♠♣értrs à ♦ rés ♣rès ♥ ♦r♥s♠ ♣t ♥ ♣ss é♦♣♣r ♥♦r♠♠♥t ♦r é♣érr rt♥s ♦r♥s♠s ♥ ♣♥t ♦♥r q① ♥t♦rs 37C trs qt♦r 10C 100C t ♥rtérs é♥ér♠♥t s ♦r♥s♠s ♣r r t♠♣értr ♦♣t♠ r♦ss♥ t♠♣értr à q r r♦ss♥ st ♣s r♣ tt t♠♣értr st ♥ ♣r♠ètr ♠♣♦rt♥t à ♥ t♠♣értr ♦♥♥é ♦rrs♣♦♥ ♥♥r♦♥♥♠♥t ♣rtr ♥ ♦r♥s♠ ♥ très t t♠♣értr ♦♣t♠ r♦ss♥ ♦♥ st q t ♣r♦ ♥ s♦r tr♠ ♦♠♠ ♥s ♣r ❨♦st♦♥ ♦ ♥ ♦♠♠ ♥ s ♦rss ♦é♥qs s t àtrès t♠♣értr ♦♥ é♠♥t ♥ é ss③ ♣rés s ♥r♦ts ♦ù ♣♦rrt r

♥s r s ♦r♥s♠s ♦♥t ♥♦♠ été s♦♥é ♦♥t s t♠♣értrs ♦♣t♠s r♦ss♥ s♣érrs à 80C q t q♦♥ s ♣♣s ②♣rtr♠♦♣s r ré♣rtt♦♥ ♥s rr st ♥tr♥t ♥♦♠rss rés s♦♥t ②♣rtr♠♦♣s q ♥qrt q ♥êtr t♦ts s rés étt ♠ê♠ ②♣rtr♠♦♣ ♠ê♠ ① térsstés à s ♦♠♥ tér♥ s♦♥t ②♣rtr♠♦♣s q sèrq ♥êtr s térs t ♣têtr à t t♠♣értr s ♥êtrss térs t s rés ét♥t t♦s ① ②♣rtr♠♦♣s ♦rs ♥♦trr♥ ♣èr à t♦s ❯ ss ♣♦rrt ♦r é à très t t♠♣értrt srt ♦♥ ♥t♠♠♥t é à s ♥r♦♥♥♠♥ts ①trê♠s

♠ ss ♦♥ tté à étr ♣②♦é♥ s ① r♥s ♦♠♥ss rés t s térs ♣♦r étr s ♣♦st♦♥s ♦r♥s♠s ♦♥t t♠♣értr ♦♣t♠ r♦ss♥ st é ♣♦r r♦♥strt♦♥ é♦t♦♥ rtèr Ps ♣rtèr♠♥t été ♣♦st♦♥ tér q①

♦s r♦♣ qs sr r r ♥étt ♣s é♥t q s ♣♦

Page 19: Early Evolution and Phylogeny

P ❯

st♦♥ à ♣r♦①♠té s r♠♦t♦s ♥ s♦t ♣s rr♦♥é é♦t♦♥ é♥♦♠ s térs ②♣rtr♠♦♣s st ♥ t très ♦♠♣qé ♥ q ♥♣s réss à ♣♥♠♥t sr♠♦♥tr t♦ts s tés ss♦és à ♥②s é♥♦♠ q① ♦s ♠♦♥ tr ♦♥r♠ q s qs ♣♦rr♥têtr ♣r♦s ♣r♥ts s r♠♦t♦s ③ s rés sr ♥tt♦♥ é♥ r♦r ♠♦♥tt r♦ t Ptr ♦rtrr ♠ ss ♥térssé à ♣♦st♦♥ ♥r♠ s②♠♦s♠ r♦♣ ♠r♦t sr r ♦♥t t♠♣értr ♦♣t♠ r♦ss♥ st 10C ♥②s sèrq tt ré st ♥ ér♥t s trs q ♥ ♣t ♣s êtr té s♠♣♠♥t à ♥ s ① r♥s r♦♣s ♦♥♥s rés s r②♦ts ts ré♥♦ts ♠s q ♦♥sttrt ♣têtr ♥ r♥♠♥t très s rr s rés t s t♠♣értr ♦♣t♠ r♦ss♥ r♥♠♥t s♠ r♠ttr ♥ s é s♦♥ q ♥êtr s résétt ♣r♦♠♥t ②♣rtr♠♦♣

♥ étr é♦t♦♥ s t♠♣értrs ♦♣t♠s r♦ss♥ é♠♥t s ♥ ♣♣r♦ ♣s rt tt t♠♣értr st ♥ rtérstq♦♠♠♥ à t♦t ♥ s♣è t ♦♥ é♠♥ é♥♦♠ s ♦r♥s♠s ①strs s ♠♦②♥s ♣rér ♥ ♣tt ♣ sttstqs t ♥ ♦r♥tr s♠♣♠♥t à ♣rtr séq♥ é♥♦♠ ♥ ♦r♥s♠ qst s t♠♣értr ♦♣t♠ r♦ss♥ s♥ q s ♦♥ st ♣ r♦♥strr s séq♥s é♥♦♠s ♦ ♠ê♠ s♠♣♠♥t ♠♦r① é♥♦♠s ♥♥s ♦r♥s♠s ♦♥ ♣t ♣rér à q t♠♣értr ♥t s♦r♥s♠s ♥ ♦♦rt♦♥ s rrs ♦♥t♣r ♣ ♥sst♠r é♦t♦♥ s t♠♣értrs r♦ss♥ ♦♣t♠ ♦rs s r♥rs ♠rs ♥♥és ♥ r♦♥strs♥t s séq♥s ♠♦r① é♥♦♠s♥str① s réstts q ♥♦s ♦♥s ♦t♥s s♦♥t r♣rés♥tés

♦s réstts sèr♥t q ❯ ♥ t ♣s à très t t♠♣értr♠s q ss ① s♥♥ts s ♥êtrs s rés t s térs ♥tà ♣s t t♠♣értr q ♥st ③ s térs ♠♦♥s s t♠♣értrs r♦ss♥ s♠♥t ♦r ér à ♥♦ tt ér♦ss♥ éàété ért ③ s térs ♥ ét ♥♥é ♣r r t q ♦♥t♥tr♣rété ♦♠♠ ét♥t ♦rréé à t♠♣értr s ♦é♥s ♦rs s r♥rs ♠rs ♥♥és t♠♣értr ♦♣t♠ r♦ss♥ s térsrt ♦♥ s t♠♣értr ♠♦②♥♥ à sr trr s ♣tt♦♥s♣rès à ts t♠♣értrs ♣s ❯ s♦♥t ♣r ♦♥tr ♥♦s ♦s♦♥s ♦♥ ré s ②♣♦tèss ♣♦♥t ①♣qr ♣é♥♦♠è♥ rt♥ss♦♥t r♣rés♥tés

st ♥s ♣♦ss q ❯ t é ♥s ♥ ♥r♦♥♥♠♥t t♠♣értr♠♦②♥♥ t t ♦♥♥é ♥ss♥ à ♥♦♠r① ♦r♥s♠s Pr♠ ① s♠tt♦♥s ♥t rt♥s r♥t été ♣s résst♥ts ① ♦rts t♠♣értrs

Page 20: Early Evolution and Phylogeny

r ♦♥strt♦♥ s t♠♣értrs ♦♣t♠s r♦ss♥ ♦♥ rr ①è♠ ♣rt ♦r s rés rt st ♥ ♣♦♥té s ♥♦s♥♦♥s ♣s ss③ ♦♥♥és ♣♦r ♦♥♥îtr ss♠♠♥t rtt

trs ♠♦♥s ♥ st q ② ♠rs ♥♥és réq♥ ts♠été♦rtqs ♦♥♥ ♥ r♥ ♠♥tt♦♥ ♦♠s t é♣s♦t t ② ♦♠r♠♥t s ts ♠été♦rts ♦♥t ♣r♦♠♥tsé ♠♣♦rt♥ts éts sr trr t ♦♥t ♦♥sér♠♥t ♠♥tr t♠♣értr q ré♥t à s sr ❯ t é ♥t ♠rs♥♥és ♦rs ss s ♣s résst♥ts à r ss s♥♥ts r♥t♣ srr ♥st s s♥♥ts r♥t ♦♥♥é ♥ss♥ ① rés tr②♦ts ♥ ♣rt t ① térs tr ♣rt ♦♥ tt ②♣♦tès ♥♣rss♦♥ sét♦♥ é à s ts ♠été♦rtqs srt à ♦r♥ é♦t♦♥♣r♦♦♥ s t♠♣értrs ♦♣t♠s r♦ss♥

❯♥ tr ②♣♦tès été ♣r♦♣♦sé ♣r ♦rtrr t sèr q♥♠tt♦♥ rt ♣ tr s ♣tt♦♥s à ♣s r♥s t♠♣értrs ③s s♥♥ts ❯ ♦♥ tt ②♣♦tès ❯ t ♥ é♥♦♠ ♦♥t ♠♦é ♣r♥♣ étt t étt ♦♥ s♥s à r ♦rs qss ① s♥♥ts ♦♥t ♥ ♥é♣♥♠♠♥t qs ♣♦ssté tsr ♦♠♠ s♣♣♦rt r é♥♦♠ ♦♠♠ ♥ é♥♦♠ à srt ♣srésst♥t à t♠♣értr q♥ é♥♦♠ à tt ♠tt♦♥ rt ♣r♠s① s♥♥ts ❯ r à ♣s ts t♠♣értrs

Page 21: Early Evolution and Phylogeny

P ❯

r é♥r♦ ♣♦r é♦t♦♥ s t♠♣értrs ♦♣t♠s r♦ss♥ ♣s❯ sqà ss s♥♥ts

♦♥s♦♥

♦♥ tr tès ♦♥stt ♥ ①♠♣ ♣♣t♦♥ ♠ét♦s sttstqs à ♥②s é♥♦♠s ♥ étr é♦t♦♥ s ♦r♥s♠s ♥tsst ♥ ♠ét♦ ♣ss♥t q ♣r♠t trtr s qst♦♥s q s♦♥t ♥sss à ♣♣rt s trs s♥s ♦♦qs ♣é♦♥t♦♦ ♥♦t♠♠♥tst ♠té ♣r rrté ♣ttss t ért♦♥ s ♦sss

② t♦t♦s ♥♦♠rss tés ss♦és à s éts t ♠ tès ♠♦♥♥ q t é♦♣♣r ♠rs ♠♦ès sttstqs é♦t♦♥s é♥♦♠s ♥ é♥ ② r ♦♣ à ♣♣r♥r sr é♦t♦♥ sé♥♦♠s s êtrs ♥ts t trr

Page 22: Early Evolution and Phylogeny
Page 23: Early Evolution and Phylogeny

2♥tr♦t♦♥

♠ ♦♥ts

r s rst② ♦ s♣♦♥ ♥s ♥ ♦t ♠r s♦♠ ♦ r ♥♠ t♦ ts ♦♥t♥♥t ♥ ♦trs s♦ ♦♥ sr ♦♥ ♦♥srs♦♥② r♦♣♥ ♦♠s P♦rts s s♣♦♥ ♥ r③ ♥s ♥ ②♥ r♥♥ r♥ ♥ ♥ t ♥ r♥♠ ♣♥s s t ♦ ♥♥ ♠♦st ♦tr ♦♥trs ♣rs♥ ♥ ♦r♣ strt♦♥ ♦ r♦♣♥♥s ♥ ♣ r r♠♦t r♦♠ r♦♣ ♦ ♦rs ♠s s♥s ♥ t t ♦st♦r② r♦♣♥ ♥s ♥ r♦t t♦ s♦t ♠r ② r♦♣♥♦♦♥③rs

♥ t ♣♥♦♠♥ r♦♠ t ♣②s ♦r ♦♦s② r t ♣r♦t ♦ t♥s r t ② t♠♣r♦sss tt ♥♦ tr♦ t♠ t♠ s ♠♦r r ♥ ♦r ♣②s

♦r ♦ts r s♠tt t♦ ts ♦♦t♣r♥t ♦♥sq♥t② t♦ ①♣♥ t①st♥ ♥ ♦r♥st♦♥ ♦ ♥tr ♦ts t♠ ♠st ♦♥t ♦r

♦t② s ♦r ♥s t rst② ♥ strt♦♥ ♦ s♣s r♦♥ t♦r ♥ ♦♥② ①♣♥ ② ♦♦♥ t tr st♦r② ♥ ♠♦s ♣ss ♦ ♦② ♦ t r♥ rs r♥ srs ♥s ♦t ♦s♣③ ♥s s♠♣ ♦♥ t ♣♦s s♥s ♦t ♠ ❲str♦♠ ♦t ♠r

♠♦st r♦s t s t ♣rt rt♦♥ ♥ t s③ ♦ ts ♥ t r♥t s♣s ♦ ♦s♣③ r♦♠ ♦♥ s r s tt♦ ♥ t♦ tt ♦ ♥ ♥ r ♦ s rt ♥♥♥ s sr♦♣ rt ♥ t ♠♥ r♦♣ ♥ t♦ tt♦ rr

tr tr♥s t♦rs st♦r ②♣♦tss t♦ ①♣♥ ts rst②

♥ ts rt♦♥ ♥ rst② ♦ strtr ♥ ♦♥ s♠ ♥t♠t② rt r♦♣ ♦ rs ♦♥ ♠t r② ♥② tt r♦♠ ♥♦r♥ ♣t② ♦ rs ♥ ts r♣♦ ♦♥ s♣s ♥t♥ ♥ ♠♦ ♦r r♥t ♥s

Page 24: Early Evolution and Phylogeny

♥sts s s ts ♦♥ ♣ t ② ♦r t r♥ ♦ ♣s s♣s ②rs tr r♥ ♦♥sr tt t r♥ ♦ s♣s ♦sr♦♥ t ♣♦s s♥s t tr s♠rts ♥ ♦r♣ strt♦♥♦ ♥♦t ①♣♥ ② ♥st♥t♥♦s ♥♣♥♥t rt♦♥s ♦♦ ② stsst s t tr ♦ st♦r ♣r♦ss s ♥♦ ♥♦♥ s t t♦r② ♦♦t♦♥ r♥

♠♦st str♥ ♥ ♠♣♦rt♥t t ♦r s ♥ rr t♦ t ♥t♥ts ♦ s♥s s tr ♥t② t♦ t♦s ♦ t ♥rst ♠♥♥t♦t ♥ t② t s♠ s♣s ♠r♦s ♥st♥s ♦ ♥ ♦ ts t ♦♥② ♦♥ tt ♦ t ♣♦sr♣♦ stt ♥r t qt♦r t♥ ♥ ♠sr♦♠ t s♦rs ♦ ♦t ♠r r ♠♦st r② ♣r♦t ♦ t♥ ♥ tr rs t ♥♠st st♠♣ ♦ t ♠r♥ ♦♥t♥♥t r r t♥t②s① ♥ rs ♥ t♥t② ♦ ts rr♥ ② r ♦ s st♥t s♣s s♣♣♦s t♦ ♥ rt r ②t t ♦s ♥t② ♦ ♠♦st ♦ ts rs t♦ ♠r♥s♣s ♥ r② rtr ♥ tr ts strs ♥ t♦♥s ♦ ♦s ♠♥st ♦ t s t t ♦tr ♥♠s ♥ t ♥r② t♣♥ts s s♦♥ ② r ♦♦r ♥ s ♠r ♠♠♦r ♦♥ t ♦r♦ ts r♣♦ ♥trst ♦♦♥ t t ♥t♥ts ♦ ts♦♥ s♥s ♥ t P st♥t sr ♥r ♠s r♦♠ t♦♥t♥♥t ②t s tt s st♥♥ ♦♥ ♠r♥ ♥ ❲② s♦ts s♦ ② s♦ t s♣s r s♣♣♦s t♦ ♥rt ♥ t ♣♦s r♣♦ ♥ ♥♦r s r s♦ ♣♥ st♠♣ ♦ ♥t② t♦ t♦s rt ♥ ♠r r s ♥♦t♥ ♥t ♦♥t♦♥s ♦ ♥ t ♦♦ ♥tr ♦ t s♥s ♥ trt ♦r ♠t ♦r ♥ t ♣r♦♣♦rt♦♥s ♥ t sr sssr ss♦t t♦tr rs♠s ♦s② t ♦♥t♦♥s ♦ t♦t ♠r♥ ♦st ♥ t tr s ♦♥sr ss♠rt② ♥ ts rs♣ts ♥ t ♦tr ♥ tr s ♦♥sr r♦ rs♠♥ ♥ t ♦♥ ♥tr ♦ t s♦ ♥ ♠t t♥ s③ ♦ t s♥s t♥ t ♣♦s ♥ ♣ ❱rr♣♦s t t ♥ ♥tr ♥ s♦t r♥ ♥ tr ♥t♥ts ♥t♥ts ♦ t ♣ ❱r s♥s r rtt♦ t♦s ♦ r t♦s ♦ t ♣♦s t♦ ♠r ts r♥ t ♥ r ♥♦ s♦rt ♦ ①♣♥t♦♥ ♦♥ t ♦r♥r② ♦ ♥♣♥♥t rt♦♥ rs ♦♥ t r ♠♥t♥t s ♦♦s tt t ♣♦s s♥s ♦ ② t♦ r♦♦♥sts tr ② ♦s♦♥ ♠♥s ♦ tr♥s♣♦rt ♦r ② ♦r♠r②♦♥t♥♦s ♥ r♦♠ ♠r ♥ t ♣ ❱r s♥s r♦♠r ♥ tt s ♦♦♥sts ♦ t♦ ♠♦t♦♥t♣r♥♣ ♦ ♥rt♥ st tr②♥ tr ♦r♥ rt♣

Page 25: Early Evolution and Phylogeny

P ❯

t s ♥♦ ♣t tt t ♦r♥s♠s ♦♥ ♦srs r t ♣r♦t ♦ st♦r ♣r♦ss ♦♥sr♥ tt ♥ ♠ttr s ♥ s♣ tr♦ t♠ s ♥ s♣

② st♦r②♥ ①♣♥ ♠♥② ♣③③♥ ♦srt♦♥s ♦r ♥st♥ s♦♠ ♠r♥ rtrts ♦r③♦♥t ♥ ♥st ♦ rt ♥ ♠♦st ♦tr ♦♥ss ♦♥ ♦ tr ♥st♦r s trrstr ttr♣♦ t strtr ♦ t♦♥s ♦ ♦r ♥♥r r s ♣rt② ①♣♥ ② tr ♦r♥ s ♦♥s ♥ ♦r s♥ r♣t ♥st♦rs st♦stt② ♥ ♦♥r♥t ♦t♦♥ ①♣♥s ②t ♠♠♠♥ ♥ ♦ str s r② ♦♠♥t ② ♠rs♣s rs♦tr ♦♥t♥♥ts ♦♥t♥ ♠♥② ♣♥t ♠♠♠s s é q♦tt♦♥ r♦♠♦③♥s② s♠s t ♣

♦t♥ ♥ ♦♦② s ♥s ①♣t ♥ t t ♦ ♦t♦♥

Pttr♥ ♥ ♣r♦ss

♥ t s ♣t tt ♦t♦♥ s s♣ ♦♦ rst② ♦r ♥st♥ t r♥s t♥ s♣s t qst♦♥ r♠♥s s t♦ ♦ ♦t♦♥ s♣ ts rst② r r t♦ s♣ts t♦ tt qst♦♥ tt tr t♦② ♥♠ ♣ttr♥ ♥ ♣r♦ss Pttr♥ ♦rrs♣♦♥s t♦ t trt♦ Pttr♥ ♦rrs♣♦♥s

t♦ t st♦r② ♦ s♣t♦♥srs ♦t♦♥ s ♦♣t t♦ rr t ①t♥t ♦r♥s♠s ♥ s ♥♦♥ s t

♣②♦♥② ♣②♦♥② ♣ts rt♦♥s♣s t♥ ♦r♥s♠s tr♦ t♠r♦♠ ♦♠♠♦♥ ♥st♦r t♦ ①t♥t ♦r♥s♠s t s♦s ♦r♥s♠s r ♠♦r♦s② rt t♥ ♦trs ♥ t ♣ttr♥ s ♥ ♣ ♣r♦ss ♦rrs♣♦♥s t♦ Pr♦ss ♦rrs♣♦♥s

t♦ t ♠♥st ♦♦t♦♥ ♦♥ ttr

♦ ♦t♦♥ ♦♥ ts trt♦rs t ♥ts ♦rr ♥♥

Pttr♥

♣②♦♥② r♣rs♥ts ♠② rt♦♥s♣s t♥ s♣s t ♥ s♥ s ♠② tr ♦ s♣s

Page 26: Early Evolution and Phylogeny

P P

r ♣②♦♥② ♦ s♦♠ ♣s ♣②♦♥② s r② ♠ ♠② trr ♠② ♠♠rs r s♣s r t ♠② tr s ♥tr ♦♥ t ♦♥♦♦ ♠♣♥③ ♥str ♥♦s ♥ ♣②♦♥t tr s s ♥ ♠② tr ♦rrs♣♦♥ t♦♥st♦rs ♥ ♣②♦♥t tr t② s♦ ♦rrs♣♦♥ t♦ s♣t♦♥s r t rst s♣t♦♥ s♣rts r♥t♥ r♦♠ ♦r ♠♥s ♥ t t♦ s♣s ♦ ♠♣♥③s

Pr♦ss

♥ t ♣ttr♥ s ♥ ♣ ♦♥ ♥ s t s r♠♦r ♦r s♥ qst♦♥s♦t t ♣r♦ss ♦ ♦t♦♥ ♥ts tt ♦rr ♦♥ t r♦ts ♦t♦♥ s t♥ ♥ tt ①♣♥ t s♠rts ♥ r♥s ♠♦♥ s♣sst♦♥s rt t♦ t ♣r♦ss ♦ ♦t♦♥ ♦ ♦t♦♥ ♦ st ♦t♦♥ ♣r♦ t♦rs ♥rss ♥ s③ ❲s ♦t♦♥ t♦t② st♦sts♦ tt ♥♦ tr♥ s♠s t♦ ♥t ♥ t ♦t ♣ttr♥ ♥ ♣r♦ss r♥t♠t② rt ♥ s② ♥♦♥ ♦♥ ♣r♠ts t♦ ♠♣r♦ t st② ♦ t♦tr s ♥ st②♥ ♦t♦♥ t s ttr t♦ st♠t t t s♠ t♠♣ttr♥ ♥ ♣r♦ss

r♥ ♠② tss ♦♣ ♠t♦s t♦ r♦♥strt ♣②♦♥s ♥♥r ♥ts ♦♥ t r♥s ♦ ♣②♦♥② ♥ ♠ ♦rts t♦rs ♠♣r♦♥ t tr ♦ ② st②♥ ♣rtr s♣s ♦s ♣②♦♥t rt♦♥s♣sr s♣t ♦s ♦♥ ♠♦r ♥♥t ♥ts t♥ t s♣t♦♥s ♦ rt♣s ♥ ♥♦t② ♦♥ ♦ t ♠♦r sss ♦ ①t♥t ♦r♥s♠s ♠ t♦ t② t② r ♥ t ♥①t st♦♥ ①♣♥ ② ♦♦sts t♥ tt ♥♥s sr ♦♠♠♦♥ ♦t♦♥r② st♦r② ♥ t r t ♠♦r sss ♦r

Page 27: Early Evolution and Phylogeny

P ❯

♥♦♠s ♦

❯ ♥ t tr ♥♦♠s

♥t② ♦

♥ ♣②♦♥t rt♦♥s♣s t♥ ♦♥② s♣s ♦ ♣s r r♣rs♥t ♠ rr ♣②♦♥s ♥ t tt ♥♦♠♣ss rs ss♥sts ♠♦ss ♠sr♦♦♠s ♣♥ts ♥ s♦rts ♦ ♠r♦♦r♥s♠s s r ♣②♦♥② ♦s ♥♦t ♦rt ♥② s♦rt ♦ ♥ ♥ t ♦rrs♣♦♥s t♦ tr ♦

♥ rt ♦r♥s♠s ♥ ♥ ♥ s tr ♦ s t②sr rtrsts tt ♥t t tr ♦♠♠♦♥ ♦r♥ t② s♦ ♠♥②♦♠♠♦♥ ♣♦♥ts tt t s ♠♦r rs♦♥ t♦ ss♠ tt t② ♥rt ts♣r♦♣rts r♦♠ ♦♠♠♦♥ ♥st♦r rtr t♥ ♦ ♥♣♥♥t② t s♠rtrsts s ♠♥s tt ♥ ♥s tt ♥♦ ♦sr rss ♠♥ tr t♦♥ ♠r♦♦r♥s♠ t s♠ r♥ r♥ r♥r♥♠♦tr s r♥♠♦tr ♦ s ♥ ♥♠ ❯ ♦r st ❯♥r ♦♠s r♦♠

❯s ♦♠♠♦♥ ♥st♦r ♥ s t ♦t ♦ ♠ ♥trst ♥ ♠♥② ♦♥tr♦rss

rtrs tt s♦ tt ♥ ♥s s♥ ♦r♥ ♥ tr♦♠♠♦♥ s ♦r♥st♦♥ ♥t t ♦st tr ♥ ♦tr ♥ r ♥ ♠♦

r strtrs rs♠r ♥ ♦r♥s♠s

r ♦r♥s♠s r ♠ ♦ ♦♥② ♦♥ ♠tr ♦r♥s♠s s s ②♦rs♠② ♦♥t♥ ♦♥s ♦ t♠ tt ①♥ ♥♦r♠t♦♥ ♥ ♥trt t♦ ♠ t♦ ♦r♥s♠ ♥t♦♥ ♥ ♦t ss ♦r s ♠t ② ♣♠♠r♥ tt rs ♦♥r② t♥ t ①trr ♥ t ♥trr ♥r♦♥♠♥ts t ♥ ♥trt ♠♦r ♠♥r s ♠♥r♥s s♦rts ♦ ♠♦s s♦♠ s s ♥r② rr♥② ♦r st♦r ♦trs s♥rstrtrs ♦trs s ♥♥♦s♦♣ ♠♥s ♦trs s ♥♦r♠t♦♥ st♦r♥ t ② tt ts ♠♦s r s s s♦ ♥rs ♥r ♣r♦sss♥s♣♥s ♥ ♥tr t♦ t ♥ ♦ t r ♥r② ♥t ♥ ♦r♥s♠s ♥t② ts s♠rts ♦♠ r♦♠ r s♠rts ♥ t ♥♦♠s♦ ♥ ♥s s ♥♦♠s ♦♥sttt ♥ ♦r♥s♠s ♦♦♦♦ rs♠rts r st ①♣♥ ② t ②♣♦tss ♦ ♦♠♠♦♥ ♥q ♦r♥ ♦r ♦♥ rt r♦r ♦r♥s♠s r rt ♥ tr rt♦♥s♣s rr♣rs♥t ② t tr ♦

♥ t s ♥rst♦♦ tt s s♥ ♦r♥ ♥t ♠② tr ♥ t tt ♥♦♠♣sss ♥s ♦ ♥ ♥s t♦ t ♥♦r t♦ s tr s st r② ♠ ♦r ♥ ♣r♦rss ♦ s♦♠ ②♣♦tss ♦ t s ♦r♥st♦♥ ♦ t tr ♦ ♥♦ s♠ t♦ ♥r② rt♥

Page 28: Early Evolution and Phylogeny

tr ♥♦♠s

♥ ♠♣♦rt♥t ♥②ss ❲♦s t ♦① t♦ ♣r♦♣♦s tt t ♠♦r s♦♥s ♦ t ♣r♠r② ♥♦♠s r tr rtr ♥♠ trrtr r ♥ ❯rr②♦ts r② s tr♦t♦♠② s s♥ ♣r♠r②

s♦♥s ♦ r r tr ♥r②

t♥ ♥ ♠♦st② ♦♥r♠ ② ♠♦r s♦♣stt ♥②ss s st♦♥ ♦r♠♦r ts

• ♠♦♥ t tr ♥♦♠s r② r t ♠♦st ♦♥s♣♦s s t② ♦♥r② ♥s ♥ ♦r♥s t♥ ♠♦st ♠tr ♦r♥s♠s ♦r ss ♣♥ts ♥♠s ♥

♥ ♠♦ r② s♦ ♦♥t♥ ♠♥② r♦♣s ♦ ♥r s♣s r② tr ♣ ♥ ♥s ♥ tr s ♦rtt ♠♥② ♦r♥s s ♦r♥s r ♣♠♠r♥ ♥s♦♠♣rt♠♥ts tt r ♣rtr ♥t♦♥s ♥ t ♦r ♥st♥ ♠t♦♦♥r ♦s t ♣r♦ss ♦ rs♣rt♦♥ t ♦①t♦♥ ♦ ♣rtr♠♦s t♦ ② ♥♦s♥ rP♦s♣t P s♠ ♠♦ tt♥ t♥ s s ♣r♦r ♦ ♥r② ♦r s♦rts ♦ rt♦♥s ♥ t ♥♦tr ♥♦♥ ①♠♣ s t ♦r♦♣st ♦♥ ♥ ♣♥ts ♥r ♣♦t♦s②♥tss ts ♣ ♣♦t♦s②♥tss tr♥s♦r♠s t ♥r②♥t♦ ♠ ♥r② ♥♦t② ♦♥ ♥ ♥ t ♦r♠ ♦ P s ♠♥r② ♥ t♥ s ♥ t ♦r♦♣st t♦ ♣r♦ s♠ srs tt♥ s ♦r st♦r ② t r②♦t s r rtr q♣♣ t s♦♣stt ②t♦st♦♥ ss♦t t t t tt ♠♥② r②♦ts ♦ ♥♦t s t♠ t t② t♦ ♥ tr s♣♥ s ♥♠♥t ♠♥t ♦ tr t② t♦ ♠♦ ♦r t♦ ♥ ♣rtst s s♦ s r♥ s♦♥ ♦r ♥trr tr♥

• r ♦♥t♥ ♠♥② ♥r s♣s ♦ s♠r s③ t♥ r②♦t♥② r ♥ ①tr♠ ♥r♦♥♠♥ts s ♥ ♥tr ♥s ♥♦r ♦r♥s ♦st s♣s r ♣r♦tt

② ♥ t ♦♠♣♦st♦♥ ♦ tr ♠♠r♥ s r♥t r♦♠tt ♦♥ ♥ r② ♥ tr ♦♥trr② t♦ r② r s♣② r♥ ♦ ♠t♦s♠s ♦t② ♠t♥♦♥ss tt rs CO2

t H2 t♦ ♣r♦ ♠t♥ s ♦♥② ♥♦♥tr ♠♦♥ r r♦r ♦s ♣r♦ s ♠♦♥ts ♦ ♠t♥ t s t♥s t♦ t♠t♥♦♥ r ♥ tr ts tr r r ♠♦s ♦r trt② t♦ ♦♣ t ♥♦s♣t ♥r♦♥♠♥ts s s t♦s ♥♦♥tr♥ r② ♦t ♦♥ s♦rs r② st② ♣♦♥s ♦r r② ♠♥ sts ♥trst♥② t♦ s♦♠ s②♠♦t r ♥ sr Prst♦♥t ♥♦ ♣rst r s ♥ ♦♥ s♦ r ♥t② t s♦♠ r tt r ♦ ♣② ♠♣♦rt♥t r♦s ♥ t ♦♥♦♠② ♦ trt ② ♥ ♠♦r ♦♥trt♦rs t♦ sr ♦♠ ②s ♠

Page 29: Early Evolution and Phylogeny

P ❯

♥♥r t ♣♣ t

• tr ♥♥♦t s② r♥tt r♦♠ r s♠♣② s ♦♥ tr ♥♥t♣♦t♦s②♥tsstr ♠♦r♣♦♦② ♦r♥② t tr ♥♦♠s r ♥ s ♦♥

t ♥②ss ♦ ♥ sq♥ ♥♦t strtr ♥ t② r s♠r②s③ s r ♥ s♠r② s♣ ♥ ♦ ♥♦t ♦♥t♥ ♦r♥s tr tr s r s♦ ♣r♦tt ② ♥ s♦ r♦r rt rst② ♥ tr ♠t♦s♠s ♥ ♥ tr st②s ♦r ♥st♥t② ♥♥t ♦①②♥ ♦r♦♣②♦r tr♦♦r♦♣②s ♣♦t♦s②♥tss tr t②♣s ♦ ♠t♦s♠s ♥ s♦ ♦♥ t tr♦tr♦♣♦r t♦tr♦♣ s♣s ♦t s②♠♦t ♥ ♣rst tr ♥s♦r t s♦♠ ♣t♦♥ tr r② ♥♦♥ ♦r t s♥str sss t② s s s ②♦tr♠ ♣r ♣r♦ss ❱r♦♦r ♦r r♣♦♥♠ ♣♠ ②♣s s ♥trs ♥tr① ❨rs♥ ♣sts ♦♥ ♣

ts t②♣s ♦ ♦r♥s♠s r t ♣r♦t ♦ st♦r② tt ♥ ♥rrtr♦ ♦♦② ♥ ♣②♦♥t ♥②ss ♦♦② sts r♦s tt rr② tst♠s ♦ ♥♥t ♥ts ♥ ts ♥ ♣r♦ s ♦♥r♥♥ t ♥r♦♥♠♥t♥ t ♥ ♥s tt r♥ ♦♥s ♦ ②rs ♦ s ♥♦r♠t♦♥ s s②r② ♣rt ♥ r♦ t ♥♥♦t ♦t♥ ② ♦tr ♠♥s ♥ t ♥①tst♦♥s ♣rs♥t rsts ♦t♥ rst tr♦ ♦♦② ♥ s♦♥ tr♦♣②♦♥ts s♥ tt ♥② ♦♠♣♠♥ts ♦♦②

s♦rt st♦r② ♦ ♦♥ rt s t♦ ②

r♦s

s♦r s②st♠ ♠② ♦r t♥ ♦♥ ②rs t ♥♠r ♦ ♠ss t♦ ♦♥♥② t ♥♥t r♦s sst tt s ①st ♦♥rt ♦r ♠♦r t♥ ♦♥ ②rs ♦♣ s st♠t s s ♦♥ t s ♠♦r t♥

♦♥ ②rs ♦♥②ss ♥ t tt♦♥ ♦ str♦♠t♦ts rt♦♥r② s♠♥tr② strtrs♦♠♠♦♥② t♥② ②r ♠s♦♣ ♥ r♦s ♥tr♣rt t♦ ♥♣r♦ ② t tts ♦ ♠t♥ ♦♠♠♥ts ♦ ♠srt♥♠r♦♦r♥s♠s ♦♣ ♦ ♠r♦♦sss ♦sss ♦ strtrs rs♠♥s ♦ ♠♦r ♦♠rrs ♠♦s ♥tr♣rt s ♥ ♥♦st ♦ ♣rtr r♦♣ ♦ ♦r♥s♠s ♥ ♦ s♦t♦♣ t ♠sr ♦ t rq♥s♦ r♦s s♦t♦♣s ♦ ♥ t♦♠ ts rq♥s ♥ t ② ♦♦ ♣r♦sss r② sr ts tr t②♣s ♦ ♠t♦s ♥ ♣rs♥t s♦♠♦ t ♥sts ♥t♦ t ♣ st♦r② ♦ t② ♣r♦ s ♥sts ♥ st rtrr② ♥ ♦s ♠♥② ♦♥ ♥♥t st♦r② ♦r t♥ ♦♥

Page 30: Early Evolution and Phylogeny

❨ ❨

②rs ♦ ♥trst rrs r ♥t t♦ r t ♦♦s ② ♥♦ ♥♥ ♦r ♠♦r ts

r♦♦sss

sts♥ t ♦♦ ♦r♥s ♦ str♦♠t♦ts ♦r ♠r♦♦sss s s② t ♥ ♥ ♦♥② tr♦ ♦♠♣rs♦♥s t ♠♦r r♥t ♥♦♥tr♦rs ①♠♣s ♦r ♥st♥ str♦♠t♦ts ♥ ♦♥♥♥② sr s♦♠♥ r♦♠ ♦♦ ♣r♦sss t② s♣② ♥ ♠♣♦rt♥t rst② ♥ trs♣ ♦r♥② ♦♦ t sr s♥ r♥t ♠♦r♣♦t②♣s♦♠ str♦♠t♦ts

r ♦♥ ②rs♦ ♠♦♥ ♦♥ ②r ♦ str♦♠t♦ts r♦♠ str r♥ tt t s♠s

♥② tt ♥♦♥♦♦ ♣r♦sss ♦ s♦ s rst② ♦ststr♦♠t♦ts s♦r s♦ r ♠② ts ♦♥ ②rs ♦ ♠r② ♠r♦♦sss s♦♥ t♦ ♠r♦♠trs③ s♣rs ♥①t t♦ ♦tr r s②♥tr♣rt s ♥ ①♠♣ ♦ s♦♥ ♦r ♠♦st ♦♥♥♥ ♦sss rt♦s tt ♦♠♥ ♠r♦s♦♣ ♥ ♠r♦s♦♣ s ♦ ♦♦ ♦r♥s ss ♠r♦♦sss ♥s r♦s rs♠♥ str♦♠t♦ts ♦♠♥ ♥♥ s♦ str♥t♥ ② ♠♦r ♦♠rrs

♦r ♦♠rrs

♥ ♦r♥s♠s ♣r♦ ♠♦s tt ♥♥♦t ♦t♥ ② ♥♦♥♦♦♣r♦sss tt r rtrst ♦ tr ♠t♦s♠ ♥ r ts ♥♠ ♦♠rrs ♦r ♥st♥ ②♥♦tr s ♠t②tr♦♦♣♥♣♦②♦s ♥ tr♠♠r♥ s rts ♦ ts ♠♦s ♥♠ ♠t②♦♣♥♦s ♥ ♦♥ ♥ s♠♥ts ♦♥ ②rs ♦ ts ssts tt ②♥♦tr r② ①st t t t♠ ♠♠♦♥s t r♦s t ♠♠♦♥s t ♠r② s s♦♠ str♦s ♥♦st ♦ r②P♦t♦s②♥tss s

♦♥ ②rs ♦ ♦t ♠♠r♥s r tt ♥ t s♠ r♦s r♦s t ♣r♦♣♦stt r② ♠② t s ♦♥ ♦ s ♦♥ ②rs t tt ♣tttrs ♦ r② ♥ ②♥♦tr r ♦♥ ♥ t s♠ r♦s s ♥trst♥s r②♦t ♠♠r♥s ♦♥t♥ ♦str♦ rqrs r② ♠♦♥ts♦ ♦①②♥ ♦r ts s②♥tss ♥ ♦ t♥ ♠♥ tt t ♦ ♣r♦t♦♥♦ ♦①②♥ ② ②♥♦tr s s ② r② t♦ ♣r♦ tr ♠♠r♥♦str♦ ♦r ts ♣♣♥ ②♣♦tss s r♥ ② r♥t rt ♥tr s♠ss♥ t t s s♦ s② t s♦stt t ♦♠rrs ♦♥ ♥ ts ♥♥t r♦s ♣r♦② ♥tr t r♦s trtr ♦r♠t♦♥ ♦st ♥ ♦r r②♦ts s ts ♦♥ t ♦♥ ②rs ♦ ♥ ♦r ②♥♦tr t ♦♥ ②rs ♦

Page 31: Early Evolution and Phylogeny

P ❯

♦♠rrs r s♦ s t♦ rtrs ♥ ♦s②st♠ tt ♣r♦ r♦sr♦♠ ♦♥ ②r ♦ s♥ ♥ str r♦s t ♦♥♠♦s ♥♦st ♦ r♦♠t r♦♣ ♦ ♠♠Pr♦t♦tr ♥♦ ♦r♦ r♦♠ t tr♦ts♦r♦ r♦♣ ♥ts ttts ♣rtr ♦s②st♠ s ♠♥② ♥♦① s ♥♥ s ♦♥sst♥t t t tt ♦①②♥ r♠♥ r② ♦ ♥t tr t♥ ♦♥ ②rs ♦ s♦♠♥r♦♥♠♥ts ts r♠♥ qt ♣r♦tt r♦♠ ♦①②♥

s♦t♦♣ rt♦s

♥♦tr ♥ ♦ ♠rr s ♦♥ ♥ s♦t♦♣ rt♦s t♦♠s ♦♠ ♥ r♥t s♦t♦♣s tt ♣♥ ♣♦♥ t ♥♠r ♦ ♥♦♥r ♣rts t ♥tr♦♥s ttt② ♦♥t♥ ♦r ♥st♥ r♦♥ t♦♠s r ♦♥ ♥ tr r♥t ♦rs12C tt ♦♥t♥s ♥♦♥s ♥tr♦♥s ♥ ♣r♦t♦♥s ♥ ♠s ♦r ♦t 99♣r♥t ♦ r♦♥ 13C ♥ 14C tt ②s ♥ t♦s♥ ②rs ❱♦♥r♦s ♥ t ② ♥②s♥ t rt rq♥s ♦ r♥t s♦t♦♣s ♥♦t② ♥ t ♦♣ r♥♠ tr ♦♦ strt ♥ t rt②t♦ ts s♦t② t ♦♥ strt ts s ♦ ♦♦sts ♥ t t ♦ r♦

♦♦ rt♦♥s t♥ t♦ ♣rr tr t♦♠s ♥ ♠ttr s tr♦r♥r ♥ 12C ♦♠♣r t♦ 13C ♥ 14C ❲♥ ts ♥ ♠ttr ♦sssst ♣r♦s r♦s ♥r ♥ 12C ts ♠s t ♣♦ss t♦ sts t ♦♥t② ♦ ♥♥t r♦s ② ♠sr♥ tr 13C/12C rt♦ ♦r♥② ♦③sst ♠sr t 13C/12C rt♦ ♥ r♥♥ r♦s t② st♠t t♦ ♦♥ ②rs ♦ ♥ ♦♥ t ♣t♦♥ ♥ 13C rtrst ♦ ♦♦♦r♥s t② ts ♦♥ tt t② s♦r t rst trs ♦ s rst s t ♦r s t r♦s tt r s ♦r ts ♠sr♠♥t ♥ tr t♦ t ♣♦♥t tt tr t♥ s ♥rt♥ r ♥♦tr ♠sr♠♥t ② ♦s♥ ♥♦♥tss s♦ ♥s 13C/12C rt♦♦♠♣t t ♦♥ ♦r♥ ♥ ♦♥ ②r ♦ r♦s r♦♠ r♥♥

r♦♥ s♦t♦♣s s♦ ♥ s t♦ t t ♣♣r♥ ♦ ♣rtr♠t♦s♠ ♠t♥♦♥ss ❯♥♦ t ♠sr r♦♥ s♦t♦♣ rt♦s♥ ♥s♦♥s ♥ r♦s r♦♠ t Pr rt♦♥ ♥ str t♦t t♦ ♥ ♣♦st ♠♦r t♥ ♦♥ ②rs ♦ s t 13C/12C rt♦ ♥t ♠ ♠t♥ s ♦♥sst♥t t ♦♥ ♦r♥ ♥ s ♦t t♥♦♥ss s

♠♦r t♥ ♦♥ ②rs ♦♣r♦sss ♦ s♦ ♣r♦ ♦tr ss tt ♥♦t ♥ ♦♥ ♥ ts

♥s♦♥s t t♦rs ♦♥ tt ♠t♥♦♥ r ♠st t st ♦♥ ②rs ♦

tr s♦t♦♣ rt♦s ♦r ♥st ♥t♦ t r② rt ♦r ♥st♥ 18O♥ 30Si ♥ s s ♣♦tr♠♦♠trs ♦ t ♦♥ ♦t② ♦rt

Page 32: Early Evolution and Phylogeny

❨ ❨

t ss♦♥ st♠t tt r♥ t st ♦♥ ②rs r♦♥ t♠♣rtrs rs r♦♠ ♦t 70C t♦ ♦t 20C t♦② rrs♦♥♥ s s ♦♥ t t tt t t♠♣rtrs ts t♦♠s r ♠♦rs② s♦ ♥ s tr ♦♥sq♥t② r♦s tt ♦r♠ ♥ s tr t t♠♣rtr t♥ t♦ ♣t ♥ 18O ♥ 30Si ♦♠♣r t♦ r♦s tt♦r♠ ♥ ♦r s tr ❲♥ t② ♥②s s♠♣ ♦ r♦s s♣♥♥♥ tst ♦♥ ②rs ♥ tr sr♥ s♦♠ r♦s tt ♠② ♥ tr② ②♦tr♠ s t② ♦♥ tt tr t♦ ♠rrs r ♥ ♦♦ r♠♥t♥ r ♥ ♦r ♦ rs ♥ ♦♥ t♠♣rtrs s t♠♣rtr s ♠♦r ♣r♠tr t♥ ♥ ♦r♥s♠s ts rsts ♠♣♦rt♥t ♠♣t♦♥s ♦r t ♦t♦♥ ♦

t ♥♦tr t♦♠ ♦s s♦t♦♣s r s ♦r t st② ♦ t r② rts ♦♥ ♥ s♣r ♣tr♥ tr ♦♠♥ ②r♦♥ t s♣tt♦ ♣r♦ ②r♦♥ s♣ ♥ ♦♥ s♦ t② t♥ t♦ s♦ s♦♠ ♣rr♥ ♦r ♣rtr s♣r s♦t♦♣ t tr 32S ♦♠♣r t♦ 34S ♥ t st ♦♥ ②r ♦ r②t r♦s ♥ r♥♥ tt r r ♥ s♣tr rs♦♥♥ s tt s♣t r♥ tr r ♣rs♥t t t t♠t② ♣r♦② r s♦♠ ♦ t s♣t tt rs t♦ t r②t r♦s ♥ t② ♦♥ s♦♠ ♠r♦s♦♣ s♣ ♥s♦♥ ♥ ♠sr tr 34S/32Ss♦t♦♣ rt♦s t♦ st♠t tt ts r ♦♥sst♥t t tr ♣r♦t♦♥ ②s♣tr♥ tr s ♠♥s tt ts ♣rtr ♠t♦s♠ ♠② t st ♦♥ ②rs ♦ ♦r♦r s♣t s ♠♦r ♥♥t ♥ r♦♠① t♥ ♥ ♥r♦ ♦♥s s♦ tt s♣t rt♦♥ ♠st ♥ ♠♦r♠♣♦rt♥t ♥ ♦①②♥ strt rs♥ ♦r♥② ♥ t ♦♥♥ ♥rs ♥ s♣r rt♦♥t♦♥ t♥ ♥ ♦♥ ②rs ♦♦♥sst♥t t t r♦r ♦ ♦①②♥ ♦♥♥trt♦♥ ♦r r♥t② r t

♠sr s♣r s♦t♦♣ rt♦s ♥ r♦s r♦♠ ♦t r ♥ ♦♥tt ② ♦♥ ②rs ♦ ♦①②♥ r t st 10−5 t♠s ts ♣rs♥t rs ♥rs ♦ ♠♦♥ ②rs rr ♦①②♥ s ♥r② s♥tr♦♠ t t♠♦s♣r ♥ ♥r ♠♦♥ ②rs tr ♥ ♦①②♥t♦♥♣s♦ t♦♦ ♣ t♥ ♥ ♠♦♥ ②rs ♦ ♦①②♥ r♦s ♥s♠t♥♦s② t t rst ♣♣r♥ ♦ ♠tr ♥♠s r♥ ts ♥♦♥ s t r ♣r♦ t s ♦♥♥ ♠② ♠♥♥ s ♠tr ♥♠s ♥ ♦①②♥ t♦ ♥ ♦♣ r♥ ♦ ①qst s②st♠s ♦r ♣r♦♥ ♦①②♥ t♦ tr s r♦♠ sts♥ rtr♦♥ rt♦♥ t♦ ♥sts tr t s ts ♥♦t ♥rs♦♥ t♦①②♥ s

♠♣t ♦♥ ♥ ♦♥ rt ss♠ tt ♦♥② ♥ ♦①②♥ s r rt♥ trs♦ ♦ ♥♠

♠trt② ♣♣r

s ♠ttr ♦ t t s tt ♦①②♥ ♦tr s♣tr ♥♥s ♦♥ ♥♠ ♦t② ❲r t ♣r♦♣♦s tt ♦♦♥st♦♥ ♦r ♥ ♥♦t qt ♦s②st♠s ② rtr♦♣♦s ♥ rtrts ♦rr ♥

Page 33: Early Evolution and Phylogeny

P ❯

t♦ ♣ss trr ② ♥rss ♥ O2 ♦♥♥trt♦♥s ♠♦♥ ②rstr ♦t ♦♥ ②rs ♦ ♥ t r♦♥r♦s ♦①②♥ r ♥r②t ts ♣rs♥t② s t♦ t rs♦♥ ② ♥t ♥♠s ss♦r♣♦♥ ♠trs ♦♥ r② t r♦♥② ♠tr ② ♦r ♥ ♠♣♥ ♠trs ♦♥ ② r ♦♥ ♥t ♦ss r♦r ♦ ts ♣r♦ r♥r t t r ♣rt ♣rssrs♥ ♦①②♥ rr ♥♠s ♦ ♥ t♦ s♣♣♠♥t tr s ts♥t ♠♦♥ts ♦ ♦①②♥

s♠♣ ♦ s♦♠ ♥sts r♦♠ ♦♦ sts

r t②♣s ♦ ♥s sst tt ♠② ♠♦r t♥ ♦♥ ②rs ♦ r ♠♣♦rt♥t ♠t♦s♠s ♥ ♥ t t♦ ♠♦r t♥ ♦♥ ②rs ♦ ♠t♥♦♥ss s t t ♥ s♣trt♦♥s t t ♦♥ ②rs ♦ ♦t sr♣rs♥② ts t♦ ♠t♦s♠s♦r ♥ ♥r♦ ♥r♦♥♠♥ts ①②♥ ♣♦t♦s②♥tss t ②♥♦tr♠t♦s♠ tt ♥ t ♦ t rt ② ♥t♥ ♥ ♠ss ♦①②♥ ♥t t♠♦s♣r ♠st t st ♦♥ ②rs ♦t♦ ②♥♦tr r② ♥ t rt ♥r♦♥♠♥t t t♦♦ qts♦♠ t♠ ♦r ♦①②♥ t♦ r ts ♣rs♥t ♣♣r♥t② ② ♦♥ ②rs♦ ♦①②♥ s ♠♦r t♥ 10−5 t♠s ts ♣rs♥t t t r ts ♣rs♥t ♦♥② r♦♥ ♦♥ ②rs ♦ ♦tt t s s ♥ r♠♥tt t ♥r♥ ♦ t ♣rs♥ ♦ t♦ ♥r♦ r♦♣s r♦♠t ♥♦r♦ ♦♥ ②rs ♦ t ♥ ♦①②♥ ♥② rs ts♣rs♥t s ♦♥s t t rst ♣♣r♥ ♦ ♠tr ♥♠s ♥t ♣s t♦ ♦t t ♣rs♥t s ♦♥s t t ♣♣r♥ ♦ ♥trtr♦♣♦s ♥ ♥t ♠♣♥s ①②♥ s tr♠♥♦s ♠♣t ♦♥ t♦t♦♥ ♦ ♥ ♠♥② ♥trst♥ sts r♠♥ t♦ ♦♥ t♦ st② t♥ t♥ ts ♠♦ ♥ st♦r②

♦rt t ss♦♥ ♣r♦♣♦s tt ♦♥ t♠♣rtrs r 70C ♦♥ ②rs ♦ s ♦ sst tt r② ♦r♥s♠s t t♠♣rtrs ♥ ♣r♦rss② ♣t t♦ t♠♣rtrs tt r ♥♦ ♠t ♦♥t rt rt s t tt t t ♥♦♠s t♦ s② ♦♥ ts①♣tt♦♥

Page 34: Early Evolution and Phylogeny

Billion years from now

-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

Traces of life

Traces of sulphate-reduction

Traces of methanogeny

Traces of Cyanobacteria

Traces of Eukaryotes

Traces of Chromatiaceaeand Chlorobiaceae

Fossils of animals

r r♦♥♦♦② ♦ s♦♠ ♥ts ♥ t st♦r② ♦

st♦r ♦♥t♥t ♦ ①t♥t ♦r♥s♠s

♥ ♦r ♥♦r♠t♦♥ ♦t t st♦r② ♦ ♥ ♥s ♥ r♦s s♠s r②♥tr qst s r♦s r ♦ ♦sss s s♥ ♥ t ♣r♦s st♦♥♦sss ♦ s♦rts s♦ tt s ①st ♦r ♦♥ t♠ t t② r ♥♦t r②♥r♦s ♦t t r② ♣②♦♥② ♦ ♥ ♥s ♥ ♦ t② ♠ t♦ t ② t② r ♦s ♠♦r t♥ ♦♥ ②r ♦ r r② rr ♥ tt♥trr ♠② ♥♦t tt sts tt r♣♦rt ♥ t ♦r♠r st♦♥ ♠ r♦♠ str ♦t r ♦r r♥♥ ♦tr ♣s r♦r r②♥♥t r♦s ♦♣② ♠t ♥ ♠② r♥t s ss t♦ ♥ rs ♦ ♥trst♥ r♦s ♦r♦r t② ♦t♥ ♥ r② tr s ♦♥s ♣ss ♥ t② ♦♥② ♣rt ♥ s② t ② ♥♦r♠t♦♥tr ♦♠♥ts ♦ ♦t♦♥r② st♦r② ♥ ♦♥ ♥ ♥ ♦r♥s♠s t♠ss tr ♠♦r♣♦♦② tr rtrsts t s♦ tr ♥♦♠ r ♥♦r♠t♦♥ rr♥ ♦t♦♥ r♦s ①t♥t s♣s rr② srs ♦ ♥♥t♥ts ♥ tr ♦♥s ♥ ♥s

♦r♣♦♦ t ♥ ♦♠♦♦②

rr♥♠♥t ♦ s♠rts ♥ r♥s t♥ ♦r♥s♠s tr②s ♦♦♠♣r♥ ①t♥ts♣s ♣r♠ts t♦ ♥r tr ♣②♦♥② t② ♦ r♦♠ ♦♠♠♦♥ ♥st♦rs ♦r ♥st♥ ♦♥♦♦s ♥ ♠♣s r t♦

♦r ②s ♥r② ♥t rtrs t♦ ♦♥ s ♠♦r t♦♥t t♥ t♦tr s s♠rt② rts tr r② r♥t r♥ r♦♠ ♦♠♠♦♥♥st♦r ♦t s♣s tt t♠ t♦ ♠t ♠♦r♣♦♦ r♥s♦♥ r♦♠ t ♦tr st② ♦♥r t♠ ♥ t ♣st ♠♣♥③s ♥ ♦♥♦♦ss♦ sr ♦♠♠♦♥ ♥st♦r t ♠♥s s ♥ s♦♥ ② t rs♠t t♥ ts r♥ ♣s ♥ tr ♥ ♦s♥ ♥ ♥ ♦ ♦♥

Page 35: Early Evolution and Phylogeny

P ❯

tt ♥ ss♠ tr ♦ ② tr♥ ♠♦r ♥ ♠♦r st♥t ♦s♥s ♥t♥♦ ♠♦r ♥ s♣s s t tt s ♥♦t ♥ ♥t t♦ t ♠② ♥♥r ♣r♠ t♦ t ♥ ♦ ♦t♦♥ s ♥ sr ② r ♥s t♦ ♦r s♦♠ rs♦♥ ♦s ♠♥s s ♦ s♣s ♥st ♦♦♥♦♦s

tr t♥ s♥ ② st♠t♦♥s ♦ ♦r s♠rt② s st♠ts ♦ rt♥ss ♦t♦♥r② ♦♦sts r② ♦♥ ♥ ♠t♦s t♦ s♦rt ♦t s♣strs ♥ ♠♣♦rt♥t st♣ ♥♦t② s t♦ ♥ rtrs ♥ stts ② ♥s♥ ♣rts ♦ t ♦② ♦r ♥st♥ t ♦r♠ ♥ ♠♠♠s ♠② s rtr s t ♥ s② ♥s r♦♠ t rst ♦ t ♦② ♥♦r ts rtr stts ♦ r♠ ♥ ♦r ♣♣r t♦ t ①♠♣ ♦s s rtr strt♦rr s② ♠♦r ♣r♦♠ s t♦ ♥t② s rtrs ❲② ♦ t♥ t rs♦♥ t♦ ♦♥sr tt tr r tr♥st♦♥s ♦ rtrs r

♦♠♦♦♦s t②♦ r♦♠ s♥♥str rtr

t♥ r♠ ♥ ♣♣r ♥ ♥ r♥t② ② ♦ t♥ tt tr ♥♦t ♣♣r ♥♣♥♥t② ♦ ♦tr t ♦ r♦♠ s♥ ♥str stt ♥ t ♥st♦r ♦ ♥♠s ♥r st② s♦rt② ② ♦ ♦♥sr tt t② r ♦♠♦♦♦s

♥t♦♥ ♦ ♦♠♦♦② ♥ ♦t♦♥r② sts s ♦ r ♠♣♦rt♥ ♦♥ ♥ts t♦ ♥r t ♦t♦♥ ♦ rtr ♦♥ ♥s t♦ r♦♥s ♣r♦♣r②ts rtr ♥r t ♦st♠s t s ss ts ♥ ♣r♠ts tsss ♦r ♥② s♣s ♦♥ s ♥trst ♥ ♦r ♠♦r♣♦♦ rtrs ♦♠♦♦②♥ ♥t ② ♦♦♥ t ♣♦st♦♥ s♣ t♠ ♦ ♦rr♥ ♥ st♦r② ♦r ♠♦r r♥t② ♣ttr♥ ♦ ♥ ①♣rss♦♥ tr r ♦ts ♦ s♠rts s♠rt② s

sts ♦♠♦♦②t♥ t♦ rtrs ♥ r♥t s♣s t s♠s ♠♦r rs♦♥ t♦ ss♠tt ts ♦♠♠♦♥ ♣♦♥ts rst r♦♠ ♦♠♠♦♥ s♥t r ♦♠♦♦♦st♥ r♦♠ ♣r ♣♣r♥s

♠♦r♣♦♦② ♠② ♣r♦ ♥♦r♠t♦♥ t♦ ♣②♦♥② ♦ rt♣s ♠♠♠s ♦r ♥ rtrts t ts qt ♥♦♠♦rt ♥ ♦♥ ♥tst♦ ♦♠♣r ♣♥t t ♠♠♥r tr ♣rst t ts ♥st ♦st♦r t ♥ r ♥ ♥ t st♦♠ ♦ ♦ ♦♠♦♦♦s rtrs ♥② ♥♦♠s ♣r

♠t t♦ r♦♥strt tr ♦ r ♥s② t♦ ♥ ♥ tr ♦t♦♥ s t♦ s② t st ♦♠♣① ♦♥tss

♣r♦♣r tr ♦ s♦ ♥ ♦ ♣♦ss s t ♣t♦♥♦ ♦ t② ♦ r♦♠ ♦♠♠♦♥ ♥st♦r ♦♥♠♦r♣♦♦ rtrs rts ♠ ♥ ♥ ♥ ♦♥ ♥ ♥♦♠s

q♥ t

♥♦♠s r ♥r ♠♦s ♠ ♦ ♦r r♥t t②♣s ♦ srs ♥♦ts♥♠ ♥♥ ②t♦s♥ ♥♥ ♥ t②♠♥ ♦r ♠♦r s♦rt② ♥

Page 36: Early Evolution and Phylogeny

♥ t sq♥ ♦ ♥♦♠ tr♦r ♠♦♥ts t♦ r♣t♥ ♥ s r ♥♠r ♦ t♠s ♥ s♣ ♦rr ♥ ♣rt ♣②♦♥tsts ♦ ♥♦t♦♠♣r ♥♦♠s ♥ tr t♦tt② t♦ t② ♦ t ♦r ♣rt rs♦♥sst s♠♥ts ❩r♥ t P♥ ♠♦st ♦t♥ rs♠♣② ♥s ♣♣r♦①♠t② ♥t♦♥ s♠♥ts ♦ t ♥♦♠ sq♥tt ♥ tr♥sr ♥ ♣♦ss② tr♥st ♥ t t♦ ♣②♦♥② ♦♥ s t♦ ♥ s ♣♦rt♦♥s ♦ ♥♦♠ sq♥s tt ♥ ♦♠♣r t♥s♣s ♣♦rt♦♥s ♦ ♥♦♠s tt r ♦♠♦♦♦s

❲♥ ♦♠♣r♥ s♠♥t ♦r ♥♦♠ sq♥s t♦ ♣②♦♥② rtrs r s♥ ♥♦ts ♦♠♦♦② t♥ ts sq♥ rtrs ss ♦♥ ♦♥♣ts ♥rt r♦♠ ♠♦r♣♦♦ ♥②ss t s tr♠♥ s♦♥ ♦r sq♥ s♠rt② t♦ sq♥s s♦ ♦t ♦ s♠rt② t s♥② tt t② r♦s ♥♣♥♥t② ♦ ♦tr ♥ tr♦r t② ♠st ♦♠♦♦♦s tr♠♥♥ ♦r sq♥ s♠rt② s tr♦ sq♥ ♥♠♥t ♣r♦r ♦s ♠ s t♦ ♥ sq♥s t rs♣t t♦ ♦tr s♦ tt tr ♦r sq♥ s♠rt② s ♠①♠s ♥ t ♥ t ♠①♠ s♦r s s tt ♦♠♦♦② s ♠♦st ♣r♦ sq♥s♥ s ♦r ♣②♦♥② r♦♥strt♦♥ r rsts ①st tt ♥ ♥sq♥s t♦tr ♦♠♣s♦♥ t ♦tr♠ t r ö②t②♥♦ t ♦♠♥ ♦r ♦s ♠ s t♦ q② sr ts ♦rsq♥s s♠r ♥ t♥ ♣♦ss② ♦♠♦♦♦s t♦ qr② sq♥ tst

tr sq♥s rtrs ♥ s t ♣rs♥s♥ ♦ ♥ ♥ t ♥♦♠ ♦ ♥ ♦r♥s♠ ♥ s s rtr s ♦♥② t♦stts ♣rs♥t ♦r s♥t ♦r ts t②♣ ♦ ♥②ss s s② ss ♣rst♥ ♥②ss tt s sq♥s ♥ tr ♥trt② t② sq♥ t ♦rsr ♥ts ♦♠♣r t♦ ♦tr t ♦r ♦t♦♥r② r♦♥strt♦♥s

rst t s r② s② t♦ ♦t♥ sq♥ t s s♥ ♥ ts♦♥ srs s♥ r ♥ s t♦ ①trt ♥ sq♥ ♥ r♦♠ t ♥ ts sr t♦ sts ♦♠♦♦② ♦r sq♥ t t♥ ♦r ♠♦r♣♦♦ t♣ t♦ t ♣♦♥t tt ♦♠♣tr ♥ ♦ t tr s ♥ ♦ ♥trst ♥ t♦♥♦♦ ♥♦♠ t s ♥♦t tt r t♦ ♥ ♦♠♦♦♦s ♥ ♥ ♠♥ ♦r♥st♥ ♠② ♦♥♦♦ ♥ s rtrs ♦♥ ♥ tt tr♠♥ ♥ s sq♥ ♠♦r t♥ 70% ♥t 350 rtrs ♥ ♦♠♠♦♥s ♦♠♦♦♦s ♥ q ♦♠♣tt♦♥ s♦s ts 70% ♥tt② rtr♦♥ s♦♥srt ♥ tr r ♣♦ss stts t st 70% ♥tt② ♠♥stt ♠♦♥ t rtrs ♠♦r t♥ 350 r ♥t t♥ ♠♥ ♥♦♥♦♦ ♣r♦t② tt t st 350 sts r ♥t ♥ 500 sq♥s

Page 37: Early Evolution and Phylogeny

P ❯

♥ ♦♠♣t s ♦♦s

P (more than 350 identical sites in 500 sites) =500∑

i=350

Ci500 × (

1

4)i × (

3

4)500−i

=500∑

i=350

Ci500 ×

3500−i

4500≈ 7 × 10−99

♣r♦t② tt t ♦♥♦♦ ♥ ♠♥ sq♥s r ♥t t 70%② ♥ ♦♥ ♥ ♦tr tr♠s ♦♥r♥ s s♦ s♠ 1

7×1099 ) tt t s ♥r②rt♥ tt t♦ sq♥s ts s♠r s♥ r♦♠ s♥ ♥st♦r ♥ ♥ r♥ ♣r♦r♠ tt s♥s t ♦ ♠♥ ♥♦♠ ♥ r♣♦rts sq♥stt r ♠♦r t♥ 70% ♥t t♦ ♠② ♥ ♦ ♥trst ts sq♥s♦r♥ t♦ ♠② ♦♥srt rtr♦♥ r ♦♠♦♦♦s ♥s

t s ♠ ♠♦r t t♦ ♥ ♦t rtr t♦ sts ♦♠♦♦② ♦r♠♦r♣♦♦ t s s ♣r♦s② t♦ sts t ♦♠♦♦② ♦ ♥ ♦r♥ ♦♥♥ ♦♥sr ts tr♠♥s♦♥ strtr ts ♠r②♦♦ ♦r♥ t ♥stt r ①♣rss r♥ ts ♦r♠t♦♥ t ♦♣♠♥t ♣s r♥ ts ♦r♠ ♠♣② tr♥ ts ♥♦r♠t♦♥ rqrs ♦t ♦ t♦s ♦r♥ t ♥ ② rsrr ♠② ss♠ r ♦② ♦ t s♦♠♦ ♠② ♦t♥ ♦♠♦♦② ♦trs ♥♦t r♦♠ ts t rsrrs t♦ tr ♥ ♦♥♥t② ssrt ♦♠♦♦② ♦r s ♥sr ♥♥♦ s♠♣ t♦♠t ♠t♦ ♥ ♣ ♠

♦♥sq♥t② s♥ sq♥ t s ♠ ♠♦r ♣rt t♥ s♥ ♠♦r♣♦♦ t ♥♦tr ♥t ♦ sq♥ ♦r ♠♦r♣♦♦ t ♣rs♥t ♥ st♦♥

♥ rtrs ♥ tr tr ♠♦r♣♦♦ ♦r sq♥t r ② t♦ r♦♥strt t ♣②♦♥② ♦ s♣s ♦ t♦ s rtr ♥t♦ rs ♦ tr ♦ t♦tr ♥♠s sr♥ t st ♥♠r ♦ ♦♠♣r♥ ♥♦♠s

rqrs sttstsstts r ①♣t t♦ t ♠♦st ♦s② rt ♦r ♦rt t♦♦s ♥ ♦♣ ♥ ♥ r♦♣ ♥r t ♦ ♥r♥t sttsts

ttsts ♦r ♥r♥

P♦♣ ♦ r ♦♥ ♦ ♥♦② ♥ sr ♣rs r♦rs ♦r ♥♦r♠t♦♥ ♦ttr ♥st♦rs s ② t② ♥ s♦r t ♥♠s ♥ ♦s ♦ tr r♥r♥r♥r♥tr ♦r ♥st♥ ♥♦tr ② t♦ ♥♦ ♠♦r ♦t tr♦rrs ♦ t♦ ♥ tr r♦trs sstrs ♥ ♦s♥s s t♠ t♦ s♣t♥ t ♥ t♥ sq♥ tr ♥♦♠s ♥ ② r② ♥②s♥ t

Page 38: Early Evolution and Phylogeny

♥♦♠s ♦ s ♥ ♥ s♥ s ♥♦ ♦ ♦ ♥s r ♣ss r♦♠ ♦♥♥rt♦♥ t♦ t ♥①t ♥♦st ♠t t♦ r♦♥strt rtrsts♦ s ♦r♦s ♥t♥ts ♥ ♥♦ t② ♦ r tr t♦♥ ♥t♦ t♥ t② t r① ♦r ♥st♥

♣②♦♥tst s t r♦② t s♠ ♦s tr ♥sr t ♦ss r♦r t♦ ♥ ♥♦r♠t♦♥ ♦t s ♥st♦rs ♥ ♦ ♣♦♥t♦♦② ♦r ♥ ♦♦ t s ♥♦♠ ♥ t ♥♦♠s ♦ s ♦s♥s ♦ tr♠♦r♣♦♦② ♥ ♦ ♦♠♣rt ♥♦♠s ♦r ♦♠♣rt ♥t♦♠② ♦♥②r♥ r s t t♠s ❲♥ t ♥♦st s ♥trst ♥ s ♦rt st ♥trs t ♣②♦♥tst s ♥trst ♥ st♦r② tt s♣♥s t♦s♥s t♦ ♦♥s ♦ ②rs ♣②♦♥tst ♦s♥s ♦♥sq♥t② ♠② ♠♦rr② ♠ r ♠ s♠r ♥ r② rs♣t r② r♥t r♦♠ ♠ s♣t ♣♦ss② r♥s ♥ st② tr ♠♦r♣♦♦② ♦r ♠♦r♣♦♦sr r♥t t♦ t ♣♦♥t tt ♥♦ ♦♠♦♦♦s rtrs ♥♥♦t r♦♥s♥②♠♦r tr ♥♦♠s ♦♣② ♠② ♦♠ ♣ t sts②♥ ♣t♦♥♦ t st♦r② ♦ tr ♠② ♥ ♥r rtrsts ♦ tr ♥st♦rs ♦ ♦s♦ ♥s ♠♦s ♦ ♠♦r♣♦♦② ♥ sq♥ ♦t♦♥ ♥ t ♦ sttsts

♥ ts st♦♥ ♠ ♦♥ t♦ ①♣♥ t ♠♥ ② ♠♦ ♥ sttsts♥tr♦♥ ts ♦♥♣ts t s♠♣st ①♠♣ ♥ r♣② ♣rs♥t♠♦s ♦ ♦t♦♥ ♥ ♥ ssq♥t st♦♥s ♣rs♥t t s ♠♦s ♦ ♦t♦♥ ♥ t s ♦t t st♦r② ♦ ♦♥ rt ♥ ♦♠♥ ts♥♦ t t r♥t r♦♠ ♦♦②

①♠♣

♥ ②♣♦tss t♦ ♥rst♥ ♥ ♥①♣t ♣ttr♥

ts ss♠ s♠ ♣r♦♠ ♦ tr ♦r♦♥s♠♣t♦♥ ❲♥ ♦♦ tt tr tt s s ♦♥ ② ss ♥ ♠② ♣rt♠♥t t ♠♣rss♦♥tt s♦♠ ②s t♦♦ ♠ tr s st ss♣t tt s♦♠♦♥ ts r ♠② s♦♠t♠s ♦rt t♦ ♣r♦♣r② ♦s s♦♠ t♣ ♦r ♦♥ t♦ ♦r ♥♥♦t r r ♥ss ♠ r② sr ♥ s♦ ♥t t♦ ♥♦ ♦ rq♥t②s ♦rts t♦ tr♥ ♦ t♣ t♦ ♥♦ ♦ ♣♥s♠♥t s srs ♥ ♦ s ♦♦ t t ② tr ♦♥s♠♣t♦♥ ♦r st ♦ ts ♥ t♥ tr② t♦♣r♦ s ♦♥ ts t ♦♥② tt ♦♠♠t tr r♠ ♦ ♦ ts ♥ ♦♠♣r t♦ ss tr t♣s r ②s ♣r♦♣r② ♦s ♦r ♦♥♦ ②♣♦tss

tr s♦♠ t♣s rs♦♠t♠s t ♦♣♥♦r ♥♦t

t♦ ♥ t ②s r s♣♥ ♦t ♦ tr r t rt ♦ ♥ ①♣trt♦♥ ♥ ♥♦r♠ tr ♦♥s♠♣t♦♥ ♦r tr r ♥ s♦♠ ②s r t♣ s ♥♦t ♦rrt② tr♥ ♦ rsts ♥ st ♦ tr s t♦ ss♦rrs♣♦♥ t♦ t♦ t②♣s ♦ ♣r♦sss ♦♥ r t tr ♦♥s♠♣t♦♥ rs

Page 39: Early Evolution and Phylogeny

P ❯

r♦♠ ♦♥② r♥♦♠ rt♦♥ r♦♥ s♦♠ r ♥ ♦♥ r trs r♦♠ t r♥♦♠ rt♦♥ r♦♥ s♦♠ r ♣s s♦♠t♠s ♥t♦♥ q♥tt② ♦ st tr ♦ t ♥ t♦ ♣r♦ tt ♣♣♥ ♥rt t t ♥♦t ♦ ♦ s♦ ♥ ♦♠♣r t ②tr ♦♥s♠♣t♦♥s ♣r♦ss ♦ ♣r♦ ♥ ♦♠♣r t t♦ t r t♥ ♦ t s♠ ♦r ♣r♦ss ♣r♦ss ♣r♦s t tt ♠♦r ♦s②rs♠s t r ♦♥s t♥ ♣r♦ss t♥ ♥ ♦②② r

♦ ♦♠s t ♣r♦♠ ♦ ♥♦♥ t t ♣r♦sss ♥ ♦♥♥r rst ♦ ♠ ♣②s s♠t♦♥s ♦r ♥st♥ ♦ ♦♣② ♦ ♠② ♣rt♠♥t s tr ♥ t ♦♣② ♣rt♠♥t s s♠r② s ♣♦sss ♦ ♥ t r ♣rt♠♥t ♥ t♥ ♥ t♦♥ s♦♠t♠s t t♣ ♦♣♥♦r ♥t ♦ tr② t♦ t t t♣ ♦♣♥ 0 ♠♦ 1 2 3 n ♠♦ t♠s ♥ s ♥r ♥♠r ♦ t♠s tr s s♣♥t ♥ s♠r ② s♦sr ♥ t r t ♦r ts s♠t♦♥ t♥q ♦ t♦s♥ t t♦ ♠♣♠♥t

♥st ♥ s ♦♠♣tr s♠t♦♥s s ♦♥ ♣r♦st ♠♦ ♠♦ s s♠♣ ♥tr♣rtt♦♥ ♦t rt②s ♣r♦st ♠♦ ♥s t♦ ♥♦r♣♦rt t ♥♦r♠ rt♦♥ ♥ ② ♦♥

s♠♣t♦♥ t s♦ t ♦s♦♥ ♦r♦tt♥ ♦♣♥ t♣ s♠♣ ♠♦ ♦r ②rt♦♥ r♦♥ ♥ r q♥tt② s ♣r♦ ② t ♥♦r♠ ♦r ss♥strt♦♥ s strt♦♥ s ♥ ② t♦ ♣r♠trs t ♠♥ µ ♥t st♥r t♦♥ σ ♦r t ♦s♦♥ ♦r♦tt♥ ♦♣♥ t♣ ss♠ ttt ♣r♦t② p t♣ s ♥♦t ♥ tr♥ ♦ ♦r t ♥t s♦ tt s♦♠q♥tt② Q ♦ tr s st ♦s② ts Q ♣r♠tr s ♣♦♦r ♣♣r♦①♠t♦♥ ♦ t rt② ♦♥ ♦s ♥♦t ①♣t tt ♥ t♣ s t ♦♣♥ ①t②Q trs r st tr t rt♦♥ ♦ t ♥t ♥ t ♦ ♦r♥ ♥ ♠♦s ♦ r ♣r♦ss s♠♣t♦♥s ♥ t♦ ♠ ♥ ♦♣ ♥ ts r♠st♥s ts s♠♣t♦♥ ♥♦t ♦ ♠ r♠ t♦ t♣♣t② ♦ ♠② ♠♦ s ♠♦ tr♦r t♦t③s ♣r♠trs ❲ts ♣r♦st ♠♦ ♥ r♥ s♠t♦♥s ♥ s♦ s ♠ sr♥ str t♥ ♥ t r ♦r

♦♥r♦♥t♥ t t

♦ tt ♠② ♣r♦st ♠♦ ♥t t♦ ♥♦ ♦ ♠♥② t♠s ♦r♦t t♦ tr♥ ♦ t♣ t t ♣♦sst② tt ts ♥♠r ♦ t♠s s ♣r♦ss ♦ ts ♥ s♠t t ♥r ♠② ♠♦ t r♥t s♦r t ♣r♠trs µ σ p ♥ Q ♥ ♦♠♣t st♥s t♥ r t ♠♦ ♥ ♥

rt rst tt s ♦♦ ♠♦♥ s♠t strt♦♥s ♦r r t ♠♦♥t♦r tr ♦♥s♠♣t♦♥

♦r ②s ♦ ♦♠♣r r ♥ s♠t strt♦♥s s t ♦♦♥♣r♦t♦♦

Page 40: Early Evolution and Phylogeny

rst ♦rr t r s ri ♦♥ ♦♥ ♥ ♥ t s♠ts si ♦♥ t ♦tr ♥

♥ ♦♠♣t Distance =∑

i∈[1..365] |ri − si|

st♥ tt ♦♠♣t s t s♠♣st ♦ t♥ ♦ t ♠② ♥♦t t st st♥ ♣♦ss ♦r ♥st♥ ts st♥ ♠t s ♥ s♦♠② ♦r tt sr♠♥t♥ rtr sts s♦ ♠ t♦ ♥sr tt♠② sttst st♠t♦r s ♥♦t t♦♦

s♠t strt♦♥s t t ♦♦♥ ♣r♠trs

• µ ♥trs ♥ ❬❪ ② ♥rss ♦

• σ ♥trs ♥ ❬❪ ② ♥rss ♦

• p r ♥♠rs ♥ ❬❪ ② ♥r♠♥ts ♦

• Q ♥trs ♥ ❬❪ ② ♥rss ♦

♦r st ♦ s r♥ t♥ s♠t♦♥s♥ t ♥ ♦♦s s ♠② st st♠ts ♦r µ σ p ♥ Q t s tt

♣r♦ t strt♦♥ ♦sst t♦ t tr ♦♥❲♥ ♦♦ t ♦ ♣r♦t♦♦ ♥ tt t st st♠ts ♦ ♠② ♣

r♠trs r s ♦♦

• µ

• σ

• p

• Q

s ♠♥s tt ♦r st ♠♦ ♦r♥ t♦ ♦r st♠t♦r ♣rts tt ♦♥r ♦♥s♠ 60 trs ♣r ② t st♥r t♦♥ ♦ 4 trs ♥tt s ♦r♦tt♥ ♦t 36 p × 365 t♠s t♦ st ♦ t t♣ ♠② ♦st t ♦ss ♦ s ♠ s 730 Q × p × 365 trs ♥ t♦t t ♦♦r ♠♦ s ♥♦t t♦♦ s s♦♥ ♥ s♣r♣♦s t ♥sts ♦ t t♦strt♦♥s ❲ ♥ ♥♦t② ♦sr tt t r strt♦♥ s♦ss♦♠ ♠♣s ♦♥ t rt s s t s♠t strt♦♥ s ♠♣s ♠♦st② ♦♠ r♦♠ s

Page 41: Early Evolution and Phylogeny

P ❯

50 60 70 80 90

0.00

0.02

0.04

0.06

0.08

Number of litres per day

Den

sity

Real dataSimulated dataWithout forgotten open tap

r ♠t strt♦♥ t st t t♦ t r strt♦♥ ♥ strt♦♥♦t♥ ♥r t ♠♦ t♦t ♦s♦♥ ♦♣♥ t♣

t s ♣r♦ tt tr ♠♦r s ♦r t ♣r♠trs ♦ ♠② ♠♦ ♦ ♦t♥ ttr t ♦r♦r ♦ ♥♦t ♥♦ tr tr s r♥ t♥ ♦♦s♥ ts s♣ s ♦r t ♣r♠trs ♥♦♦s♥ ♦tr ♦s s ♦r ♥ ♦♦s♥ ♠♦ r p = 0 ♣r♦ss ♣r♣s ♦tr s ♣r♦ t ♥r② s ♦♦ s ts s ♥ st t ♦ ♥ ♦r ts ♦ r② ♦ t qtt s t♦ tr ♣r♦ss ♠② ♣r♦ t r t r♣rs♥tt strt♦♥ ♦sst t♦ t tr t ♥ p = 0 ♥ t♦ ts ♥♦t tt r r♦♠ t r strt♦♥ t ♦s s♠ t♦ ♥♦t s ♦♦ s tst strt♦♥ ♥ p 6= 0 ♥♦t② t ♦s ♥♦t ♣r♦ ♠♣s s ♥ t rstrt♦♥ t ♦ ♣r♦② s t♦ ♠ sr tt t ♠♦ t p = 0r② ♥♥♦t strt♦♥ s ♦♦ s ♥ p 6= 0 ♣♦ss② t ♠♦r s♠t♦♥s ♦r ② ♥rs♥ t s♠♣ s③ t ♠ ♦♥ t♦ trst ts ♥r♥tr ♥♦t ♠ s t st

❲♥ ♦♥r♦♥t t ts rsts ♦♥ss tt s ♦r♦tt♥ t♦♦s t♣ t♠s r♥ t st ②s s s♦s t ♥t ♦ sttsts ♦r r♥♥♥ ♦s♦

Page 42: Early Evolution and Phylogeny

♥r♥t sttsts

①♠♣ s♦ tt ♥ ♦♥r♦♥t t s♦♠ ♥①♣t t♦♥ ♥ ♠ ②♣♦tss ♦t t ♣r♦ss tt ♥rt t♠ ♠♦s ♦♥ ts ②♣♦tss ♥ tst ♦ ts ♠♦ t ②♣♦tss ts tt ♦r♥ t♦ s♦♠ st♠t♦r s ♦ ♣r♦r ♥ rrr t♦ s♥r♥t sttsts

s ♥ ♦r ①♠♣ ♥ s♣s ♥ tr rtrsts r t rst ♦ ♣r♦ss tt ♦rr tr♦ t♠ ♦t♦♥ s ♥ ♦r ①♠♣ ♥♦② ♦ ♥♦♥ t ♣r♦ss t rtt s ♥♦ t♠ ♠♥s ♥ ♦r ①♠♣ ♥ rs♦rt t♦ ♥r♥t sttsts t♦ ♠ ②♣♦tss ♥♦♥r♦♥t t♠ t♦ t

♥r♥t sttsts ♥ ♠♥ts

t ♥ t ①♠♣ t r ♦♠♣♦s ♦ q♥tts ♦ trs♣♥t ♣r ② ♥ ♦♦② ts t ♦ t ♣rs♥s♥ ♦ s♦♠rtrs ♥s ♦r ♥st♥ ♥ ♦♦ t♦r t ♦♣t♠r♦t t♠♣rtr ♦r ♥st♥ sq♥s t

②♣♦tss ♦♥ t ♣r♦ss tt ♥rt t t ❲♥ t t sttst♥ ♦rts s♦♠ ②♣♦tss ♦ ♦ t② ♠ t♦ t ②t② r ②♣♦tss ♦t t ♣r♦ss tt ♥rt t t ♦r t ①♠♣ ♦♥ ♦ t♥ ♦ t♦ ♣♦ss ♣r♦sss ♥

♠♦ ♥ t ②♣♦tss ♥ ♥♥t t② ♥ t♦ tr♥st ♥t♦ ♠t♠t ♠♦ ♠♦ ♥s t♦ ♥♦r♣♦rtt ♠♦st ♠♣♦rt♥t s♣ts ♦ t r ♣r♦ss ♥ ♦♣t♠ ♥ t♥rs♠ ♥ trtt② ♥s t♦ ♦♥

♥ st♠t♦r ♥ t sttst♥ s ♠♦ ♦ t ♣r♦sss t♥s♠② ♥rt t t ♥s t♦ ♥ ② t♦ ♥②s ♦ ♦s②s ♠♦ ts t t ♥ t ①♠♣ rs♦rt t♦ s♠t♦♥s ♥♦♠♣t st♥s t♥ ts s♠t♦♥s ♥ t tr t trst♠t♦rs ♥ ♦♥ ♥ st ♥ r♥ ♦ ♣r♦♠s s♦tt ♥ ♣rtr st♠t♦r s s ♦♥ ♥♦s ts rtrsts

❲♥ ♣♣ t♦ ♦♦ t t ♠ ♦ ♥r♥t sttsts s t♦ r♦♥strt ♣st st♦r② tr ♣st s♣t♦♥s t♦ st♠t t ♣②♦♥② ♦r ♣st♦♥ ♦t♦♥

t♦ r♦♥strt st♦r② ♥ts tt ♦rr t♥ s♣t♦♥s t♦ st♠t t ♣r♦ss ♦ ♦t♦♥

❲ r② ♣rs♥t t t tt ♣②♦♥tsts ♦ s t♦ ♥r tst♦r② ♦ r♦♠ ①t♥t ♦r♥s♠s ❲t ts t ♥ ♥ ♣②♦♥tsts♥ s t ♥ss ♦ ♠♦s ♦ ♦t♦♥ t♦ ♦♦ ♥t♦ t ♣st

Page 43: Early Evolution and Phylogeny

P ❯

♦s ♦ ♦t♦♥

♦ r♥t t②♣s ♦ t ♠♣♦s t♦ r♥t t②♣s ♦ ♠♦s ♦♠ rtrss♦ ♠t ♥♠r ♦ stts ts r srt rtrs ♦r n stts trr n2 ♣♦ss tr♥st♦♥s ♥♥ tr♥st♦♥s r♦♠ stt i t♦ ts trrtrs ♦ ♥♦t ♠t ♥♠r ♦ stts ♦r ①♠♣ t s③ ♦ ♥♦r♥ ♦r t ♥♠r ♦ r ♦♥ t ♦ r♦s♦♣ t tr ♦t♦♥ ♥st ♠♦ ♣r♦st② ♥♦t t ts s♦♥ ♥ ♦ ♠♦ss ♥♦t s t♠ rtr ts ♦t ts ♥ ♦♥ ♥♦t② ♥s♥st♥ P P t ♥ t ♥①t ♥s r② ♣rs♥t ♠♦s tt ♥ s ♦r srt rtrs t ①♥♠r ♦ stts

♠r ♠♦s ♦ ♦t♦♥ ♥ s ♦r ♠♦r♣♦♦ ♥ sq♥ t♦r sq♥ t t ♥♠r ♦ stts s r② ♥ ♦r tr r stts ♦r ♠♦r♣♦♦ t ts ♥♠r r② ♦r♠ s t rtr♦ ♥trst ♥ ts ♦sr stts r r♠ ♥ ♥ ♣♣r ts ♥♠r s ♥♦t ss ♠♦s ♦ ♦t♦♥ ♥ s r② ♦t♦♥r② ♦♦sts ♥ ♦r rt ♣rt st ♥♦ ♦ ♠♦r♣♦♦ ♥s ♦rr♥ ♥♦rt♥t② ♥♦t ss t♦ sq♥ t ♦♥sq♥t② r② ②♣♦tss ♦ ♠ ♦t t ♣r♦sss tt ♥rt ♠♦r♣♦♦ rst② ♦rrs♣♦♥♥ ♠♦s r ts ♥ssr② r② s♠♣ ♥ r②st ♥ ♣r♦r ♠♦r ♣r♦t② t♦ ♣rtr tr♥st♦♥ t♥ t♦♥♦tr ♦r♦r r② ♠♦s r st♠t ② ♥ s ♦♠♣trs r♥♦t ②t s t♦ ts s r② ♥♦t t ♦♥② rs♦♥ ♠♣♦s rtr ♦♥str♥t ♦♥ ♠♦ rs♠ ♥ rtr r♥t stts ♥ t♦s♣s ② t ♦♥ ♦ ♦♥t ♦♥② ♦♥ tr♥st♦♥ ♥t ♥ tr ♠② ♥ sr tr♥st♦♥s ♥ ♥ s♣② t t♦ ♦♠♣r s♣s r ♦♥ t♠ ♦ ♦r t rtr ♥r st② s tt ♦♥str♥ ♥♥r♦s ♠♥② tr♥st♦♥s ♥ ♦♥sq♥ s ♠♦ ♥♠ ♣rs♠♦♥②s ♥♦♥ t♦ st t♦ s ♥ ♦ts ♦ ♥ts ♦rr s t♦s ②s ♥rst♠t t tr ♥♠r ♦ tr♥st♦♥s s s ♥ r♣t② s♦♥ ♦♥ sq♥ t s♥st♥ s t r ♥r t s♥st♥ t♥♦ t s♥ ♥♦♥t s ♥st s♣② ♦♥ sq♥ t ♣♦♣ ♥♦ s ♠♦r① ♠♦s tt ♥ st♠t tt rtr s ♥r♦♥ sr tr♥st♦♥s ♥ ts tr♥st♦♥s t ♥♦ ♦sr tr ♥ tt r② ♦♥ ♥①♣t ♣r♦st ♠♦ r tr♥st♦♥s r ss♦t t ♣r♦tss ②♦ t ♠r s♥st♥ ♥t s t ♠r ♦♥s t ❲♥ t♦♠♥ t s s tr♥st♦♥s r ♣r♠trs ♦ t♠♦ ♥ sttst② ♥rr ♦s ♦ sq♥ ♦t♦♥ ♥ t♦r♦② t ♥ s♥st♥ ❨♥ ♦r ♥st♥ ♦r ♥t tss ♦ tr ♥♦♥ ♦r ts s♦ ♣r♦

Page 44: Early Evolution and Phylogeny

♥ st♦♥

♠♦ ♦ rtr ♦t♦♥ s♦ ♥♦t ♦♥② ♦♥t ♦r t ♣r♦ss ♠♦ ♦ ♦t♦♥ ♦♥t♥s tr♥st♦♥ ♣r♦ts ♥ ♣②♦♥②

t t tr♥st♦♥ ♣r♦ts t s♦ s♦ ♦♥t ♦r t ♣ttr♥ t ♣ttt s ♥ t♥ ② ♦t♦♥ t♦ ② t ♦sr stt strt♦♥ ♠♦♥s♣s s ♣ttr♥ s s② ♠♦ ② rt♥ ② r♦♦t r♣♥ r♣rs♥ts s♣s ♣②♦♥② s r♣ s ♦♠♣♦s ♦ ♥♦s♥ r♥s ♦r♥ t♦ t ♣②♦♥ts ♦r② r t r♦♦t ♥♦r♣rs♥ts t ♥st♦r ♦ ♦r♥s♠s ♦♥ ♥ t tr ♥tr♥ r♥s ♦♥t♦ ♥st♦rs t♦tr ♥ ①tr♥ r♥s ♦♥ s♠♣ s♣s t ts♠♦st rt ♥st♦r ♥ ♣rt ♦♥ ♥ ♠♣♦s ♥♦♥ s♣s ♣②♦♥② ♦rst♠t t s♣s ♣②♦♥② ♦♥t② t t ♦tr ♣r♠trs ♦ t ♠♦♦ ♦t♦♥ ♦r ♥ s ss ♦♥ rtr s ♥♦t ♥♦ t♦ st♠t t ❲♥ ♦t s♣s tr ♥ rtr ♦t♦♥ r t♦ st♠t r♥♠r ♦ rtrs r ♥

r ①♠♣ ♦ s♣s tr s♣s ♦ ♥trst r ♦♥♦♦ ♠♣♥③♠♥ ♦r ♥ ♦r♥t♥ ♥tr♥ ♥♦s r r ♥ r r♦♦t ♦ t tr st ♣♣st ♥♦ ♥ t tr

r t s♦♥ ♥t ♦ sq♥ t ♦r ♠♦r♣♦♦② s r② ♦♦sq♥ t rsr t♦ ♠♦ ♥♦♥t♥ ♠♦r ♥♦r♠t♦♥ t♥ ♠♦r♣♦♦②

s t ♥♠r ♦ sq♥ rtrs ♥ r② r ♥ s rtrs r② s♠r t♦ t ♥①t ♦♥ ♠t ♥♠r ♦ ♣r♠trs ♠② t r♥♠r ♦ sq♥ rtrs ♥ t ♦♥trr② ♠♦r♣♦♦ t r tt♦ qr ♥ r q♥tt② ♥ ♠② ss s② t♦ ♠♦ t ♠t st ♦♣r♠trs s r♦♠ ♦♥ rtr ♦r ①♠♣ t ♦r♠ t♦ t ♥①t ♦r①♠♣ t ♣rs♥ ♦ ♠♠♠r② ♥s stts ♠② r ♥ ♦r ①♠♣s

Page 45: Early Evolution and Phylogeny

P ❯

♦ rtrs ♦♥ ♠② ♦③♥ ♦ stts ♥ t ♦tr ♦♥② t♦ s t s♠♦r t t♦ ♣r♦♣r② st♠t ♣r♠trs ♦ ♠♦ ♥ t r srs♥ sq♥ t t♦ r♦♥strt ♦t♦♥r② st♦r② s ♠ ♠♦r rs♦♥ ♥♦r

s t s sr t♦ st② sq♥ t♥ ♠♦r♣♦♦② ♦t♦♥ ♠♦s ♦sq♥ ♦t♦♥ r♥t② r r ♦ s♦♣stt♦♥ tt ♠♦s♦ ♠♦r♣♦♦ ♦t♦♥ ♥r r t s ♥♦ ♥♦♥ ♥ ♦ ssttt♦♥ ♠♦♥ A, C, G, T s s② ♠♦r rq♥t t♥ t ♦tr ♦♥ tt t♠♦ ♦ ssttt♦♥ ♥ ♥ ♣♥♥ ♦♥ ts ♣♦st♦♥ ♥ t ♥♠♥t ♦r♣♥♥ ♦♥ t t♠ ♥ ts st♦r② ♦r ♦r♠ ♣rs♥tt♦♥s ♦ ♠♦s ♦♦t♦♥ ♥ ♦♥ ♥ rts ♥ tr♦♥t② ♥ ♠♦s ♦ ♦t♦♥ tr♦ t♠ s ♥ t ♥ st♦♥ ♥ rts ♥ rr♥t ♠♦s ♦ ♦t♦♥ ♥ s ♦♥ r♥t r♥s ♦ ♣②♦♥②tr♦♥t② ♥ ♠♦s ♦ ♦t♦♥ t♥ sts s ♥ t ♥ rr♥t ♣②♦♥s r ss♦t t♦ r♥t ♣♦rt♦♥s ♦ s♥ s♠♥t

st♠t♦rs

♥ ♠♦ ♦ rtr ♦t♦♥ s ♥ ♥ t♦ ♥r ♦t s♣s tr♥ rtr ♦t♦♥ tt r♠♥s t♦ ♦♥ s ♥ st♠t♦r ♥ ♦s s♠t♦♥s ♥ ♦♠♣t st♥s t♦ t r t s s ♦♥ ♥ t ①♠♣ ♥st s ♥ ♣②♦♥ts s♠t♦♥s ♦ ♥♦ ♥t♦r t♠ ♠♦r ss st♠t♦rs r s rst st♠t♦r tt s ss ♥ t ♦♥t①t ♦ ♣rs♠♦♥② ♠♦s ♥ s ♥♠ ♠①♠♠ ♣rs♠♦♥② tsst♠t♦r ♣♦sts tt t st ♠♦ r ♥rst♥ ♠♦ s t ♣②♦♥ttr ♦♥② s t ♠♦ tt s♣♣♦ss t s♠st t♦t ♥♠r ♦ tr♥st♦♥s t♥ stts t s rst ♣♣ t♦ ♠♦r t ② rs t ♦r③ ♥ ♥ ♠♣r♦ ♦rt♠ s s ② t ②rs trt ♠② ♦r ♥② ♥ t rtrs ♥r st② ♥r♦♥ tr♥st♦♥s ♦r s♦♠ rtrs ♦ r② st s♦ tt tr tr st♦r② r r♦♠ t ♠♦st ♣rs♠♦♥♦s ♦♥ r♦r ♠①♠♠ ♣rs♠♦♥② s♦ s t t♦♥ ♥ ♠♦r ① st♠t♦rs ♣rrr ♥② tr sst♠t♦rs tt r② ♦♥ ①♣t② ♣r♦st ♠♦s ♦ tr♥st♦♥ t♥ sts ♥ s ♥ ♣rt t♦ ♠♦st② ♦♥ sq♥ t s st♠t♦rsr ♠♥♠♠ ♦t♦♥ ♠①♠♠ ♦♦ ♥ ②s♥ ♥trt♦♥

♥♠♠ ♦t♦♥ rst ①♣♦s ② ♦r③ t rs s rt ♥ ♣♦s♦♣② t♦ ♠①♠♠ ♣rs♠♦♥② s t ♥s t tr ♦t♦♥r②st♦r② t t ♦♥ tt s♣♣♦ss t s♠st ♥♠r ♦ tr♥st♦♥s ♥ ♣rt r② st rsts ♥ ♦♥ tt ♣r♦ r② ♦♦ rsts t♦t tr t ♣♣r s s ♦r ♥st♥ ♥♦♥

Page 46: Early Evolution and Phylogeny

t s ♦r tr ♥② ♦r ts t②♣ ♦ ♠t♦ ♦s ♥♦t♣r♠t t♦ rt② st♠t ♣r♠tr s ♦tr t♥ t tr t♦♣♦♦② ♦r♥st♥ tr♥st♦♥ ♣r♦ts ♥♥♦t ♥rr t ts st♠t♦r s s r s ♥♦ ♦ t ♣r♦ss ♦r ♥ st♠t ♦ t ♣r♦ss ♥ ♠♣r♦♣②♦♥t r♦♥strt♦♥

♥st ♦ q♥tt② ♦ ♦t♦♥ t ♦s ♦ ♠①♠♠ ♦♦ ♥①♠♠ ♦♦♥ ②s♥ ♥trt♦♥ r t stst♠t♦rs ♥ ♣②♦♥②

②s♥ ♥trt♦♥ s ♣r♦t② ♥ ♣rt t② r ♠ s♦r t♥ ♠t♦s s ♦♥ ♠♥♠♠ ♦t♦♥ t r ♠♦r ♣rs ♥ ♣r♠t t♦ s ttr♠♦s ♦ sq♥ ♦t♦♥ s ♦tr ♣r♠trs t♥ t tr t♦♣♦♦② ♥ st♠t ①♠♠ ♦♦ ts s ♥ st♠t♦r ♦ t tr ♦t♦♥r②st♦r② t ♦♥ tt ♣r♠ts t♦ ♠①♠s t ♣r♦t② ♦ t t t ♦♦ ♦ ♠♦ s ♣r♦♣♦rt♦♥ t♦ t ♣r♦t② ♦ t t ♥ ♠♦s s ♥ ss♥ r② s♠r t♦ t ♣♦♦r t♥q tt s ♥ t ①♠♣ ♥ ♥♥t ♥♠r ♦ s♠t♦♥s r ♦♥ t ♠①♠♠ ♦♦♠♦ ♦ t ♦♥ tt ♣r♦s t r t ♠♦st ♦t♥ ♠♦♥ ♠♦s ♦♥sr ♦rt♠ s t♦ ♦♠♣t t ♦♦ ♦ ♣②♦♥ttr ♦r srt rtrs ♣r♦♣♦s ♥ ② ♦s♣ s♥st♥ s♥st♥ ♦s ♥♦t rqr s♠t♦♥s ♦r t ss ♥ ♥②t ♦r♠ ♥s ♣♣ t♦ ♥② t②♣ ♦ t tt ♥ sr t ♠t ♥♠r ♦stts ♦r♠ ♦r ♦♠♣t♥ t ♦♦ ♦ ♣②♦♥t tr r ♥ ♥rts ♥

②s♥ ♥trt♦♥ s r♥t r♦♠ st♠t♦rs sss s♦ r s t♦s ♥♦t ♣r♦ ♣♦♥t st♠t ♦ t st ♠♦ t♦ t s ♣♦ss t♦①trt ♣♦♥t st♠ts r♦♠ t rst ♦ ♥ ♥②ss ② ②s♥ ♥trt♦♥t ts ♥st ♠♦r t♦s ♣♣r♦ ② ♥♦♥ tt ♥② st♠t♦♥ s ss♦t t rt♥ ♠♦♥t ♦ ♥rt♥t② t ♣r♦s ♣r♦t②strt♦♥ ♦r t ♠♦s ♦ ♥trst ♥ t♥ s♠♠ ♣ ♣r♦t② s ② ②s♥ ♥trt♦♥ s ♣♦str♦r ♣r♦t② ♥♦t ♦♦ ♦♦ s t ♣r♦t② ♦ t t ♥ t ♠♦ ♦r s ♣r♦♣♦rt♦♥t♦ t t ♣♦str♦r ♣r♦t② s t ♣r♦t② ♦ t ♠♦ ♥ t tr♥t② ♣t t ♠♦ t t st ♣♦str♦r ♣r♦t② s t ♠♦tt ♠♦st ♣r♦② ♥rt t t s ♦s ♦♥ t♦ ♥tr② ♦♠♣r♠♦s t ♦tr ♠♦ A s ♣♦str♦r ♣r♦t② ♦ 0.09 ♥ ♠♦B ♣♦str♦r ♣r♦t② ♦ 0.03 ts ♠♥s tt ♠♦ A s tr t♠s ♠♦r♣r♦ t♥ ♠♦ B ♥ t ♦♥trr② ♠♦ A s ♦♦ ♦ 0.09 ♥♠♦ B ♦♦ ♦ 0.03 ♦♥ ♥♥♦t s② tt ♠♦ A s tr t♠s ♠♦r♣r♦ t♥ ♠♦ B t ♦ s② tt t r tr t♠s ♠♦r ♣r♦♥r ♠♦ A t♥ ♥r ♠♦ B ♥ ts s ♦♥ ♥ s② tt ♠♦ As tr t♠s ♠♦r ② t♥ ♠♦ B st♥t♦♥ ♥ tr♠s t♦ sr ts ♦♦ t s♦♠ ♦r♠ t♦ ttr ♥rst♥ t r♥ t♥♦♦ ♥ ♣♦str♦r ♣r♦t②

Page 47: Early Evolution and Phylogeny

P ❯

♦♦ L(M |D) ♦ ♠♦ M s ♣r♦♣♦rt♦♥ t♦ t ♣r♦t② p ♦t t D ♥ t ♠♦ M

L(M |D) = k × p(D|M)

r ♦♥sr tt t ♣r♦♣♦rt♦♥t② ♦♥st♥t k s 1

L(M |D) = p(D|M)

♣♦str♦r ♣r♦t② PP (M) ♦ ♠♦ s t ♣r♦t② ♦ t ♠♦♥ t t

PP (M) = p(M |D)

♥ ♠♦r ♣②♦♥ts M ♦rrs♣♦♥s ♥♦t② t♦ t st ♦ tr♥st♦♥♣r♦ts t♥ stts ♥ t ♣②♦♥t tr D ♦rrs♣♦♥s t♦ tsq♥s ♥r st②

❲♥ ♦♥ ♦♥srs t♦ ♠♦s A ♥ B ♦♥ ♥ ♦♠♣t tr ♦♦sL(A) = p(D|A) ♥ L(B) = p(D|B) s t♦ ♦♦s r ♣r♦tst t② r ♥♦t r♦♠ t s♠ ♣r♦t② s♣ ♥ t ♦♥trr② t ♣♦str♦r♣r♦ts ♦ ♠♦s A ♥ B r PP (A) = p(A|D) ♥ PP (B) = p(B|D)r♦♠ t s♠ ♣r♦t② s♣ s ♦♥sq♥ ♦r ♥ tst t s♠♦ ♣♦str♦r ♣r♦ts ♦r ♠♦s s 1

M PP (M) =∑

M p(M |D) = 1t t s♠ ♦ ♦♦s ♦r ♠♦s s ♥♥

♦r ♣rs② ♣♦str♦r ♣r♦ts ♣r♠t ♥ ♣r♦t② str

t♦♥ ♦r ♠♦s rs ♦♦s ♦ ♥♦t ❯s♥ ♦♦ ♦♥ ♦s♦ t ♣r♦t② strt♦♥ t ♣r♦t② strt♦♥ ♦r ♣♦sst ♦r ♦♥ ♠♦ ♥♦t ♥trst♥ ♦r sttst♥ ♦ s② ♦♥② s ♦t♦♥ tst ♥ ♥ts t♦ ♥ t st ♠♦s

♣r♦t② strt♦♥ s ♥ ♠t♠t ♦t s♦ ♣♦rt♥qs ①st t♦ ♦r t t♠ t s ♥♦t② ♣♦ss t♦ s♠♣ r♦♠ t♠s♠rt② r t♦ ♦ st♥ t♠ s♠♣♥ ♠♦s tt r② ♣♦str♦r ♣r♦t② ♥ ♥♦t ♠ss ♠♦s tt r② ♣♦str♦r ♣r♦t② ♥ ♣♦str♦r ♣r♦t② strt♦♥s ♦t♥ ♥♥♦t ②①♣♦r s tr r t♦♦ ♠♥② ♣♦ss s s s ♥♦t② tr ♥♣②♦♥ts r t ♥♠r ♦ ♣♦ss trs s ♥st ♦♥ s♠♣s♠♦s s♥ t♥qs s s r♦ ♥ ♦♥t r♦ tr♦♣♦s t r♥ ♥♥t② ts t♥qs r♥t tt t st ♦ ♠♦ss♠♣ ♥ ♥s s♠♣ r♦♠ t strt♦♥ r♥ ♦r s♥t② ♦♥ t♠ ♦♥ ♥ ①♣t tt t ♦t♥ s♠♣ r② ♦♦ t♥qs ♦r ♦♥② ♦r ♣r♦t② strt♦♥s tr♦r t♦ ♦t♥♠♦ ♣r♦t② strt♦♥s t♥qs ♥ ♦♥② s t ♣♦str♦r ♣r♦t② strt♦♥s ♦♠ t♦rs s♦♥ ♦ t♦ s♠♣ r♦♠ t

Page 48: Early Evolution and Phylogeny

♦♦ ♥t♦♥ tr♦ t s r s ♥rst♦♦ t② s ♦♥ ♣♦str♦r ♣r♦t② strt♦♥s ♥ t♥ s ♠t♠t tr ♥♦♥s ♠♣♦rt♥ s♠♣♥ ♦r ♠♣♦rt♥ rt♥ t♦ tr♥s♦r♠ t ♣♦str♦r♣r♦t② s♠♣ ♥t♦ s♠♣ ♦ t ♦♦ ♥t♦♥ ②r ♥rt

♣♣t♦♥ ♦ ②s♥ ♥ ♠t♦s t♦ ♣②♦♥ts ts r♦♠t ♠♥♥t♥ ♥♥ts t t ♣♦♥r♥ rts ♦ ♥♥ t ❨♥ ❨♥ t ♥♥ t t♦♥ t ② ♦♥sr② ♥ ♥ ♣♦♣rt② s♥ t♥ ♥ ♣r♦② t ♠♦r s s♠♦s ♦ ♦t♦♥ ♦♠ ♠♦r ♦rt ♥ ts tss ♠② ♦r s s♠①♠♠ ♦♦ t♥qs t♦ ♠♦s ♥ ♦rt♠s tt s♥ ♦♣ ♥ s♦ ♠♣♠♥t ♥ ②s♥ r♠♦r ♥ t ♦♦ ♥ ♣♦str♦r ♣r♦t② r ♥t♠t② rt ② ②s ♦r♠

PP (M) = p(M |D) =p(D|M) × p(M)

p(D)=

L(M) × p(M)

p(D)

♥ ts st ♦r♠ ♦♥ ♥ s tt t ♣♦str♦r ♣r♦t② ♦ ♠♦ s♣r♦♣♦rt♦♥ t♦ t ♣r♦t ♦ t ♠♦ ♦♦ ♥ ♦ ♣r♦r ♣r♦t②p(M) ss♦t t t ♠♦ rtrr② ♥ ② t sr ♦ ②s♥♣r♦r♠ ♥♦tr tr♠ s ♦♥ ♥ p(D) t ♣r♦t② ♦ t t sst tr♠ s t t♦ ♦♠♣t ♥ s s② ♥♦t ♦♠♣t s♦ t ♣♦str♦r♣r♦t② ♦ ♠♦ ♥ ♦♥② ♥♦♥ ♣ t♦ ♠t♣t tr♠ 1

p(D) ♥

♣rt ♥ t♥qs r s ts ♠t♣t tr♠ s ♥♦t ♦ t qt♦♥s

♥ s♦♠ ss t ♠♦st ② ♠♦s s♦ ♠♦s ♦ st ♣♦str♦r♣r♦ts s s ♥♦t② tr ♥ t ♣r♦r ♣r♦ts p(M) ♦ ♥♦t r t♥ ♠♦s ∀M, p(M) = c, with c ∈ [0; 1] constant ♥ s ssPP (M) = p(D|M) × c = L(M) × c ♦r t sr ♦ ②s♥ ♣r♦r♠s s♦♠ ♥♦ ♦♥ ♠♦s r ♠♦r ♣r♦ t♥ ♦trs ♥ tr♥t ♣r♦r ♣r♦ts t♦ t ♠♦s s ♠② rst ♥ r♥s t♥t ♠♦st ② ♠♦s ♥ t ♠♦s ♦ st ♣♦str♦r ♣r♦ts

s r♥s r r ♣r♦♣♦♥♥ts ♦ ♠①♠♠ ♦♦ ♠t♦s rq♥tsts ♥ ♣r♦♣♦♥♥ts ♦ ②s♥ ♥trt♦♥ ②s♥s sr s♣r♦r ♣r♦ts ♥♥ t rst ♦ ♥ ♥②ss rq♥tsts t♥ tt ts t♦ ♥♥ t t t t♦ s② ♥ ②s♥s ♥sr tt tr♠t♦ s ♠♦r ♣♦r s t ♥♦r♣♦rts ①tr ♥♦ t♥ t s♦♥② ♦♥ ♥ t t r r ♦r ss r s♥tsts ♥♦ t ♣r♦r strt♦♥ s♦ s ♥ s ss ②s♥s ♦ ssts♥ ♥♦r♠ strt♦♥ ♦r rq♥tsts ♥sr tt s♥ ♥♦r♠

Page 49: Early Evolution and Phylogeny

P ❯

strt♦♥ s r r♦♠ ♥ ♥♦st ♣♣r♦ s t ♠♦♥ts t♦ s♣♣♦s♥ tt ②♣♦tss r q② ♣r♦ rs ♦ r tt ♠r ♠st ♦t t♦ ssss♥ t ♠♣t ♦ ♣r♦r ♣r♦t② strt♦♥s♦♥ t rst

♦ ♠ ②s♥ ♥trt♦♥ ♠t♦s r ttrt ♥♦t② s t②♦ t sttst♥ t♦ ♥ ♠♦s t ♠♦r ♣r♠trs t♥ ♠①♠♠♦♦ ♥ t q♥tt② ♦ t ♦♥ ♥ st② s ♠t tr♦r ♥♥rt♥t② s ♥ssr② ss♦t t♦ t st♠t♦♥ ♦ ♣r♠tr ♦ ♠♦ ❲♥ tr r ♦ts ♦ ♣r♠trs tr s s♦♠ ♥ tt ♣r♠trs ♥♦♥♥ ♠♦♥t ♦ ♥rt♥t② ②♥ ♦♥ ♣♦♥tst♠ts ♦r ts ♣r♦♣ts ts ♥rt♥t② t♦ ♦tr ♣r♠trs ♥st②s♥ ♥trt♦♥ ♠t♦s ♣r♦ ♣r♦t② strt♦♥s ♦r ♣r♠trs♦ t ♠♦ t strt♦♥ t tr s ♦t ♦ ♥rt♥t② ♦r r②♣♦♥t② tr s ♦t ♦ s♥ ♥ t t ♥ ♦r ♦ ♣rtr ♦ t♣r♠tr ♣r♠tr s ♥rt♥ ♠①♠♠ ♦♦ ♥②ss ② ♥rt♥ ♥st ②s♥ ♥trt♦♥ ♥②ss ♥trt ♦tts ♥rt♥t② t♦ st♠t ♦tr ♣r♠trs ♥ tr♦r ♠♦r r♦st

rt rr ♥♦t tt s♣ts ♦ ②s♥ ♥trt♦♥ tt ♣rs r rt t♦ ♥trt♦♥ ♦ ♥rt♥t② rtr t♥ t♦ t s ♦ ♣r♦r♣r♦ts t♦ ts ♣r♦r ♣r♦ts ♥ r② s ♥ s♦♠ ♠♦s ♥♥ t ❨♥ ♥é t s ♥♦t rr tr♦♠♣♦rt♥ s♠♣♥ t s ♣♦ss t♦ s♠♣ r♦♠ t ♦♦ ♥t♦♥ ♥st ♦ t ♣♦str♦r ♣r♦t② strt♦♥ ♣♣r♦s ♥ r♦st♥sst♦ ♦♦s ♣♣r♦s

q♥ t ♠♦s ♦ ♦t♦♥ ♥ st♠t♦rs ♥ s ♥ ♦♥♥t♦♥ t♦ ♥sr qst♦♥s tt ♦♦② ♥ ♦♠♣rt ♥t♦♠② ♦ ♥♦trss

s♦rt st♦r② ♦ ♦♥ rt s t♦ ②

♥♦♠s

♥ ts st♦♥ ♣rs♥t s♦♠ ♥sts ♥t♦ t r② ♦t♦♥ ♦ tt r ♦r ♣②♦♥②♥ ♦♦② r ♦♠♣♠♥tr②♥ tr♦ t ♥②ss ♦ ♥ ♥ ♥♦♠ sq♥s tr♦ ♠♦s ♦ ♦

t♦♥ ♥②ss r ♦♠♣♠♥tr② t♦ ♦♦ sts ♦♦② ♣r♦s♣♥t ♥♦r♠t♦♥ ♦t ♣rtr ♥r♦♥♠♥t t ♣rtr t♠ st② ♦ ♥♦♠s ♥ st♦r r♠♦r s ♥♦tr ♥♦ ♥t♦ t ♣st♦r ♥♦♠ ♥②s ♥ ♥♦r♠t♦♥ ♦t ♥ ♥st♦r s ♥ s t♥♠r ♦ ♥♦♠s st ♥rss t ♥♠r ♦ ♥st♦rs ♦r ♥♦r

Page 50: Early Evolution and Phylogeny

❨ ❨

♠t♦♥ s ♥rss ♥ ♥ ♣ t ♣s ♦ t ♦♦ r♦r♦r♦r t ♦♦ r♦r ♥ t st ♣r♦ ♠♦r♣♦♦ t ♥ ♥♦tr② t t t♠ss tt ♦♥sr ♥ ♠② ♦r s s ♥st♦♥ r ss s② t♦ ♥②s t♥ sq♥ t r sq♥ t♣r♦ ♥ ♥♠t ② t♦ st② ♥♥t ♦t♦♥

tr ♥♦♠s

r ♥♦♠ s rst sq♥ ♥ s♠♥♥ t ♦rtt t ♠♦r ♣②♦♥② s ♦t♥ s ♦♥ s♥ ♥ sq♥s ♥♣rtr② ♠♣♦rt♥t ♣②♦♥② s ♦♥ sq♥s s ♥ ♦t♥ t♦t ♥♦♥ t ♣rs sq♥ ♦ t st ♥ s♠ s♥t r♦s♦♠ r ❲♦s t ♦① ♥st t ♥ ♣r♦t ①trt ♦r t♦r♥s♠ ♥r st② s s♠tt t♦ ♥ ♥③②♠t trt♠♥t st♦♥tt ts t ♠♦ ♥ tt ♣s ② t ♠♦ s t ♣♥s ♦♥ts sq♥ tr♦r ♦♥ ♥ ♦♠♣t st♥s t♥ t st♦♥ ♣ttr♥s♦t♥ ♦r t rs ♦ r♥t s♣s ♥ ①tr♣♦t tt ts st♥t♥ rs s ♦♦ st♠t ♦ st♥s t♥ s♣s ❲♥ ❲♦st ♦① ♥rt♦♦ tr ♥②ss t s ② ♣t tt t ♣r♠r②s♦♥ s t♥ tr ♦♥ ♦♥ ♥ ♥ r② ♦♥ t ♦tr ♥♦r st♥s t♥ r sq♥s sst tt ♠t♥♦♥ tr r r② r♥t r♦♠ t ♦tr ♦♥s ♦t s r♥t r♦♠ t♠ sr② ❲♦s t ♦① ♦♥ tt ♠t♥♦♥ tr r ♥♦ttr ♥ rt r♦♣ ♦r t♠ t r r r ts trr♥t r♦♣s ♦ s♣s tr r ♥ r② str ♥♦♠s t♥ ♥ ♦♥r♠ ② t ♥②ss ♦ ♥ sq♥ss ♥t t♥qs r♥ sq♥s ♥ tr t♦tt② ♥ ♦♣♥r t ♦r r♣rs♥tts ♦ t r s♦ ♥ s♦r rtr ♦♥♥ s♣ts tt tr r tr ♥♦♠s ♦rtrr

Page 51: Early Evolution and Phylogeny

P ❯

r ♠ ♦ r② s♠♣ tr ♦ t t tr ♥♦♠s rtr ♥ r②

r♦♦t ♦ t tr ♦

tr♦t♦♠② t♥ r tr ♥ r② ♦s ♥♦t ♣r♦ ② t♦♣♥♣♦♥t t r♦♦t ♦ t tr ♦ t ♦r♥s♠ r♦♠ ①t♥t ♦r♥s♠sr s♥ ❯ ♦rt♥ t ♥ t ♦♥ ♥♥t ② t♦ r♦♦t t tr ♦ ♦ ♥rst♥ t ♦♥ rst ♥s t♦ r♠♠rtt ❯ s ♥♦t t rst ♥ ♦r♥s♠ ♦♥ rt ❯ s t st ❯♥rs♦♠♠♦♥ ♥st♦r ♥② ♦r♥s♠s ♠② ♦r t s♦♠ ♦ ♠② t ♥♦ s♥♥t ♠♦♥ ①t♥t ♦r♥s♠s ♦trs tt ♦ ♥ ♥st♦rs ♦ ❯ ♥ t ♥st♦rs ♦ ❯ ♠tt♦♥s ♦r ♥♦t②s♦♠ ♥s ♣t ❲t ts ♥ ♠♥ t② sr ♦r ♥ ♠s ttr ♦r t♥ ❯ ♥ tt ♣t ♦r ts ♣♣r♥ ♣t ♦ t♥ s s ♥ ♦tr♦♣ ♦r t ♦tr ♦♥ ♥ t tr ♦ r♦♦t

Page 52: Early Evolution and Phylogeny

❨ ❨

r s ♦ ♥♥t② ♣t ♥s ♦r r♦♦t♥ t tr ♦ ♣t♦♥ ♥t ♣rt♥ ❯ r r s s♦♥ t r ♦t ♥♥t♣ts sst tt t rst s♣t♦♥ s t♥ tr ♥ r♦♣ ♦♥sst♥♦ r ♥ r②

♥②ss ♦ ♥♥t② ♣t ♥s ♠♦st ♦t♥ ♣ t r♦♦t t♥t s ♥sr rt r♦♦t ♦ t tr ♦ s tr ♥ rr② ❩①②② t ♦r ♣②♦♥s

♦t♥ s♥ ts ♥♥t ♣ts ♥ qst♦♥ s t♦♦ ♠♥② ssttt♦♥s ♠② t tr sq♥s ♦rtrr t P♣♣ t♦rtrr ♥ ts tss ♥ ♥rr♥ ♥ts ♦s t♦ t r♦♦t tr♦r♦♥sr tr ♣♦ss r♦♦ts t♥ ♦ t tr ♥♦♠s rts

♦tr ②s ♥ ♣r♦♣♦s t♦ r♦♦t t tr ♦ ♥ ♣♦sst②s t♦ t s♦♠ ♥♦s ♦ t tr ♥ ♠ t ②♣♦tss tt t ssttt♦♥♣r♦ss s ♦s ❩r♥ t P♥ s♥ t ♠r ♥r ts ②♣♦tss t r♦♦t ♦ t tr s♦ t ♣♦♥tqst♥t r♦♠ ①t♥t ♦r♥s♠s ♦r t ssttt♦♥ ♣r♦ss s rr②♣rt② ♦s s♣② ♥ r ♦t♦♥r② st♥s r ♦♥sr♥ ttr ♠♦s ♦ ♦t♦♥ tt r① t ♦ ②♣♦tss ♥ s t♦r♦♦t tr s♣ r♠♠♦♥ t ♥♥ t ❨♥ t♦ ts ♠t♦ s ♥trst♥ ♥ srs t♦ ♦♣ s ♥♦t♦♠♠♥t rtr ♦♥ t

♥ ♣r♥♣ t ♣ttr♥ ♦ ssttt♦♥ ♠t s♦ s t♦ r♦♦t tr♥r s♦♠ ♠♦s ♦ sq♥ ♦t♦♥ t rt♦♥ ♦ ♥ s s♦♠ ♠♣♦rt♥ ❨♥ t ♦rts tr t ♦② s♥ t ❨♣ t ♣ ♦ss t ♦② ♦r t s ♦ s ♠♦ss ♥ r② rr s t s♠ ss ♣rt t♦ sr ♦r ♣②♦♥② s♥t♠ ♦r ♠♦r ♥♦r♠t♦♥ s rt

Page 53: Early Evolution and Phylogeny

P ❯

♠♦s ♦ ♦t♦♥ ss♠ tt t ssttt♦♥ ♣r♦ss ♦♦s ♦♥t♥♦st♠ r♦ ♥ tt t ♥①t tr♥st♦♥ ♦♥② ♣♥s ♦♥ t ♣rs♥t stt♥ ♥♦t ♦♥ t ♦r♠r stt s ♥ ♥ r♣rs♥t t t ♦♦♥♠tr① Q s♦♥ ♥st♥t♥♦s rts ♦ tr♥st♦♥ r s♦ ssttt♦♥qij t♥ stts i ♥ j ♥ t ♦rr A C G T

Q =

− qCA qGA qTA

qAC − qGC qTC

qAG qCG − qTG

qAT qCT qGT −

s ♠tr① rs s ♦♦s t ♥st♥t♥♦s ssttt♦♥ rt r♦♠ A t♦C s qAC r♦♠ A t♦ G qAG t − s s♣ ② t rqr♠♥t tt t♦♠♥s s♠ t♦ 0 r♦♠ ts ♥st♥t♥♦s rts ♦ ssttt♦♥ ♦♥ ♥ r♣r♦ts ♦ ♦r♥ ♦r ♣♦ss ssttt♦♥s r♥ t♠ t ② t♥t ①♣♦♥♥t ♦ t ♠tr① Q

P (t) = pij(t) = eQt

t ♠tr① s rr stts ♥ ♦t♥ tr ♥ t♠tr t strt♥ stt t♥ t ♥ s stt♦♥r② strt♦♥ ♠♥s tt t ♥ s r♥ ♦r ♥ ♥♥t② ♦♥ t♠ ♦♥ ♦♥ sq♥ strt♥ r♦♠ ♥② ♥t ♦♠♣♦st♦♥ F0 t ♥ sq♥ ♦♠♣♦st♦♥ ♦rrs♣♦♥t♦ t stt♦♥r② strt♦♥ Π

limt→∞

P (t) × F0 = Π =

πA

πC

πG

πT

♥ t♦♥ ♠♦ ♦ ssttt♦♥ s s t♦ rrs t stss t♦♦♥ qt♦♥

qij × πi = qji × πj

s ♠♥s tt sq♥s r ♦ t qr♠ s♥ rrs♠♦ ♥♦ ♠ttr ♦ t r ♥②s tr s ♥♦ ② t♦ t ♥ t rt♦♥♦t♦♥ s ♦r tr s ♥♦ ② t♦ ♥ t r♦♦t ♦ tr t ssttt♦♥♠♦ sq♥s ♥ ♦ ♥r rrs ♠♦ ♦♥rs② sq♥s ♥ ♦ t ♥♦♥rrs ♠♦ ♦ ♦t♦♥ t rst♥t r ♥②s t rrs ♠♦ ♦ ♦t♦♥ ♠♥ t ②♣♦tsstt t ♣r♦ss ♦ ♦t♦♥ s t qr♠ t s♥ ♦r rrrst② s

Page 54: Early Evolution and Phylogeny

❨ ❨

♥♦r ♥ t r♦♦t ♥♥♦t r♦r

♥ ♣rt ♠♦st ♠♦s ♦ ♦t♦♥ tt r s t♦ r♦♥strt t st♦r②♦ sq♥s r rrs ♥♦t s t s tt t r ♦♦ ♣r♦ss ♦ ♦t♦♥ s ♥ rrs t s t ♠s ♦♠♣tt♦♥s sr♦♥sq♥t② t s s② ②♣♦ts③ tt t ssttt♦♥ ♣r♦ss s t qr♠ tt sq♥ ♦♠♣♦st♦♥ s t s♠ ♦r t tr ♥ ♥②s♥ ♦r ♥♦♥rrst② s ♥♦r

r r tr♦r t♦ rs t♦ s♥ rrs ♠♦s ♦ ♦t♦♥rst t r♦♦t ♥♥♦t ♥t s ♦♥ sq♥s ♦♥② ♥ tr s s♥ ♥ t t ♦r t ♥ s♦♥ t s ②♣♦ts③ tt sq♥ ♦♠♣♦st♦♥s ♥ ♦♥st♥t tr♦♦t ♦t♦♥ s ♥♦♥ t♦ r♦♥ ♦r ♠♦rsss♦♥ ♦♥ t s rts

♥st ♦♥ ♥ s ♥♦♥rrs ♠♦s ♦ ♦t♦♥ tt ♦ ♥♦t r②qt♦♥ ♥ ♣r♥♣ t st ts ♠♦s ♣r♠t ♦♥ t♦ ♥ t r♦♦t ♦ tr ♥ ♠② ♦r s ♠♦s tt ♣r♠t ♥rr♥ t r♦♦t ♦ tr t ssttt♦♥ ♠trs tt s r rrs

♥ ♣r♦ss ♦ ♦t♦♥ ♥ ♦♠ ♥♦♥rrs t s ♠ ♦ sr rrs ♣r♦sss ♦♠♥ ♦t② s ♣rtr ♠♦s ♦♦t♦♥ ♥♦♥♦♠♦♥♦s ♦r r♥tr♦♥♦s ♠♦s ♥ ts tss r r♥t rrs ssttt♦♥ ♠trs r ss♦t t♦ r♥tr♥s ♦ t tr t♦ ts ♠♦ s ♦♠♣♦s ♦ rrs ♠trs ts s ♦ ♥♦♥rrs ♠② sr t♦ ♣♥♣♦♥t t r♦♦t ♦ ♣②♦♥t tr s ts ♣r♦♣rt② t♦ tst sr ♣♦t♥t r♦♦ts rt t♦♥ tt ♦♥ t t ♥②s t ♠t♦ s ♥♦t ♣♦r ♥♦ t♦ ♥t② r♦♦t t♦t ♦t s ♦ ♣♦r ♦s r♥t ♦rs s♥t ❨♣ t ♣ tt s ♥♦♥rrs ♠trs ♥ r s♠r ♦♥s♦♥s t ♣♣rs t♦ s♣♣♦♥t♥ t rs♣t t♦ r②①♣tt♦♥s ❨♥ t ♦rts

s t②♣s ♦ ♠♦s r r♥t ♠trs r ss♦t t♦ r♥t♣rts ♦ t tr s♦ t ♥t tt t② ♦ ♥♦t ②♣♦ts③ tt sq♥ ♦♠♣♦st♦♥ s ♥ ♦♥st♥t tr♦♦t ♦t♦♥ ♥♥t② tsrtrst s t ♠♥ ♠♦tt♦♥ ♥ tr ♦♣♠♥t tr♦rs s ♠♦s t♦ r♦♥strt sq♥ ♦t♦♥ r♦♠ ❯ t♦ ①t♥t ♦r♥s♠s rts

t♦ ts ♦rts s ♦♥ ♠♦s ♦ sq♥ ♦t♦♥ ♥♦t ♥ t♦ ♣r♦ str♦♥ ♥ ♥ ♦r ♦r ♥st ♣rtr r♦♦ts ♦ t tr ♦ ♠ ♥♦t ♣ss♠st ♦♥ t ♣♦sst② t♦ ♥ ♥ ②s t♦ ♣ ❯ ♦r

Page 55: Early Evolution and Phylogeny

P ❯

♥st♥ ♦♥ ♦ ♦♠♥ ♥♦♥♦♠♦♥♦s ♠♦s ♦ ♦t♦♥ t r①♦ ♠♦s t♦ ♥t r♦♠ ♦t s♥s t ♦ s♦ ♥♦r♣♦rt ♦tr t②♣s♦ ♥♦r♠t♦♥ tt ♣r♠t t♦ t ♥ tr♥srs s rt ♥ ♦ ♥②s ♠ ♠♦r t t♥ s ♥ ♦♥ ♣ t♦ ♥♦

Pr♠r② ♥♦s②♠♦ss

♦st r② r♦r ♦r♥s st♦♥ ♥♦t② ♠t♦♦♥r ♥ ♦r♦♣sts s ♦r♥s sr ♠♦r♣♦♦ ♥ ♦♠ s♠rtst ♣rtr tr rs t♦ ♣r♦♣♦s tt ts ♦r♥s r ♦ tr ♦r♥ ♥ ♦ ts r♠♥ts s tt ♦r♥s tr ♦♥ ♥♦♠ ts ♥♦♠ ♠t r ♦ ♥str tr♥♦♠s st♦♥ t♥q ♦ r ❲♦s ♦♥r♠ tt ts ♦r♥s r r♦♠ tr ♥st♦rs

rst ❩♥ t ♦♥♥ t ♦♦tt s♦ tt ♦r♦♣str r ♠♦r s♠r t♦ ②♥♦tr ♦♥s t♥ t♦ t♦s r♦♠ t r②♦t♥s s s tr ♦♥r♠ ② ♠♥② ♦tr sts ♥ r♥ t♦ ♣r♦♣♦stt ♦r♦♣sts ♠ r♦♠ tr♦②st♦r♠♥ ②♥♦tr s t ②♥♦tr tt ♣r♦ ♣rtr t ② ♥ ♥tr♦♥ s ① ♥t♦ ♠♥♦s s ssts tt r② ♣♦ssss♥ ♦r♦♣sts r ②♦♥r t♥ tr♦②st♦r♠♥ ②♥♦tr s ts②♥♦tr ♠② t♥ ♥ ♦♥ ②rs ♦ ♦♠t♥ t ♦r♦♣str♥ r② ♥♦t ♣♣r ♦r ts ts ♦rí③③♣t t s♦ tt t s ♣r♦② ♥ ♥ ♥st♦r ♦ ❱r♣♥t♣♥ts ♥ ♥r ♦♦♣②t r ♥ ♦♣②t tt t ♥♦s②♠♦ss ♦ ♦r♦♣st t♦♦ ♣

♦♥ ♦♥♥ t s♦ tt t ♣♦r♥rt♥ ♠t♦♦♥rs r r ♠♦r s♠r t♦ tr ♦♥s t♥ t♦ t♦s r♦♠ t r②♦t ♥s ♥②ss ♦ ♦tr ♥s s ♦♥ ♥ ♦♥r♠ tsrsts ♠♥② t♠s ♥ ♣r♠tt t♦ s♦ tt ♠t♦♦♥r ♠r r♦♠♣Pr♦t♦tr ssr t s ♦r♥ s♠s t♦ ♠ s♥ss ♣Pr♦t♦tr ♦♥t♥ ♠♥② ♦r♥s♠s ♥t♠t② ss♦t t♦ r② s ♣rsts ♦ ♥♠s ♦r ♣♥ts ♦r ♥st♥ ♥ ②♣♦tss ♥♦t②♣r♦♣♦ss tt ♣Pr♦t♦tr ♥ ♠t♥♦♥ r rst ss♦tt ♣Pr♦t♦tr ♣r♦tt♥ t ♠t♥♦♥ r♦♠ ①ss ♦①②♥ t♥st♦ ts r♥ st ♣r♦t H2 rt♥ t ür r♦ t♠ t♦♥ r♥ tr s ♥ ② t ♠t♥♦♥ ♥ s ♦♥s ♥t ②t ♦ ♠r tr♥ ♥t♦ t ♥♦ ♥♦ s r② s ②♣♦tss s s ♦♥ t ♦srt♦♥ tt ♠♥② ss♦t♦♥s r ♥♦ ♦♥ t♥♠t♥♦♥s ♥ ♣Pr♦t♦tr ♦r ♥ ♣②♦♥s ♦ t tr ♦

Page 56: Early Evolution and Phylogeny

❨ ❨

♥♦ ♠t♥♦♥ r s ♦♥ t t r♦♦t ♦ r② s ♠t♦♦♥r♦r ♠t♦♦♥r r♠♥s ♥ tt ♥ r② r② t r② r ♠♦r r♥t t♥ ♣Pr♦t♦tr ♦♦ sts st♦♥ sst tt r② ♠② t st ♦♥ ②rs ♦ ts s♦ s♦ ♣Pr♦t♦tr

tr ♥♦s②♠♦ss ♣♣♥ ♥ t r②♦t ♥♦♠ ♥ ♥♠r② r s♠ss ts s♦♥r② ♥♦s②♠♦ss ♣rtr ♣r♦♣♥st② ♦ r② t♦ ♥ ♦tr

s ♠② rt t♦ tr r ♠♠r♥ tt ♦♥t♥s ♦str♦♥ tr ②t♦st♦♥ t s ♥trst♥ t♦ ♥♦t tt rt ♣rt ♦ t ♣♦♦r♠t♦ rst② s♦♥ ② r② s ♦rr♦ r♦♠ tr r♦♠ ♠t♦ ♣♦♥t ♦ r② r ♥♥♦②♥ ♦♦rs

♦ t tr ♦

r sts ♦s ♦♥ ♣rtr ♣rts ♦ t tr ♦ ♥ ♦rr t♦♠♣r♦ tr ♣②♦♥② ♥t ♦rs s② r② ♦♥ t ♦♥srt♦♥ ♦ r ♥♠r ♦ ♥s t t s♠ t♠ ♥ t ♦♣ tt t r ♣②♦♥ts♥ ♠② ♦♦ st♠t♦r ♦r s♣s ♣②♦♥② ♠r ♣♣r♦s ♥ s ♥ t tr ♥♦♠s ♦

r ♣②♦♥②

é♥ r♦r ♠♦♥tt r♦ ♥ Ptr ♦rtrr ♠ rt ♦rts t♦ ♠♣r♦ t ♣②♦♥② ♦ r tt③ t ♦rtrrt r♦r t r♦ t r♦rr♠♥t r♦rr♠♥t t ♥s t ♦ ts ♥ t② ♥tr♦♠ r♥t② ♣s ♦ ♥♦♠ sq♥s t♦ ♥②s r ♥♠r ♦♥s tt r ♦♥sr ♠♦♥ r ♥ tr♦r t♦t t♦ ♦♦♠rrs ♦ t s♣s ♣②♦♥② ② r tt s♣♣♦rt ♣②♦♥②s ♥♦ ♠r♥ ♦r t s st ♥r r s♣s s s ♥r♠s②♠♦s♠ ♥ ♥ts ♦rr♠ r②♣t♦♠ s♦ ♣ rt ♥ ♥s t ♥ t rst② ♦ r st ♥s t♦ ttrs♠♣

ss② r r ♥ t♦ ♣② r②r♦t ♥ r♥r ♥st♦r ♦ r ♠② ②♣rtr♠♦♣ ♦t r♥r♦t r t st r♣rs♥t r♦♣ ♥

♦♥t♥ r♠♦♣r♦ts ♦♦s ♥ r♠♦♦s r②r♦t ♦♥t♥ ♦tr s♣s t♦ t ①♣t♦♥ ♦ t r♥t② ♣r♦♣♦s ♠r♦trt ♦s ♣♦st♦♥ s ♥r ♥ s r r t t s ♦ r t ♠② s♦ t t s ♦ r♥r♦t s ♦ts ♦ ②♣rtr♠♦♣t ♦♣t♠ r♦t t♠♣rtr ♦ 80C s♣s r ♦♥ ♥ r ♥s t② r sttr ♥ ♦t r②r♦t ♥ r♥r♦t t s ♥ s

Page 57: Early Evolution and Phylogeny

P ❯

s♠ tt t ♥st♦r ♦ r s ②♣rtr♠♦♣ ♦r♥s♠ r♦t r♦rr♠♥t

t♥♦♥ss s ♠t♦s♠ ♥♠ t♦ r s ♣rts ②t♥♦♣②rs t♥♦trs t♥♦♦s t♥♦♠r♦s ♥t♥♦sr♥s ♥ ♥ ♣rs♠♦♥♦s② ♥r tt ♠t♥♦♥ss ♣♣r♥ t st ♦♠♠♦♥ ♥st♦r ♦ ts r♦♣s ts ts ♥♦ t ♠♦rt♥ ♦♥ ②rs ♦ s ♦♥str♥t s ♣♣ t♦ t tr♦ r ♥ ♦♦s ♦r ❯ ♥ ♦r t♥ ♦♥ ②rs t♣r♠r② rt♦♥s ♥ r s♠ r② ♦s t♦ ♦tr

tr ♣②♦♥②

r♦♣s ♦ s♥tsts ♥♦r t♦ ♥②s sr ♥s ♦♠♥t♦ ♣r♦♣♦s ♣②♦♥② ♦ tr ttst③③ t ♦ t r♥ t ♦r r t ♦ t ♠ ♣tst t s♦ tt♠♣t t♦ ♣r♦ tr ♣②♦♥② ♥ rt r ts ♣②♦♥s r♦r s♠r r♦♣♥s ♦r t s ♥♦t r ttts ♣r♦①♠ts ♥ t tr r tr ♦t♦♥r② rt♦♥s♣s sr ♦♥♦♥♥ t♦rs ♠② ♣r♦ rtt r♦♣♥s

♥ ♦♥♦♥♥ t♦r s ♥♦t② ♦♥ ♥ tr ♥ tr♥srs tr♦ r s tr ♦tr ♣s ♦ ♥♦♠s ♥ ①♥ t♥ s♣s ♦♠ tt

♥ tr♥srs r s♦ ♣r♥t tt rt st♦r② ♦ t ♥♦♠s tt ♦r♦r s♣t♦♥s ♥ ♥♦t tr♥srs ♥♥♦t r♦r ♦♦tt r sts ♦r sst tt tr ♦ t rt st♦r② ♦ tr ♥ r♦♥strt ♥ t ♦ t ♦ t ♠ tr ♦rrrs♦ t strs♥ ♥ ♥ ♠♦tt♦♥ ♦r rt

ttr ♠t♦s ♦r ♣②♦♥t r♦♥strt♦♥ ♠② r② t st♦r② ♦ ❲t s t♦rt t♠♣rtr ♦ t tr♥st♦r

♥ tr tt r② st♥s t♥ s♣s tr ♥ ♥ trs ♦r♥♦ t s♠s tt qs ♥ r♠♦t♦s r ♣rtr② s tr r♦r t P♣♣ sr t t② r ♦♥ s♥ ♥ sts t♦ r♦♣s ♦♥t♥ ②♣rtr♠♦♣ tr t s ♥sr t s t♦rt t♠♣rtr ♦ t ♥st♦r ♦ tr rt trts s qst♦♥s ♥ ts

♦♦ t ♣r♠t t♦ ♦♥str♥ ts ♦♥ ♥♦s ♦ t tr ♦ tr♦r ♣Pr♦t♦tr r ♥ssr② ♦r t♥ r② s r②♣♦ssss ♠t♦♦♥r ♦r ♠t♦♦♥r r♠♥s s s ♦♥str♥t ♠♣ tt ② ♦♥ ②rs ♦ ♠♦st tr ♣② r② rss ♠② ♦♥sst♥t t ♦ss t ♦ ♥rr tr♦ ♣rs

Page 58: Early Evolution and Phylogeny

♠♦♥② tt t ♥st♦r ♦ ♣Pr♦t♦tr ♠② ♥ ♥ ♦①②♥♥r♦♥♠♥t ♦r ♠ ♦r r♠♥s t♦ ♦♥ t♦ r② t trtr ♥ ♥ t t t ♦♦ r♦r

r②♦t ♣②♦♥②

r②♦t ♣②♦♥② s ♥ t st ♦ ♥t♥s rsr s sr②♦♥trs♠

♥t♥s t t ♥r② t①♦♥♦♠ ♣ts r♦♠ t ♣②♦♥② ♦ ♠♠♠ t rr t s♥ t r♣② t ♥③ t ❲♠♥ t t♦ t ♣②♦♥② ♦ t ♦♥♦♠ ♦② t ♦rr t P♣♣ t ♦③r②t ♦rí③③♣t t r t ♣ss♥ ② t♣②♦♥② ♦ ♥♠s ♦tt t s t ♦rt t rét③ t ♥♥ t ♦r t ♣②♦♥② ♦ ♣♥ts t ♦♥♥ t ♦♥♥ t s s t ♥s♥ t r♦ t s ♥② rt♦♥s♣s r st♥rs♦ ♦r ♥ t s ♦r ♥st♥ r r♦♠ r r t r♦♦t ♦ r② s♦ ♣ s r② r t ♥♦♠ t t rtst ♦ssr♦r ♣♦♣ ♥♦r t♦ ♥②s sq♥s ♦♥t♥ ♦r ts ♦sss♦sss ♣r♠t t♦ ♥♦r ♥ t♠ s♦♠ ♥♦s ♦ ♣②♦♥② ♥ ts ♥ st♦ t ♥♦r♠t♦♥ ♦t rts ♦ ♦t♦♥ t s♦ ♦t t♠s ❯s♥ s♦ ♦ r

r② t ♦③r② t st♠t tt r② r r♦♥ ♦♥ ②rs ♦ ♠ ♠♦r r♥t② t♥ t st♠t r r♦♠ ♦♦② ♦♥ ②rs ♦ s ss t♥ rr♦rs ts sr♣♥② ♠②♦♠ r♦♠ ♥ ♠♣r♦♣r qt♦♥ ♦ ♦str♦ t r② ♦r ♠② s♦ ssttt r② s ♥♦ s t♠ r ♦♥② t t♦♣ ♦ ♥ r t ♦tt♦♠ ♦ ♦ ♥♦ ①t♥t

t♦ ♠♥② rt♦♥s♣s r st ♥r t tr ♦ ♥ ♥♦ ♣t ♥ r♦ str♦s ♦♥srt♦♥ ♦ ♦♦ t ♣r♠ts t♦ trt♥ ♣♦♥ts ♥ t tr

Page 59: Early Evolution and Phylogeny

ThaumarchaeotaThermoproteales

SulfolobalesDesulfurococcales

NanoarchaeaThermococcalesMethanopyrales

MethanobacterialesMethanococcales

ThermoplasmatalesArchaeoglobales

HalobacterialesMethanomicrobialesMethanosarcinales

AmoebozoaMetazoa

FungiMalawimonadozoa

RhodophytaGlaucophytaViridiplantae

CercozoaStramenopila

AlveolataJakobozoa

EuglenozoaHeteroloboseaalpha-Proteobacteria

beta-Proteobacteriagamma-Proteobacteria

delta-Proteobacteriaepsilon-Proteobacteria

SpirochaetesBacteroidetes-ChlorobiPlanctomycetes

ChlamydialesCyanobacteria

ChloroflexiFirmicutes

ActinobacteriaThermus-DeinococcusAquificales

Thermotogales

Billion years from now

-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

Traces of life

Traces of sulphate-reduction

Traces of methanogeny

Traces of Cyanobacteria

Traces of Eukaryotes

Traces of Chromatiaceaeand Chlorobiaceae

Fossils of animals

Bacteria

Eukarya

Archaea

LUCA

r st ♦ t tr ♦ s r ② ♥②ss ♦ ♥♦♠s ♣②♦♥② ♦ tr s s ♥ rt t ♣②♦♥② ♦ r s ♥ ♦♠♣r♦♠ r♦ t r♦rr♠♥t r♦rr♠♥t t ♥s t

♥ t ♣②♦♥② ♦ r② r♦♠ ♦rí③③♣t t ♦♠♥♦s ♦ t tr ♥ ♦♥str♥ t♦ r t tt♦♥s s ♦t♥ ♥ st♦♥ ts ss♦t t ♥♦♥♦♥str♥ ♥♦s s♦ ♥♦r Pr♠r② ♥♦s②♠♦ss s♦ ♥ r♣rs♥t t rr♦s ♥t♥ t rt♦♥ ♦ tr♥sr♣r♣ ♦r t ♦r♥ ♦ ♠t♦♦♥r ♥ r♥ ♦r t ♦r♥ ♦ ♦r♦♣sts P②r♦r♥ ②♣rtr♠♦♣ ♦r♥s♠s ♥ ♥r♥ r♥r♦t r ♦♥ r♥ r♦♥ ♥ r②r♦t ♦♥ r♦♥

Page 60: Early Evolution and Phylogeny

❯P

t♦ s♦♠ ♥♦s ♥ t st ♦ ♥♦t ♣r♦♣r t♠♣♦rr♠♦r t♦ ♥rst♥ t ♦t♦♥ ♦ ♦t ♦ ♥trst♥ ♦r r♠♥st♦ ♦♥ t♦ ♠♣r♦ t tr ② ♥t②♥ ♥♥t rt♦♥s♣s st♠t♥t ♣r♦♣rts ♦ ①t♥t ♦r♥s♠s ♥rr♥ t ♥ts tt rs t♦ ♦rst② ♥ t♥ ♥♥t ♥♦s ♥ ts ♣r♣♦ss t r♠t ♥rss ♥t ♠♦♥ts ♦ t ♦r ♥ ①♥t strt♥ ♣♦♥t ❲t ts t♥ ♥ ttr ♠♦s ♦ sq♥ ♦t♦♥ s s ttr ♠♦s ♦ s♣str r♦♥strt♦♥ ♥ t♦ ♥ ♥ ♦♣ ♥② ♣♦♣ r ♦r♥ ♦♥s ♣r♦ts rt ♥♦ ♥ t s ♦♣♥ t ♥ ①t♥ ♣ ② ♦♥tss s r♦ r♦♥ s ♠♦s ♦ ♥♦♠ ♥ sq♥ ♦t♦♥

r♥st♦♥ ♦ t ♠♥sr♣t

② tss ♦r s ♥ ♥ ♥ t ♦rts t♦ ♠♣r♦ ♦r ♥♦ ♦ t r②Pttr♥ ♣r♦ss♥ r② ♦t♦♥ ♦t♦♥ ♦ tr t♦ ♠♣r♦ t♥qs ♦ ♣②♦♥t r♦♥str

t♦♥s ♥ s s♦♠ ♦ ts t♥qs t♦ ♣r♦♣♦s ♥srs t♦ ♣rtr♦t♦♥r② ♣r♦♠ss ♠♥sr♣t ♦♥t♥s t rts ♦♥trt t♦ ♣rs♥t t♠ ♥ ♥♦♥r♦♥♦♦ ♦rr ♥ strt ② ♥ rt tt ♣rs♥ts ①♠♣s ♦ ts tt r ♠t ♥ ♦♥ tt♠♣ts t♦ ♣②♦♥t② ♣ s♣s ♥ ♣rs♥t rts r ts ts ♥ rss ② t ♦♣♠♥t♦ ♥ ♠t♦s ♥ ①♠♣s ♦ ♣♣t♦♥ ♦ ts ♥ ♠t♦s

• rst rt tt♠♣ts t♦ r② t ♣②♦♥t ♣♦st♦♥ ♦ qs r♦♣ ♦ tr ♥ ♥ ♦t ♥r♦♥♠♥ts r ♣②♦♥t♣♠♥t s ♠♣♦rt♥t t♦ ♦r ♥rst♥♥ ♦ t ♦t♦♥ ♦ t♦r♥t♦ t♠♣rtrs t s ♥ t t♦ st♠t s tr ♥♦♠s♠s t♦ ♦♥t♥ ♦ts ♦ ♥s ♦♠♥ r♦♠ ♦tr ♦r♥s♠s ♥ s s♦♦ ♥ ♣r ♠♥♥r ♥r t ♥♥ ♦ ①tr♠ ♦♥t♦♥st s ♦♣ ♦♠♣♦st♦♥ s t♦ ♦ ♥♦t t r ♦ ♣♦t♥t rtts sr s sst tt t② ♠② rt t♦ r♠♦t♦s ♥♦tr ♥ ♦ t♦♥ ♦r♥s♠s s st② s♦ ♠s♦♠ ♦ t ♠♦r ts tt ♦♥ s t♦ ♥ tr②♥ t♦ ♦ ♣♣②♦♥ts ♥ ♦♥♥ ♠ tt ttr ♠♦s ♦ sq♥ ♦t♦♥s♦ ♦♣ ♦t② t② ♠♦tt ♠ t♦ ♦r ♦♥ ♠♦s r♦st t♦ ♦♠♣♦st♦♥ s rts ♥ r♦♠♥t♦♥ rt ♥tr ♥ tr♥srs rt

• s♦♥ rt t♦ t ♣r♦t ♦ ♥ rr ♦r ♥ s♥ s r② ♣rt ♥sr t♦ t ♣r♦♠s tt t rst rt rss t ts t ss ♦ ♦♠♣♦st♦♥ s ♥ ts rt s♦♥tt ♠♦s ♦ sq♥ ♦t♦♥ ♠♦r r♦st t♦ ♦♠♣♦st♦♥ ss t♥

Page 61: Early Evolution and Phylogeny

P ❯

♦♠♠♦♥② s ♦♥s ♦ s s s② ② ♦♦♥ r② t ♦t ♦♦ ♦ tr s ♦♠♣t s s ①♠♣ tr♦ t♦♣♠♥t ♦ ♣ ♦ s♦tr ♥P② tt s♦ s ♠♦rr♦st t♦ ♦♠♣♦st♦♥ s t♥ ♦tr ♦♠♠♦♥ ♠t♦s

• s s♦tr s t♥ s t♦ tr② ♥ ♣ ♣rtr ♦r♥s♠ tr ♥r♠ s②♠♦s♠ ♥t♦ t tr ♦ ❲ ♣r♦♣♦s tts②♠♦s♠ ♠② r♣rs♥t tr r ♣②♠ ♥ t♦♥ t♦ r②r♦t ♥ r♥r♦t s t r♥s r r♦♠ t ♦tr r♥ s ts ♥ ♦♥t♥t s st♥t r♦♠ ♦t ♦tr ♣② rt ②tt ♦♥trt♦♥ ♥♦t r♥ r② ♠ s ♥P② ♥♦t r r♠ ♥sr s t♦ ♥r♠s rt♦♥s♣s s♦♠ ♠♣r♦♠♥ts ♦ ♣♣ t♦ t ♣ts ♦ ♥P② t♦ ♦♣ t ♥♠rs ♦ sq♥s ♦rrt② ①♣♦r♥ t s♣ ♦ tr t♦♣♦♦s

• ♥P② ♠② ♥♦t rt t ①♣♦r♥ t s♣ ♦ tr t♦♣♦♦s t ♥♦r rt② st♠t t ♦♥t♥t ♥ ss ♥ ♥ ♥str sq♥s s s♦♥ ♥ t s♦♥ rt ♥ r ts ♦♥t♥ts ♦rrt t♦ t ♦st ♦r♥s♠s ♦♣t♠ r♦t t♠♣rtr s♦ ttst♠t♥ ♦♥ ♣r♠ts t♦ ♥r t ♦tr ❯s♥ ts s♦tr s s ②s♥ s♦tr ♦♣ ② ♦t♦rs ♦ ts rt ♣r♦♣♦stt ❯ s ♠ ss t♦♥ ♦r♥s♠ t♥ ts t♦ s♥♥tss sr♣rs♥ ♣ttr♥ s ♥ r♠♥t t ♣r♦s② ♣s ②♣♦tss ♥ ssts ②s tr♦ ♦♦② ♥ ♦t♦♥r② ♦♦② ♠②♠♥t ♦tr

• ♣r♦s rt ♥t r♦♠ ②s♥ s♦tr r♦♠ ♦r ♦t♦rst t s♦ s♦ ♠ tt ♠ ♣r♦rss r♠♥s t♦ ♦♥ t♦ r♦t♥②s ♠♦s ♦ sq♥ ♦t♦♥ tt r r♦st t♦ ♦♠♣♦st♦♥ s str ♣r♦r♠ t♦♦ s t♦ r♥ ♦♥ ① t♦♣♦♦② ♥ ♦r 30 sq♥s tt t ♦r ♣rs♥t ♥ ts rt ♠② st♣ ♥ trt rt♦♥ s t s♦ ♣ ♣②♦♥tsts s② tst ♥ s

• rt s t ♥♦tr ♣r♦♠ ♥t ♥ t rst rt tt ♦r♦♠♥t♦♥ ② ♥ s t ♣r♦t ♦ t♦ ♦r ♠♦r r♥t ♦t♦♥r② st♦rs ♣r♦♣♦s t♦ ♠♦s t♦ r♦♥strt ts ♦t♦♥r②st♦rs ♥ tst t♠

• st t② ♥t ♥ t rst rt s t t tt ♥ trs ♥r r♦♠ t s♣s tr ♥ ts s♥t rt ♣rs♥t ♠♦tt s♣rt② ♥rs s♣s tr r♦♠ ♥ trs

• s st rt s♦ ♦rs s ♦♥s♦♥ t♦ ts ♠♥sr♣t ♥ t♦rs ♦ ♠② tss ♦♠ t♦ ♥rst♥ tt ♣r♦st ♠♦s♦ ♣r♦ rt ♥st ♥t♦ t st♦r② ♦ ② s♥ t ♥♦r♠t♦♥

Page 62: Early Evolution and Phylogeny

❯P

♦♥t♥ ♥ ♥♦♠s ♥ ts r ♣rs♥t s♦♠ r♥t ♣r♦rss tts ♥ ♠ ♥ ♠♦s ♦ ♦t♦♥ ♥ ♣r♦♣♦s ♣rs♣ts tt r r② ♣r♦♠s♥

• s ♣♣♥s t♦ rts ♦♥trt t♦ ♦r ♠② tss ♥ ♦ss ♦♥ ♥tr♣rt♥ ♣②♦♥s ♦ ♥s ♣rs♥t ♥ ♦rts♥ t ♦tr tt♠♣ts t♦ r♦♥strt ♥ ♦♥t♥t ♦t♦♥ ♥ ♣Pr♦t♦tr

Page 63: Early Evolution and Phylogeny

3P②♦♥② s ♥♦t s②

s rst rt tt♠♣ts t♦ r② t ♣②♦♥t ♣♦st♦♥ ♦ ♣rtrr♦♣ ♦ tr qs ♥ sr ♣r♦♠s rt t♦ ♠♦r♣②♦♥② r trt

rst ♥♦♠s r♦♠ ♦r♥s♠s ♥ ♥ s♠r ♥r♦♥♠♥ts ♥ ♦♥r t♦ s♠r rtrsts ♥ t ♣rs♥t s ♥♦t② qs ♠② rtt② r♦♣ t r♠♦t♦s s t② sr s♠r sq♥♦♠♣♦st♦♥s s ♦♠♣♦st♦♥ s ♥ ♠s ♣②♦♥t r♦♥strt♦♥t♦ tr ♥ ts rt t♦ ♠♥s ts ♠♣t ttr ♥sr♠② ♦♠ r♦♠ ttr ♠♦s ♦ ♦t♦♥

♦♥ tr ♦r ♦r③♦♥t ♥ tr♥sr ♥ ♦♥sr② tr ♣②♦♥② r♦♥strt♦♥ s ♥ ts ♣rs♥ ♥ trs ♥ r r♦♠ s♣s trs♥ ♥ ♥ ts rt tr t♦ ♦ ♦r st t♦ ♥r s♣s trs♣t t ttr ②s t♦ ♦ s♦ ♦ ♦♥ ♥ ♥ ♠♦s ♦ ♦t♦♥

♥ t ♥ ♦r rsts sst tt qs ♠② ♠♦r rt t♦ r♠♦t♦s t♥ t♦ ♦tr tr ② s♦ ♦r ttr ♠♦s ♦ ♦t♦♥tt ♦ ♦♣ ♦t t ♦♠♣♦st♦♥ ss ♥ ♥ tr♥sr

s rt s ♥ ♣t ♦r ♣t♦♥ ♥ ♦t♦♥r② ♦♦②♦♠♣♥②♥ ♣♣♠♥tr② trs ♥ ♦♥ t t ♦♦♥

rsss

tt♣♦♠sr♥②♦♥r⑦♦ssrtt♦♥①s

tt♣♦♠sr♥②♦♥r⑦♦ssrtt♦♥♣

Page 64: Early Evolution and Phylogeny

BioMed Central

Page 1 of 18

(page number not for citation purposes)

BMC Evolutionary Biology

Open AccessResearch article

Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of BacteriaBastien Boussau*, Laurent Guéguen and Manolo Gouy

Address: Université de Lyon; Université Lyon 1; CNRS; INRIA; Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France

Email: Bastien Boussau* - [email protected]; Laurent Guéguen - [email protected]; Manolo Gouy - [email protected]

* Corresponding author

Abstract

Background: Despite a large agreement between ribosomal RNA and concatenated protein

phylogenies, the phylogenetic tree of the bacterial domain remains uncertain in its deepest nodes.

For instance, the position of the hyperthermophilic Aquificales is debated, as their commonly

observed position close to Thermotogales may proceed from horizontal gene transfers, long

branch attraction or compositional biases, and may not represent vertical descent. Indeed, another

view, based on the analysis of rare genomic changes, places Aquificales close to epsilon-

Proteobacteria.

Results: To get a whole genome view of Aquifex relationships, all trees containing sequences from

Aquifex in the HOGENOM database were surveyed. This study revealed that Aquifex is most often

found as a neighbour to Thermotogales. Moreover, informational genes, which appeared to be less

often transferred to the Aquifex lineage than non-informational genes, most often placed Aquificales

close to Thermotogales. To ensure these results did not come from long branch attraction or

compositional artefacts, a subset of carefully chosen proteins from a wide range of bacterial species

was selected for further scrutiny. Among these genes, two phylogenetic hypotheses were found to

be significantly more likely than the others: the most likely hypothesis placed Aquificales as a

neighbour to Thermotogales, and the second one with epsilon-Proteobacteria. We characterized

the genes that supported each of these two hypotheses, and found that differences in rates of

evolution or in amino-acid compositions could not explain the presence of two incongruent

phylogenetic signals in the alignment. Instead, evidence for a large Horizontal Gene Transfer

between Aquificales and epsilon-Proteobacteria was found.

Conclusion: Methods based on concatenated informational proteins and methods based on

character cladistics led to different conclusions regarding the position of Aquificales because this

lineage has undergone many horizontal gene transfers. However, if a tree of vertical descent can

be reconstructed for Bacteria, our results suggest Aquificales should be placed close to

Thermotogales.

Published: 3 October 2008

BMC Evolutionary Biology 2008, 8:272 doi:10.1186/1471-2148-8-272

Received: 14 May 2008Accepted: 3 October 2008

This article is available from: http://www.biomedcentral.com/1471-2148/8/272

© 2008 Boussau et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 65: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 2 of 18

(page number not for citation purposes)

BackgroundIn the study of evolution, as in any scientific endeavour,progress relies on the comparison of hypotheses withrespect to how well these succeed in accounting for arange of observed data. In phylogenetics, a given tree, ahypothesis, is confronted with trees inferred using otherdata; resulting incongruences are then explained by amethodological artefact, or the inability of a single tree toproperly depict the evolution of the biological entitiesunder consideration. The large agreement between theribosomal RNA (rRNA) bacterial phylogeny and phyloge-nies built from a concatenated set of protein sequenceswas therefore a strong piece of evidence that the tree of lifecould be solved [1]. For instance, protein phylogeniesconfirmed the monophyly of most rRNA-defined bacte-rial phyla. Similarly, Aquificales are found close to Ther-motogales both in trees built from rRNA and fromconcatenated proteins. However, the position of theAquificales clade within the phylogeny of Bacteria hasoften been questioned on the ground of single gene phyl-ogenies, phylogenies built from gene or domain content[2], and supposedly rare genomic changes such as inser-tions-deletions [3-8]. Strikingly, many of these analysesare congruent with each other and suggest that Aquificalesmight be more closely related to Proteobacteria than toThermotogales. This new view has been adopted in recentscenarios that explain the whole evolution of life on earth[9], so it is important to our understanding of bacterialevolution that the puzzling phylogenetic problem of theposition of Aquificales within the bacterial phylogeny getssolved.

Species phylogenies built from the comparison of genesequences suffer from two major limitations: on one sidethe true gene trees may differ from the species trees, andon the other side, the signal contained in the genesequences might be too weak or too complex to be cor-rectly interpreted by bioinformatics methods. Gene treeswill differ from species trees in cases of hidden paralogy,closely spaced cladogenesis events or horizontal genetransfers (HGT). This last phenomenon is particularly rel-evant to the present study, as gene transfers are frequentamong prokaryotes. Phylogeneticists therefore often onlyconsider informational genes, involved in the processes oftranscription, translation and replication, which appear tobe less prone to HGTs over broad distances than othergenes, named operational [10]. The second limitation,that of a phylogenetic signal so blurred or buried that treereconstruction methods fail to recover the true tree, maycome from a saturated history of mutations (long branchattraction, [11,12]) or compositional biases [13,14]. Bothpitfalls are likely to affect genes used to reconstruct thebacterial phylogeny, because Bacteria possibly date as farback as 3.5 billion years ago [15], and because they dis-play a great diversity in their genomic characteristics as

well as in their ecological niches. More specifically, Aquifi-cales may be placed close to Thermotogales not becausethey last diverged from them, but because they share acommon ecological niche, i.e. they are both hyperther-mophilic, which led both their rRNA [16] and their pro-tein sequences [17] to adapt to high temperatures.Sequence similarities between these two clades wouldtherefore be the result of convergences due to identicalselective pressures, not the result of common descent.Consequently, recovering the bacterial species tree andclarifying the relations between hyperthermophilic organ-isms from comparison of gene sequences is a difficult task,and has led several authors to search for more reliableinformative characters.

Such characters are cell-structural features, or of a genomicnature: "rare genomic changes" [18], such as gene fusion/fission or insertion-deletions (indels), and gene ordomain presence/absence. The main assumption con-cerning all these characters is that they are nearly immuneto convergence: to be informative, a given character, mor-phological or genetic, should only arise once. To ourknowledge, this assumption has never been thoroughlytested. The genomic characters further depend on theidentification of orthologous genes in different genomes,and consequently are subject to the pitfall of horizontalgene transfers. Here again, this weakness is of particularinterest to our study, since both Aquificales and Thermo-togales seem to be particularly prone to exchanging geneswith other bacterial species [19,20].

Therefore it appears that both approaches – sequence phy-logenies and character cladistics – are potentially hin-dered by defaults whose magnitude is sufficient toquestion their conclusions. As in the case of the phyloge-netic position of Aquificales their conclusions diverge, adetailed study might clarify which approach has sufferedmost from its drawbacks.

In this report, we used the HOGENOM [21] database tosurvey the phylogenetic neighbourhood of Aquifex. Thisdatabase contains families of homologous genes fromcomplete genome sequences with associated sequencealignments and maximum likelihood phylogenetic trees.The automatic survey of all trees containing sequencesfrom Aquifex in the HOGENOM database reveals thatAquifex is most often found as a neighbour to Thermoto-gales. When genes are separated into informational andnon-informational genes we find that genes from theformer category seem to be less transferred than non-informational ones. To this end, neighbour clades foreach gene from Aquifex were counted, separately for infor-mational genes and for operational genes, yielding twodistributions. Then for each of the two distributions,Shannon's index of diversity was computed [22]. This

Page 66: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 3 of 18

(page number not for citation purposes)

index measures whether the genes are evenly distributedamong all possible neighbourhoods or whether a specificvicinity dominates. We find that the index value is signif-icantly different between the two distributions: amonginformational genes, one neighbourhood, betweenAquificales and Thermotogales, tends to dominate thedistribution much more than in operational genes. Thisshows that there is one dominating phylogenetic signalamong informational genes, and much less among opera-tional genes, which is consistent with the idea that opera-tional genes experience more frequent HGT events thaninformational genes.

To study the impact of saturation and compositional het-erogeneity on the position of Aquificales, we concate-nated a large dataset of putatively orthologous proteinsfrom a wide range of bacterial species (Additional file 1).A phylogenetic tree was built, and then taken as a refer-ence to test for the position of Aquificales: Aquificaleswere first removed from the tree, and then re-introducedin the topology in all possible positions. Site likelihoodswere computed for all these positions, which allowed forthe identification of sites favouring a given topology. Twophylogenetic hypotheses were found to be significantlymore likely than the others: the most likely hypothesisplaced Aquificales as a neighbour to Thermotogales, andthe second one placed Aquificales with epsilon-Proteo-bacteria. We characterized the genes that supported eachhypothesis, and found that differences in rates of evolu-tion or in amino-acid compositions could not explain thepresence of two dominating phylogenetic signals in thealignment. However, evidence for a large Horizontal GeneTransfer between Aquificales and epsilon-Proteobacteriawas found. These findings suffice to explain why methodsbased on concatenated informational proteins and meth-ods based on character cladistics led to different conclu-sions, and suggest that the vertical signal in the genomesof Aquificales, i.e. the portion of the genome most likelyto have been inherited through descent and not throughHGT, relates them to Thermotogales.

Results and discussionA whole genome view of Aquifex relationships

For each gene tree containing sequences from Aquifexaeolicus in the HOGENOM database, the identity of thegroup of sequences neighbouring Aquifex was recorded.This gave counts of Aquifex genes found close to Thermo-togales, Firmicutes, epsilon-Proteobacteria, among oth-ers. Cases where Aquifex genes were found close to a non-monophyletic group of species were discarded, which left578 gene trees. Among these, Thermotoga is found asAquifex's closest neighbour 98 times, epsilon-Proteobacte-ria are found 44 times, delta-Proteobacteria 84 times, Fir-micutes 71, Thermus-Deinococcus 39, Euryarchaeota 74(see Fig. 1). In view of such a distribution, it is difficult to

argue in favour of any particular relationship: HorizontalGene Transfers appear so pervasive that no signal emergesas clearly dominant. However, HGTs may not affect alltypes of genes with similar frequencies. It has been pro-posed that genes that are related to the universal processesof transcription, translation and replication and known as"informational genes" may be less transferred than "oper-ational genes", involved in metabolism for instance [10].

We therefore separated HOGENOM protein families intoinformational and non-informational gene families. Fig.1a shows that among informational genes, the genes plac-ing Aquifex close to Thermotoga (32 genes) are twice morenumerous than the genes favouring the second best alter-native hypothesis, i.e. the vicinity of Firmicutes (15genes). On the contrary, among operational genes (Fig.1b), differences between various hypotheses are muchnarrower: Thermotoga is Aquifex's neighbour in only twomore cases than delta-Proteobacteria, 11 more cases thanFirmicutes, and 13 more cases than Euryarchaeota. Toquantify this comparison, Shannon's index of diversitywas measured for both sets of genes. This index measureshow evenly distributed observations are among categories[22]: the higher the index, the more even the distribution;conversely, the lower the index, the more a few categoriesdominate. Shannon index values were 2.07 for informa-tional genes, and 2.49 for operational genes (significantlydifferent according to a t-test, p-value < 0.001; a Pearsonχ2 test between the two distributions is also significant, p-value < 10-20), which means that operational genes are sig-nificantly more evenly distributed among the variousneighbour groups than informational genes. The distribu-tions depicted in Figs 1a and 1b result from a mixture oflack of phylogenetic resolution at the single-gene leveland of HGT events. But the difference between themstrongly suggests that operational genes have been hori-zontally transferred more often than informational genes,which is consistent with the fact that Euryarchaeota arealmost never found as neighbour to Aquifex in informa-tional genes (2%), but often found in operational genes(11%). Interestingly, for both sets of genes, epsilon-Pro-teobacteria are not one of the most frequent Aquifexneighbours, as they are less frequent than Thermotoga, Fir-micutes, and delta-Proteobacteria. For operational genes,they are even less frequent than Euryarchaeota. Theseresults thus do not support the hypothesis that Aquificalesare epsilon-Proteobacteria [4]. However, if all Proteobac-teria are to be counted as a single clade, the vicinity ofAquifex with Proteobacteria becomes a high-scoringhypothesis: Aquifex is most closely related to a Proteobac-terium with 18 informational genes and 76 non-informa-tional genes. According to operational genes, if anything,Aquifex would be a Proteobacterium, as almost twice moregenes place it with Proteobacteria than with Thermotoga(76 for Proteobacteria against 41 for Thermotoga); accord-

Page 67: Early Evolution and Phylogeny

BM

C E

volu

tionary

Bio

logy 2

008

, 8:2

72

http

://ww

w.b

iom

edcen

tral.c

om

/14

71-2

148/8

/272

Pag

e 4

of 1

8

(pa

ge

nu

mb

er n

ot fo

r cita

tion p

urp

ose

s)

Phylo

genetic re

lationsh

ips o

f Aquifex ge

nes acco

rdin

g to th

e H

OG

EN

OM

datab

aseF

igu

re 1

Ph

ylo

gen

etic

rela

tion

ship

s of A

quife

x g

en

es a

cco

rdin

g to

the H

OG

EN

OM

data

base

. a: Info

rmatio

nal ge

nes. b

: Non-

info

rmatio

nal ge

nes.

Thermotoga

Thermus−Deinococcus

Firmicutes

Actinobacteria

Cyanobacteria

Spirochaetes

Planctomycetes

Chlamydiales

Bacteroidetes−Chlorobi

alpha−Proteobacteria

beta−Proteobacteria

gamma−Proteobacteria

delta−Proteobacteria

epsilon−Proteobacteria

Crenarchaeota

Euryarchaeota

Number of Informational

Gene Families

0

10

20

30

40

50

32

5

15

31

11

30

40

00

11

73

2 aThermotoga

Thermus−Deinococcus

Firmicutes

Actinobacteria

Cyanobacteria

Spirochaetes

Planctomycetes

Chlamydiales

Bacteroidetes−Chlorobi

alpha−Proteobacteria

beta−Proteobacteria

gamma−Proteobacteria

delta−Proteobacteria

epsilon−Proteobacteria

Crenarchaeota

Euryarchaeota

Number of Non−Informational

Gene Families

0

10

20

30

40

50

41

15

30

6

12

16

35

16

83

6

39

20

6

28 b

Page 68: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 5 of 18

(page number not for citation purposes)

ing to informational genes, Aquifex is close to Thermotoga,as almost twice more genes place it with Thermotoga thanwith Proteobacteria (18 for Proteobacteria against 32 forThermotoga). However, considering all Proteobacteria as asingle clade artificially groups a variety of different histo-ries under the same hypothesis. It is thus more likely thatthe high frequency of close relationships between Aquifexand Thermotoga among informational genes reflects verti-cal descent, and that the scattered distribution of Aquifexclosest homologs among operational genes results fromfrequent horizontal transfers to or from the Aquifex line-age.

Furthermore, this whole genome analysis may suffer fromcompositional biases or long branch attraction. Conse-quently, a subset of carefully chosen genes was concate-nated and used to assess the importance of potentialartefacts: first a tree of the Bacteria was built, and then,using this tree as a scaffold, the influence of saturation andcompositional biases on the position of Aquificales wasestimated.

Bacterial phylogeny obtained from a concatenated set of

putatively orthologous genes

Fifty-six genes that were nearly universal in Bacteria andpresent as single copy in most genomes were concate-nated (see Methods). Genes that showed a transferbetween Bacteria and Archaea had previously been dis-carded because a gene showing evidence of a transferbetween very distantly related organisms might be espe-cially prone to be transferred among species of the samedomain. Some of the 56 remaining genes may still haveundergone a transfer, and concatenating them may lead tospurious results. Usually, transferred genes are discardedbefore gene concatenation [23,24]. Here, we first checkedfor possible tree building biases resulting from composi-tion or evolutionary rate effects before proceeding to ananalysis designed to specifically identify genes that mayhave been transferred between Aquificales and other spe-cies. PhyML was used to build a starting phylogeny basedon the concatenated protein alignments, using the JTTmodel and a gamma law discretized in four classes toaccount for variation in the evolutionary rates. The discre-tized gamma law [25] is widely used because of its math-ematical convenience, not as a precise model of theevolutionary rates of protein sequences. Therefore it isexpected that some sites are not properly modelled whenthis approximation is made. To estimate how sites weremodelled by the discretized gamma law, we plotted thedistribution of expected relative evolutionary rates acrosssites (Fig. 2) as found by BppML. This distribution showsfour peaks, each corresponding to the rate of a particularclass. The two largest peaks are at the limits of the distri-bution: they comprise both sites whose rate is properlyapproximated by one of the two extreme evolutionary

rates, but also sites whose rate would be smaller or larger,if the discretized gamma law was able to provide a con-venient rate. For instance, the leftmost peak contains sitesproperly modelled by a relative rate of ~0.2, but also sitesevolving more slowly, such as constant sites. Per se,improperly modelling constant sites probably does notlead to biased phylogenetic estimations; however under-estimating the evolutionary rate of some fast-evolvingsites (and this may be a by-product of improper model-ling of constant sites) will lead to an underestimation ofthe convergence probability. Such misspecified modellingis therefore a potential cause for long branch attraction, asunderlined in another context [26]. We consequentlydecided to conservatively discard sites whose evolutionaryrate was above the arbitrary threshold of 2.2 (red line,Fig.2), in the hope of reducing risks of reconstruction arte-facts. The resulting alignment contains 10,000 sites, andhas been submitted to an additional reconstructionthrough PhyML, with a bootstrap analysis based upon200 replicates.

Our tree comprises 94 bacterial species, spanning asexhaustively as currently possible the diversity of Bacteria(Fig. 3). The resulting topology is in good agreement withrRNA trees [27], recently published concatenated-proteinphylogenies [28,29], as well as supertree phylogenies[30]. In particular, we do recover the clade named "Terra-bacteria" by Battistuzzi and co-workers, as well as theclade named Gracilicutes by Cavalier-Smith [7], separatedwith a high bootstrap support (BS 94%). It is interestingto note that these three recent bacterial phylogenies all

Distribution of the site relative evolutionary ratesFigure 2Distribution of the site relative evolutionary rates. Rates were estimated using a 4 class discretized gamma dis-tribution. The 4 peaks correspond to the rates associated to each class. The vertical red line corresponds to the threshold above which sites have been discarded due to their high evo-lutionary rate.

Site relative evolutionary rate

Fre

qu

en

cy

0.0 0.5 1.0 1.5 2.0 2.5

05

00

10

00

15

00

20

00

Page 69: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 6 of 18

(page number not for citation purposes)

recover these two clades, which suggests that the globalpicture of bacterial evolution might be slowly unveiling.The "PVC supergroup" (Planctomyces-Verrucomicrobia-

Chlamydiales, [31]) seems to find a confirmation in ourphylogeny where Planctomycetes and Chlamydiales aregrouped with 100% BS. Many similarities are also found

Unrooted phylogenetic tree of BacteriaFigure 3Unrooted phylogenetic tree of Bacteria. This tree was obtained after discarding all sites with evolutionary rate predicted to be above 2.2. Stars indicate branches with 100% bootstrap support (200 replicates). Bootstrap supports between 80% and 100% are shown, bootstraps below 80% have been removed for clarity. Aquificales are represented in bright red. Names of major groups are according to the NCBI taxonomy. Gracilicutes and Terrabacteria, two recently proposed superclades, are shown as dashed frames, and their names are between quotation marks to mark their unconsensual status.

THER_NETHER_MATHER_PE

FERV_NOTHER_ME

SULF_AZAQUI_AE

FUSO_NUPROP_ACLEIF_XY

CORY_DIMYCO_BO

NOCA_FASTRE_CO

BIFI_LORUBR_XY

DEIN_GEDEIN_RA

THER_TH1THER_TH

SYNE_ELTHER_EL

ANAB_SPTRIC_ER

PROC_MAGLOE_VI

DEHA_ETDEHA_CB

ROSE_SPCHLO_AUHERP_AU

HALO_ORSYMB_TH

MOOR_THCLOS_PE

THER_TEENTE_FA

LIST_MOBACI_SU

GEOB_KALACT_AC

UREA_PAMYCO_GE

MYCO_PEMESO_FLONIO_YEASTE_YE

SOLI_USACID_BA

BDEL_BAANAE_DE

DESU_PSDESU_AC

GEOB_MEBART_HE

AGRO_TUCAUL_VI

RICK_PRMAGN_SP

PAST_MUESCH_CO

BUCH_APPSEU_AE

BORD_PEBURK_MARALS_SO

AZOA_SPNEIS_GO

CHRO_VINITR_EU

HELI_PYWOLI_SUCAMP_JE

BORR_BUBORR_GA

TREP_DELEPT_IN

CHLA_PNCHLA_CA

CHLA_MUCHLA_TR

PARA_SPBLAS_MA

RHOD_BAKUEN_ST

CHLO_TEPELO_LU

PROS_AEBACT_FRBACT_THPORP_GI

FLAV_JOCYTO_HU

SALI_RU

0.1

**

**

*

*

**

**

*

*

*

**

*

**

*

*

**

**

**

*

**

**

*

*

*

**

**

*

*

**

*

*

**

**

*

*

**

*

**

*

*

*

**

*

94

99

81

80

84

8789

83

85

87

CHLAMYDIALES

SPIROCHAETES

PROTEOBACTERIA

MYCOPLASMA

FIRMICUTES

CYANOBACTERIA

THERMUS/DEINOCOCCUS

CHLOROFLEXI

ACTINOBACTERIA

AQUIFICALES

THERMOTOGALES

*

BACTEROIDETES/CHLOROBI

PLANCTOMYCETES

"GRACILICUTES"

"TERRABACTERIA"

Page 70: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 7 of 18

(page number not for citation purposes)

with the phylogeny proposed by Ciccarelli and co-workers[32], or the supertree obtained by Beiko, Harlow andRagan [33], such as the monophyly of Proteobacteria, andthe grouping of Aquificales with Thermotogales.

However, many deep nodes do not obtain high bootstrapsupports. Two avenues might help fully resolve the bacte-rial phylogeny: further increase the number of phyloge-netic markers, and improve the interpretation of thephylogenetic signal through the development of newmodels of evolution. Such models would ideally be ableto deal with compositional heterogeneity, and wouldsafely handle saturation. As there is no efficient programwith these properties, we have chosen to filter out satu-rated sites to try and diminish compositional heterogene-ity.

We have already attempted to remove the most saturatedsites. To assess the impact of compositional heterogeneity,we performed Bowker's tests for symmetry in the evolu-tionary process on the whole alignment [34,35]. Bowker'stest relies on the comparison of two sequences againsteach other, therefore 94*93/2 = 4371 tests can be done onour alignment. Among these 4371 tests, 3826 reject sym-metry at the 5% level: though we have made no effort toalleviate the multiple tests problem, compositional heter-ogeneity might be an important issue for the reconstruc-tion of bacterial phylogeny. Species that show the mostbiased amino-acid usage, i.e. that fail the highest numbersof Bowker's tests, include first AT-rich species (Buchneraaphidicola, Borrelia burgdorferi), then GC-rich species (Ther-mus Thermophilus) and finally hyperthermophilic species(data not shown). This is in agreement with results basedon a multivariate analysis of proteome composition [36],where the GC content of the genome was found to be themajor factor influencing amino-acid composition, beforethermophily.

To try and limit the influence of compositional bias, werecoded the concatenated protein alignment in 4 statesbased on the physico-chemical properties of the amino-acids [37]. Such a recoding is expected to reduce the riskof long branch attraction artefact as well as compositionalbias by decreasing the number of homoplasies. Accord-ingly, after the recoding, 2818 tests reject symmetry: therecoding seems to have diminished compositional bias atleast in 1008 cases, but clearly has not permitted to fullyerase heterogeneity. The tree we obtain on the recodedalignment (Fig. 4) is very similar to the previous tree (Fig.3), with Gracilicutes separated from Terrabacteria (BS76%). Interestingly, Aquificales are still found as a sistergroup of Thermotogales with a high bootstrap support(96%), and Thermus-Deinococcus also clusters with thesehyperthermophilic organisms, although the bootstrapsupport is negligible (36%). The grouping of the photo-

synthetic lineages Chloroflexi and Cyanobacteria gainssupport through the recoding, with a BS of 85% on therecoded alignment against 77% on the original align-ment. So does the clustering of these two photosyntheticlineages with another lineage that contains photosyn-thetic organisms, the Firmicutes: from 63% on the origi-nal alignment, the BS increases to 73% with the recodedalignment. The grouping of these three photosyntheticlineages appears as an appealing hypothesis, but certainlyrequires further inquiry, especially since horizontal genetransfers are thought to have been part of the evolution ofphotosynthesis [38]. Strikingly, Spirochaetes were foundto group with Chlamydiales, Planctomycetes and Bacter-oidetes/Chlorobi with a high bootstrap support (83%) onthe original alignment, but grouped with epsilon-Proteo-bacteria on the recoded alignment (bootstrap support:18%), which shows that recoding can impact tree recon-struction. Overall, the average bootstrap support is 87.1%,not significantly lower than the average support for theoriginal alignment (90.3%, p-value = 0.065 with a Stu-dent paired t-test, p-value = 0.154 with a Wilcoxon signedrank test). This supports the conclusion of Susko andRoger [39] that recoding does not lead to a substantial lossof information.

As the trees obtained on the recoded and original align-ments are in strong agreement, we conclude that weobtain a fairly robust Bacterial tree, and that the clusteringof Aquificales and Thermotogales does not seem due tosaturation or compositional artefacts. However, sincemore than 50% of Bowker's tests reject symmetry on therecoded alignment, considerable compositional heteroge-neity has escaped the 4-state recoding, and this analysiscannot entirely rule out the hypothesis that Aquificalesand Thermotogales are attracted by compositional biases.Nonetheless, the addition to the concatenated alignmentof sequences from two free-living epsilon-Proteobacteria,Sulfurovum NBC37-1 and thermophilic NitratiruptorSB155-2 [40], does not affect this grouping either (seeadditional file 2). Thus the Aquificales-Thermotogalesgrouping does not seem to result from compositionalbiases.

Does the Thermotogales-Aquificales cluster come from a

reconstruction artefact?

The topology that is found without Aquificales usingPhyML with the same parameters is perfectly congruentwith the tree obtained with Aquificales. Taking thereforeas reference the tree without Aquificales, we tested all pos-sible positions for this group in the bacterial tree. Themost likely position was as found by the tree search heu-ristics, with Thermotogales. The second most likely posi-tion was very close, at the base of a clade comprising bothThermotogales and Fusobacterium, and the third mostlikely position was with epsilon-Proteobacteria, the only

Page 71: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 8 of 18

(page number not for citation purposes)

Unrooted phylogenetic tree obtained from 56 genes of Bacteria based on the recoded alignmentFigure 4Unrooted phylogenetic tree obtained from 56 genes of Bacteria based on the recoded alignment. Labels as in Fig. 3.

THER_THTHER_TH1

DEIN_RADEIN_GETHER_NETHER_PETHER_MA100

THER_MEFERV_NO

AQUI_AESULF_AZ

96

FUSO_NURUBR_XY

BIFI_LOPROP_ACLEIF_XY

STRE_COCORY_DI

NOCA_FAMYCO_BO

82

97

GLOE_VIPROC_MA

THER_ELSYNE_EL

TRIC_ERANAB_SP

80

DEHA_CBDEHA_ET

HERP_AUCHLO_AU

ROSE_SP100HALO_OR

MOOR_THSYMB_TH

THER_TECLOS_PEGEOB_KA

BACI_SULIST_MO

LACT_ACENTE_FA

MESO_FLUREA_PA

MYCO_PEMYCO_GE

ONIO_YEASTE_YE

83

98

85

81

99

ACID_BASOLI_US

CHRO_VINEIS_GO

NITR_EUAZOA_SP

BORD_PERALS_SO

BURK_MAPSEU_AE

PAST_MUESCH_CO

BUCH_AP90MAGN_SP

RICK_PRCAUL_VI

AGRO_TUBART_HE

BDEL_BAGEOB_MEDESU_AC

DESU_PSANAE_DE

100

97

93

CAMP_JEWOLI_SU

HELI_PYLEPT_IN

TREP_DEBORR_GABORR_BU

84

PROS_AEPELO_LUCHLO_TE97

SALI_RUCYTO_HU

FLAV_JOPORP_GI

BACT_THBACT_FR

99

PARA_SPCHLA_TRCHLA_MUCHLA_CACHLA_PN

100

KUEN_STRHOD_BA

BLAS_MA

86

0.05

PLANCTOMYCETES

CHLAMYDIALES

BACTEROIDETES/CHLOROBI

SPIROCHAETES

PROTEOBACTERIA

MYCOPLASMA

FIRMICUTES

CHLOROFLEXI

CYANOBACTERIA

ACTINOBACTERIA

AQUIFICALES

THERMOTOGALES

THERMUS/DEINOCOCCUS

*

*

*

*

*

*

*

*

*

*

*

**

*

* *

*

*

*

*

*

* *

*

*

*

*

* *

*

*

*

*

* * *

**

*

** *

**

* **

**

**

*

*

"GRACILICUTES"

"TERRABACTERIA"

Page 72: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 9 of 18

(page number not for citation purposes)

placement not rejected at the 5% level according to an AUtest [41] as implemented in Consel [42] (p-value = 0.062).Because the AU test is based on a multiscale RELL boot-strap procedure, the fact that the second most likelyhypothesis is rejected by the AU test at 5% while the thirdis not suggests that sites of high likelihood scores are thesame in the two first hypotheses, but are different fromthe sites of high likelihood scores in the third hypothesis.Consequently two contrasting signals can be found in thedata, coming from different sites in the alignment, thatsupport the two currently prevailing phylogenetic hypoth-eses for Aquificales, one based on rRNA trees, and theother heralded by Cavalier-Smith [4]. We decided to fur-ther analyse the nature of the signal that favoured each ofthese two placements, through a gene-wise analysis.

We built phylogenetic trees for each of our 56 genes withPhyML. Among these 56 trees, 11 place Aquificales closeto Thermotogales (T genes), and only two place Aquifi-cales close to epsilon-Proteobacteria (E genes). We com-pared these two sets of genes, with respect to rates ofevolution and amino-acid composition, to see whetherone signal is the result of a long branch attraction or of acompositional bias.

First, we computed the sum of the branch lengths for eachtree in our two datasets, and computed an average branchlength for each dataset. The average branch length was0.163 for T genes, and 0.131 for E genes, which is not sig-nificantly different according to an unpaired t-test (p-value: 0.145). The discrepancy between the two datasetsdoes not seem to be explainable by a long branch attrac-tion artefact.

Second, the position close to Thermotogales might befavoured because of convergences instead of commondescent: as written above, both Thermotogales and Aquifi-cales are hyperthermophilic organisms, so their sequencesare subject to partly similar selective pressures. Throughthe analysis of many completely sequenced genomes, Zel-dovich and co-workers [17] have found a positive correla-tion between the proteome content in amino acidsIVYWREL and the organism optimal growth temperature.As hyperthermophilic bacteria and archaea are not mono-phyletic, this suggests that there exists a selective pressureto increase the IVYWREL content in organisms that thrivebest at high temperatures. If we find a higher proportionof the amino-acids IVYWREL in the Aquificales sequencesfor T genes than for E genes, this would imply that com-position biases could be at the origin of the signal favour-ing the Thermotogales placement. We find that T genes inAquifex aeolicus and Sulfurihydrogenibium azorense contain45,4% of IVYWREL amino-acids, against 44.4% for Egenes. As the difference is not significant (χ2 test, p-value

= 0.61), there is no evidence that the T signal is comingfrom compositional artefacts.

Consequently it appears that neither the signal favouringa close relationship between Aquificales and epsilon-Pro-teobacteria nor the signal favouring a close relationshipbetween Aquificales and Thermotogales seem induced bya reconstruction artefact, namely long branch attraction orcompositional convergence. Similarly, this suggests thatthe trees placing Aquificales close to Thermotogales in thewhole genome study may not come from long branchattraction or compositional artefacts. Therefore, incongru-ences found between the T and E groups of genes proba-bly unveil different gene histories: at least one of these twoprevailing signals comes from HGTs.

Detection of Horizontal Gene Transfers in the

concatenate

We used the 181 possible Aquificales positions whoselikelihoods had been computed earlier to search for evi-dence of HGTs affecting Aquificales genes. Because thetaxonomic sampling was as exhaustive as currently possi-ble, and because all possible positions for Aquificalesamong Bacteria have been tried, it is expected that fewHGTs affecting Aquificales might escape this screening.

Naturally, some genes from other Bacteria present in thedataset also underwent transfers that will not be detectedusing our approach. But neglecting such transfers shouldnot affect our results, since the focus of this study is theposition of Aquificales,.

The top curve of Fig. 5 shows the cumulative sum of thelog-likelihood differences between the tree in whichAquificales are close to epsilon-Proteobacteria and thetree in which Aquificales are close to Thermotogales. Ifasked to divide this curve, one would probably cut it intwo parts, the first one decreasing, and the second oneincreasing. This would plead for two signals, first one infavour of the Thermotogales position, and then one infavour of the epsilon-Proteobacterial position. However,this division would be based on the comparison of onlytwo trees, whereas 181 different positions should be com-pared.

We used the Maximum Predictive Partitioning (MPP)algorithm to find what are the two prevailing signals inthe alignment among all 181 compared positions [43].This algorithm identifies the best way of dividing the datain two parts and assigning each to a specific tree position.The results are displayed in the bottom panel of Fig. 5. TheMPP algorithm divides the alignment very close to the sitein which the curve changes from descending to ascendingtrends. The most likely positions affected to each of thetwo parts, among all 181 possible positions, are first the

Page 73: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 10 of 18

(page number not for citation purposes)

tree in which Aquificales are close to Thermotogales, andsecond the tree in which Aquificales are close to epsilon-Proteobacteria. Therefore, the two dominant signals in thealignment are T and E signals. Furthermore, the sequenceconcatenate was built following the gene order in theAquifex aeolicus genome. Consequently, the fact that seriesof consecutive sites support the same phylogenetic posi-tion for Aquifex means that whole genes plead for eachhypothesis.

The issue now is to decide which of these two dominantsignals is most likely HGT, and which has the highestchance of coming from vertical inheritance. One can relyon the Aquifex aeolicus genomic map to find the solution:if a hypothesis is favoured by an isolated island that con-centrates a few genes, it is likely to be the signature of alarge horizontal transfer affecting a unique region of thegenome. Contrary to the T signal, the signal that favours aclose relationship between Aquificales and epsilon-Pro-teobacteria is limited to a few clustered genes, mainly con-sisting of the rplL-rpoB-rpoC operon (characterized in E.

coli, [44,45]), which seems conserved in most bacterialgenomes. This clustering strongly suggests that the epsi-lon-proteobacterial signal comes from horizontally trans-ferred genes, through a single transfer of the whole rplL-rpoB-rpoC operon, from epsilon-Proteobacteria to Aquifi-cales. Indeed, if only these three genes are concatenatedand submitted to phylogenetic analysis, Aquificales arefound clustered with epsilon-Proteobacteria with a fairlyhigh bootstrap support (79%, Fig. 6). As these transferredgenes are large, they contribute a substantial amount ofsignal in the complete concatenate. This large transferappears unexpected, since it concerns informationalgenes, involved in translation (rplL) and transcription(rpoB-rpoC), but it has already been suggested by Iyer,Koonin and Aravind [46]; the alternative hypothesis ofthe E signal being the real phylogenetic signal wouldrequire repeated HGTs of 11 genes between Thermoto-gales and Aquificales along all the Aquifex genome (Table1), or a very large HGT of 11 genes, subsequently scatteredalong the Aquifex genome. Both explanations seem moreunlikely. Consequently, we favour the hypothesis of a sin-

Comparison between site likelihoods when Aquificales are placed close to Epsilon-proteobacteria and when they are placed with ThermotogalesFigure 5Comparison between site likelihoods when Aquificales are placed close to Epsilon-proteobacteria and when they are placed with Thermotogales. Upper panel: summed differences between site log-likelihoods obtained when Aquificales are placed with epsilon-Proteobacteria and when they are placed with Thermotogales. A descending trend means that a consecutive series of sites favours the Thermotogales position (T signal), whereas an ascending trend means that a series of sites favours the epsilon-proteobacterial position (E signal). Genes have been ordered according to their position along the Aquifex genome. Dashed blue lines represent gene boundaries. The red interval represents the genes which appear to contain most of the E signal. The green interval represents gene infB, in which the curve first decreases and then increases. Lower panel: result obtained by the Maximum Predictive Partitioning algorithm when asked to find the most likely partition of the sites in two segments. The a posteriori most likely model for the first segment is the tree in which Aquificales are sister group to Thermotogales, and the second segment is best fitted by the tree in which Aquificales are sister group to epsilon-Proteobac-teria.

0 2000 4000 6000 8000 10000 12000

−250

−200

−150

−100

−50

050

Site index in the sequence

Cum

ula

ted s

ite log−

Lik

elih

oods

Segm

enta

tion

rplL rpoB rpoC infB

T signal E signal

Page 74: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 11 of 18

(page number not for citation purposes)

gle HGT of the whole rplL-rpoB-rpoC operon from anancestor of epsilon-Proteobacteria to Aquificales.

Such a hypothesis is relevant to the relative dating ofAquificales and epsilon-Proteobacteria: a transfer from anancestor of epsilon-Proteobacteria to an ancestor ofAquifex aeolicus and Sulfurihydrogenibium azorense impliesthat these ancestors are contemporary. Although in treesof life obtained from rRNAs or concatenated proteins androoted between Bacteria and Archaea-Eukaryota Aquifi-cales are found very close to the root of Bacteria, the diver-gence between Aquifex and Sulfurihydrogenibium should

not be more ancient than the divergence of epsilon-Pro-teobacteria from other Proteobacteria.

A gene-by-gene analysis adds support to the hypothesisthat the dominating signal places Aquificales whith Ther-motogales. Table 1 shows that, among the 39 gene phyl-ogenies that can be unambiguously interpreted, 11 placeAquificales with Thermotogales while only 2 (RpoB andRpoC) place Aquificales with epsilon-Proteobacteria. Thephylogeny of rplL is difficult to interpret, with Aquificalesplaced close to Delta-proteobacteria and epsilon-Proteo-bacteria, which might be due to the short length of this

Table 1: Position of Aquificales in phylogenies built from single genes present in the concatenated alignment

Position in the genome (locus index) Gene name Phylogeny: group neighbouring Aquificales

8 rpsJ Thermotogales

11 rplD Deinococcus/Thermus

13 rplB Fusobacterium nucleatum

16 rplV Thermoanaerobacter tengcongensis

17 rpsC Thermotogales

18 rplP Planctomycetes

20 rpsQ Chloroflexi

73 rpsK Planctomycetes

74 rpsM a clade comprising spirochaetes and Bacteroidetes/Chlorobi

123 rpsP Bdellovibrio

226 rpsO Planctomycetes

287 smb Thermotogales

461 gatB Thermotogales

609 hypothetical protein Clostridiales

712 frr Chloroflexi

735 rpsL2 Thermotogales

792 cycB1 a clade comprising Thermoanaerobacter tengcongensis and Bdellovibrio

946 rnc Thermotogales

1478 recR Leptospira interogans

1489 trmD Thermotogales

1493 dnaG a clade comprising Spirochaetes and Thermotogales

1645 rpsE Deinococcus/Thermus

1648 rplR Clostridiales

1649 rplF Thermotogales

1651 rpsH a clade comprising Thermotogales and Deinococcus/Thermus

1652 rplE Actinobacteria

1654 rplN Mycoplasma

1767 rpsT Proteobacteria

1773 rpmA Borrelia

1777 infC Leptospira interogans

1832 rpsG1 Thermotogales

1878 rpsI Desulfotalea psychrophila

1919 era2 Thermotogales

1933 rplK Thermotogales

1935 rplA Chloroflexi

1939 rpoB Campylobacter jejuni

1945 rpoC Campylobacter jejuni

2007 rpsB a clade comprising Thermotogales and Cyanobacteria

2032 infB a clade comprising Proteobacteria, Bacteroidetes-Chlorobi, Spirochaetes, Chlamydiales

2042 rplI a clade comprising delta-Proteobacteria, Chloroflexi, and Planctomycetes

Results not unambiguously interpretable are not shown.

Page 75: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 12 of 18

(page number not for citation purposes)

Unrooted tree obtained from the concatenation of rplL-rpoB-rpoCFigure 6Unrooted tree obtained from the concatenation of rplL-rpoB-rpoC. Colors and symbols as in Fig. 3.

BART HEAGRO_TU

CAUL_VIRICK_PR

NITR_EUAZOA_SP

RALS_SOBORD_PE

91

NEIS_GOCHRO_VI

ESCH_COBUCH_AP

PAST_MU

94

PSEU_AEDESU_PS

BDEL_BAANAE_DE

DESU_ACGEOB_ME

ACID_BASOLI_US

95

SULF_AZAQUI_AE

CAMP_JE

89

79CHLA_TRCHLA_MUCHLA_PN

CHLA_CAPARA_SP

SALI_RUCYTO_HU

BACT_THBACT_FR

PORP_GIFLAV_JO

PELO_LUPROS_AE

CHLO_TE

95

80

96

LEPT_INBORR_GABORR_BU

TREP_DE

80

DEHA_ETDEHA_CB

ROSE_SPCHLO_AU

RUBR_XYSTRE_CO

LEIF_XYCORY_DI

NOCA_FAMYCO_BO

PROP_ACBIFI_LO

82

HALO_ORSYMB_TH

ENTE_FALACT_AC

LIST_MO

90

BACI_SUGEOB_KA

83

89MOOR_TH

CLOS_PETHER_TE

96

THER_NETHER_PE

THER_MA92THER_ME

FERV_NODEIN_GE

DEIN_RATHER_TH1THER_TH

FUSO_NUMESO_FL

UREA_PAMYCO_GE

MYCO_PEASTE_YEONIO_YE

97

0.2

*

*

*

*

*

*

*

*

*

**

**

*

*

**

*

**

*

*

*

*

*

*

**

**

*

*

*

*

*

*

*

*

*

*

*

*

BACTEROIDETES/CHLOROBI

CHLAMYDIALES

AQUIFICALES*EPSILON−PROTEOBACTERIA

PROTEOBACTERIA

SPIROCHAETES

CHLOROFLEXI

ACTINOBACTERIA

FIRMICUTES

THERMUS/DEINOCOCCUS

THERMOTOGALES

**

MYCOPLASMA

Page 76: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 13 of 18

(page number not for citation purposes)

gene (139 sites). Strikingly, 13 genes place Aquificaleswith Gracilicutes, either close to Planctomycetes, to Spiro-chaetes, to Bacteroidetes-Chlorobi or to Proteobacteria. Asingle dominant pattern does not emerge from these genetrees: therefore they do not argue in favour of a specificrelationship between Aquificales and a particular group ofGracilicutes. These results rather suggest either uncertain-ties in phylogenetic reconstruction or repeated horizontalgene transfers between Aquificales and various Gracilicutedonors.

In conclusion, the epsilon-proteobacterial signal in theconcatenated carefully chosen proteins probably derivesfrom horizontally transferred informational genes, andthe Thermotogal signal might be the signal of verticaldescent. This conclusion is perfectly congruent with theresults from the whole genome analysis. However, theepsilon-Proteobacterial vicinity hypothesis was originallybased upon rare genomic changes. How can this hypoth-esis be reconciled with our conclusions?

The impact of horizontal gene transfers on rare genomic

changes

The prevailing cladistic study arguing that Aquificalesshould be placed as a neighbour to Proteobacteria wasperformed by Griffiths and Gupta [6], where inserts in 4genes were found to support this hypothesis. These 4genes are rpoB, rpoC, alanyl-tRNA synthetase and inor-ganic pyrophosphatase.

Interestingly, two of these four genes, rpoB and rpoC, areincluded in our concatenated alignment. Because they areclustered in the Aquifex aeolicus genome and display thesame non-mainstream phylogenetic signal, we have diag-nosed them as resulting from HGT from epsilon-Proteo-bacteria. Therefore, the two large inserts that Griffiths andGupta found are no proof of a particular relatedness butrather of a HGT.

The alanyl-tRNA synthetase has not been included in ourconcatenate because tRNA synthetase genes are known tobe extremely prone to HGT [47]. The analysis of the ala-nyl-tRNA synthetase gene family of the HOGENOM data-base (family HBG008973), confirms that this gene mightnot be a good phylogenetic marker. In the tree built fromthis family with PhyML, Aquifex aeolicus is found close tothe spirochaete Leptospira, together close to Clostridiales,the Planctomycete Rhodopirellula baltica is found as aneighbour to Deinococcales (data not shown), amongother oddities. All these relations are inconsistent with thetree built from the concatenate and inconsistent with cur-rent ideas about bacterial taxonomy. Therefore, using thealanyl-tRNA synthetase gene family to resolve bacterialphylogeny appears inadequate.

Finally, the inorganic pyrophosphatase tree as retrievedfrom HOGENOM (family HBG000457) shows Aquifexaeolicus inside Proteobacteria, close to Alpha-proteobacte-ria, which are not monophyletic. It appears that this genefamily has undergone a duplication (Cyanobacteria arerepresented twice in the tree in widely separated posi-tions) as well as horizontal gene transfers (Archaea areclustered in two groups widely separated in the tree, aswell as Chlamydiales). Overall, the history of inorganicpyrophosphatase is probably too complex to be used as amarker of species relationships.

Consequently, the rare genomic changes that were used toargue for a specific relatedness between Aquificales andProteobacteria most likely come from HGT between thesetwo clades, as already observed in the above analyses (Fig.1 for instance).

The fact that the outer membrane of Aquifex closely resem-bles the outer membrane of other Proteobacteria was alsoused [4] to argue that Aquificales are more closely relatedto Proteobacteria than to Thermotogales. It is unclear whythis character would be particularly immune to HGT; theouter membrane most likely possesses a strong adaptivevalue, so that the transfer of the operational genes codingfor such a structure could be positively selected and rise tofixation in a species. Given the very high rate of HGT seenin Aquifex genome, it is not unreasonable to assume thatthe proteobacterial type of outer membrane might havebeen transferred to Aquificales. Similarly, the close rela-tionship found between epsilon-Proteobacteria andAquificales in trees based on cytochromes b and c mightalso come from a HGT of a whole operon, as concludedby Schutz et al. [48]. On the contrary, our counting analy-sis confirms that informational genes are less prone toHGT than operational genes, and their signal clustersAquificales and Thermotogales.

Further difficulties to resolve the tree of Bacteria

A possible approach to uncover a putative species tree ofBacteria, or at least a tree for a core set of bacterial genes,would be to remove transferred genes from a dataset, con-catenate all genes that have not been detected as havingbeen transferred, and use them to build a phylogenetictree. Such an approach would be expected to yield bettertrees, with higher bootstrap supports. However, the phyl-ogeny obtained on the concatenate in the same condi-tions as before (without recoding) but after removal of therplL-rpoB-rpoC genes does not show a significantly bettersupport for most of its nodes than the phylogeny shownin Fig. 3 (average bootstrap support for the tree withoutthe three genes, 90.9, and for the tree with all genes, 90.3;p-value = 0.17 with a Student paired t-test, p-value = 0.288with a Wilcoxon signed rank test). This is probably due tothe fact that bootstrap supports increase with the number

Page 77: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 14 of 18

(page number not for citation purposes)

of characters; the length parameter therefore counters theexpected positive effect associated with the removal of dis-cordant signal. Topologically, both trees are highly con-gruent, with the main noticeable difference being theplacement of Fusobacterium nucleatum, which leaves itsposition as sister-group to Thermotogales and Aquificalesin Fig. 3 to nest inside the Firmicutes as a sister group toMycoplasma. This placement might stem from a longbranch attraction, as both Mycoplasma and Fusobacteriumhave long terminal branches, or alternatively might revealthe true history of Fusobacterium nucleatum, as suggestedby Mira and co-workers [49]. Certainly this organismdeserves further study, possibly with techniques such asthose that were used in this article.

It is interesting to note that the removal of genes thoughtto have been transferred has not improved the phylogeny.A most promising avenue for further research in deep phy-logenies would probably involve the development ofmodels explicitly taking into account HGT, as proposedby Suchard [50] or, in other contexts, by Edwards, Liu andPearl [51,52] and Ané et al. [53]. HGTs should be mod-elled as a genuine biological phenomenon on equal foot-ing with vertical descent to represent the evolution ofbacterial genomes. The resulting species tree would corre-spond to the history of those genome parts that have beenvertically inherited at any time during evolution. The ver-tically inherited portions of a genome at a given time neednot be vertically inherited at all time, so that a species treecould be inferred as long as, at any time, some vertical sig-nal could be recovered.

Another additional difficulty might be that the gene is notnecessarily the atomic unit of transfer: transfers may affectonly parts of a gene, through recombination. In thisrespect, the analysis of Figure 5 reveals a striking patternin the Initiation Factor 2 gene (infB, green line). In thislarge gene (the Aquifex aeolicus protein is 805 amino-acidslong), the curve of the difference in log-likelihoodsbetween the epsilon-proteobacterial and the thermotogalpositions of Aquificales first decreases for about half itslength, and then increases. This pattern is suggestive of arecombination event inside the gene.

To test for recombination, we divided the infB alignmentin two at the point where this curve changes trend andbuilt phylogenetic trees for both partial alignments (Fig.7). In the first resulting tree, Aquificales plus Fusobacteriumnucleatum make together a sister group to Thermotogalesplus Deinococcus/Thermus. In the second tree, Aquificalesare a sister group to a subclade of Firmicutes. These twobranchings are consistent with the slope of the curve ofFig. 5, first descending, as Aquificales are close to Thermo-togales, and then ascending, as Aquificales are far fromThermotogales. To assess whether the differences in the

topologies were significant, Consel was used [42] on theselast two trees. The first part of the alignment stronglyrejected the tree obtained for the second part (AU test p-value: 4.10-36; SH and KH p-value: 0), and vice versa (AUtest p-value: 1.10-06; SH and KH p-value: 0). Therefore astrong signal for recombination within the gene infB isfound, possibly between Firmicutes and Aquificales.

This indicates that the unit of transfer between Bacteria isnot necessarily the gene, but can also be parts of a gene.Models aiming at resolving the bacterial tree may need totake this additional complexity into account.

ConclusionOverall, the signal in favour of a close relationshipbetween Aquificales and epsilon-Proteobacteria has beenshown to be coming from a lateral transfer and not verti-cally inherited, both in protein phylogenies and in cladis-tic analyses. A large HGT involving three consecutivegenes encoding two RNA polymerase subunits and aribosomal protein has been detected. This large genetransfer between epsilon-Proteobacteria and Aquificalescan be understood in terms of a shared ecological niche:some epsilon-Proteobacteria are indeed found in hyper-thermophilic environments [54].

The present single-gene analyses suggested that genetransfers may have frequently occurred between Aquifi-cales and various Gracilicutes and Proteobacteria in par-ticular, which explains why cladistic analyses of raregenomic changes or of domain contents often placeAquifex inside Gracilicutes.

Bacterial phylogeny is crucial to understand the evolutionof the biosphere, as it provides a backbone permitting tointegrate the evolution of life as revealed from molecularphylogenies with the history of the earth, as dug up bygeology. There is no doubt that HGT has played a majorrole in the evolution of Prokaryotes, to the point thatthere might be no gene that has never undergone HGT;however a few gene families may have seldom been trans-ferred, and they might bear sufficient signal to unveil thevertical history of the genome, provided powerful compu-tational methods modelling both gene transfers and intra-genic recombination are developed.

Nonetheless, because Aquificales are often found groupedwith Thermotogales, and because this phylogenetic signaldoes not seem to result from known artefacts such as longbranch attraction or compositional bias, if there is a spe-cies tree in Bacteria, Aquificales are to be considered as asister group to Thermotogales. This clarification does notdramatically affect the scenario for the evolution of lifeproposed by Cavalier-Smith [9], except that Aquificalesdiverged earlier than proposed. However the present

Page 78: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 15 of 18

(page number not for citation purposes)

results question the methodology used to build this sce-nario because the rare genomic changes method requiresthat HGT does not affect used marker genes. In the case ofthe Aquificales, we have shown that this requirement isnot fulfilled.

MethodsWhole phylome analysis

In order to get a whole genome view of Aquificales phylo-genetic relationships, we queried the HOGENOM data-base (release 03, October 2005) using the TreePatternprogram in FamFetch [55]. HOGENOM is a database thatclusters sequences from whole genomes into homologousgene families, and builds trees based on these familieswith PhyML using a gamma law with 4 classes of substitu-tion rates, with estimated alpha parameter and proportionof invariable sites. Trees corresponding to all 892 familiesin which there was a sequence from Aquifex aeolicus wereautomatically analysed, and each sequence from Aquifex

was classified according to what group of species appearedas its closest neighbour, not taking into account branchsupport or branch length. This gave counts of Aquifexgenes found close to Thermotogales, Firmicutes, epsilon-Proteobacteria, etc... Cases where Aquifex genes werefound close to a non-monophyletic group of species werediscarded, which left 578 gene trees. These counts werefurther classified into two functional categories, "informa-tional genes" and "non-informational genes", throughTIGRFAM annotations [56]. A functional category couldbe determined for 351 families. "Informational genes"were genes classified in TIGRFAMs whose function waspart of "Transcription", "DNA metabolism", "Protein syn-thesis"; "non-informational genes" were those whose rolewas part of other major functional classes.

Concatenate assembly

Nearly universal gene families which had only one copyper genome were used to minimize problems of ill-

Unrooted trees corresponding to the infB geneFigure 7Unrooted trees corresponding to the infB gene. Left: tree corresponding to the first 301 sites. Right: tree correspond-ing to the remaining 246 sites. Colors as in Fig. 3.

SOLI USACID_BA

DEHA_ETDEHA_CB

ROSE_SPCHLO_AU

HERP_AUFUSO_NU

SULF_AZAQUI_AE

THER_NETHER_MATHER_PE

FERV_NOTHER_ME

DEIN_GEDEIN_RA

THER_TH1THER_TH

THER_TECLOS_PEHALO_OR

SYNE_ELTHER_EL

ANAB_SPTRIC_ER

PROC_MAGLOE_VI

SYMB_THMOOR_TH

RICK_PRMYCO_GE

UREA_PAMYCO_PE

MESO_FLONIO_YEASTE_YE

ENTE_FALACT_AC

LIST_MOGEOB_KA

BACI_SURUBR_XY

BIFI_LOLEIF_XY

MYCO_BONOCA_FA

CORY_DISTRE_CO

PROP_ACLEPT_IN

BORR_BUBORR_GA

TREP_DEDESU_PS

BDEL_BAANAE_DE

DESU_ACGEOB_ME

RALS_SOBURK_MA

BORD_PEAZOA_SP

NEIS_GOCHRO_VI

NITR_EUPAST_MU

ESCH_COBUCH_AP

PSEU_AECHLO_TEPELO_LU

PROS_AEBACT_FRBACT_TH

PORP_GIFLAV_JO

CYTO_HUSALI_RU

WOLI_SUHELI_PY

CAMP_JEAGRO_TUBART_HE

CAUL_VIMAGN_SP

BLAS_MARHOD_BA

KUEN_STPARA_SP

CHLA_CACHLA_PN

CHLA_MUCHLA_TR

0.1

CHLAMYDIALES

PLANCTOMYCETES

SPIROCHAETES

FIRMICUTES

FIRMICUTESPROTEOBACTERIA

MYCOPLASMA

FIRMICUTES

THERMUS/DEINOCOCCUS

THERMOTOGALES

AQUIFICALES

PROTEOBACTERIA

PROTEOBACTERIA

BACTEROIDETES/CHLOROBI

CHLOROFLEXI

PROTEOBACTERIA

ACTINOBACTERIA

CYANOBACTERIA

SYNE_ELTHER_EL

PROC_MAANAB_SP

TRIC_ERGLOE_VI

THER_NETHER_MA

THER_PEFERV_NO

THER_MEBLAS_MA

RHOD_BAKUEN_ST

FUSO_NUHALO_OR

MOOR_THCLOS_PE

THER_TERUBR_XY

SYMB_THBIFI_LOLEIF_XY

STRE_COMYCO_BO

CORY_DINOCA_FA

PROP_ACMESO_FL

GEOB_KABACI_SU

ENTE_FALACT_AC

LIST_MOSULF_AZ

AQUI_AEWOLI_SU

HELI_PYCAMP_JE

DEIN_GEDEIN_RA

THER_TH1THER_TH

UREA_PAMYCO_PE

MYCO_GEONIO_YE

ASTE_YEPROS_AE

ANAE_DECHLA_PN

CHLA_CACHLA_MUCHLA_TR

PARA_SPPAST_MUESCH_CO

BUCH_APPSEU_AE

BORD_PEAZOA_SP

RALS_SOBURK_MA

NITR_EUNEIS_GO

CHRO_VIAGRO_TU

CAUL_VIBART_HE

RICK_PRBDEL_BA

DESU_PSMAGN_SP

SOLI_USACID_BA

DESU_ACGEOB_ME

DEHA_ETDEHA_CB

ROSE_SPHERP_AU

CHLO_AUCHLO_TE

PELO_LUSALI_RU

BACT_FRBACT_TH

CYTO_HUPORP_GI

FLAV_JOLEPT_IN

BORR_BUBORR_GA

TREP_DE

0.2

SPIROCHAETES

BACTEROIDETES/CHLOROBI

CHLOROFLEXI

PROTEOBACTERIA

CHLAMYDIALES

MYCOPLASMA

PROTEOBACTERIA

AQUIFICALES

FIRMICUTES

ACTINOBACTERIA

FIRMICUTES

PLANCTOMYCETES

THERMOTOGALES

CYANOBACTERIA

THERMUS/DEINOCOCCUS

BACTEROIDETES/CHLOROBIPROTEOBACTERIA

Page 79: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 16 of 18

(page number not for citation purposes)

defined orthology. Consequently, gene families from theHOGENOM database of families of homologous genes(release 03, October 2005) that displayed a wide speciescoverage with no or very low redundancy in all specieswere selected. This provided 70 gene families. Sequencesfrom representative genomes from Archaea were retrievedfrom these families, and sequences from genomes notpresent in the release 03 of HOGENOM but whose phyl-ogenetic position was interesting were included in thefamilies. These studied genomes are listed in Additionalfiles 1, 2 and 3 and were downloaded from the JointGenome Institute [57], The Institute for GenomicResearch [58] or the National Center for BiotechnologyInformation [59], and were searched for homologousgenes using BLAST [60]; only the best hit was retrieved.The gene families were subsequently aligned using MUS-CLE v3.52 [61] and submitted to a phylogenetic analysisusing the NJ algorithm [62] with Poisson distances asimplemented in Phylo_Win [63]. During this step, fami-lies in which there seemed to be a gene transfer between abacterial species and Archaea were discarded, as well asamino-acid synthetases, which are known to be prone toHGT [47]. In the rare families where there were twosequences from the same species, the sequence showingthe largest terminal branch length or whose position wasmost at odds with the NCBI classification was discarded.This whole process provided 56 gene families and 94 bac-terial species. Only bacterial sequences were used in therest of the study, because our focus is on the bacterial phy-logeny itself. The 56 families were submitted to Gblocks[64] to discard parts of the alignments that were unrelia-ble, but using a non-stringent site selection, because thesubsequent analyses should permit to sort biased fromgenuine signal. Consequently, the following Gblocksparameters were used: the minimum numbers ofsequences used to define a conserved or a flanking posi-tion were set at 50% of the total number of sequences, theminimum length of a block was set at 2 sites, and all posi-tions could be kept by the algorithm, even if they con-tained gaps. The resulting alignments were thenconcatenated using ScaFos [65], following the order ofgenes along the Aquifex aeolicus genome. The amount ofmissing data was low, reaching 21% at its maximum inThermotoga petrophila.

Phylogenetic analyses

A phylogenetic tree was built from the concatenate underthe Maximum Likelihood criterion using PhyML v.2.4.4[66] with the JTT model [67], and a discretized gammalaw with 4 categories to model evolutionary rate variation.This first tree was used to compute site-specific evolution-ary rates using BppML from the Bio++ package [68],which allowed for the removal of saturated sites. A newtree was built using this refined alignment, with the sameparameters plus an estimated proportion of invariant sites

and with a non-parametric bootstrap analysis (200 repli-cates), and was used as a reference for the rest of the work.An estimated proportion of invariant sites was not used inthe previous analysis because it had not been imple-mented in the used version of Bio++. Noticeably, thetopology was found to be unchanged when Aquificaleswere removed from the alignment and the tree re-com-puted. Similarly, the topology was nearly identical whentwo free-living espsilon-Proteobacteria (SulfurovumNBC37-1 and thermophilic Nitratiruptor SB155-2 [40],)were added, and the tree recomputed with PhyML v3.0;for this tree, the minimum of SH-like and chi2-based sup-port was computed instead of bootstrap support [69]. Anadditional test was performed to assess the impact of com-positional heterogeneity as well as saturation: the align-ment without saturated sites was recoded in 4 categories[70,37]. In this recoding, aromatic (FWY) and hydropho-bic (MILV) amino-acids were grouped in a single state,basic amino-acids (HKR) in another, acidic (DENQ)amino acids in one more state, and the fourth state con-tained all other amino acids (AGPST) to the exception ofcysteine which was coded as missing data. The recodedalignment was subjected to a phylogenetic analysis withthe GTR model [71], an estimated proportion of invariantsites, a gamma law discretized in 8 categories with itsalpha parameter estimated, and 200 bootstrap replicates.

The tree without the Aquificales was used as a scaffoldupon which all possible Aquificales positions were triedin turn. The likelihoods for each of these positions werecomputed using BppML from the Bio++ package. Evolu-tionary rates per site as well as likelihoods per site weresimultaneously inferred. Site evolutionary rates wereobtained by computing the average of the gamma law ratecategories weighted by their posterior probabilities.

The tree containing only the rplL-rpoB-rpoC genes wasobtained with PhyML as described above and with a non-parametric bootstrap analysis based upon 500 replicates.

Individual gene trees were built using PhyML with thesame parameters as above except that the gamma law wasdiscretized in 8 categories.

Concatenate segmentation and HGT identification

We wanted to know which was the most likely segmenta-tion in two segments of the alignment according to sitelikelihoods for all topologies. It was computed using Sar-ment [72] with the Maximum Predictive Partitioningalgorithm [43]. This algorithm was input a matrix con-taining the site log-likelihoods for all 181 topologiestested (obtained by placing the Aquificales in all possiblepositions in the backbone bacterial phylogeny) and forthe whole alignment. The best log-likelihood of a givensegmentation is the sum of the best log-likelihoods of its

Page 80: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 17 of 18

(page number not for citation purposes)

segments, that are computed as follows: on a segment, foreach of the 181 topologies tested, the log-likelihood of atopology is the sum of all site log-likelihoods on the align-ment. This procedure produces 181 log-likelihoods, themaximum of which is the best log-likelihood of this seg-ment. Once this maximum is found, it clearly associates amost likely topology to each segment of the alignment. Allstatistical analyses were done with the seqinR package[73] in R [74].

AbbreviationsHGT: Horizontal Gene Transfer; rRNA: ribosomal Ribo-Nucleic Acid; indel: insertion-deletion; MPP: MaximumPredictive Partitioning.

Authors' contributionsMG and BB designed the study. LG performed the seg-mentation analysis, and BB performed the other experi-ments. BB wrote most of the manuscript, which wasimproved by LG and MG.

Additional material

AcknowledgementsWe wish to thank Vincent Daubin, Anamaria Necsulea, Leonor Palmeira

and Sophie Abby for valuable discussions and help with R and the data. Pre-

liminary sequence data was obtained from The Institute for Genomic

Research through the website at http://www.tigr.org. Sequencing of Sulfuri-

hydrogenibium azorense Az-Fu was accomplished with support from NSF.

Sequencing of Thermotoga neapolitana DSM 4359 was accomplished with

support from DOE. This work was supported by Action Concertée Incita-

tive IMPBIO. We thank the Centre de Calcul de l'IN2P3 for providing com-

puter resources. Bastien Boussau acknowledges a PhD scholarship from the

Centre National de la Recherche Scientifique.

References1. Wolf YI, Rogozin IB, Grishin NV, Koonin EV: Genome trees and

the tree of life. Trends Genet 2002, 18:472-479.2. Deeds EJ, Hennessey H, Shakhnovich EI: Prokaryotic phylogenies

inferred from protein structural domains. Gen Res 2005,15:393-402.

3. Klenk HP, Meier TD, Durovic P, Schwass V, Lottspeich F, Dennis PP,Zillig W: RNA polymerase of Aquifex pyrophilus: implicationsfor the evolution of the bacterial rpoBC operon andextremely thermophilic bacteria. J Mol Evol 1999, 48:528-541.

4. Cavalier-Smith T: The neomuran origin of archaebacteria, thenegibacterial root of the universal tree and bacterial mega-classification. Int J Syst Evol Microbiol 2002, 52:7-76.

5. Coenye T, Vandamme P: A genomic perspective on the rela-tionship between the Aquificales and the epsilon-Proteobac-teria. Syst Appl Microbiol 2004, 27:313-322.

6. Griffiths E, Gupta RS: Signature sequences in diverse proteinsprovide evidence for the late divergence of the Order Aquifi-cales. Int Microbiol 2004, 7:41-52.

7. Cavalier-Smith T: Rooting the tree of life by transition analyses.Biol Direct 2006, 1:19-19.

8. Kunisawa T: Dichotomy of major bacterial phyla inferred fromgene arrangement comparisons. J of Theor Biol 2006,239:367-375.

9. Cavalier-Smith T: Cell evolution and Earth history: stasis andrevolution. Philos T R Soc B 2006, 361:969-1006.

10. Jain R, Rivera MC, Lake JA: Horizontal gene transfer amonggenomes: the complexity hypothesis. P Natl A Sci USA 1999,96:3801-3806.

11. Felsenstein J: Cases in which parsimony or compatibility meth-ods will be positively misleading. Syst Zool 1978, 27:401-410.

12. Brinkmann H, Giezen M van der, Zhou Y, Poncelin De Raucourt G,Philippe H: An empirical assessment of long-branch attractionartefacts in deep eukaryotic phylogenomics. Syst Biol 2005,54:743-757.

13. Weisburg WG, Giovannoni SJ, Woese CR: The Deinococcus-Thermus phylum and the effect of rRNA composition onphylogenetic tree construction. Syst Appl Microbiol 1989,11:128-134.

14. Foster PG, Hickey DA: Compositional bias may affect bothDNA-based and protein-based phylogenetic reconstruc-tions. J Mol Evol 1999, 48:284-290.

15. Schopf JW: Fossil evidence of Archaean life. Philos T R Soc B 2006,361:869-85.

16. Galtier N, Lobry JR: Relationships between genomic G+C con-tent, RNA secondary structures, and optimal growth tem-perature in prokaryotes. J Mol Evol 1997, 44:632-636.

17. Zeldovich KB, Berezovsky IN, Shakhnovich EI: Protein and DNAsequence determinants of thermophilic adaptation. PLoSComput Biol 2007, 3:e5-e5.

18. Rokas A, Holland PW: Rare genomic changes as a tool for phy-logenetics. Trends Ecol Evol 2000, 15:454-459.

19. Deckert G, Warren PV, Gaasterland T, Young WG, Lenox AL, Gra-ham DE, Overbeek R, Snead MA, Keller M, Aujay M, Huber R, Feld-man RA, Short JM, Olsen GJ, Swanson RV: The complete genomeof the hyperthermophilic bacterium Aquifex aeolicus. Nature1998, 392:353-358.

20. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH,Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L,Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cot-ton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, SuttonGG, Fleischmann RD, Eisen JA, White O, Salzberg SL, Smith HO, Ven-ter JC, Fraser CM: Evidence for lateral gene transfer betweenArchaea and bacteria from genome sequence of Thermo-toga maritima. Nature 1999, 399:323-329.

21. The HOGENOM database [http://pbil.univ-lyon1.fr/databases/hogenom3.html]

22. Zar JH: Biostatistical Analysis 4th edition. Upper Saddle River: PrenticeHall; 1999.

23. Leigh JW, Susko E, Baumgartner M, Roger AJ: Testing congruencein phylogenomic analysis. Syst Biol 2008, 57:104-115.

24. Bapteste E, Susko E, Leigh J, Ruiz-Trillo I, Bucknam J, Doolittle WF:Alternative methods for concatenation of core genes indi-cate a lack of resolution in deep nodes of the prokaryoticphylogeny. Mol Biol Evol 2008, 25:83-91.

Additional file 1The list of species used in the study, and their abbreviated names as

found in the figures of the article.

Click here for file

[http://www.biomedcentral.com/content/supplementary/1471-

2148-8-272-S1.xls]

Additional file 2Unrooted phylogenetic tree of Bacteria obtained after the addition of

two free-living epsilon-Proteobacteria, Sulfurovum NBC37-1 and

thermophilic Nitratiruptor SB155-2.

Click here for file

[http://www.biomedcentral.com/content/supplementary/1471-

2148-8-272-S2.jpeg]

Additional file 3The list of 56 HOGENOM gene families used to estimate species trees,

with the corresponding function description.

Click here for file

[http://www.biomedcentral.com/content/supplementary/1471-

2148-8-272-S3.xls]

Page 81: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:272 http://www.biomedcentral.com/1471-2148/8/272

Page 18 of 18

(page number not for citation purposes)

25. Yang Z: Maximum likelihood phylogenetic estimation fromDNA sequences with variable rates over sites: approximatemethods. J Mol Evol 1994, 39:306-314.

26. Lartillot N, Brinkmann H, Philippe H: Suppression of long-branchattraction artefacts in the animal phylogeny using a site-het-erogeneous model. BMC Evol Biol 2007, 1:S4-S4.

27. Brochier C, Philippe H: Phylogeny: a non-hyperthermophilicancestor for bacteria. Nature 2002, 417:244-244.

28. Battistuzzi FU, Feijao A, Hedges SB: A genomic timescale ofprokaryote evolution: insights into the origin of methano-genesis, phototrophy, and the colonization of land. BMC EvolBiol 2004, 4:44-44.

29. Bern M, Goldberg D: Automatic selection of representativeproteins for bacterial phylogeny. BMC Evol Biol 2005, 5:34-34.

30. Daubin V, Gouy M, Perrière G: A phylogenomic approach tobacterial phylogeny: evidence of a core of genes sharing acommon history. Genome Res 2002, 12(7):1080-1090.

31. Wagner M, Horn M: The Planctomycetes, Verrucomicrobia,Chlamydiae and sister phyla comprise a superphylum withbiotechnological and medical relevance. Curr Opin Biotech 2006,17:241-249.

32. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P:Toward automatic reconstruction of a highly resolved treeof life. Science 2006, 311:1283-1287.

33. Beiko RG, Harlow TJ, Ragan MA: Highways of gene sharing inprokaryotes. P Natl A Sci USA 2005, 102:14332-14337.

34. Ababneh F, Jermiin LS, Ma C, Robinson J: Matched-pairs tests ofhomogeneity with applications to homologous nucleotidesequences. Bioinformatics 2006, 22:1225-31.

35. Bowker AH: A test for symmetry in contingency tables. J AmStat Assoc 1948, 43:572-574.

36. Kreil DP, Ouzounis CA: Identification of thermophilic speciesby the amino acid compositions deduced from theirgenomes. Nucleic Acids Res 2001, 29:1608-15.

37. Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF,Philippe H: Detecting and overcoming systematic errors ingenome-scale phylogenies. Syst Biol 2007, 56:389-399.

38. Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE:Whole-genome analysis of photosynthetic prokaryotes. Sci-ence 2002, 298:1616-1620.

39. Susko E, Roger AJ: On reduced amino Acid alphabets for phyl-ogenetic inference. Mol Biol Evol 2007, 24:2139-50.

40. Nakagawa S, Takaki Y, Shimamura S, Reysenbach AL, Takai K,Horikoshi K: Deep-sea vent epsilon-proteobacterial genomesprovide insights into emergence of pathogens. Proc Natl A SciUSA 2007, 29:12146-12150.

41. Shimodaira H: An approximately unbiased test of phylogenetictree selection. Syst Biol 2002, 51:492-508.

42. Shimodaira H, Hasegawa M: CONSEL: for assessing the confi-dence of phylogenetic tree selection. Bioinformatics 2001,17:1246-1247.

43. Guéguen L: Segmentation by maximal predictive partitioningaccording to composition biases. Computational Biology, LNCS,2066 2001:32-45.

44. Newman AJ, Linn TG, Hayward RS: Evidence for co-transcriptionof the RNA polymerase genes rpoBC with a ribosomal pro-tein gene of escherichia coli. Mol Gen Genet 1979, 169:195-204.

45. Yamamoto M, Nomura M: Contranscription of genes for RNApolymerase subunits beta and beta' with genes for ribosomalproteins in Escherichia coli. P Natl A Sci USA 1978, 75:3891-3895.

46. Iyer LM, Koonin EV, Aravind L: Evolution of bacterial RNApolymerase: implications for large-scale bacterial phylogeny,domain accretion, and horizontal gene transfer. Gene 2004,23:73-88.

47. Wolf YI, Aravind L, Grishin NV, Koonin EV: Evolution of aminoa-cyl-tRNA synthetases-analysis of unique domain architec-tures and phylogenetic trees reveals a complex history ofhorizontal gene transfer events. Gen Res 1999, 9:689-710.

48. Schütz M, Brugna M, Lebrun E, Baymann F, Huber R, Stetter KO,Hauska G, Toci R, Lemesle-Meunier D, Tron P, Schmidt C, NitschkeW: Early evolution of cytochrome bc complexes. J Mol Biol2000, 300:663-675.

49. Mira A, Pushker R, Legault BA, Moreira D, Rodríguez-Valera F: Evo-lutionary relationships of Fusobacterium nucleatum basedon phylogenetic analysis and comparative genomics. BMCEvol Biol 2004, 4:50-50.

50. Suchard MA: Stochastic models for horizontal gene transfer:taking a random walk through tree space. Genetics 2005,170:419-31.

51. Edwards SV, Liu L, Pearl DK: High-resolution species trees with-out concatenation. P Natl A Sci USA 2007, 104:5936-5941.

52. Liu L, Pearl DK: Species trees from gene trees: reconstructingbayesian posterior distributions of a species phylogeny usingestimated gene tree distributions. Syst Biol 2007, 56:504-14.

53. Ané C, Larget B, Baum DA, Smith SD, Rokas A: Bayesian estima-tion of concordance among gene trees. Mol Biol Evol 2007,24:412-26.

54. Nakagawa S, Takaki Y, Shimamura S, Reysenbach AL, Takai K,Horikoshi K: Deep-sea vent epsilon-proteobacterial genomesprovide insights into emergence of pathogens. P Natl A Sci USA2007, 104:12146-12150.

55. Dufayard JF, Duret L, Penel S, Gouy M, Rechenmann F, Perrière G:Tree pattern matching in phylogenetic trees: automaticsearch for orthologs or paralogs in homologous genesequence databases. Bioinformatics 2005, 21:2596-2603.

56. Haft DH, Selengut JD, White O: The TIGRFAMs database of pro-tein families. Nucleic Acids Res 2003, 31:371-373.

57. The Joint Genome Institute [http://www.jgi.doe.gov/]58. The Institute for Genomic Research [http://www.tigr.org/]59. The National Center for Biotechnology Information [http://

www.ncbi.nlm.nih.gov/]60. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lip-

man DJ: Gapped BLAST and PSI-BLAST: a new generation ofprotein database search programs. Nucleic Acids Res 1997,25:3389-3402.

61. Edgar RC: MUSCLE: multiple sequence alignment with highaccuracy and high throughput. Nucleic Acids Res 2004,32:1792-1797.

62. Saitou N, Nei M: The neighbor-joining method: a new methodfor reconstructing phylogenetic trees. Mol Biol Evol 1987,4:406-425.

63. Galtier N, Gouy M, Gautier C: SEAVIEW and PHYLO_WIN:two graphic tools for sequence alignment and molecularphylogeny. Comput Appl Biosci 1996, 12:543-548.

64. Castresana J: Selection of conserved blocks from multiplealignments for their use in phylogenetic analysis. Mol Biol Evol2000, 17:540-552.

65. Roure B, Rodriguez-Ezpeleta N, Philippe H: SCaFoS: a tool forselection, concatenation and fusion of sequences for phylog-enomics. BMC Evol Biol 2007, 7(Suppl 1):S2-S2.

66. Guindon S, Gascuel O: A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. SystBiol 2003, 52:696-704.

67. Jones DT, Taylor WR, Thornton JM: The rapid generation ofmutation data matrices from protein sequences. Comput AppliBiosci 1992, 8:275-282.

68. Dutheil J, Gaillard S, Bazin E, Glemin S, Ranwez V, Galtier N, BelkhirK: Bio++: a set of C++ libraries for sequence analysis, phylo-genetics, molecular evolution and population genetics. BMCBioinformatics 2006, 7:188-188.

69. Anisimova M, Gascuel O: Approximate likelihood-ratio test forbranches: A fast, accurate, and powerful alternative. Syst Biol2006, 55:539-552.

70. Hrdy I, Hirt RP, Dolezal P, Bardonová L, Foster PG, Tachezy J, EmbleyTM: Trichomonas hydrogenosomes contain the NADH dehy-drogenase module of mitochondrial complex I. Nature 2004,432:618-622.

71. Lanave C, Preparata G, Saccone C, Serio G: A new method for cal-culating evolutionary substitution rates. J Mol Evol 1984,20:86-93.

72. Guéguen L: Sarment: Python modules for HMM analysis andpartitioning of sequences. Bioinformatics 2005, 21:3427-3428.

73. Charif D, Thioulouse J, Lobry JR, Perrière G: Online synonymouscodon usage analyses with the ade4 and seqinR packages. Bio-informatics 2005, 21:545-547.

74. R Development Core Team: R: A language and environment forstatistical computing. R Foundation for Statistical ComputingVienna, Austria; 2005. ISBN 3-900051-07-0

Page 82: Early Evolution and Phylogeny
Page 83: Early Evolution and Phylogeny

4♠♣r♦♥ t♦s ♦ P②♦♥t

♦♥strt♦♥

♣r♥ rt s♦ tt ♦♠♣♦st♦♥ tr♦♥t② s r② t ♣r♦♠ ♦r ♣②♦♥t r♦♥strt♦♥ s t ♥♥♦t rs s② st ♣♣r♦ t♦ r♦♥strt♥ ♣②♦♥② ♥ tr r ♦♠♣♦st♦♥ ss♥♦s ttr ♠♦s ♦ ♦t♦♥

s ttr ♠♦s ♦ ♦t♦♥ r ♥♦♥♦♠♦♥♦s s st♦♥ ♥ r t♦t t♦ ♥s② t♦ ♦r t s t② r♥r t ♦♣r♦ss ♦ ♦t♦♥ rrrs r s♦ tt ts rrrst② ♦s ♥♦t♣r♥t r♦♠ s♥ ss ♥ ♥t ♦rt♠s s ♣r♦♦♦♦♥♣t ♣r♦r♠♠ ♥P② ♥ ♦rt♠ tt ♠♣♠♥ts ♠♦ s♥ t♦ t ♦♠♣♦st♦♥ tr♦♥t②

s rt s ♥ ♣s ♥ ②st♠t ♦♦②

Page 84: Early Evolution and Phylogeny

Syst. Biol. 55(5):756–768, 2006

Copyright c© Society of Systematic Biologists

ISSN: 1063-5157 print / 1076-836X online

DOI: 10.1080/10635150600975218

Efficient Likelihood Computations with Nonreversible Models of Evolution

BASTIEN BOUSSAU AND MANOLO GOUY

Laboratoire de Biometrie et Biologie Evolutive (UMR 5558); CNRS; Universite Lyon 1, 43 boulevard 11 nov 1918, 69622, Villeurbanne Cedex, France;E-mail: [email protected] (B.B.)

Abstract.—Recent advances in heuristics have made maximum likelihood phylogenetic tree estimation tractable for hundredsof sequences. Noticeably, these algorithms are currently limited to reversible models of evolution, in which Felsenstein’spulley principle applies. In this paper we show that by reorganizing the way likelihood is computed, one can efficientlycompute the likelihood of a tree from any of its nodes with a nonreversible model of DNA sequence evolution, and hencebenefit from cutting-edge heuristics. This computational trick can be used with reversible models of evolution without anyextra cost. We then introduce nhPhyML, the adaptation of the nonhomogeneous nonstationary model of Galtier and Gouy(1998; Mol. Biol. Evol. 15:871–879) to the structure of PhyML, as well as an approximation of the model in which the set ofequilibrium frequencies is limited. This new version shows good results both in terms of exploration of the space of treetopologies and ancestral G+C content estimation. We eventually apply it to rRNA sequences slowly evolving sites andconclude that the model and a wider taxonomic sampling still do not plead for a hyperthermophilic last universal commonancestor. [Efficient algorithm; LUCA; maximum likelihood; molecular phylogeny; nonreversible model of evolution; PhyML;origin of life; root of life.]

Research in molecular phylogeny aims at reconstruct-ing historical relations between genes or species whiletrying to capture the true nature of the evolutionaryprocess itself. Both can be estimated at the same timethrough the use of statistical modeling. Maximum likeli-hood or the Bayesian framework permit the estimation ofparameters of the evolutionary model such as the tran-sition/transversion ratio, the equilibrium base compo-sition, and the tree itself, topology and branch lengthsincluded. Optimizing all these parameters is computa-tionally intensive: the number of possible topologies in-creases factorially with the number of taxa considered,which makes it necessary to use heuristics when explor-ing the space of tree topologies. Most recent algorithms(e.g., PhyML [Guindon and Gascuel, 2003], RAxML[Stamatakis et al., 2005]) are able to find trees with ex-cellent likelihood scores for hundreds of sequences, butonly with reversible models of evolution. All these re-versible models are homogeneous and stationary, i.e.,suppose that state evolution is constant all over thetree. If this hypothesis were true, sequences sharing acommon ancestor would have the same expected basefrequencies.

More precisely, a process of evolution is homogeneouswhen the state distribution probability simply dependson the time separating it from a given past state dis-tribution probability and not on the branch in the tree:homogeneity is the feature of a process of evolution thatis constant in pattern over the whole tree. On the otherhand, stationarity is the feature of a process of evolu-tion that keeps the state distribution probability constantover the whole tree: the probability to draw a given stateis the same wherever on the tree the sampling is done.A process can be stationary and not be homogeneous,as is the case for Ziheng Yang’s codon model in whichthe nonsynonymous-to-synonymous ratio varies acrossbranches while codon equilibrium frequencies remainconstant all over the tree (Yang, 1998). On the contrary,nonstationarity induces nonhomogeneity, as the processof evolution depends upon the equilibrium frequencies.

The analysis of extant sequences shows that homol-ogous genes vary widely in their composition. As theyall stem from a common ancestor, this evidences that se-quence evolution is at least not stationary: two sequencesin two different species or at two different periods evolvetowards different compositions.

The use of nonhomogeneous and nonstationary mod-els that account for this variability in evolution permitsminimizing compositional biases and hence improvingphylogenetic reconstructions (Galtier and Gouy, 1998;Tarrio et al., 2001; Herbeck et al., 2005). Unfortunately,removing the homogeneity and stationarity hypothesesimplies abandoning reversibility, and hence prevents onefrom using the most efficient algorithms, when those par-ticularly variable-rich models would most eagerly needit.

In this article we show in the general case that it is pos-sible to use recent algorithms with nonreversible modelsof sequence evolution. We first explain how the likeli-hood of a tree is computed and how the reversibilityproperty is used in recent heuristics to avoid dispens-able calculations during tree space search. We then provethat the same computational trick can be used with non-reversible models of evolution through a reorganizationof the way likelihood is computed. As reversible mod-els of evolution can also be used in this framework, thiswork can be considered as a generalization of the usualformulas, which considerably broadens the amount ofmodels that can be used for a phylogenetic analysis, asall nonhomogeneous as well as nonstationary modelscan be used.

Eventually we report nhPhyML, the first adaptationof PhyML (Guindon and Gascuel, 2003), a very fast andefficient algorithm, to a nonreversible model of evolu-tion; namely, the implementation of Tamura’s model ateach branch of a tree introduced by Galtier and Gouy(Tamura, 1992; Galtier and Gouy, 1998). We also in-troduce nhPhyML-Discrete, an approximation of nh-PhyML. This implementation shows better performancethan nhPhyML in the exploration of the space of tree

756

Page 85: Early Evolution and Phylogeny

2006 BOUSSAU AND GOUY—EFFICIENT TREE BUILDING WITH A NONREVERSIBLE MODEL 757

topologies and similar accuracy in the estimation of theancestral G+C content. Eventually, we apply nhPhyMLto ribosomal RNA slowly evolving sites and concludethat a wider taxonomic sampling than in Galtier et al.(1999) still does not support a hyperthermophilic lastuniversal common ancestor.

COMPUTING THE LIKELIHOOD OF A TREE FROM ANY

NODE UNDER A NONREVERSIBLE MODEL OF DNASEQUENCE EVOLUTION

Computing the Likelihood of a Tree

We first explain how one computes the likelihood ofa phylogenetic tree with DNA sequences using the fol-lowing example (Fig. 1).

Most commonly, sites are supposed to evolve inde-pendently of each other: a site does not depend on itsneighbors’ states but only on its past state. As a conse-quence, the likelihood of a tree for a whole sequence isobtained by multiplying all the likelihoods obtained atsingle sites.

The likelihood Ls of the tree given in Figure 1 for asingle site s is computed as follows:

Ls =∑

x∈

(

P(R = x) ×∑

z∈

[Pxz(lA, vA)Ls,low(RA)(A = z)]

×∑

y∈

Pxy(lU , vU)∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

)

(1)

where Pxy(lA, vA) is the probability for base x to change

FIGURE 1. Example rooted tree for likelihood computation. Thistree is composed of a root R, an internal node U, three other nodesor leaves A, B, and C, and four branches of length lA, lB , lC , lU , andother evolutionary parameters vA, vB , vC , vU . We are here interestedin the likelihood of the tree for a single site. The internal node statesare unknown and then represented as variables x at node R, y at nodeU, z at node A, q at node B, and v at node C . Arrows represent theevolutionary direction, from the root of the tree to its leaves.

into base y along a branch of length lA and other evo-lutionary parameters vA, P(R = x) is the probability tohave base x at the root R, and = A, T, C, G is theset of possible DNA bases. Ls,low(RA)(A = z) is the lowerconditional likelihood (Felsenstein, 1981) of observingthe data downstream from branch RA conditionally onthe underlying subtree and on having base z at node A.For each subtree, one can define four conditional likeli-hoods, one for each DNA base. Once these conditionallikelihoods have been computed for a subtree, as longas its topology and branch lengths do not change, theycan be re-used if one moves the whole subtree aroundthe topology. This property is used in recent heuristics tosearch for the most likely phylogenetic tree. These condi-tional likelihoods are defined as lower, in the sense thatthey do not contain the root.

Lower conditional likelihoods are defined recursively.For a leaf C :

Ls,low(UC)(C = v) =

1 if base v is at site s of leaf C

0 otherwise(2)

And for a subtree whose root is in U:

Ls,low(RU)(U = y) =∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

(3)

Computing the Likelihood When the Model of EvolutionIs Reversible

Reversibility.—When computing the likelihood of aphylogenetic tree, a root R must be specified. If the modelis homogeneous and reversible, the process of evolutionis stationary: wherever the root is, its base proportionsare the same, i.e., they are the equilibrium frequencies ofthe process, noted π : P(R = x) = πx. (1) can be rewritten:

Ls =∑

x∈

(

πx

y∈

Pxy(lU , vU)

×∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

×∑

z∈

[Pxz(lA, vA)Ls,low(RA)(A = z)]

)

(4)

Reversibility means that, averaged over the whole se-quence, the flux from one base to another is equal to theflux from this other base back to the first one:

πx Pxy(l, v) = πy Pyx(l, v)

Page 86: Early Evolution and Phylogeny

758 SYSTEMATIC BIOLOGY VOL. 55

Supposing the model used is reversible, as is the casewith most current models of DNA sequence evolution,we can rewrite the likelihood in (4) as:

Ls =∑

y∈

(

πy

x∈

Pyx(lU , vU)

×∑

z∈

[Pxz(lA, vA)Ls,low(RA)(A = z)]

×∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

)

(5)

Expression (5) can be read as if the root was placed atnode U. The root can therefore be placed at any node, onany branch of the tree, a property named “pulley princi-ple” by Felsenstein (1981), and widely used in heuristicsto find most likely trees. Considering Figure 1, this makesarrows meaningless.

The possibility to place the root of the tree wher-ever is needed is thoroughly used in recent heuristics tothe problem of the most likely phylogenetic tree. Thoseheuristics usually explore the space of tree topologiesby applying local rearrangements: “nearest neighbor in-terchange” (NNI) swaps two subtrees around an inter-nal edge (used in PhyML [Guindon and Gascuel, 2003]),“subtree pruning and re-grafting” removes a subtreefrom the whole tree and places it on another edge (usedin RAxML [Stamatakis et al., 2005]), and “tree bisectionand reconnection” splits the tree into two subtrees thatare rewired by any of their edges. In all these rearrange-ments, whole subtrees remain fixed: their branches stillhave the same parameters and their internal topology isunchanged. By defining conditional likelihoods for fixedsubtrees, and by placing the root at the rearrangementpoint, one can avoid much computation when explor-ing the space of tree topologies. As the root is placed atthe rearrangement point, all the conditional likelihoodscan be considered as lower from a mathematical point ofview since none contains the root.

The most efficient algorithms first compute condi-tional likelihoods for all subtrees, before they computean approximate likelihood for topologies obtained witha given sort of rearrangement, using the previouslyobtained conditional likelihoods. They apply the mostpromising rearrangements, either all at once (PhyML)or as soon as it is tried (RAxML), optimize evolutionaryparameters of the new tree, and eventually start a newround of conditional likelihood calculation and explo-ration of the space of tree topologies, until convergence.

Computing the Likelihood of a Tree witha Nonreversible Model

Upper conditional likelihoods.—In the nonreversiblecase, upper conditional likelihoods can be defined to ac-

count for the true root of the tree and the evolutionarydirections of the branches.

We define the upper conditional likelihood at branchRU in the nonreversible case as:

Ls,upp(RU)(R = x) = P(R = x)

×∑

z∈

[Pxz(lA, vA)Ls,low(RA)(A = z)]

(6)

The underlying branches’ upper likelihoods can alsobe defined recursively:

Ls,upp(U B)(U = y) =∑

x∈

[Pxy(lU , vU)Ls,upp(RU)(R = x)]

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

(7)

The main difference lies in the incorporation of theroot nucleotide frequencies in the definition of the upperconditional likelihoods. This way, the root is not movedaround the topology, and the evolutionary direction isconserved.

We now prove that the expression of the tree likelihoodis not changed when computed from other nodes of thetree using upper and lower likelihoods.

Recurrence.—We show that for any branch, say UB,

Ls =∑

y∈

[Ls,upp(U B)(U = y)]

×∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

We initialize the recurrence with branch RU:

Ls, RU =∑

x∈

Ls,upp(RU)(R = x)

y∈

[Pxy(lU , vU)Ls,low(RU)(U = y)]

(8)

We expand it:

Ls, RU =∑

x∈

(

P(R = x)∑

z∈

[Pxz(lA, vA)Ls,low(RA)(A = z)]

×∑

y∈

Pxy(lU , vU)∑

q∈

[Pyq (lB , vB)Ls,low(U B)

(B = q )]∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

)

Page 87: Early Evolution and Phylogeny

2006 BOUSSAU AND GOUY—EFFICIENT TREE BUILDING WITH A NONREVERSIBLE MODEL 759

With (1):

Ls, RU = Ls

The likelihood computed on the branch RU is the sameas the one computed at the root. This is also true for theedge RA.

We now suppose we know the likelihood at a branchRU and are interested in the likelihood at an underlyingbranch UB.

Ls,U B =∑

y∈

Ls,upp(U B)(U = y)

×∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

(9)

We expand it, using (7):

Ls,U B =∑

y∈

x∈

[Ls,upp(RU)(R = x)Pxy(lU , vU)]

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

×∑

q∈

[Pyq (lB , vB)Ls,low(U B)(B = q )]

And then rearrange it:

Ls,U B =∑

x∈

Ls,upp(RU)(R = x)

×∑

y∈

[Pxy(lU , vU)Ls,low(RU)(U = y)]

So that we have proved, with (8):

Ls,U B = Ls, RU = Ls

By recurrence, we have shown that the likelihoodvalue can be computed from any branch of the tree,which, as will be seen, is particularly useful when ex-ploring the space of tree topologies.

Exploring the Space of Tree Topologies witha Nonreversible Model

Efficient heuristics explore the space of tree topolo-gies by local rearrangements such as nearest neighbor in-terchanges (NNIs). In the nonreversible case, evolutionproceeds from the root of the tree to its leaves, so onemust keep this evolutionary direction unchanged. Forthis purpose, a distinction is made between the branchon which the root is placed and the others. Figure 2ashows that being able to compute the likelihood from

branch UB permits to define four subtrees whose condi-tional likelihoods can be used as constants to estimate thelikelihoods of the three alternate topologies. In the non-reversible case, one uses upper conditional likelihood forthe root-containing subtree and lower likelihoods for allother subtrees. With no loss of generality, NNIs aroundbranch UB only require exchanges of subtrees havinglower conditional likelihoods.

The likelihood of topology 1 Figure 2a can be com-puted with:

Ls,a1 =∑

x∈

Ls,upp(RU)(R = x)∑

y∈

[

(Pxy(lU , vU)

×∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

×∑

q∈

Pyq (lB , vB)∑

t∈

[Pqt(lD, vD)Ls,low(B D)(D = t)]

×∑

w∈

[Pqw(lE , vE )Ls,low(B E)(E = w)]

]

The likelihoods of topologies 2 and 3 are computedsimilarly.

In case the internal branch around which NNIs are tobe done possesses the root, the situation is slightly differ-ent (Fig. 2b). As in the above case, three configurationscan be reached through interchanges between subtrees,but here all conditional likelihoods are lower.

The likelihood of the topology 1 (Figure 2b) can becomputed as follows:

Ls,b1 =∑

x∈

(

P(R = x)∑

z∈

Pxz(lA, vA)

×∑

i∈

[Pzi (lF , vF )Ls,low(AF )(F = i)]

×∑

j∈

[Pzj (lG , vG)Ls,low(AG)(G = j)]

×∑

y∈

Pxy(lU , vU)∑

q∈

[Pyq (lB , vB)Ls,low(U B)

(B = q )]∑

v∈

[Pyv(lC , vC )Ls,low(UC)(C = v)]

)

The likelihoods of topologies 2 and 3 are computedsimilarly.

It can be shown that by only doing NNIs, the root canbe moved throughout the whole tree, to the exceptionof leaves: as NNIs keep internal branches internal (andexternal branches external), the root cannot be movedto a leaf.

Thus, the exploration through NNIs of the space ofrooted tree topologies with a nonreversible model is as

Page 88: Early Evolution and Phylogeny

760 SYSTEMATIC BIOLOGY VOL. 55

FIGURE 2. Use of conditional likelihoods when applying NNIs (Nearest Neighbor Interchanges) to an internal branch. (a) NNIs are appliedto internal branch U B. The root node is situated in the subtree noted “UPP.” The three other subtrees, named D, C , and E , are all lower. Byswapping lower subtrees, three different topologies noted 1, 2, 3 are obtained. One can use the conditional likelihoods (upper in one case, lowerin the three other cases) of the four subtrees to speed up the likelihood computation for these three alternate topologies. (b) The root node is nowsituated on the branch around which the NNI is done. All the conditional likelihoods are therefore lower, so any swap can be done.

exhaustive as the exploration of the space of unrootedtrees with a reversible model, except that the root cannotgo to an external branch.

Once again, and in the same way as computing thelikelihood of a tree with a nonreversible model of evo-lution can be considered as a generalization of the re-versible case, the way the space of tree topologies can beexplored by NNIs with a nonreversible model of evolu-tion can be seen as a generalization of the reversible case.Overall, nonreversible models of evolution can easily fitinto recent heuristics such as PhyML to search for mostlikely rooted phylogenies.

In the next part, we report a new program built onthe algorithmic architecture of PhyML, which exploresthe space of tree topologies under the nonhomogeneous,nonstationary model of Galtier and Gouy (1998).

NHPHYML, ADAPTATION OF PHYML ALGORITHMIC

STRUCTURE TO GALTIER AND GOUY’S MODEL

We adapted the fast heuristics of PhyML (Guindon andGascuel, 2003) to Galtier and Gouy’s nonhomogeneousand nonstationary model, and we report here results con-

cerning the ability of the resulting nhPhyML program toexplore the space of tree topologies and to estimate theancestral G+C content.

Galtier and Gouy’s model is particularly variablerich: in addition to common parameters such as branchlengths and transition/transversion ratio, it incorporatesdifferent equilibrium G+C contents for each branch andan additional G+C content at the root. This makes it amodel containing 4n − 2 variables, with n the numberof taxa in the tree: 2n − 3 branch lengths, 2n − 2 equi-librium G+C contents, the G+C content at the root, thetransition/transversion ratio, and an additional parame-ter defining the position of the root on its branch; i.e., thefraction of the branch length lying on the left side of theroot. All these parameters are estimated in the maximumlikelihood framework, which leads to a computationallyintensive model. For this reason, in all the studies thatused this model to find phylogenetic trees (Galtier et al.,1999; Tarrio et al., 2001; Herbeck et al., 2005), no explo-ration of the space of tree topologies was conducted; themodel was simply used to compare a limited set of inputphylogenies.

Page 89: Early Evolution and Phylogeny

2006 BOUSSAU AND GOUY—EFFICIENT TREE BUILDING WITH A NONREVERSIBLE MODEL 761

The use of very efficient heuristics to find most likelytrees is then mandatory for this kind of model to be usednot just as an evaluation tool. PhyML (Guindon andGascuel, 2003) is such an algorithm that explores thespace of tree topologies around an input phylogeny. Theadaptation of Galtier and Gouy’s model to the algorith-mic structure of PhyML permits for the first time to usethis nonstationary, nonhomogeneous model to explorethe space of phylogenies with dozens of sequences.

PhyML’s code was deeply modified to produce nh-PhyML. nhPhyML starts from a user input–rooted topol-ogy: the choice of the root depends upon the user andis unchanged throughout the whole search for the mostlikely tree, except for its position on its branch. Even if wehave shown that NNIs around the root branch could beeasily implemented, this has not been done in nhPhyML.Data structures had to be slightly remodeled, as in Galtierand Gouy’s model each branch has its own substitutionmatrix, and algorithms had to incorporate the fact thatthe root was fixed, both in the computation of likeli-hood values for alternate topologies obtained by NNIs(2) and in the computation of conditional likelihoodsthemselves. Those conditional likelihoods are computedalmost as in PhyML, by first a postorder (the originalFelsenstein’s pruning algorithm; Felsenstein, 1981) andthen a preorder tree traversal, but starting from the rootof the tree, whereas in the reversible case the startingpoint could be any leaf.

Equations (2) and (3) show that lower conditional like-lihoods depend, if at a leaf, upon the base observed inthe sequence, or, if at an internal node, upon the valuesof the underlying nodes lower conditional likelihoods.Lower conditional likelihoods can then be obtained bya postorder tree traversal starting from the root node:the tree is traversed to its leaves, and then the condi-tional likelihoods of the upper nodes are computed, fromthe leaves up to the root. On the contrary, upper con-ditional likelihoods depend both on underlying nodes’lower conditional likelihoods and on above-lying nodes’upper conditional likelihoods (Equations (6) and (7)): up-per conditional likelihoods can then be computed oncelower conditional likelihoods have been computed, andwith a preorder tree traversal.

All the parameters of the model are optimized with theNewton-Raphson method (Felsenstein and Churchill,1996; Galtier and Gouy, 1998). Derivatives are computedanalytically except for the shape parameter of the gammadistribution accounting for differences in substitutionrates across sites (Yang, 1993) whose derivatives are com-puted numerically.

The topology is reorganized as in PhyML except thatwhen estimating the approximate likelihood of a givenNNI, not only the length but also the equilibrium G+Ccontent of the internal branch around which NNIs aredone are optimized.

Results obtained with nhPhyML suggested that theprogram was prone to getting trapped in local maxima.This prompted us to develop an approximate versionof the Galtier and Gouy model, named nhPhyML-Discrete.

We thus adopted a strategy inspired by Foster (2004)and only allowed a limited set of c equilibrium frequen-

cies, themselves permanently set to 1c+1

, . . . , cc+1

. Eachbranch can still have its own equilibrium frequency, bychoosing from the few ones available.

Three changes were introduced in the algorithm. First,the user sets the number c of equilibrium G+C frequen-cies. Second, before the exploration of the space of treetopologies, each equilibrium frequency is tested for eachbranch independently from the others, and the branchlength is optimized for each equilibrium frequency. Thebest pairs (equilibrium frequency–branch length) arerecorded and ordered according to the gain in likelihoodthey permit. When all branches have been tried, all thebest values are simultaneously used. If the likelihooddoes not increase, only the first half of them, accordingto the order previously defined, are applied, until in-crease. This technique is very similar to the one used inPhyML to optimize branch lengths. Third, the space oftree topologies is explored as in PhyML, except that foreach NNI that is tried, all the equilibrium frequencies aretried on the internal branch, and for each one the branchlength is optimized.

Ability of nhPhyML to Explore the Space of Tree Topologies

In order to estimate the ability of nhPhyML to explorethe tree topological space, we simulated the evolution of1000-bp-long sequences according to Galtier and Gouy’smodel with a gamma-distributed rate across sites and therooted version of the trees containing 40 leaves that wereused to test PhyML (Guindon and Gascuel, 2003). Theancestral sequence G+C content was uniformly drawnfrom the interval [0.2; 0.8], and the equilibrium G+C fre-quencies were uniformly drawn at each node from the in-terval [0.1; 0.8], but the transition-transversion ratio waskept constant on the whole tree. We then applied variousalgorithms (neighbor joining, maximum parsimony andmaximum likelihood with PhyML) to estimate their abil-ity to find the topologies that had been used to simulatethe evolution of the sequences (the “true topologies”)and compared them to nhPhyML.

Among these algorithms, we distinguished programsthat do not need a starting topology (like distance-basedapproaches) from the ones that reorganize a user in-put tree to explore the space of tree topologies (like nh-PhyML). PhyML and the parsimony algorithm can besaid to belong to the two classes, as they reorganize astarting topology that can be input a priori by the user orgenerated by the program itself. Two experiments werethen conducted, one in which algorithms that do notneed a user input topology were tested upon the sim-ulated sequences (Fig. 3, white bars), and one in whichalgorithms that can run starting from a user input topol-ogy were compared (Fig. 3, grey bars). For this secondexperiment, input topologies were obtained by perturb-ing the “true topologies” by a number of NNIs uniformlydrawn from [5; 20], while making sure that the ingroupand the outgroup were not melted. This additional con-dition is necessary as nhPhyML is not able to question

Page 90: Early Evolution and Phylogeny

762 SYSTEMATIC BIOLOGY VOL. 55

FIGURE 3. Efficiency of various methods in reconstructing a phylogeny in nonhomogeneous conditions. White bars: Results obtained byphylogenetic methods that do not start from a user input topology. Grey bars: Results obtained by phylogenetic methods that start from perturbedinput topologies. These results were obtained with the same 2000 different topologies taken from the PhyML test set. Error bars represent standarddeviations. Asterisks are displayed where Student paired t-tests are significant at the 1% level.

the position of the root when exploring the space of treetopologies.

To estimate the efficiency of a method, Robinsonand Foulds’ (R&F; Robinson and Foulds, 1979) averagedistances between the true topologies and the recon-structed ones were computed with the PHYLIP package(Felsenstein, 1989). Results are given in Figure 3.

For both PhyML (version 2.4.4, under the TN93 model[Tamura and Nei, 1993]) and nhPhyML, reconstructionwas made using a gamma law with eight categories toaccount for across-site rate variation; parameter α, tran-sition/transversion ratio, and the other parameters wereestimated by the programs. PhyML and the parsimonymethod as implemented in PAUP∗ (Swofford, 2003) wereboth used from their built-in starting topology and fromthe perturbed input topologies. The neighbor-joining al-gorithm was applied to pairwise distances estimated un-der HKY85 (Hasegawa et al., 1985), LogDet (Lake, 1994;Lockhart et al., 1994), GG95 (Galtier and Gouy, 1995), andtransversions-only observed divergence distances.

Figure 3 shows that PhyML is more efficient at findinggood topologies than parsimony, which also has betterresults than distance methods. It is surprising to note

that the transversions-only observed divergence and theGG95 distances perform worse than the HKY85 dis-tance, because these methods were devised to be resis-tant to G+C content biases. However, as expected, theLogDet distance provides better results than the HKY85distance.

Figure 3 (grey bars) compares the efficiencies of par-simony, PhyML, and nhPhyML methods. The averagedistance between the rearranged input topologies andthe true topologies is shown. All the methods tested areable to explore the space of tree topologies to find bettertrees than the input ones. Results obtained by PhyMLfrom the rearranged input phylogenies are better thanwhen PhyML departs from its own starting topology (adistance-based tree). As the rearranged topologies arefurther away from the true topologies than the distance-based trees, this comes from the fact that distance treesfail on subtrees also difficult to solve for the maximumlikelihood method, whereas the rearranged topologiescan be perturbed at the level of subtrees whose solutionis trivial.

It appears that PhyML is able to find better treesthan parsimony and surprisingly also better trees than

Page 91: Early Evolution and Phylogeny

2006 BOUSSAU AND GOUY—EFFICIENT TREE BUILDING WITH A NONREVERSIBLE MODEL 763

nhPhyML (Student unilateral paired t-test, P-value<1 × 10−10). This might be due to the large number ofparameters: nhPhyML has 2 × n − 2 additional parame-ters when compared to PhyML, because an equilibriumG+C content is associated to each branch. This might re-sult in a likelihood surface with lots of local maxima, inwhich the algorithm would get trapped.

The comparison of the likelihood values found by nh-PhyML when launched from the rearranged topologiesto the likelihood values computed on the true topologiescomforts us in this hypothesis: the log-likelihood ofthe true trees is on average 62.9 points higher than thelog-likelihood of the trees found by nhPhyML, whichhave a better likelihood than the true trees in only830 cases out of 2000. As a comparison, PhyML findstopologies with a 1.2-point higher log-likelihood scorethan the true topologies and finds topologies with betterlikelihoods than the true ones in 1562 cases out of 2000.This means that nhPhyML fails to correctly explore thespace of tree topologies, because it gets trapped in localmaxima and does not get to the real maximum. Beingparticularly parameter rich, it seems that nhPhyMLcan fit nearly any topology by taking advantage of itsnumerous parameters.

An approximate and less flexible model was devel-oped and was named nhPhyML-Discrete. This nonho-mogeneous model still has the same number of freeparameters as nhPhyML, as each branch can have aparticular equilibrium frequency, but is also much less“flexible,” because these equilibrium frequencies areconstrained to be in the limited set defined by the user.These constraints have a positive impact on the resultsof the algorithm. These are shown in Figure 3 for sets of1 to 4 equilibrium frequencies.

nhPhyML-Discrete shows a better topological accu-racy than PhyML even when using only one equilibriumfrequency: this can be due either to the fact that the rootbase distribution is estimated in nhPhyML-Discrete andnot in PhyML, or to the fact that the ingroup and the out-group cannot be swapped in nhPhyML-Discrete, therebyavoiding the exploration of unreasonable tree topologiescontrary to PhyML. Increasing the number of equilib-rium frequencies further increases nhPhyML-Discrete’stopological accuracy, but this tendency quickly reverses,as using 3 equilibrium frequencies yields better resultsthan 4 equilibrium frequencies (though the unilateralpaired Student t-test is not significant). When furtherincreasing the number of equilibrium frequencies, thetopological accuracy continues dropping, with, for in-stance, an average distance to the true topologies of 2.23for 10 equilibrium frequencies (data not shown), not bet-ter than when using only 1 equilibrium frequency (2.18,Student unilateral paired t-test P-value: 0.054). Overall,it seems that using 3 equilibrium frequencies might be agood choice, as it gets the best topological accuracy onthe simulations, which is the same performance as with2 equilibrium frequencies, but with a lower standarddeviation.

When used with 3 G+C equilibrium frequencies, thealgorithm finds topologies closer to the true ones than

PhyML (nhPhyML-Discrete: 2.09, PhyML: 2.49, Studentunilateral paired t-test, P-value <1 × 10−10). This alsohas an impact on the risk of getting trapped in localoptima: nhPhyML-Discrete finds topologies that have alog-likelihood on average 37.5 points higher than the truetopology log-likelihoods, which is better than nhPhyML.Moreover, it appears that nhPhyML-Discrete finds treeswith better likelihoods than the true tree in 1768 casesout of 2000. Overall, it seems that nhPhyML-Discreteshows a performance as good as PhyML’s one, with astronger variation in log-likelihood, which might hintfor a stronger discriminating power. Finally, this approx-imation has a great impact on the computational speed,nhPhyML used on average 40 min 42 s to give its re-sults while nhPhyML-Discrete only needed 20 min 43 s:it is faster to choose among a limited set of equilibriumfrequencies than to optimize the value of a continuousparameter.

Figure 4 shows that nhPhyML-Discrete is, as PhyMLand parsimony, nearly insensitive to the distance of its in-put phylogeny to the true one. On the contrary, nhPhyMLdoes not seem to be able to cope with distant topologies,which is in agreement with the results above.

Estimation of Root G+C Content

The ancestral G+C content is a parameter of the modelin itself. We conducted tests to check the ability of theprogram to estimate this parameter on trees containing40 leaves, either from the true phylogenies or from thephylogenies found by nhPhyML and nhPhyML-Discretewhen it was input perturbed topologies. nhPhyML-Discrete results are shown Figure 5 for 3 equilibriumfrequencies.

Whether it is estimated from the true phylogenies, orfrom the phylogenies found by nhPhyML or nhPhyML-Discrete (Fig. 5), the ancestral G+C content is well es-timated. The correlation coefficient between estimatedand expected G+C contents is above 0.99 in all cases.Interestingly, results do not depend upon the number ofequilibrium frequencies: performances are highly sim-ilar whether we use only 1 equilibrium frequency orwhether we use nhPhyML. The average of the squareddifferences between the estimated and the true valuesis ≈0.000282 when inferred from the true phylogeny,≈0.000292 when inferred from the phylogenies found bynhPhyML, and ≈0.0003423 when inferred by nhPhyML-Discrete with 3 equilibrium frequencies from the phy-logenies it has found. It is interesting to note that evenif nhPhyML-Discrete does not model evolution as pre-cisely as nhPhyML, being limited in its choice of equilib-rium frequencies, it can still provide very good estimatesof the ancestral G+C contents, even from topologies thatare not the true ones.

Overall it appears that the limitation of the num-ber and values of equilibrium frequencies has beenvery successful, permitting to increase the ability ofthe algorithm to explore the space of tree topologieswhile retaining the capacity to estimate ancestral G+Ccontent.

Page 92: Early Evolution and Phylogeny

764 SYSTEMATIC BIOLOGY VOL. 55

FIGURE 4. Ability of phylogenetic methods to explore the space of tree topologies in nonhomogeneous conditions. As the distance from theinput phylogenies to the true topologies increases, results obtained by nhPhyML get worse, but nhPhyML-Discrete seems to be less dependentupon the input topologies.

FIGURE 5. Ability of nhPhyML-Discrete to estimate ancestral G+Ccontents. nhPhyML-Discrete was used to estimate ancestral G+C con-tents on sequences simulated on 2000 different topologies from thePhyML test set (see text) from the phylogenies found by the programitself, with 3 equilibrium frequency categories. In each experiment,parameter α that accounts for across-site rate variation using an 8-category discretized gamma distribution, transition/transversion ra-tio, and the other parameters were estimated by maximizing the like-lihood.

ESTIMATION OF THE UNIVERSAL PHYLOGENY

AND OF THE ANCESTRAL G+C CONTENT

The ability of nhPhyML-Discrete to explore the spaceof tree topologies and to estimate the ancestral G+C con-tent has then been demonstrated and can be applied toreal data.

As rRNA stem G+C contents are known to be cor-related to the optimal growth temperatures in Bacteriaand Archaea (Galtier and Lobry, 1997), these genes areespecially good candidates for an analysis using Galtierand Gouy’s model (Galtier and Gouy, 1998), and hencenhPhyML-Discrete. Therefore, we reiterated the analy-sis of Galtier et al. (1999), improving the taxonomic sam-pling and benefiting from the heuristics of PhyML tobetter scan the space of tree topologies.

The analysis was divided in two steps: first wesearched with nhPhyML-Discrete for the most likelytopology for which complete rRNA genes and the modelplead, and then, using the best topology we could findand the stem portion of rRNA sequences, we estimatedthe G+C content of the last universal common ancestor(LUCA) with nhPhyML.

Phylogeny Estimation

Small and large rRNA subunit sequences were down-loaded from the European Ribosomal RNA database

Page 93: Early Evolution and Phylogeny

2006 BOUSSAU AND GOUY—EFFICIENT TREE BUILDING WITH A NONREVERSIBLE MODEL 765

(Wuyts et al., 2004) and from generalist databases fora few missing sequences. Ninety-two species were se-lected, representing the three domains of life with 22 Ar-chaea, 34 Bacteria, and 36 Eukaryota. Small- and large-subunit rRNA genes were concatenated, aligned usingClustalW (Chenna et al., 2003), and manually curated.The resulting alignment contained 2924 sites, with G+Ccontents ranging from 43% to 71%, which highlights theneed for models robust to compositional biases.

Even if the ability of nhPhyML-Discrete to explore thespace of tree topologies appears as good as PhyML’s, it iswise to run the program from various starting topologiesto diminish the risk of getting trapped in local optima.Moreover, as the process of evolution in not reversible,the position of the root influences the likelihood value.For this reason, it was decided to build various topolo-gies with distance-based and parsimony methods (as im-plemented in Phylo Win, Galtier et al., 1996), and thento try three different rootings for each of these topolo-gies. The trees were rooted either on the branch leadingto the Archaea, on the branch leading to the Bacteria, oron the branch leading to the Eukaryota. This produced57 starting phylogenies, among which 9 had identicaltopologies.

Results obtained by nhPhyML-Discrete on thesetrees were then analyzed using CONSEL (Shimodairaand Hasegawa, 2001) and Treedist (PHYLIP package;Felsenstein, 1989). Eighteen phylogenies were found tobe significantly more likely than the others (AU test P-values <5%) among the 55 different topologies found,which shows that even if nhPhyML-Discrete showedgood capabilities to explore the space of tree topologies,on real cases, with many sequences, the algorithm can gettrapped in a local maximum, even when starting fromtwo phylogenies with the same topologies but differentbranch lengths. None of these 18 topologies were iden-tical, so the majority rule (extended) consensus tree ob-tained from these trees was built (Fig. 6) using Consensefrom the PHYLIP package (Felsenstein, 1989).

The Tree of Life

Though we do not believe that a two-gene phylogenycan clarify the Tree of Life, we think the analysis of rRNAsusing a nonhomogeneous model might bear some inter-esting insights.

Most great clades are found monophyletic; e.g., Pro-teobacteria, Metazoa, Crenarchaeota. In the Bacteriakingdom, Firmicutes and Clostridia are found associatedin all trees. Planctomycetes appear monophyletic andare associated to Chlamydiales but do not get a basalposition as in Brochier and Philippe (2002): this resultwas found by getting rid of fastest evolving positionson the small subunit rRNA gene and is not found whencoping with compositional heterogeneities on LSU andSSU rRNA genes. Instead, hyperthermophilic Bacteriaare found at the root of the bacterial clade, as in firstrRNA phylogenies (Woese, 1987).

Ancestral G+C Content Estimate

The consensus universal phylogeny found usingnhPhyML-Discrete was used to infer the ancestral G+C

contents of the small- and large-subunit rRNA genesstem regions. The inference was performed using onlyslowly evolving stem regions for two reasons. First, onlyrRNA stems, that is, the fraction of the rRNA moleculethat folds as double helices, have a G+C content stronglycorrelated with optimal growth temperature (Galtier andLobry, 1997). Second, Gowri-Shankar and Rattray (2006)have recently shown that the ancestral G+C content es-timate obtained with the Galtier and Gouy model wasbiased towards the G+C content at slowly evolving sitesand that equilibrium frequencies were biased towardsfast-evolving sites. Correlations between interacting sitesof the helices were not taken into consideration.

Stem regions were identified using the following pro-cedure. The rRNA alignment downloaded from theEuropean Ribosomal RNA database (Wuyts et al., 2004)was extended to the present sample of 92 sequencesby manually aligning missing sequences using SeaviewGaltier et al., (1996). A total of 1896 sites were predictedin stems in Escherichia coli, Archaeoglobus fulgidus, andSchizosaccharomyces pombe. Slowly evolving sites wereidentified by using the COE program (Dutheil et al., 2005)from the Bio++ package (Dutheil et al., 2006) and select-ing sites predicted to undergo on average less than 0.1substitution per branch under a HKY85 model with an8-class discretized gamma law to model rate heterogene-ity. Six hundred seventy-eight slowly evolving stem siteswere finally retained.

The correlation between rRNA stem slowly evolv-ing sites G+C content and optimal growth tempera-ture (Topt) is high for both Bacteria (0.815) and Archaea(0.953), which is in agreement with Galtier and Lobry(1997). Because ancestral G+C content inferences areknown to be robust with respect to the tree topology(Galtier et al., 1999; and Fig. 5 herein), those estimates areexpected not to depend strongly on the input phylogeny.

Because the location of the root of the universal tree isnot currently known (Brown and Doolittle, 1997; Forterreand Philippe, 1999), the likelihoods of all three possiblerootings, that is, on the branch leading to each one ofthe three domains, were computed using nhPhyML onthe slowly evolving stem sites. We chose not to use thediscrete version of nhPhyML in order to avoid any biasthat might arise from the fact that equilibrium frequen-cies are set to a priori values. For instance, with threeG+C categories, the equilibrium frequencies are set to0.25, 0.50, or 0.75, which may not model the evolutionof the slowly evolving stem sites appropriately, giventhat their G+C content range from 0.52 for Entamoebahistolytica to 0.83 for Methanopyrus kandleri. Four rate cat-egories were used to model rate heterogeneity, and theparameter α of the gamma distribution and the transi-tion/transversion ratio were optimized by maximizingthe likelihood. Since studies (Huelsenbeck et al., 2002;Yap and Speed, 2005) have shown that there may be in-formation in extant sequences for the identification of theroot position of a phylogenetic tree using nonreversiblemodels of evolution, we compared all three likelihoodsto see which root the model and the data were predict-ing. We used the slowly evolving stem sites, as they areexpected to be less prone to saturation problems. The

Page 94: Early Evolution and Phylogeny

766 SYSTEMATIC BIOLOGY VOL. 55

FIGURE 6. Consensus tree obtained from the 18 significantly most likely trees obtained by nhPhyML-Discrete. The tree was built as describedin the text. Dashed lines represent branches that were not found in all the 18 most likely trees. The alignment contained 2924 sites. Group namesare in agreement with the NCBI taxonomy. Sequence G+C contents range from 43% to 71%, which highlights the need to use models robust tocompositional biases.

Page 95: Early Evolution and Phylogeny

2006 BOUSSAU AND GOUY—EFFICIENT TREE BUILDING WITH A NONREVERSIBLE MODEL 767

FIGURE 7. Correlation between rRNA stem G+C contents and optimal growth temperature, and ancestral G+C content estimates. Circles:bacterial data; triangles: archaeal data. Estimates of the G+C content of the stem fraction slowly evolving sites of rRNA molecules at the tree rootare indicated by vertical lines, continuous for the bacterial branch rooting and dashed for the eukaryotic branch rooting. Confidence intervalsare represented by horizontal segments, continuous for the bacterial rooting, and dashed for the eukaryotic rooting. The linear regression isrepresented as a dotted line. An orthogonal regression provided very similar results.

highest likelihood was obtained by the bacterial branchrooting (log-likelihood: −15,139.94, ancestral G+C con-tent: 71.8%), whereas the least realistic rooting was foundon the archaeal branch (log-likelihood: −15,147.22, an-cestral G+C content: 75.1%) and could be rejected us-ing CONSEL (Shimodaira and Hasegawa, 2001) (AU testP-value <5%). Hence, we chose to discard the archaealrooting from subsequent analyses. There was no signifi-cant difference between the bacterial and the eukaryoticrootings (log-likelihood: −15,141.61, ancestral G+C con-tent: 69.2%). It seems interesting to note that the longestbranch (the eukaryotic branch) was not found to pro-vide the most likely rooting: the model is then able tofind a signal that is independent from the evolutionarydistance.

To estimate the accuracy of the estimation of ancestralG+C contents, we computed the likelihoods obtainedwhen setting the ancestral G+C content to various val-ues, between 55% and 85%, and compared the resultsusing CONSEL. Ancestral G+C contents with AU testP-values higher than 5% were considered to be in theconfidence interval. We found that, when rooting in theeukaryotic branch, ancestral G+C contents ranging from63% to 75% could not be rejected, whereas when root-

ing in the bacterial branch, the confidence interval was[67%; 76%]. The larger confidence interval found for theeukaryotic rooting might be explained by the fact thatthis branch is considerably longer than the bacterial one:the extra length of the eukaryotic branch may providemore latitude to accomodate nonoptimal ancestral G+Ccontents.

Inferred ancestral G+C contents (Fig. 7) suggest amesophilic (optimal growth temperature below 60C)to thermophilic (optimal growth temperature between60C and 80C) LUCA, in agreement with Galtier et al.(1999). Interestingly, both confidence intervals do notcontain any value that seem to favour a hyperther-mophilic LUCA. As a consequence, it appears that re-ducing site rate heterogeneity to avoid the bias put forthby Gowri-Shankar and Rattray (2006) does not contradictGaltier et al.’s conclusion.

CONCLUSION

In this article, we have shown that by reorganizingthe way likelihood is computed, one can efficiently ex-plore the space of tree topologies with a nonreversiblemodel of evolution. We modified the PhyML algorithm

Page 96: Early Evolution and Phylogeny

768 SYSTEMATIC BIOLOGY VOL. 55

(Guindon and Gascuel, 2003) to cope with the nonho-mogeneous, nonstationary model of Galtier and Gouy(1998) and tested its abilities to find the right topologyand to estimate the G+C content at the root. An approx-imate model was also tested, which showed good per-formance in both tree topological space exploration andancestral G+C content estimation. We eventually usedthe program to estimate the topology of the Tree of Lifefrom rRNA sequences and to estimate the ancestral stemG+C content by only selecting slowly evolving sites. Theresults agree with the ones obtained by Galtier and Lobry(1999) and support a nonhyperthermophilic last univer-sal common ancestor.

Genome sequences vary widely in their compositionbetween species. Therefore, when building a phyloge-netic tree from such heterogeneous data, it is importantto use a method robust to compositional biases. Nonho-mogeneous models of evolution are particularly suitable,but their nonreversibility discarded them from most gen-eral phylogeny packages and prevented their use in largescale analyses. This work renders nonreversible modelsof evolution useful for phylogeny reconstruction, whichconsiderably broadens the range of available models andopens new opportunities for models explicitly dealingwith compositional biases.

ACKNOWLEDGEMENTS

nhPhyML is available as a LINUX executable file at http://pbil.univ-lyon1.fr/software/nhphyml/ and as source code upon request. Wewant to thank the reviewers and Olivier Gascuel for their constructiveremarks, which resulted in considerable improvement in the quality ofthe manuscript. This work was supported by Action Concertee Incita-tive IMPBIO. We thank the Centre de Calcul de l’IN2P3 for providingcomputer resources. Bastien Boussau acknowledges a PhD scholarshipfrom the Centre National de la Recherche Scientifique. We also thankMathilde Paris for fruitful discussions.

REFERENCES

Brochier, C., and H. Philippe. 2002. Phylogeny: A non-hyperthermo-philic ancestor for bacteria. Nature 417:244–244.

Brown, J. R., and W. F. Doolittle. 1997. Archaea and the prokaryote-to-eukaryote transition. Microbiol Mol. Biol. Rev. 61:456–502.

Chenna, R., H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins,and J. D. Thompson. 2003 Multiple sequence alignment with theClustal series of programs. Nucleic Acids Res. 31:3497–3500.

Dutheil, J., T. Pupko, A. Jean-Marie, and N. Galtier. 2005. A model-based approach for detecting coevolving positions in a molecule.Mol. Biol. Evol. 22:1919–1928.

Dutheil, U., U. Gaillard, U. Bazin, U. Glemin, U. Ranwez, U. Galtier, andU. Belkhir. 2006. Bio++: A set of C++ libraries for sequence analysis,phylogenetics, molecular evolution and population genetics. BMCBioinformatics 7:188–188.

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maxi-mum likelihood approach. J. Mol. Evol. 17:368–376.

Felsenstein, J. 1989. Phylogeny inference package (version 3.2). Cladis-tics 5:164–166.

Felsenstein, J., and G. A. Churchill. 1996. A hidden Markov modelapproach to variation among sites in rate of evolution. Mol. Biol.Evol. 13:93–104.

Forterre, P., and H. Philippe. 1999. Where is the root of the universaltree of life? Bioessays 21:871–879.

Foster, P. G. 2004. Modeling compositional heterogeneity. Syst. Biol.53:485–495.

Galtier, N., and M. Gouy. 1995. Inferring phylogenies from DNA se-quences of unequal base compositions. Proc. Natl. Acad. Sci. USA92:11317–11321.

Galtier, N., and M. Gouy. 1998. Inferring pattern and process:Maximum-likelihood implementation of a nonhomogeneous modelof DNA sequence evolution for phylogenetic analysis. Mol. Biol.Evol. 15:871–879.

Galtier, N., M. Gouy, and C. Gautier. 1996 SEAVIEW and PHYLO WIN:Two graphic tools for sequence alignment and molecular phylogeny.Cabios 12:543–548.

Galtier, N., and J. R. Lobry. 1997. Relationships between genomic G+Ccontent, RNA secondary structures, and optimal growth tempera-ture in prokaryotes. J. Mol. Evol. 44:632–636.

Galtier, N., N. Tourasse, and M. Gouy. 1999. A nonhyperther-mophilic common ancestor to extant life forms. Science 283:220–221.

Gowri-Shankar, V., and M. Rattray. 2006. On the correlation betweencomposition and site-specific evolutionary rate: Implications forphylogenetic inference. Mol. Biol. Evol. 23:352–364.

Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algo-rithm to estimate large phylogenies by maximum likelihood. Syst.Biol. 52:696–704.

Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-apesplitting by a molecular clock of mitochondrial DNA. J. Mol. Evol.22:160–174.

Herbeck, J. T., P. H. Degnan, and J. J. Wernegreen. 2005. Nonhomoge-neous model of sequence evolution indicates independent originsof primary endosymbionts within the enterobacteriales (gamma-Proteobacteria). Mol. Biol. Evol. 22:520–532.

Huelsenbeck, J. P., J. P. Bollback, and A. M. Levine. 2002. Inferring theroot of a phylogenetic tree. Syst. Biol. 51:32–43.

Lake, J. A. 1994. Reconstructing evolutionary trees from DNA andprotein sequences: Paralinear distances. Proc. Natl. Acad. Sci. USA91:1455–1459.

Lockhart, P. J., M. A. Steel, M. Hendy, and D. Penny. 1994. Recoveringevolutionary trees under a more realistic model of sequence. Mol.Biol. Evol. 11:605–612.

Robinson, D., and L. Foulds. 1979. Comparison of weighted labeledtrees. Pages 119–126 in Isomorphic factorisations VI: Automor-phisms, combinatorial mathematics (A. F. Horadam and W. D. Wallis,eds.). No. 748 in Lecture Notes in Mathematics, Springer, Berlin.

Shimodaira, H., and M. Hasegawa. 2001. CONSEL: For assessing theconfidence of phylogenetic tree selection. Bioinformatics 17:1246–1247.

Stamatakis, A., T. Ludwig, and H. Meier. 2005. RAxML-III: A fast pro-gram for maximum likelihood-based inference of large phylogenetictrees. Bioinformatics 21:456–463.

Swofford, D. L. PAUP∗. 2003. Phylogenetic analysis using parsimony(∗and other methods), version 4. Sinauer Associates, Sunderland,Massachusetts.

Tamura, K. 1992. Estimation of the number of nucleotide substitutionswhen there are strong transition-transversion and G+C-content bi-ases. Mol. Biol. Evol. 9:678–687.

Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotidesubstitutions in the control region of mitochondrial DNA in humansand chimpanzees. Mol. Biol. Evol. 10:512–526.

Tarrio, R., F. Rodriguez-Trelles, and F. J. Ayala. 2001. Shared nucleotidecomposition biases among species and their impact on phyloge-netic reconstructions of the Drosophilidae. Mol. Biol. Evol. 18:1464–1473.

Woese, C. R. 1987. Bacterial evolution. Microbiol Rev. 51:221–271.Wuyts, J., G. Perriere, and Y. Van De Peer. 2004. The European Riboso-

mal RNA database. Nucleic Acids Res. 32:D101–D103.Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from

DNA sequences when substitution rates differ over sites. Mol. Biol.Evol. 10:1396–1401.

Yang, Z. 1998. Likelihood ratio tests for detecting positive selection andapplication to primate lysozyme evolution. Mol. Biol. Evol. 15:568–573.

Yap, V. B., and T. Speed. 2005. Rooting a phylogenetic tree with nonre-versible substitution models. BMC Evol. Biol. Jan 4;5(1):2.

First submitted 17 October 2005; reviews returned 10 December 2005;final acceptance 28 March 2006.

Associate Editor: Olivier Gascuel

Page 97: Early Evolution and Phylogeny

5♥ ❯♥①♣t r

r t ♣r♥t② ♦♣ ♣r♦r♠ ♥P② s s t♦ t ♥♦tr♣②♦♥t ss tt ♦ t ♣♦st♦♥ ♦ t r ♥r♠ s②♠♦s♠② ♥②s♥ ts r sq♥s rst ♥♦t ♣r♠t t♦ ♣r♦ r♠♥sr t rtr ♦r ② é♥ r♦rr♠♥t ♠♦♥tt r♦ ♥Ptr ♦rtrr s t♦ ♣r♦♣♦s tt ♥r♠ s②♠♦s♠ s s♦ ♥♦tr r tt t ♠t sr t♦ ♣rt ♦ ♥ r ♣②♠ ♠r♦t

❲♥ ♥②s♥ tst ♦♥t♥♥ ♠♦r t♥ sq♥s s t♦ s tt ♥P② s♦♠ ♣r♦♠s ①♣♦r♥ t s♣ ♦ tr t♦♣♦♦s♦r ♦r s rqr ♦♥ t ♦rt♠s ss♦t t ♣r♠trr ♠♦s

s rt s ♥ ♣s ♥ tr s r♦♦♦②

Page 98: Early Evolution and Phylogeny

The RNA component of the small subunit of the ribos-ome (referred to here as SSU rRNA) has been the ‘Rosetta stone’ of modern evolutionary studies1. In particular, the discovery of the archaeal domain and establishment of the evolutionary relationships between archaeal species were based entirely on rRNA studies2–5. These analyses led to the proposal that the archaeal domain should be divided into two phyla, the Euryarchaeota (from the Greek ‘euryos’, meaning diversity) and the Crenarchaeota (from the Greek ‘crenos’, meaning spring or origin)6. At that time, the Euryarchaeota included a mixture of methanogens, extreme halophiles, thermoacidophiles and a few hyper-

thermophiles. By contrast, the Crenarchaeota included only hyperthermophiles (hence their name, which refers to a ‘hot origin of life’ hypothesis). This division of the Archaea was rapidly accepted, because it had been observed in the early days of archaeal research that Sulfolobales and Thermoproteales (two hyperthermophilic crenarchaeota orders) are fundamentally different to other archaea in terms of their SSU rRNA oligonucleotide catalogues7 and RNA polymerase structures8.

More recently, genomic data9 and gene phylogenies that have been obtained from combined datasets10–12 have also confirmed the division of the Archaea into two main lineages, although Euryarchaeota are some-times paraphyletic in whole-genome trees, probably owing to artefacts that have been introduced by hori-zontal gene transfer (HGT) from bacteria13,14. Several genes that are involved in key cellular processes in

the Euryarchaeota lack homologues in all hyperther-mophilic crenarchaeota for which complete genome sequences are available15–18. For example, there are no homologues of the DNA polymerase from the D family19 and the cell-division protein FtsZ20 in hyperthermophilic crenarchaeota, both of which are present in all sequenced complete euryarchaeal genomes. Furthermore, this group of organisms lacks homologues of the eukaryotic-like histone21 and the protein MinD (involved in chromosome and plasmid partitioning15), both of which are present in most sequenced euryarchaeal genomes. This indicates that important differences in main cellular processes were established shortly after the speciation of the Euryarchaeota and Crenarchaeota14.

The discovery of mesophilic crenarchaeota

More than 20 years ago, direct PCR amplification of genes that encode the SSU rRNA from environmental samples gave rise to molecular ecology22. One of the major early outcomes of this new discipline was the discovery of many novel lineages of mesophilic or psychrophilic archaea23,24 (reviewed in REFS 25,26). The first environmental archaeal sequences were detected in marine environments, and were clearly separated into two groups (named group I and group II) in an SSU rRNA tree that was rooted by a bacterial outgroup23. Group I formed a sister group of hyperthermophilic crenarchaeota, whereas group II emerged within the Euryarchaeota23.

*Université de Provence, Aix-

Marseille I, CNRS, UPR 9043,

Laboratoire de Chimie

Bactérienne, Institut de

Biologie Structurale et de

Microbiologie, 13402

Marseille, France. ‡Université de Lyon, Université

Lyon 1, CNRS, UMR 5558,

Laboratoire de Biométrie et

Biologie Evolutive, 69622

Villeurbanne, France. §Biologie Moléculaire du

Gène chez les Extrêmophiles

(BMGE), Département de

Microbiologie, Institut

Pasteur, 75015 Paris, France. ||Université Paris-Sud, 91405

0rsay, France.

Correspondence to C.B.

e-mail: celine.brochier@ibsm.

cnrs-mrs.fr

doi:10.1038/nrmicro1852

Mesophilic crenarchaeota: proposal for a third archaeal phylum, the ThaumarchaeotaCéline Brochier-Armanet*, Bastien Boussau‡, Simonetta Gribaldo§ and

Patrick Forterre§ ||

Abstract | The archaeal domain is currently divided into two major phyla, the Euryarchaeota

and Crenarchaeota. During the past few years, diverse groups of uncultivated mesophilic

archaea have been discovered and affiliated with the Crenarchaeota. It was recently

recognized that these archaea have a major role in geochemical cycles. Based on the first

genome sequence of a crenarchaeote, Cenarchaeum symbiosum, we show that these

mesophilic archaea are different from hyperthermophilic Crenarchaeota and branch

deeper than was previously assumed. Our results indicate that C. symbiosum and its relatives

are not Crenarchaeota, but should be considered as a third archaeal phylum, which we

propose to name Thaumarchaeota (from the Greek ‘thaumas’, meaning wonder).

Hyperthermophile

An organism that has an

optimal growth temperature of

at least 80°C.

Paraphyletic

A group of organisms or

sequences that includes an

ancestor and some, but not all,

of its descendants.

NATURE REVIEWS | MICROBIOLOGY VOLUME 6 | MARCH 2008 | 245

ANALYSIS

© 2008 Nature Publi shing Group

Page 99: Early Evolution and Phylogeny

Possibly because they were discovered only 2 years after the generally accepted proposal to divide Archaea into 2 phyla6, group I was classified as Crenarchaeota23,24, even though it was only a sister group of hyperthermophilic crenarchaeota and did not branch off within them. The classification of group I archaea as Crenarchaeota was fur-ther strengthened by the phylogenetic analysis of a DNA polymerase sequence from Cenarchaeum symbiosum (a marine archaeon that inhabits the tissues of a temperate water sponge27), which branched within sequences from hyperthermophilic crenarchaeota28. Consistent with this, a recent, and widely accepted, SSU rRNA tree that was pub-lished by Schleper and colleagues25,29, and has been widely used to illustrate archaeal phylogeny, shows mesophilic archaea of group I emerging within hyperthermophilic crenarchaeota. This phylogenetic placement is consist-ent with the assumption that mesophilic crenarchaeota evolved from hyperthermophilic ancestors through adap-tation to a mesophilic lifestyle11,14,30–32. However, this place-ment remains controversial, because in most SSU rRNA phylogenies, such as the one recently published by Pace’s group33, group I sequences do not emerge within cultivated hyperthermophilic crenarchaeota and form a distinct line-age. Interestingly, the recent discovery of a eukaryotic-like histone gene that was probably not acquired by HGT in a genomic fragment from C. symbiosum34 suggests that mesophilic crenarchaeota might have genomic features that are substantially different from those of hyperther-mophilic crenarchaeota. Indeed, homologues of this gene are present in most euryarchaeal genomes, but never in hyperthermophilic crenarchaeota.

The ecological importance of mesophilic crenarchae-ota, an extremely diverse group that is widely distributed in oceans and soils35, is being increasingly recognized. Indeed, molecular environmental surveys have extended the diversity of mesophilic crenarchaeota by revealing several new lineages that are related to group I sequences, such as SAGMCG-1, FFS, marine benthic groups B and C, YNPFFA and THSC1 (reviewed in REFS 25,26). Some of these crenarchaeota might be moderate thermophiles or psychrophiles, even though the group is still designated as mesophilic crenarchaeota. Mesophilic crenarchaeota comprise organisms that are probably important partici-pants in the global carbon and nitrogen cycles25,36,37, and might be the most abundant ammonia oxidizers in soil ecosystems37. For example, it was reported that Candidatus Nitrosopumilus maritimus, a recently isolated mesophilic crenarchaeon, can grow chemolithoautotrophically by aerobically oxidizing ammonia to nitrite, which was the first observation of nitrification in the Archaea38.

Investigating the phylogenetic position of mesophilic crenarchaeota within the archaeal phylogeny, together with their gene content and genomic features, could, therefore, provide valuable information on the evolution of the Archaea.

Can rRNA resolve deep archaeal phylogeny?

The phylogenetic position of mesophilic crenarchaeota is currently based solely on SSU rRNA sequences. The trees that were published by Schleper et al.25 and Robertson et al.33 included a large number of sequences (1,344 and

712 SSU rRNA sequences, respectively), but both showed poor resolution of the relative order of emergence of the different archaeal lineages and it was pointed out that the Crenarchaeota and Euryarchaeota appeared as poly-tomies (star radiations)33. This lack of resolution showed that SSU rRNA sequences do not contain enough phylo-genetic signal to resolve the deepest nodes of the archaeal phylogeny, probably owing to their size, which limits the number of nucleotide positions that are available for phylo-genetic analyses. However, the number of positions that can be used for phylogenetic analyses can be increased by a combined analysis of SSU and large subunit (LSU) rRNA sequences.

FIGURE 1 shows a maximum likelihood phylogenetic tree that is based on the concatenation of 226 SSU and LSU sequences from complete genomes that are repre-sentative of archaeal and bacterial diversity, as well as 18 mesophilic crenarchaeal or euryarchaeal fosmids that contain both types of sequences. Mesophilic crenarchaeal fosmid sequences belong to three distinct subgroups: groups 1.1a and 1.1b25, and the recently proposed deep-branching HWCG III group39. The bacterial part of the tree shows a phylogeny that is consistent with those previ-ously published (that is, high statistical support for the monophyly of most bacterial phyla, but a low resolution of their relative order of emergence (not shown)). For the Archaea, the monophyly of most orders within both Euryarchaeota and Crenarchaeota is robustly recovered (FIG. 1). However, the relationships among most euryar-chaeal orders are poorly resolved (bootstrap value (BV) of less than 70%) (FIG. 1), and even the monophyly of the Euryarchaeota is not significantly supported (BV of less than 16%). Importantly, both mesophilic and hyperther-mophilic crenarchaeota were recovered as two robust monophyletic groups (BV of 99 and 100%, respectively), which is consistent with the SSU rRNA tree published by Robertson and colleagues33, but not with the tree that was published by Schleper and colleagues25. Moreover, mesophilic and hyperthermophilic crenarchaeota form a sister group, but with low support (BV of 36%), and the node is extremely unstable. For example, using a different evolutionary model, the position of mesophilic crenarchaeota was altered — they branched at the base of the archaeal tree and, therefore, became the sister group of a large group that included Euryarchaeota and hyper-thermophilic crenarchaeota — but still with low statistical support (BV of 20%; not shown).

A possible explanation for such poor resolution could be the heterogeneity of G+C content among sequences. Sequences from hyperthermophilic euryarchaeota and crenarchaeota have higher G+C content compared with that of mesophilic organisms. This well-known compositional bias of RNA sequences might blur the genuine phylogenetic signal40. To investigate this possi-bility, we used a recently developed phylogenetic method that reduces the biases that are due to convergent G+C content (nhPHYML41). We tested three possible deep placements for mesophilic crenarchaeota, based on the rRNA archaeal phylogeny of FIG. 1: first, as a sister group of hyperthermophilic crenarchaeota; second, as a sister group of a cluster that comprises Euryarchaeota and

Sister groups

In a phylogeny, two lineages

that share an exclusive

common ancestor.

Monophyletic group

Includes an ancestor and all its

descendants.

A N A LY S I S

246 | MARCH 2008 | VOLUME 6 www.nature.com/reviews/micro

© 2008 Nature Publi shing Group

Page 100: Early Evolution and Phylogeny

0.1

Uncultured crenarchaeote 54d9 (soil, group 1.1L)Uncultured crenarchaeote DeepAnt EC39 (deep marine, group 1.1a)

Uncultured crenarchaeote 4B7 (marine, group 1.1a)Uncultured crenarchaeote 74A4 (marine, group 1.1a)Cenarchaeum symbiosum (marine, group 1.1a)96

77100

100100

Pyrobaculum islandicum

93100

100

75

80

Sulfolobus solfataricus

Sulfolobus acidocaldarius100100

100

99

100

Pyrococcus furiosus

98100100

100

100

100100

100

78100

100

90

100

100

97

100

91100

100

100

100

100

100

ThaumarchaeotaMesophilic crenarchaeota

Thermococcales

NanoarchaeotaMethanopyrales

Desulfurococcales

Thermoproteales

Archaeoglobales

Halobacteriales

Methanomicrobiales

Methanosarcinales

Thermoplasmatales

Methanococcales

Methanobacteriales

Sulfolobales

Hyperthermophiliccrenarchaeota

Crenarchaeota

Euryarchaeota

Uncultured crenarchaeote 45H12 (gold mine, HWCGIII)

Haloferax mediterraneiHaloquadratum walsbyi

Natrinema sp. Natrialba magadiiHalococcus morrhuaeHaloarcula marismortui

Halosimplex carlsbadenseHalobacterium sp.

Natronomonas pharaonis

Uncultured methanogenic archaeon

Methanospirillum hungateiMethanocorpusculum labreanum

Methanoculleus marisnigri

Methanosarcina mazeiMethanosarcina barkeri str. FusaroMethanosarcina acetivorans

Methanococcoides burtonii

Uncultured archaeon fos0625e3Uncultured archaeon fos0642g6

Methanosaeta thermophila PT

Uncultured archaeon GZfos10C7Uncultured archaeon GZfos12E1Uncultured archaeon GZfos27G5Uncultured archaeon GZfos34G5

Archaeoglobus fulgidus

Thermoplasma volcaniumThermoplasma acidophilumPicrophilus torridus

Uncultured euryarchaeote Alv FOS5

Methanococcus vannieliiMethanococcus maripaludis S2

Methanocaldococcus jannaschii

Methanosphaera stadtmanaeMethanobacterium thermoautotrophicum

Pyrococcus abyssiPyrococcus horikoshii

Thermococcus kodakaraensisThermococcus celer

Methanopyrus kandleriNanoarchaeum equitans

Sulfolobus tokodaii

Metallosphaera sedulaAeropyrum pernix K1

Hyperthermus butylicusDesulfurococcus mobilis

Pyrobaculum calidifontisPyrobaculum aerophilum

Thermofilum pendens

Bacteria

Uncultured archaeon fos0128g3

Figure 1 | Maximum likelihood tree based on the concatenation of 226

SSU and LSU sequences from Archaea and Bacteria. For clarity, the

bacterial part of the tree is not shown. Sequences were aligned using

MUSCLE (multiple sequence comparison by log-expectation)58. Resulting

alignments were manually refined using the MUST (Management Utilities

for Sequences and Trees) package59, and only unambiguously aligned

regions were kept for phylogenetic analyses. Concatenation was performed

using home-developed software (C.B., unpublished data), which provided

a final dataset of 3,305 nucleotide positions. The maximum likelihood tree

was computed by PHYML61, using the general time-reversible model of

sequence evolution by including a Γ-correction (eight categories of

evolutionary rates, an estimated α-parameter and an estimated proportion

of invariant sites). Numbers at nodes represent non-parametric bootstrap

values (BVs) that were computed by PHYML61 (1,000 replications of the

original dataset) using the same parameters. For clarity, only BVs of more

than 70% are shown. The scale bar represents the average number of

substitutions per site. If a different evolutionary model (Hasegawa Kishino

Yano) was used, a sister grouping of hyperthermophilic crenarchaeota and

euryarchaeota, and a basal branching of mesophilic crenarchaeota was

recovered, albeit with weak statistical support (BV of 20%).

A N A LY S I S

NATURE REVIEWS | MICROBIOLOGY VOLUME 6 | MARCH 2008 | 247

© 2008 Nature Publi shing Group

Page 101: Early Evolution and Phylogeny

Clade

A monophyletic group.

Long-branch attraction

artefact

A phylogenetic artefact that is

induced by differences in

evolutionary rates, and results

in the artificial grouping of

lineages that have long

branches in a phylogenetic

tree.

hyperthermophilic crenarchaeota; and, third, as a sister group of Euryarchaeota (Supplementary information S1 (table)). All six tests significantly rejected the third topol-ogy, whereas only two tests rejected the second topology. This means that the tests discard the third topology, but do not allow discarding the second topology in favour of the first topology. It is likely that the phylogenetic signal which is carried by rRNA sequences is too weak to con-fidently resolve the position of mesophilic crenarchaeota in the archaeal phylogeny, even if the number of posi-tions is increased by combining SSU and LSU rRNA sequences. Nevertheless, the phylogenetic analysis of the rRNAs strongly supports the separation of mesophilic and hyperthermophilic crenarchaeota into 2 distinct lineages (BV of 100 and 99% for the monophyly of each lineage, respectively). To clarify the position of meso-philic crenarchaeota in the archaeal tree further, the use of alternative markers thus becomes crucial.

Analysing ribosomal proteins

Although they were first discovered 15 years ago, the iso-lation and cultivation of representative mesophilic crenar-chaeota has proven to be a frustrating task. In fact, the first genome of a member of this group, C. symbiosum, which has still not been grown in pure culture, was published only recently42. The availability of this genome sequence now permits an investigation of the phylogenetic position of mesophilic crenarchaeota, based on markers other than SSU and LSU rRNA.

Owing to the availability of an increasing number of complete archaeal genomes, large concatenated datasets of ribosomal (R) proteins are now widely used as an alterna-tive to SSU rRNA to study archaeal phylogeny43–45. Indeed, these proteins have the same evolutionary attributes as rRNA, and their concatenation allows the construc-tion of larger alignments. Although the trees that were obtained using these markers were roughly congruent with the rRNA trees43, they substantially improved the archaeal phylogeny and resolved a number of impor-tant nodes (reviewed in REFS 11,14). In particular, these analyses have helped to clarify the phylogenetic positions of ‘lonely’ archaeal species (those that lack sequenced relatives), which are often misplaced, especially if they are fast-evolving or have a biased sequence composition (for example, the G+C content of rRNA sequences)46. For example, Nanoarchaeum equitans was originally proposed to represent a third (and basal) archaeal phylum based on trees that were produced using SSU rRNA47 and concate-nated R proteins44. However, a subsequent analysis of R proteins and additional protein markers suggested that this species is not the earliest archaeal offshoot, but is probably a fast-evolving euryarchaeal lineage that is possibly related to Thermococcales48. Another example is the hyperther-mophilic methanogen Methanopyrus kandleri, for which phylogenetic placement is crucial to obtain an understand-ing of the time of emergence of methanogenesis within Euryarchaeota. In fact, although M. kandleri represents the earliest euryarchaeal offshoot in SSU rRNA phylog-enies25,49, in trees that are based on R-protein concatena-tions it robustly branches off after the non-methanogenic lineage of Thermococcales10,11,50. Further, a recent

phylogenetic analysis placed this archaeon as a sister group of two other methanogen lineages (Methanococcales and Methanobacteriales)51, which is in agreement with phylogenomic studies of the genes that are involved in methanogenesis51 and gene-content analyses45. Globally, these analyses indicate that methanogenesis might not be the ancestral metabolism of euryarchaeota.

The examples of N. equitans and M. kandleri highlight the power of R-protein combined datasets for phylogenetic reconstruction. We therefore applied the same approach to study the placement of C. symbiosum in the archaeal phylogeny. FIGURE 2 shows a maximum likelihood phy-logeny of the archaeal domain that is based on the con-catenation of 53 R-protein sequences from 48 complete archaeal genomes and was rooted using sequences from 16 eukaryotes. The phylogeny includes C. symbiosum, 33 Euryarchaeota and 14 hyperthermophilic crenarchae-ota, which represents 21 new species (11 Euryarchaeota, and 1 mesophilic and 9 hyperthermophilic crenarchaeota, respectively) with respect to previous similar analyses11. This tree is better resolved than the SSU/LSU rRNA tree in FIG. 1 (note the higher BVs at nodes in FIG. 2), and the positions of the newly included archaea are well supported and in agreement with their classification. Consequently, Thermofilum pendens, Caldivirga maquilingensis, Pyrobaculum calidifontis, Pyrobaculum arsenaticum and Pyrobaculum islandicum are grouped with Pyrobaculum aerophilum (group of Thermoproteales; BV of 100%), and Ignicoccus hospitalis, Staphylothermus marinus and Hyperthermus butylicus are grouped with Aeropyrum pernix (group of Desulfurococcales; BV of 97%), whereas Metallosphaera sedula is grouped with other Sulfolobales (BV of 100%). In Euryarchaeota, Natronomonas pharaonis, Halorubrum lacusprofundi and Haloquadratum walsbyi are grouped with other Halobacteriales (BV of 100%). The four Methanomicrobiales (Methanocorpusculum labreanum, Methanospirillum hungatei, Candidatus Methanoregula boonei and Methanoculleus marisnigri) are grouped together (BV of 100%) within a cluster that also contains Methanosarcinales (including their new rep-resentative Methanosaeta thermophila; BV of 100%) and Halobacteriales (BV of 100%). Finally, Methanosphaera stadtmanae emerges as a sister group of the other Methanobacteriale Methanothermobacter thermau-totrophicus (BV of 100%), whereas Methanococcus aeolicus and Methanococcus vannielii are grouped with the other Methanococcales (BV of 100%).

In contrast to the tree that is based on SSU and LSU rRNA (FIG. 1), most relationships among the archaeal orders are well resolved and in agreement with previous studies10, which highlights that R proteins are the phylo-genetic markers of choice to study the archaeal phylogeny. Importantly, the monophylies of both hyperthermophilic crenarchaeota and Euryarchaeota are robustly recovered (each has a BV of 100%; FIG. 2). Interestingly, C. symbiosum constitutes a deeply branching lineage (BV of 99%), as it is a sister group of a clade that contains both Euryarchaeota (including N. equitans) and hyperthermophilic crenar-chaeota. We think that this position is genuine and not the consequence of a long-branch attraction artefact, as the branch that leads to C. symbiosum is not particularly long

A N A LY S I S

248 | MARCH 2008 | VOLUME 6 www.nature.com/reviews/micro

© 2008 Nature Publi shing Group

Page 102: Early Evolution and Phylogeny

0.1

* Mesophilic crenarchaeota

Eucarya

Thaumarchaeota

Thermoproteales

Euryarchaeota

*

**

*

*

****

Sulfolobales

Desulfurococcales

Halobacteriales

Methanosarcinales

Methanomicrobiales

Thermoplasmatales

Archaeoglobales

Methanobacteriales

Methanococcales

Methanopyrales

Nanoarchaeota

*

*

**

***

Giardia lambliaEntamoeba histolytica

Leishmania majorTrypanosoma bruceiTrypanosoma cruzi

Cryptosporidium parvumTheileria parvaPlasmodium falciparumPlasmodium yoelii

Arabidopsis thalianaOryza sativa

Dictyostelium discoideumHomo sapiens

Anopheles gambiaeSaccharomyces cerevisiae

Schizosaccharomyces pombe

Cenarchaeum symbiosum

94

53

92

54

99

100100

100100

100

100

100

100

100

100

100

Thermofilum pendensCaldivirga maquilingensis

Pyrobaculum calidifontisPyrobaculum islandicumPyrobaculum aerophilumPyrobaculum arsenaticum

Metallosphaera sedulaSulfolobus solfataricus

Sulfolobus acidocaldariusSulfolobus tokodaii

Staphylothermus marinusAeropyrum pernix

Hyperthermus butylicus

99

90

7259

97

100

100100

100

100

100

100

100

Nanoarchaeum equitans100

44

100

7273

43

88

44

4547

100100

100

100100

100

100

100100

100

100

100100

100

100100

100100

100

100

100

77

Thermococcus gammatoleransThermococcus kodakarensis

Pyrococcus furiosusPyrococcus abyssiPyrococcus horikoshii

Methanopyrus kandleri

Methanosphaera stadtmanaeMethanothermobacter thermautotrophicus

Methanocaldococcus jannaschiiMethanococcus aeolicus

Methanococcus maripaludisMethanococcus vannielii

Picrophilus torridusFerroplasma acidarmanus

Thermoplasma acidophilumThermoplasma volcanium

Archaeoglobus fulgidus

Methanosaeta thermophilaMethanococcoides burtoniiMethanosarcina barkeriMethanosarcina acetivoransMethanosarcina mazei

Methanocorpusculum labreanumMethanoculleus marisnigri

Methanospirillum hungateiCandidatus Methanoregula sp.

Halorubrum lacusprofundiHaloquadratum walsbyi

Haloferax volcaniiHalobacterium halobiumNatronomonas pharaonisHaloarcula marismortui

*

*

*

*

Thermococcales

Ignicoccus hospitalis

Crenarchaeota

Hyp

erth

erm

op

hilic

cren

archae

ota

Figure 2 | Maximum likelihood tree based on the concatenation of

53 R proteins from complete archaeal genomes. Homologues of each

R protein in complete genomes were retrieved by BLASTP and TBLASTN60.

The concatenation included 53 alignments that harboured sequences from

at least 61 of 64 taxa. The maximum likelihood phylogenetic tree was

reconstructed using PHYML61, with the Jones Taylor Thornton model

of sequence evolution, by including a Γ-correction (eight categories of

evolutionary rates, an estimated α-parameter and an estimated proportion

of invariant sites). Numbers at nodes represent non-parametric bootstrap

values computed by PHYML61 (100 replications of the original dataset) using

the same parameters. The use of different evolutionary models and methods

did not produce differences in the resulting tree topology, at least for the

archaeal part of the tree (not shown). Asterisks indicate the 21 new species

(1 representative of the mesophilic crenarchaeota, Cenarchaeum

symbiosum, 9 representatives of hyperthermophilic crenarchaeota and

11 representatives of Euryarchaeota) that were included in this analysis

compared with previous work11. The scale bar represents the average

number of substitutions per site.

A N A LY S I S

NATURE REVIEWS | MICROBIOLOGY VOLUME 6 | MARCH 2008 | 249

© 2008 Nature Publi shing Group

Page 103: Early Evolution and Phylogeny

Thaumarchaeota(mesophiliccrenarchaeota)

Euryarchaeota

Crenarchaeota(hyperthermophiliccrenarchaeota)

14

1

293

25

2

7

in the tree (Fig. 2) or in individual R-protein trees (not shown), which indicates that its R proteins are not par-ticularly fast-evolving. Moreover, even the fast-evolving Thermoplasmatales and lonely taxon N. equitans are not artificially attracted at the base of the tree (FIG. 2). However, a definitive exclusion of a long-branch attraction artefact52 that could affect the position of C. symbiosum in this tree will only be possible by the addition of sequences from its relatives.

In conclusion, in contrast to SSU/LSU rRNA, analysis of R proteins improves the resolution of the deepest nodes in the archaeal phylogeny and suggests that mesophilic crenarchaeota could have diverged before the speciation of Euryarchaeota and hyperthermophilic crenarchaeota.

A conserved crenarchaeal genomic core?

Our SSU/LSU rRNA analysis only weakly suggests that mesophilic and hyperthermophilic crenarchaeota are sis-ter groups (FIG. 1). By contrast, the analysis of R proteins indicates a robust and deeper branching of C. symbiosum that occurred before the speciation between Euryarchaeota and hyperthermophilic crenarchaeota (FIG. 2). This place-ment implies that mesophilic crenarchaeota are not more related to hyperthermophilic crenarchaeota than they are to Euryarchaeota. Thus, we investigated the presence in C. symbiosum of genes that seem to be strictly specific to Euryarchaeota (genes that are present in at least one rep-resentative of each major order of Euryarchaeota, but are absent from all representatives of Crenarchaeota); strictly specific to hyperthermophilic crenarchaeota (genes that are present in at least one representative of each major order of thermophilic crenarchaeota, but are absent from all representatives of Euryarchaeota); or that are com-mon to Euryarchaeota and thermophilic crenarchaeota (FIG. 3). This criterion might seem stringent, as it excludes the markers that have been secondarily lost from some lineages (for example, histones in Thermoplasmatales). However, it has the advantage of focusing on genes that comprise the strictly conserved genomic core of Euryarchaeota and hyperthermophilic crenarchaeota, but avoiding the introduction of ambiguities that are due to genes with scattered distributions.

Using the NCBI COGs database (see Further informa-tion)53, we identified 12 proteins that are strictly specific to Euryarchaeota, 15 proteins that are strictly specific to hyperthermophilic crenarchaeota (Supplementary infor-mation S2 (table)) and 318 proteins that are common to both phyla. Surprisingly, we found that C. symbiosum har-bours 10 of the 12 euryarchaeal-specific proteins. Because HGTs from Euryarchaeota to mesophilic crenarchaeota were detected in a genome fragment from an uncultivated mesophilic crenarchaeon32, we carried out a phylogenetic analysis of the ten euryarchaeal-specific proteins that were harboured by C. symbiosum. These trees, although generally poorly resolved (not shown), revealed that only three of these proteins might be present owing to HGT, whereas the remaining seven are probably ancestral traits that are common to Euryarchaeota and C. symbiosum (FIG. 3; Supplementary information S2 (table)). By con-trast, C. symbiosum lacks 14 of the 15 hyperthermophilic crenarchaeal-specific proteins (including two R proteins) (FIG. 3; Supplementary information S2 (table)). Thus, with respect to the conserved genomic core, the mesophilic crenarchaeon C. symbiosum seems to be more similar to Euryarchaeota than to hyperthermophilic crenar-chaeota. Importantly, a few of the euryarchaeal-specific genes that are present in C. symbiosum encode proteins that are involved in core cellular processes, such as DNA replication and cell division (Supplementary informa-tion S2 (table)), which shows that biologically important differences distinguish this organism, and by extension all mesophilic crenarchaeota, from hyperthermophilic crenarchaeota.

In addition to the presence of most euryarchaeal-specific proteins and absence of most proteins that are specific to hyperthermophilic crenarchaeota, C. symbiosum also lacks 25 proteins that are present in both Euryarchaeota and hyperthermophilic cre-narchaeota, including the R protein S24e and the type I DNA topoisomerase of the A family (IA) (FIG. 3; Supplementary information S2 (table)). The absence of topoisomerase IA from C. symbiosum is surprising, as a protein from this family is present in representa-tives from the three domains of life54, including archaea. Finally, C. symbiosum lacks the R protein L14e, which is present in all available genomes from hyperthermophilic crenarchaeota and basal euryarchaeota (Methanopyrales, Methanococcales, Methanobacteriales, Thermococcales and N. equitans), and the R protein L20a, which is present in all archaeal genomes except Thermoplasmatales. Moreover, we have identified potentially informative insertions and deletions (indels) in two other proteins, the R protein S27ae (hyperthermophilic crenarchaeota harbour a three-amino acid insertion that is absent from Euryarchaeota and mesophilic crenarchaeota) and the elongation factor EF-1α (both hyperther-mophilic and mesophilic crenarchaeota harbour a con-served seven-amino acid insertion that is absent from Euryarchaeota). The distribution patterns of the features in the genome of C. symbiosum discussed above are puz-zling, because they suggest that mesophilic crenarchae-ota have a combination of traits that are either specific to hyperthermophilic crenarchaeota or Euryarchaeota.

Figure 3 | Scheme showing the number of proteins

shared by Euryarchaeota, mesophilic crenarchaeota

and hyperthermophilic crenarchaeota.

A N A LY S I S

250 | MARCH 2008 | VOLUME 6 www.nature.com/reviews/micro

© 2008 Nature Publi shing Group

Page 104: Early Evolution and Phylogeny

Similar genome-mining data were recently obtained independently by Makarova, Koonin and co-workers55, using an updated version of the NCBI COGs database that focused on Archaea. These authors noticed that the genome of C. symbiosum includes a much lower propor-tion of archaeal COGs than other archaeal genomes and groups with Euryarchaeota in a gene-content tree. They concluded from their analysis that “C. symbiosum is not a typical crenarchaeon (REF. 55)”.

A third archaeal phylum?

Our SSU/LSU rRNA tree (FIG. 1) and analysis of the conserved genomic cores strongly reject the hypothesis that mesophilic crenarchaeota evolved from hyperther-mophilic crenarchaeota (BV of 100%, which supports the monophyly of hyperthermophilic crenarchaeota). Moreover, our R-protein concatenation tree (FIG. 2) strongly rejects a sister-group relationship between hyperthermophilic crenarchaeota and C. symbiosum. Rather, it favours a deeper branching before the speciation of hyperthermophilic crenarchaeota and Euryarchaeota. The analysis of the genomic cores shows that C. symbiosum shares more features with Euryarchaeota than with hyperthermophilic crenar-chaeota. This might indicate that C. symbiosum and its uncultivated relatives either belong to, or are sister to, Euryarchaeota. However, this is excluded by our phylo-genetic analyses. Consistent with the basal emergence of mesophilic crenarchaeota, the genes of the euryarchaeal core that are shared with C. symbiosum can be inter-preted as being ancestral characters that were present in the ancestor of archaea and were secondarily lost in the branch that led to hyperthermophilic crenarchaeota. We predict that the genomes of other mesophilic crenar-chaeota from marine and terrestrial environments56, such as Candidatus N. maritimus, will confirm our results when they become available for analysis. Moreover, this will enable the identification of features that are specific to the group, such as a conserved genomic core. One such feature could be the presence of a type I DNA topoisomerase of the B family (IB), which we detected in the genome of C. symbiosum. Whereas members of the topoisomerase IB family have never been identified in archaea, they are almost universal in eukarya and rarely present in bacteria54. This probably correlates with the absence from C. symbiosum of topoisomerase IA, which is present in all other archaea. Interestingly, the topoisomerase IB of C. symbiosum branches as a sister group to eukaryotes (not shown), which suggests that it was not transferred from the sponge host. A topoisomerase IB that was present in the last common ancestor of archaea and eukaryotes could later have been lost in the lineage that led to Euryarchaeota and hyperthermophilic crenarchaeota after their divergence from mesophilic crenarchaeota.

The diversity of mesophilic crenarchaeota based on SSU rRNA sequences25,26,56,57 is comparable to that of hyperthermophilic crenarchaeota and Euryarchaeota, which suggests that they represent a major lineage that has equal status to Euryarchaeota and hyperthermo-philic crenarchaeota. Indeed, environmental SSU rRNA

surveys have already revealed several likely order-level subgroups within mesophilic crenarchaeota25,26,56. Moreover, the basal placement of one of their repre-sentatives in the archaeal phylogeny (FIG. 2) suggests that mesophilic crenarchaeota are an ancient lineage. This leads us to propose that mesophilic crenarchaeota repre-sent a third archaeal phylum that we suggest naming the Thaumarchaeota (from the Greek ‘thaumas’, meaning wonder). This choice was made to avoid any name that referred to phenotypic properties, such as mesophily, that could be challenged by the future identification of non-mesophilic organisms that belong to this phylum or the discovery of mesophilic relatives of cultivated hyperthermophilic crenarchaeota.

We stress that the classification of archaeal group I and its relatives as crenarchaeota was dubious from the outset, because their sequences formed only a sister group of hyperthermophilic crenarchaeota in the first rRNA trees23. The acceptance of this classification was probably influenced by the fact that the proposal to split the archaeal domain between Crenarchaeota and Euryarchaeota had only recently been made6. Clearly, the current classification of mesophilic crenarchaeota as Crenarchaeota is misleading, just as it is misleading to call methanogens ‘methanogenic bacteria’ because all methanogens are archaea. The proposal to establish mesophilic crenarchaeota as a third archaeal phylum goes beyond purely taxonomic purposes, and will stimulate research on this group of organisms and, more generally, on the Archaea.

Further phylogenetic analyses that include new members of the Thaumarchaeota are required to confirm the position of this phylum in the archaeal phylogeny. In any case, even if the basal branching of mesophilic crenarchaeota is challenged in favour of a sister grouping with hyperthermophilic crenarchaeota, this should not, in our opinion, change their phylum status, as they would remain a highly diversified and ancient group that have peculiar genomic character-istics. If the emergence of Thaumarchaeota prior to the speciation of Crenarchaeota and Euryarchaeota (as supported by R-protein analysis) is confirmed, this will leave open the nature of the last archaeal ancestor, which might have been either a mesophilic or psychrophilic organism (such as Thaumarchaeota) or a hyperthermophilic or thermophilic organism (such as cultivated crenarchaeota and some euryarchaeota). Importantly, the nature of the archaeal ancestor pro-vides a different meaning for the HGTs from meso-philic euryarchaeota and bacteria to Thaumarchaeota that were highlighted from environmental genomics studies32. If the ancestor of Archaea was a hyperthermo-phile, HGT might have enabled the adaptation of hyperthermophilic thaumarchaeal lineages towards mesophily, as has been previously suggested32. Conversely, if the archaeal ancestor was a mesophile, HGT might have occurred between organisms that were thriving in the same low-temperature environ-ments. Further studies on Thaumarchaeota will be essential to gain fundamental insights into the origin and early evolution of Archaea.

Mesophile

This term is normally restricted

to organisms that have optimal

growth temperatures of

between 20 and 50°C. Here,

however, the term mesophilic

crenarchaeota is given to all

non-hyperthermophilic

crenarchaeota, even though

some of them (presently

uncultivated) are psychrophiles

(optimal growth temperature of

between O and 20°C) or

moderate thermophiles

(optimal growth temperature of

between 50 and 70°C).

A N A LY S I S

NATURE REVIEWS | MICROBIOLOGY VOLUME 6 | MARCH 2008 | 251

© 2008 Nature Publi shing Group

Page 105: Early Evolution and Phylogeny

1. Pace, N. R. A molecular view of microbial diversity and

the biosphere. Science 276, 734–740 (1997).

2. Woese, C. R. & Fox, G. E. Phylogenetic structure of the

prokaryotic domain: the primary kingdoms. Proc. Natl Acad. Sci. USA 74, 5088–5090 (1977).

3. Fox, G. E. et al. The phylogeny of prokaryotes. Science

209, 457–463 (1980).

4. Woese, C. R. Bacterial evolution. Microbiol. Rev. 51,

221–271 (1987).

5. Woese, C. R. in Archaea: Evolution, Physiology and Molecular Biology (eds Garrett, R. A. & Klenk, H. P.)

1–15 (Blackwell publishing, Oxford, 2006).

An eloquent historical review that reported, for the

first time, all the steps that led to the discovery of

Archaea. Forms the introduction of an interesting

book that covers the various aspects of archaeal

physiology and molecular biology and focuses on

the similarities between Archaea and Eukaryotes.

6. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a

natural system of organisms: proposal for the domains

Archaea, Bacteria, and Eucarya. Proc. Natl Acad. Sci. USA 87, 4576–4579 (1990).

7. Woese, C. R., Gupta, R., Hahn, C. M., Zillig, W. & Tu, J.

The phylogenetic relationships of three sulfur

dependent archaebacteria. Syst. Appl. Microbiol. 5,

97–105 (1984).

8. Prangishvilli, D., Zillig, W., Gierl, A., Biesert, L. & Holz, I.

DNA-dependent RNA polymerase of thermoacidophilic

archaebacteria. Eur. J. Biochem. 122, 471–477

(1982).

9. Makarova, K. S. & Koonin, E. V. Comparative

genomics of archaea: how much have we learned in six

years, and what’s next? Genome Biol. 4, 115 (2003).

10. Brochier, C., Forterre, P. & Gribaldo, S. An emerging

phylogenetic core of Archaea: phylogenies of

transcription and translation machineries converge

following addition of new genome sequences. BMC Evol. Biol. 5, 36 (2005).

Revealed a conserved core of vertically inherited

genes in different cellular systems, which proved

that reconstructing the phylogeny of species is a

feasible task in Archaea.

11. Gribaldo, S. & Brochier-Armanet, C. The origin and

evolution of Archaea: a state of the art. Phil. Trans. R. Soc. Lond. B 361, 1007–1022 (2006).

12. Daubin, V., Gouy, M. & Perriere, G. A phylogenomic

approach to bacterial phylogeny: evidence of a core of

genes sharing a common history. Genome Res. 12,

1080–1090 (2002).

13. Wolf, Y. I., Rogozin, I. B., Grishin, N. V., Tatusov, R. L.

& Koonin, E. V. Genome trees constructed using five

different approaches suggest new major bacterial

clades. BMC Evol. Biol. 1, 8 (2001).

14. Forterre, P., Gribaldo, S. & Brochier-Armanet, C.

in Archaea: Evolution, Physiology and Molecular Biology (eds Garrett, R. A. & Klenk, H. P.) 17–29

(Blackwell publishing, Oxford, 2006).

15. Bernander, R. Chromosome replication, nucleoid

segregation and cell division in Archaea. Trends Microbiol. 8, 278–283 (2000).

16. Myllykallio, H. et al. Bacterial mode of replication

with eukaryotic-like machinery in a hyperthermophilic

archaeon. Science 288, 2212–2215 (2000).

17. Makarova, K. S. & Koonin, E. V. Evolutionary and

functional genomics of the Archaea. Curr. Opin. Microbiol. 8, 586–594 (2005).

18. Klenk, H. P. in Archaea: Evolution, Physiology and Molecular Biology (eds Garrity, G. M. & Klenk, H. P.)

75–95 (Blackwell publishing, Oxford, 2006).

19. Uemori, T., Sato, Y., Kato, I., Doi, H. & Ishino, Y. A novel

DNA polymerase in the hyperthermophilic archaeon,

Pyrococcus furiosus: gene cloning, expression, and

characterization. Genes Cells 2, 499–512 (1997).

20. Margolin, W., Wang, R. & Kumar, M. Isolation of an

ftsZ homolog from the archaebacterium Halobacterium salinarium: implications for the evolution of FtsZ and

tubulin. J. Bacteriol. 178, 1320–1327 (1996).

21. Allers, T. & Mevarech, M. Archaeal genetics — the

third way. Nature Rev. Genet. 6, 58–73 (2005).

22. Olsen, G. J., Lane, D. J., Giovannoni, S. J., Pace, N. R.

& Stahl, D. A. Microbial ecology and evolution: a

ribosomal RNA approach. Annu. Rev. Microbiol. 40,

337–365 (1986).

23. DeLong, E. F. Archaea in coastal marine environments.

Proc. Natl Acad. Sci. USA 89, 5685–5689 (1992).

24. Fuhrman, J. A., McCallum, K. & Davis, A. A. Novel

major archaebacterial group from marine plankton.

Nature 356, 148–149 (1992).

25. Schleper, C., Jurgens, G. & Jonuscheit, M. Genomic

studies of uncultivated archaea. Nature Rev. Microbiol. 3, 479–488 (2005).

A fascinating review that underlines the diversity of

Archaea and the importance of using metagenomic

approaches to gain insights into their biology and

physiology.

26. Schleper, C. in Archaea: Evolution, Physiology and Molecular Biology (eds Garrett, R. A. & Klenk, H. P.)

39–50 (Blackwell publishing, Oxford, 2006).

27. Preston, C. M., Wu, K. Y., Molinski, T. F. & DeLong,

E. F. A psychrophilic crenarchaeon inhabits a marine

sponge: Cenarchaeum symbiosum gen. nov., sp. nov.

Proc. Natl Acad. Sci. USA 93, 6241–6246 (1996).

28. Schleper, C., Swanson, R. V., Mathur, E. J. & DeLong,

E. F. Characterization of a DNA polymerase from the

uncultivated psychrophilic archaeon Cenarchaeum symbiosum. J. Bacteriol. 179, 7803–7811 (1997).

29. Garrett, R. A. & Klenk, H. P. (eds) Archaea: Evolution, Physiology and Molecular Biology (Blackwell

publishing, Oxford, 2006).

30. Barns, S. M., Delwiche, C. F., Palmer, J. D. & Pace,

N. R. Perspectives on archaeal diversity, thermophily

and monophyly from environmental rRNA sequences.

Proc. Natl Acad. Sci. USA 93, 9188–9193 (1996).

31. Hershberger, K. L., Barns, S. M., Reysenbach, A. L.,

Dawson, S. C. & Pace, N. R. Wide diversity of

Crenarchaeota. Nature 384, 420 (1996).

32. Lopez-Garcia, P., Brochier, C., Moreira, D. &

Rodriguez-Valera, F. Comparative analysis of a genome

fragment of an uncultivated mesopelagic

crenarchaeote reveals multiple horizontal gene

transfers. Environ. Microbiol. 6, 19–34 (2004).

33. Robertson, C. E., Harris, J. K., Spear, J. R. & Pace, N. R.

Phylogenetic diversity and ecology of environmental

Archaea. Curr. Opin. Microbiol. 8, 638–642 (2005).

A recent, exhaustive archaeal phylogeny that was

based on an analysis of SSU rRNA sequences.

Despite the use of a large number of sequences,

this marker was unable to resolve the deepest

nodes of the archaeal phylogeny.

34. Cubonova, L., Sandman, K., Hallam, S. J., Delong,

E. F. & Reeve, J. N. Histones in Crenarchaea.

J. Bacteriol. 187, 5482–5485 (2005).

35. Ochsenreiter, T., Selezi, D., Quaiser, A., Bonch-

Osmolovskaya, L. & Schleper, C. Diversity and

abundance of Crenarchaeota in terrestrial habitats

studied by 16S RNA surveys and real time PCR.

Environ. Microbiol. 5, 787–797 (2003).

36. Wuchter, C. et al. Archaeal nitrification in the ocean.

Proc. Natl Acad. Sci. USA 103, 12317–12322 (2006).

37. Leininger, S. et al. Archaea predominate among

ammonia-oxidizing prokaryotes in soils. Nature 442,

806–809 (2006).

38. Konneke, M. et al. Isolation of an autotrophic

ammonia-oxidizing marine archaeon. Nature 437,

543–546 (2005).

The first report of the isolation of a member of the

Thaumarchaeota, and the first demonstration of

the ability of an isolated archaeon to oxidize

ammonium.

39. Nunoura, T. et al. Genetic and functional properties of

uncultivated thermophilic crenarchaeotes from a

subsurface gold mine as revealed by analysis of

genome fragments. Environ. Microbiol. 7,

1967–1984 (2005).

40. Woese, C. R., Achenbach, L., Rouviere, P. &

Mandelco, L. Archaeal phylogeny: reexamination of

the phylogenetic position of Archaeoglobus fulgidus in

light of certain composition-induced artifacts. Syst. Appl. Microbiol. 14, 364–371 (1991).

41. Boussau, B. & Gouy, M. Efficient likelihood

computations with nonreversible models of evolution.

Syst. Biol. 55, 756–768 (2006).

42. Hallam, S. J. et al. Genomic analysis of the

uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proc. Natl Acad. Sci. USA 103,

18296–18301 (2006).

43. Matte-Tailliez, O., Brochier, C., Forterre, P. &

Philippe, H. Archaeal phylogeny based on ribosomal

proteins. Mol. Biol. Evol. 19, 631–639 (2002).

44. Waters, E. et al. The genome of Nanoarchaeum equitans: insights into early archaeal evolution and

derived parasitism. Proc. Natl Acad. Sci. USA 100,

12984–12988 (2003).

45. Slesarev, A. I. et al. The complete genome of

hyperthermophile Methanopyrus kandleri AV19 and

monophyly of archaeal methanogens. Proc. Natl Acad. Sci. USA 99, 4644–4649 (2002).

46. Gribaldo, S. & Philippe, H. Ancient phylogenetic

relationships. Theor. Popul. Biol. 61, 391–408 (2002).

47. Huber, H. et al. A new phylum of Archaea represented

by a nanosized hyperthermophilic symbiont. Nature

417, 63–67 (2002).

48. Brochier, C., Gribaldo, S., Zivanovic, Y., Confalonieri, F.

& Forterre, P. Nanoarchaea: representatives of a novel

archaeal phylum or a fast-evolving euryarchaeal

lineage related to Thermococcales? Genome Biol. 6,

R42 (2005).

49. Burggraf, S., Stetter, K. O., Rouviere, P. & Woese, C. R.

Methanopyrus kandleri: an archaeal methanogen

unrelated to all other known methanogens. Syst. Appl. Microbiol. 14, 346–351 (1991).

50. Brochier, C., Forterre, P. & Gribaldo, S. Archaeal

phylogeny based on proteins of the transcription and

translation machineries: tackling the Methanopyrus kandleri paradox. Genome Biol. 5, R17 (2004).

51. Bapteste, E., Brochier, C. & Boucher, Y. Higher-level

classification of the Archaea: evolution of

methanogenesis and methanogens. Archaea 1,

353–363 (2005).

52. Felsenstein, J. Cases in which parsimony or

compatibility methods will be positively misleading.

Syst. Zool. 27, 401–410 (1978).

53. Tatusov, R. L. et al. The COG database: new

developments in phylogenetic classification of proteins

from complete genomes. Nucleic Acids Res. 29,

22–28 (2001).

54. Forterre, P., Gribaldo, S., Gadelle, D. & Serre, M. C.

Origin and evolution of DNA topoisomerases.

Biochimie 89, 427–446 (2007).

55. Makarova, K. S., Wolf, Y. I., Sorokin, A. V. & Koonin,

E. V. Clusters of orthologous genes for 41 archaeal

genomes and implications for evolutionary genomics

of archaea. Biol. Direct 2, 33 (2007).

A dedicated COG database for Archaea that

highlights the important differences between

Thaumarchaeota and hyperthermophilic

crenarchaeota.

56. Nicol, G. W. & Schleper, C. Ammonia-oxidising

Crenarchaeota: important players in the nitrogen

cycle? Trends Microbiol. 14, 207–212 (2006).

Together with reference 37, this review revealed

the unsuspected role of archaea in the global

nitrogen cycle, which was previously assumed to be

carried out by bacteria.

57. Forterre, P., Brochier, C. & Philippe, H. Evolution of the

Archaea. Theor. Popul. Biol. 6, 409–422 (2002).

58. Edgar, R. C. MUSCLE: a multiple sequence alignment

method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).

59. Philippe, H. MUST, a computer package of

Management Utilities for Sequences and Trees.

Nucleic Acids Res. 21, 5264–5272 (1993).

60. Altschul, S. F. & Koonin, E. V. Iterated profile searches

with PSI-BLAST — a tool for discovery in protein

databases. Trends Biochem. Sci. 23, 444–447

(1998).

61. Guindon, S. & Gascuel, O. A simple, fast, and accurate

algorithm to estimate large phylogenies by maximum

likelihood. Syst. Biol. 52, 696–704 (2003).

AcknowledgmentsThe authors thank G. Sezonov for suggesting the name

Thaumarchaeota, E. Koonin for unpublished communications

and the referees for useful comments and suggestions.

DATABASESEntrez Genome Project: http://www.ncbi.nlm.nih.gov/

entrez/query.fcgi?db=genomeprj

Aeropyrum pernix | Caldivirga maquilingensis | Candidatus

Methanoregula boonei | Cenarchaeum symbiosum |

Halorubrum lacusprofundi | Haloquadratum walsbyi |

Hyperthermus butylicus | Ignicoccus hospitalis |

Metallosphaera sedula | Methanococcus aeolicus |

Methanococcus vannielii | Methanocorpusculum labreanum |

Methanoculleus marisnigri | Methanopyrus kandleri |

Methanosaeta thermophila | Methanosphaera stadtmanae |

Methanospirillum hungatei | Methanothermobacter

thermautotrophicus | Nanoarchaeum equitans | Natronomonas

pharaonis | Pyrobaculum aerophilum | Pyrobaculum

arsenaticum | Pyrobaculum calidifontis | Pyrobaculum

islandicum | Staphylothermus marinus | Thermofilum pendens

FURTHER INFORMATIONCéline Brochier-Armanet’s homepage: http://www.frangun.org

Céline Brochier-Armanet’s laboratory website:

http://lcb.cnrs-mrs.fr/

NCBI COGs database: http://www.ncbi.nlm.nih.gov/COG/

SUPPLEMENTARY INFORMATIONSee online article: S1 (table) | S2 (table)

ALL LINKS ARE ACTIVE IN THE ONLINE PDF

A N A LY S I S

252 | MARCH 2008 | VOLUME 6 www.nature.com/reviews/micro

© 2008 Nature Publi shing Group

Page 106: Early Evolution and Phylogeny
Page 107: Early Evolution and Phylogeny

6Pttr♥ Pr♦ss ♥ t r②

♦t♦♥ ♦ ♠♣rtr ♦♥ rt

♥P② ♦ ♥t r♦♠ ♠♣r♦♠♥ts ♥ ts ♣t② t♦ ①♣♦r t s♣♦ tr t♦♣♦♦s t s ♦r t♦ r♦♥strt ♣rt ♦ t ♣r♦ss ♦ ♦t♦♥ ♥ ♥♦t② ♦ t sq♥ ♦♥t♥t ♦ s ts ♣r♦r♠t♦ r♦♥strt t ♦t♦♥ ♦r r ♦♥t♥ts ♦♥ t tr ♦ ♥ ♣r ♠ ♥qrt ♥ ♦s rt♦t ♥②s t ♦t♦♥ ♦ ♣r♦t♥sq♥ ♦♠♣♦st♦♥ ♦rrt♦♥s t♥ sq♥ ♦♠♣♦st♦♥ ♥ r♦tt♠♣rtr ♦♥ ② ♥♠r s ♦ s t♦ ♣r♦♣♦s s♥r♦ ♦rt ♦t♦♥ ♦ r♦t t♠♣rtrs ♦♥ t tr ♦

r rsts s t♦ ♦♥sr t ♦♦ r♦r ♥ sr ♦ trs ♦♥ts tt ♠t s t ♦t♦♥s tt ♦sr tr sts♠t ♥t r♦♠ t ss♦t♦♥s ♦ ts t♦ s♣♥s

s rt s ♥ ♣t ♦r ♣t♦♥ ♥ tr

♦♠♣♥②♥ ♣♣♠♥tr② trs ♥ ♦♥ t t ♦♦♥ rss

tt♣♦♠sr♥②♦♥r⑦♦ssrt♣♣♠♥tr②tr♣

Page 108: Early Evolution and Phylogeny

LETTERS

Parallel adaptations to high temperatures in theArchaean eonBastien Boussau1*, Samuel Blanquart2*, Anamaria Necsulea1, Nicolas Lartillot2 & Manolo Gouy1

Fossils of organisms dating from the origin and diversification ofcellular life are scant and difficult to interpret1, for this reasonalternative means to investigate the ecology of the last universalcommon ancestor (LUCA) and of the ancestors of the threedomains of life are of great scientific value. It was recently recog-nized that the effects of temperature on ancestral organisms left‘genetic footprints’ that could be uncovered in extant genomes2–4.Accordingly, analyses of resurrected proteins predicted that thebacterial ancestor was thermophilic and that Bacteria subsequentlyadapted to lower temperatures3,4. As the archaeal ancestor is alsothought to have been thermophilic5, the LUCAwas parsimoniouslyinferred as thermophilic too. However, an analysis of ribosomalRNAs supported the hypothesis of a non-hyperthermophilicLUCA2. Here we show that both rRNA and protein sequencesanalysed with advanced, realistic models of molecular evolution6,7

provide independent support for two environmental-temperature-related phases during the evolutionary history of the tree of life. Inthe first period, thermotolerance increased from a mesophilicLUCA to thermophilic ancestors of Bacteria and of Archaea–Eukaryota; in the second period, it decreased. Therefore, the twolineages descending from the LUCA and leading to the ancestors ofBacteria and Archaea–Eukaryota convergently adapted to hightemperatures, possibly in response to a climate change of the earlyEarth1,8,9, and/or aided by the transition from an RNA genome inthe LUCA to organismswithmore thermostableDNAgenomes10,11.This analysis unifies apparently contradictory results2–4 into acoherent depiction of the evolution of an ecological trait over theentire tree of life.

Investigations into whether the LUCA was a hyperthermophilic(optimal growth temperature (OGT)$80 uC), thermophilic (OGT50–80 uC), or mesophilic (OGT#50 uC) organism have relied oncorrelations between the species’ OGT and the composition of theirmacromolecular sequences. In extant prokaryotic species, the G1Ccontent of rRNA stems (that is, double-stranded parts) has beenshown to correlate with OGT12. Exploiting this correlation, supportwas obtained for a non-hyperthermophilic LUCA2. In contrast,studies based on correlations between the composition of theLUCA’s proteins andOGT concluded in favour of a hyperthermophi-lic LUCA13,14 and of hyperthermophilic ancestors for both Archaeaand Bacteria. The discrepancy between these results could come fromsome unexplained incongruence between rRNA and proteins, or, aswe shall see, from differences between evolutionary models used.

These previous investigations2,13,14 based their conclusions on com-parisons of reconstructed ancestral sequence compositions with extantones. Accurate modelling of the evolution of compositions is thereforecrucial for such approaches. Two of these studies13,14 relied on homo-geneous models of evolution which make the simplifying hypothesisthat substitutionsoccurwith constantprobabilities over timeandacross

all lineages. If genomes and proteins had evolved according to a homo-geneous model, they would all share the same base and amino acidcompositions. Clearly, rRNA12 and protein sequences15 do not.Another approach2 has been to use a branch-heterogeneous model ofRNA sequence evolution. Branch-heterogeneous models are computa-tionally more challenging, but more realistic as they allow replacementor substitutionprobabilities to vary between lineages, and thus explicitlyaccount for compositional drifts2,6,7,16,17. Accordingly, they have beenshown to accurately reconstruct ancestral sequence compositions7.

We recently developed nhPhyML7, an efficient program for thebranch-heterogeneous modelling of nucleotide sequence evolutionin the maximum likelihood framework, and nhPhyloBayes6, whichimplements a site- and branch-heterogeneous Bayesianmodel of pro-tein sequence evolution. The latter combines the break-pointapproach17 tomodel variations of amino acid replacement rates alongbranches and the CAT18mixture model to account for site-wise varia-tions of these rates. These models have been shown to describe theevolution of real sequences more faithfully than homogeneousones6,17, although neither homogeneous nor heterogeneous modelsensure that inferred ancestral sequences are biologically functional.Using nhPhyML and nhPhyloBayes, we can reconstruct ancestralsequences of both rRNAs and proteins with branch-heterogeneousmodels, and estimate sequence compositions of all nodes of the treeof life, including the LUCA and its descendants. These compositionscan be translated into approximate OGTs using the OGT/composi-tion correlations observed in extant sequences12,15.

A nucleotide data set of concatenated small- and large-subunitrRNAs—restricted to double-stranded regions—from 456 orga-nisms (1,043 sites), and an amino acid data set of 56 concatenatednearly universal proteins from 30 organisms (3,336 sites), wereassembled, each data set sampling all forms of cellular life.Correspondence analyses of the protein data set show that eukaryotesand prokaryotes markedly differ in amino acid compositions andthat an effect of temperature on proteomes is detectable only amongprokaryotic species (Supplementary Figs 4 and 6b). Similarly, thecorrelation between rRNA G1C content and OGT has only beendocumented in prokaryotes12. The ability to infer ancestral OGTsfrom rRNA and protein compositions therefore applies only to pro-karyotes. However, eukaryotic sequences were kept in the subsequentanalyses because they are part of the tree of life and as such provideuseful phylogenetic information for ancestral sequence inferences.

The effect of temperature on prokaryotic proteomes is independentfrom genomic G1C contents15, and was summarized in terms ofaverage content in the amino acids I, V, Y, W, R, E and L (hereafterreferred to as IVYWREL). Accordingly, our correspondence analysisidentifies two independent factors accounting formost of the variancein amino acid compositions of prokaryotic proteins (SupplementaryFig. 5). The first factor (45.4% of the variance) highly correlates to

*These authors contributed equally to this work.

1Laboratoire de Biometrie et Biologie Evolutive, CNRS, Universite de Lyon, Universite Lyon I, 43 Boulevard du 11 Novembre, 69622 Villeurbanne, France. 2LIRMM, CNRS, 161 rue Ada,

34392 Montpellier, France. Present address: Departement de Biochimie, Universite de Montreal, C.P. 6128, succursale Centre-Ville, Montreal QC H3C3J7, Canada.

doi:10.1038/nature07393

1

©2008 Macmillan Publishers Limited. All rights reserved

Page 109: Early Evolution and Phylogeny

genomeG1Ccontent (r5 0.81); the second (13.8%of the variance) isstrongly correlated to OGT (r5 0.83) and to IVYWREL content(r5 0.73, Supplementary Fig. 6). The second factorwas therefore usedhere as a molecular thermometer. The rRNA-based and the protein-based thermometers are thus independent, both because they comefrom distinct genome parts and because they exploit different effectsof temperature on sequence composition. Furthermore, the correla-tion between rRNA G1C content and OGT is not expected to varyduring evolutionary time because it stems from the different thermalstabilities of G–C and A–URNA base pairs12. Thus, assuming that therelationship between temperature and amino acid composition ofprokaryotes has also not varied since LUCA, the estimations ofrRNA G1C content and amino acid compositions through branch-heterogeneousmodels provide two independent means to analyse theevolution of thermophily.

For each data set, a phylogenetic tree was inferred and rooted on thebranch separating Bacteria from Archaea and Eukaryota(Supplementary Figs 7 and 8). Because the location of the root inthe universal tree remains uncertain19, the alternative rooting on theeukaryotic branch was also considered. Correlations between G1Ccontent and OGT (Fig. 1a), and between the second axis of the amino

acid correspondence analysis andOGT (Fig. 1b), were used to estimateOGTs for the LUCA and its descendants (Fig. 2).

Proteins and rRNAs support similar patterns of OGT changes forprokaryotes, so the discrepancy between previous rRNA- and pro-tein-based investigations2,13,14 was not a result of incongruencebetween these molecules. Protein-derived temperature estimatesare generally lower than those based on rRNAs (Fig. 1), althoughsome protein and rRNA-based OGT estimates overlap if confidenceintervals of ancestral compositions are taken into account(Supplementary Table 3). Both types of data support key conclusions(Fig. 1). First, the LUCA is predicted to be a non-hyperthermophilicorganism, as previously reported2. Second, both archaeal and bac-terial ancestors, as well as the common ancestor of Archaea andEukaryota, are estimated to have been thermophilic to hyperthermo-philic (Fig. 2). This result is in line with previous studies3,5. Third,within the bacterial phylogenetic tree, tolerance to heat decreased(Fig. 2). This last result is congruent with recent estimates of theevolution of OGTs in the bacterial domain based on ancestral recon-structions and characterizations of elongation factor Tu proteins4.

Support for the hypothesis of a non-hyperthermophilic LUCA andof subsequent parallel adaptations to high temperatures partly restson a protein content depleted in IVYWREL for the LUCA and sub-sequently enriched in these amino acids. This is consistent with arecent report that amino acids IVYEW might be under-representedin LUCA’s proteins20. This finding has been interpreted as evidencethat these five amino acids were a late addition to the genetic code,

b

a

Second factor of the amino-acid correspondence analysis

LUCA

Archaea ancestorArchaea–

Eukaryota

ancestor Bacteria

ancestor

0.05 0.10 0.15

rRNA stem G+C content (%)

65 70 75 80 85

100

80

60

40

20

0

Op

tim

al g

row

th t

em

pera

ture

(ºC

)

100

80

60

40

20

Op

tim

al g

row

th t

em

pera

ture

(ºC

)

LUCA Archaea

ancestor

Archaea–

Eukaryota

ancestor

Bacteria

ancestor

–0.10 –0.05 0

Figure 1 | Correlations between sequence compositions and OGT, and

estimates of key ancestral compositions. Black dots indicate extantprokaryotes positioned according to their sequence composition and OGT.Dashed coloured lines indicate predicted OGTs for various ancestors.a, Correlation between rRNAG1C content and OGT. The vertical colouredbars indicate most likely nhPhyML estimates of ancestral G1C contentswith their 95% confidence intervals. b, Correlation between the secondfactor of the correspondence analysis on amino acid compositions andOGT.The vertical coloured bars indicate median ancestral compositions inferredby nhPhyloBayes with their 95% confidence intervals. The LUCA issignificantly less thermophilic than its direct descendants (P# 0.005).

0.5

20

(1–37)

69

(64–75)

55

(45–65) 66

10

Thermoplasma (Eu)

Archaeoglobus (Eu)Methanobacterium (Eu)

Haloarcula (Eu)

Cenarchaeum (Cr)

Aeropyrum (Cr)Nanoarchaeum (Eu)

Giardia

Trichomonas

TetrahymenaPlasmodium

Leishmania

Homo

Dictyostelium

Aquifex (Aq)

Thermotoga (T)

Thermus (DT)

Deinococcus (DT)

Rubrobacter (Ac)

Bacillus (F)

Phytoplasma (F)Dehalococcoides (Cf)

Gloeobacter (C)

Campylobacter (P)

Pseudomonas (P)

Agrobacterium (P)

Desulfuromonas (P)

Parachlamydia (Ch)Kuenenia (Pl)

Cytophaga (Ba)

20 30 40 50 60 70 80 90

(59–73)

Figure 2 | Evolution of thermophily over the tree of life. Protein-derivednhPhyloBayes OGT estimates (and their 95% confidence intervals for keyancestors) for prokaryotic organisms are colour-coded from blue to red forlow to high temperatures. Colours were interpolated between temperaturesestimated at nodes. The eukaryotic domain, in which OGT cannot beestimated, has been shaded. The colour scale is in uC; the branch length scaleis in substitutions per site. A, archaeal; B, bacterial; E, eukaryotic domains.Ac, Actinobacteria; Aq, Aquificae; Ba, Bacteroidetes; C, Cyanobacteria; Cf,Chloroflexi; Ch, Chlamydiae; Cr, Crenarchaeota; DT, Deinococcus/Thermus; Eu, Euryarchaeota; F, Firmicutes; P, Proteobacteria; Pl,Planctomycetes; T, Thermotogae.

LETTERS NATURE

2

©2008 Macmillan Publishers Limited. All rights reserved

Page 110: Early Evolution and Phylogeny

and that the proteome of the LUCA had not yet reached composi-tional equilibrium. Although such interpretation in terms of earlygenetic code evolution is possible, our hypothesis of parallel adapta-tions to high temperatures has the advantage of explaining the pat-terns observed with both rRNAs and proteins.

Additional experiments suggest that the present analyses of rRNAand protein sequences with branch-heterogeneous models of evolu-tion uncover genuine signals of ancient temperature preferences andare not affected by systematic biases.

First, these results are robust to changes in the topology chosen forinference because analyses with alternative topologies yielded vir-tually identical OGT estimates (Supplementary Fig. 10). Moreover,phylogenetic trees rooted on the eukaryotic branch also suggest thatOGT increased between the universal ancestor and the divergence ofArchaea and Bacteria (Supplementary Figs 13–15).

Second, taxonomic sampling does not strongly affect these results.With rRNA and protein data sets in which eukaryotic sequences wereremoved, the signal for OGT increases between the LUCA and thedomain ancestors was essentially unchanged (Supplementary Fig.36). Moreover, both for rRNAs and proteins, two artificially biaseddata sets containing sequences from either thermophilic or mesophi-lic prokaryotes were assembled (see Supplementary Information).The signal for parallel increases in OGT is confirmed in all but oneof these four data sets: the mesophilic rRNA data set. However, thelongest of the two mesophilic alignments, the protein data set, sup-ports the same pattern of OGT changes as the complete data sets(Supplementary Figs 16 and 17). Notably, analysis of the proteinmesophilic data set shows that this pattern is independent of thedebated position of hyperthermophilic organisms in the tree of life.Furthermore, with all rRNA and protein data sets, even with thesampling limited to thermophilic prokaryotes, the LUCA remainspredicted as a non-hyperthermophilic organism (SupplementaryFigs 18 and 19).

Third, dependence of the results on models used for ancestralreconstruction was investigated. Additional branch-heterogeneousevolutionary models were applied, two to the rRNA data set, andone to the protein data set (see Supplementary Information). Allthese alternative branch-heterogeneous models confirm our results(Supplementary Figs 21–23, 29 and 30). Compositional analyses werealso conducted using branch-homogeneous models of evolution:GTR21 for rRNA and proteins, and CAT18 for proteins. All thesemodels tend to predict parallel adaptations to higher temperaturesfrom the LUCA to its descendants, suggesting the existence of agenuine signal for such a pattern in the data (Supplementary Figs24, 26 and 28). However, only whenmodels are realistic enough is theLUCA predicted as significantly less thermophilic than its two des-cendants. For instance, ancestral protein compositions predicted bythe GTRmodel for the LUCA and its two descendants strongly over-lap, which may explain previously published results13, whereas theCAT model better separates these ancestral node distributions,although less clearly than does the CAT–BP branch-heterogeneousmodel (Supplementary Figs 26, 28 and 29). These experiments showthat as the evolutionary process is more accurately modelled, thesupport for parallel increases in OGT from the LUCA to its offspringis strengthened.

Fourth, it is known that the base compositions of fast and slowlyevolving sites and, particularly, of single- and double-strandedregions of rRNA molecules differ and that this may bias ancestralsequence estimates16. To minimize this bias, only double-strandedrRNA regions have been analysed here. Moreover, if fast-evolvingsites are removed, estimates still support parallel adaptations to hightemperatures (Supplementary Fig. 33).

Fifth, it has been shown that some ancestral reconstruction meth-ods might improperly estimate the frequencies of rare amino acids22.To control for that potential bias, the two rarest amino acids, cysteineand tryptophan, were discarded from estimated ancestral sequences:this had essentially no impact on results (Supplementary Fig. 34).

Sixth, the sensitivity of the OGT estimates at the tree root to theprior distribution of ancestral amino acid compositions used forBayesian analyses was investigated (Supplementary Fig. 35). Thisprior distribution induces a flat, uninformative distribution overOGTs, whereas the posterior distributions estimated for LUCA andthe bacterial ancestor have small variance, and thus reflect a genuinesignal in the data, rather than a bias from the prior. Moreover, evenwith a strongly informative prior distribution that is biased towardshigh temperature amino acid distributions, the posterior distri-bution of the LUCA’s amino acid composition, although altered, iscentred at lower temperatures than that of the bacterial ancestor.

The present use of molecular thermometers requires that evolutionof the data sets under analysis can bemodelled by a tree structure as faras reconstruction of ancestral compositions is concerned. Weemphasize that our protein analyses are based on 56 genes that didnot undergo between-domain transfers (see Methods), which pre-cludes that ancestral sequence reconstructions are confounded bysuch gene exchanges.We do not exclude within-domain lateral trans-fers of these genes; however, the robustness of the inferred ancestralcompositions to alternative domain phylogenies4,7 (see alsoSupplementary Figs 10 and 20) suggests that these potential transfersdo not fundamentally affect the results for domain ancestors. Finally,becausemolecular thermometers measure the average environmentaltemperature of the hosts of ancestral genes, they apply even if ancestralgenes of extant prokaryotes originate from diverse organisms19.

Thus, all our analyses support the hypothesis of a non-hyperther-mophilic LUCA and of transitions to higher environmental tempera-tures for its descendants. Although these organisms have not yet beenanchored in time23, a few geological and biological factorsmay explainobserved changes in temperature preferences. It has already beenobserved4 that the general trend of decreasingOGTs from the bacterialancestor to extant species strikingly parallels recent geological esti-mates of the progressive cooling down of oceans shifting from about70 uC 3.5 billion years ago to approximately 10 uC at present24. Theevolution of thermophily in the bacterial domain might thereforestem from the continuous adjustment of Bacteria to ocean tempera-tures, although the evidence for a hot Archaean climate remainsdebated25. A similar conclusion may apply to Archaea as well, butwould require confirmation with additional genome sequences frommesophilic Archaea.A hotArchaeanoceanmaypreclude the existenceof a cool ‘little pond’where theLUCAcould have evolved. Therefore, anon-hyperthermophilic LUCAwould suggest thatmoderate tempera-tures existed earlier in the history of the Earth.

Geological data about palaeoclimates that old are very scarce.However, some models of Hadean and early Archaean climates(3.5–4.2 billion years ago) suggest that the Earth might have beencolder than it is today, possibly covered with frozen oceans1,26.Moreover, a hypothesis of brutal temperature changes involvingmeteoritic impacts that boiled the oceans and therefore nearly anni-hilated all life forms but the most heat-resistant ones has been pro-posed1,8,9. Huge meteorites probably impacted the Earth at least aslate as 3.8–4 billion years ago, most notably during the late heavybombardment27 and created a series of brief but very hot climates onEarth1. As life may have originated more than 3.7 billion years ago28,it is possible that early organisms, namely the LUCA’s offspring,experienced such bottlenecks.

Alternatively, under the hypothesis that life originated extra-ter-restrially, the transfer of life to the Earth from another planet in ejectacreated by meteorite impacts would have also entailed selection ofheat-resistant cells1. Overall, geological knowledge provides severalframes that might fit the predictions of our biological thermometers.

A biological hypothesis could provide an internal mechanism toexplain the observed pattern. It posits that the LUCA had an RNAgenome, and that its offspring lineages independently evolved theability to use DNA for genome encoding10, possibly by co-opting itfrom viruses11. Although our results do not bring direct evidence insupport of this hypothesis, they are compatible with it and could even

NATURE LETTERS

3

©2008 Macmillan Publishers Limited. All rights reserved

Page 111: Early Evolution and Phylogeny

help explain such independent acquisitions of DNA in adaptiveterms, as DNA is much more thermostable than RNA29.

Great care is necessary when attempting a reconstruction of eventsthat took placemore than three billion years ago. However, the strongagreement between results obtained using two types of data (proteinsand rRNAs), two independent temperature proxies (protein aminoacid composition and rRNA G1C content), and independentlydeveloped statisticalmodels, is remarkable. This suggests that a similarapproach could successfully be used to gain insight into other eco-logical features of early life. For example, it has been shown thataerobic and anaerobic bacteria differ in the amino acid compositionof their proteome30; future ancestral sequence reconstructions couldreveal the evolution of aerobiosis along the tree of life in relation withthe geological record of oxygen atmospheric concentration.

METHODS SUMMARY

Ribosomal RNA sequences were aligned according to their shared secondary

structure. Sites belonging to double-stranded stems were selected to obtain an

alignment of 1,043 stem sites for 456 organisms. Protein families with wide

species coverage and no or very low redundancy in all species were selected from

the HOGENOM database of families of homologous genes. Only sites showing

less than 5% gaps were kept, giving an alignment of 3,336 positions for

30 organisms. Phylogenetic trees were inferred using Bayesian or maximum

likelihood techniques. Ancestral nucleotide and amino acid compositions were

inferred for all tree nodes using the programs nhPhyML7 and nhPhyloBayes6,

respectively. The G1C contents of ancestral rRNA sequences were compared to

extant rRNA base compositions. The second factor of the correspondence ana-

lysis of amino acid compositions of extant prokaryotic proteins was used to

estimate ancestral environmental temperatures by adding ancestral amino acid

compositions as supplementary rows to the correspondence analysis. These two

procedures allowed us to estimate ancestral environmental temperatures with

the rRNA and the protein data sets, respectively. Confidence intervals for the

estimated environmental temperatures were as follows: in the case of rRNAs,

they contained 95% of the distribution obtained by a bootstrap procedure

(200 replicates); for Bayesian analyses, regular 95% credibility intervals were

computed from a sample of 2,000 points drawn from the posterior distribution.

Full Methods and any associated references are available in the online version of

the paper at www.nature.com/nature.

Received 5 March; accepted 1 September 2008.

Published online 26 November 2008.

1. Nisbet, E. G. & Sleep, N. H. The habitat and nature of early life. Nature 409,

1083–1091 (2001).

2. Galtier, N., Tourasse, N. &Gouy,M.A nonhyperthermophilic common ancestor to

extant life forms. Science 283, 220–221 (1999).

3. Gaucher, E. A., Thomson, J. M., Burgan, M. F. & Benner, S. A. Inferring the

palaeoenvironment of ancient bacteria on the basis of resurrected proteins.

Nature 425, 285–288 (2003).

4. Gaucher, E. A. Govindara jan, S. & Ganesh, O. K. Palaeotemperature trend for

precambrian life inferred from resurrected proteins. Nature 451, 704–707

(2008).

5. Gribaldo, S. & Brochier-Armanet, C. The origin and evolution of archaea: a state of

the art. Phil. Trans. R. Soc. Lond. B 361, 1007–1022 (2006).

6. Blanquart, S. & Lartillot, N. A site- and time-heterogeneous model of amino-acid

replacement. Mol. Biol. Evol. 25, 842–858 (2008).

7. Boussau, B. & Gouy, M. Efficient likelihood computations with nonreversible

models of evolution. Syst. Biol. 55, 756–768 (2006).

8. Sleep, N. H., Zahnle, K. J., Kasting, J. F. & Morowitz, H. J. Annihilation of

ecosystems by large asteroid impacts on the early Earth. Nature 342, 139–142

(1989).

9. Gogarten-Boekels, M., Hilario, E. & Gogarten, J. P. The effects of heavy meteorite

bombardment on the early evolution–the emergence of the three domains of life.

Orig. Life Evol. Biosph. 25, 251–264 (1995).

10. Mushegian, A. R. & Koonin, E. V. A minimal gene set for cellular life derived by

comparison of complete bacterial genomes. Proc. Natl Acad. Sci. USA 93,

10268–10273 (1996).

11. Forterre, P. The origin of DNA genomes and DNA replication proteins. Curr. Opin.

Microbiol. 5, 525–532 (2002).

12. Galtier, N. & Lobry, J. R. Relationships between genomic G1C content, RNA

secondary structures, and optimal growth temperature in prokaryotes. J. Mol.

Evol. 44, 632–636 (1997).

13. Di Giulio, M. The universal ancestor and the ancestor of bacteria were

hyperthermophiles. J. Mol. Evol. 57, 721–730 (2003).

14. Brooks, D. J., Fresco, J. R. & Singh, M. A novel method for estimating ancestral

amino acid composition and its application to proteins of the Last Universal

Ancestor. Bioinformatics 20, 2251–2257 (2004).

15. Zeldovich, K. B., Berezovsky, I. N. & Shakhnovich, E. I. Protein and DNA sequence

determinants of thermophilic adaptation. PLoS Comput. Biol. 3, 62–72 (2007).

16. Gowri-Shankar, V. & Rattray, M. On the correlation between composition and

site-specific evolutionary rate: implications for phylogenetic inference. Mol. Biol.

Evol. 23, 352–364 (2005).

17. Blanquart, S. & Lartillot, N. A Bayesian compound stochastic process formodeling

nonstationary and nonhomogeneous sequence evolution. Mol. Biol. Evol. 23,

2058–2071 (2006).

18. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site

heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21,

1095–1109 (2004).

19. Zhaxybayeva, O., Lapierre, P. & Gogarten, J. P. Ancient gene duplications and the

root(s) of the tree of life. Protoplasma 227, 53–64 (2005).

20. Fournier, G. P. & Gogarten, J. P. Signature of a primitive genetic code in ancient

protein lineages. J. Mol. Evol. 65, 425–436 (2007).

21. Lanave, C., Preparata, G., Saccone, C. & Serio, G. A new method for calculating

evolutionary substitution rates. J. Mol. Evol. 20, 86–93 (1984).

22. Williams, P. D., Pollock, D. D., Blackburne, B. P. & Goldstein, R. A. Assessing the

accuracy of ancestral protein reconstruction methods. PLoS Comput. Biol. 2,

598–605 (2006).

23. Graur, D. & Martin, W. Reading the entrails of chickens: molecular timescales of

evolution and the illusion of precision. Trends Genet. 20, 80–86 (2004).

24. Robert, F. & Chaussidon, M. A palaeotemperature curve for the Precambrian

oceans based on silicon isotopes in cherts. Nature 443, 969–972 (2006).

25. Shields, G. A. & Kasting, J. F. Evidence for hot early oceans? Nature 447, E1

(2007).

26. Kasting, J. F. &Ono, S. Palaeoclimates: the first two billion years. Phil. Trans. R. Soc.

Lond. B 361, 917–929 (2006).

27. Gomes, R., Levison, H. F., Tsiganis, K. & Morbidelli, A. Origin of the cataclysmic

Late Heavy Bombardment period of the terrestrial planets. Nature 435, 466–469

(2005).

28. Rosing, M. T. 13C-depleted carbon microparticles in .3700-Ma sea-floor

sedimentary rocks from West Greenland. Science 283, 674–676 (1999).

29. Islas, S., Velasco, A. M., Becerra, A., Delaye, L. & Lazcano, A. Hyperthermophily

and the origin and earliest evolution of life. Int. Microbiol. 6, 87–94 (2003).

30. Naya, H., Romero, H., Zavala, A., Alvarez, B. &Musto, H. Aerobiosis increases the

genomic guanine plus cytosine content (GC%) in prokaryotes. J. Mol. Evol. 55,

260–264 (2002).

Supplementary Information is linked to the online version of the paper atwww.nature.com/nature.

Acknowledgements This work was supported by Action Concertee IncitativeIMPBIO-MODELPHYLO and ANR PlasmoExplore. We thank C. Brochier-Armanet

and A. Lazcano for help and suggestions, the LIRMM Bioinformatics platformATGC and the computing facilities of IN2P3.

Author Contributions B.B. and S.B. contributed equally to this study, designing andconducting experiments. A.N. performed statistical analyses and retrieved optimal

growth temperatures. N.L. and M.G. provided guidance throughout the study, andM.G. gave the original idea. All authors participated in manuscript writing.

Author Information Reprints and permissions information is available at

www.nature.com/reprints. Correspondence and requests for materials should beaddressed to M.G. ([email protected]).

LETTERS NATURE

4

©2008 Macmillan Publishers Limited. All rights reserved

Page 112: Early Evolution and Phylogeny

METHODSrRNA data set. Prokaryotic small (SSU) and large (LSU) subunit rRNAs wereretrieved in January 2007 from complete genomes available at the NationalCenter for Biotechnology Information (NCBI). SSU and LSU rRNA sequencesfrom ongoing genome projects or from large genomic fragments of important orpoorly represented groups (for example, Archaea or hyperthermophilic bacteria)were added in June 2007. Eukaryotic SSU and LSU rRNA sequences were pro-vided by D. Moreira; 65 slowly evolving sequences were selected from this dataset31. Sequences were aligned using MUSCLE32. Resulting alignments were con-catenated andmanually improved using theMUSTpackage33. Regions of doubt-ful alignment were removed using the MUST package; 2,239 sites were kept. Adistance phylogenetic tree was computed using dnadist (Jukes and Cantormodel) and neighbour from the PHYLIP package34. The final data set contained65 eukaryotic, 60 archaeal and 331 bacterial sequences representative of themolecular diversity in each domain. An additional data set of 60 sequencessampling the diversity of the full data set was used in Bayesian analyses.Secondary structure predictions were downloaded from the rRNA database35.Sites that were predicted as double-stranded stems in Saccharomyces cerevisiae,Escherichia coli and Archaeoglobus fulgidus were selected to give an alignment of1,043 sites.

Protein data set. Nearly universal protein families with one member per genomewere used to avoid ill-defined orthology. Protein families from the HOGENOMdatabase of families of homologous genes (release 03,October 2005, S. Penel andL.Duret, personal communication; http://pbil.univ-lyon1.fr/databases/hogenom3.html) that displayed a wide species coverage with no or very low redundancy inall species were selected. Additional sequences from other genomes whose phylo-genetic positionwas interestingwere considered. Thesewere downloaded from theJoint Genome Institute (Desulfuromonas acetoxidans), The Institute for GenomicResearch (Giardia lamblia, Tetrahymena thermophila, Trichomonas vaginalis) orthe NCBI (Kuenenia stuttgartiensis), and were searched for homologous genesusing BLAST36; only the best hit was retrieved. The protein families were subse-quently aligned usingMUSCLE32 and submitted to phylogenetic analysis using theNJ algorithm37 with Poisson distances with Phylo_Win38. Proteins from mito-chondrial or chloroplastic symbioses and families in which horizontal transfersbetween Bacteria and Archaea may have occurred were discarded, and so wereaminoacyl-tRNA synthetases prone to transfers39. In the rare families with twosequences from the same species, the sequence showing the longest terminalbranch or whose position was most at odds with the biological classification wasdiscarded. This provided 56 protein families (Supplementary Table 2) for 115species, which were concatenated using ScaFos40. From the 9,218 concatenatedsites, 3,336positionswith less than5%gapswere conserved. Thewholedata setwasused to compute the correspondence analysis and correlations between amino-acid composition and optimal growth temperature. For Bayesian analyses, 30species among 115 were selected sampling the diversity of cellular life(Supplementary Table 1).

Multivariate data analyses. Correspondence analysis41 was performed on theamino-acid compositions of the protein data set, using the ade4 package42 of theR environment for statistical computing.

Phylogenetic tree construction. An rRNA phylogenetic tree was built from the456-sequence alignment with both stems and loops with PhyML_aLRT43,44 withthe GTRmodel, a gamma law with eight categories and an estimated proportionof invariant sites. The tree for the 60-sequence data set was obtained in the samemanner. The phylogenetic trees for the three protein data sets (SupplementaryTable 1) were obtained usingMrBayes 3.1.1 (ref. 45), using the GTR substitutionmodel and a gamma law with four categories for rates across sites. Chains wererun for 1,000,000 generations and samples were collected each 100 generations, aburn-in of 1,000 samples was discarded. The majority rule consensus was com-puted from the 9,000 remaining samples.

Identification of fast-evolving rRNA sites. Posterior probabilities for gammalaw rate categories were predicted for each site with PhyML_aLRT. Site evolu-tionary rates were obtained by averaging gamma law rate categories weighted bytheir posterior probabilities. Sites whose evolutionary rate was above the arbit-rarily chosen threshold of 2.0 (Supplementary Fig. 2) were discarded, which left940 sites.

Estimation of ancestral compositions. For the maximum likelihood approach,nhPhyML7 was applied to the rRNA stem sites alignment and the phylogenetictree described above, and used to estimate all evolutionary parameter values,except tree topology, which was fixed. Site-specific ancestral nucleotide compo-sitions at tree root and at internal node j descendant of node iwere computed by:

proot(x)5 a(x)Llow(x at root)/L; a(A)5 a(T)5 (1 2 v)/2;a(C)5 a(G)5v/2

pj(x)5 (P

y Lupp(y at node i) pyRx Llow(x at node j))/L

where x and y are inA, C, G, T, L is the total tree likelihood at this site, Llow andLupp are site lower and upper conditional likelihoods, respectively7, v is themaximum likelihood estimate of root G1C content, and pyRx is the probabilityof the y to x substitution on the i to j branch. For Bayesian analyses,nhPhyloBayes6 was applied to trees described above. Ancestral sequence recon-struction started, for each site, by drawing a state x at the root: x,v(x)Llow(x atroot), where v was the Markov Chain Monte Carlo45 (MCMC) estimate of rootamino acid or nucleotide frequencies. Then, states x have been recursively drawnat each node j: x, pyRxLlow(x at j), where y was the parental node state. Given arealization of themodel, this permitted the reconstruction of ancestral sequencesat all nodes. Posterior distributions were sampled by 2 (for proteins) or 4 (forrRNA) independent MCMC chains, each with 1,000 to 2,000 realizations.Posterior distributions of sequence compositions combined all realizations ofall chains. Protein ancestral compositions were projected on the second axis ofthe correspondence analysis, and rRNAancestral compositionswere summedupas G1C contents.Statistical tests. In bootstrap analyses, all parameters but topology and branchlengths were estimated under the maximum likelihood criterion for each rep-licate. In tests of whether the LUCA is less thermophilic than one of its descen-dants, P values were the fraction of cases where the temperature estimate forLUCA in a bootstrap replicate or in an iteration of an MCMC chain was abovethe estimate obtained for its descendant.

31. Moreira, D. et al.Global eukaryote phylogeny: Combined small- and large-subunit

ribosomal DNA trees support monophyly of Rhizaria, Retaria and Excavata. Mol.

Phylogenet. Evol. 44, 255–266 (2007).

32. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time

and space complexity. BMC Bioinformatics 5, 113 (2004).

33. Philippe, H. MUST, a computer package of management utilities for sequences

and trees. Nucleic Acids Res. 21, 5264–5272 (1993).

34. Felsenstein, J. PHYLIP (Phylogeny Inference Package) version 3.6. (Department of

Genome Sciences, 2005).

35. Wuyts, J., Perriere, G. & Van De Peer, Y. The European ribosomal RNA database.

Nucleic Acids Res. 32, D101–D103 (2004).

36. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein

database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

37. Saitou, N. & Nei, M. The neighbor-joining method: a new method for

reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

38. Galtier, N., Gouy, M. & Gautier, C. SEAVIEW and PHYLO_WIN: two graphic tools

for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12,

543–548 (1996).

39. Wolf, Y. I., Aravind, L., Grishin, N. V. & Koonin, E. V. Evolution of aminoacyl-trna

synthetases–analysis of unique domain architectures and phylogenetic trees

reveals a complex history of horizontal gene transfer events. Genome Res. 9,

689–710 (1999).

40. Roure, B., Rodriguez-Ezpeleta, N. & Philippe, H. SCaFoS: a tool for selection,

concatenation and fusion of sequences for phylogenomics. BMC Evol. Biol. 7

(Suppl 1), S2 (2007).

41. Hill, M. O. Correspondence analysis: a neglected multivariate method. Appl.

Statist. 23, 340–354 (1974).

42. Chessel, D., Dufour, A. B. & Thioulouse, J. The ade4 package -I- one-table

methods. R. News 4, 5–10 (2004).

43. Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large

phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003).

44. Anisimova, M. & Gascuel, O. Approximate likelihood-ratio test for branches: A

fast, accurate, and powerful alternative. Syst. Biol. 55, 539–552 (2006).

45. Huelsenbeck, J. P. & Ronquist, F. MrBayes: Bayesian inference of phylogenetic

trees. Bioinformatics 17, 754–755 (2001).

doi:10.1038/nature07393

©2008 Macmillan Publishers Limited. All rights reserved

Page 113: Early Evolution and Phylogeny

P P P ❨ ❱❯ P❯

rtr tsts

♣rs♥t rt rs ♥ ♦r ♦ ♥♦♥♣rs♠♦♥♦s ♣ttr♥ t ♣r♦t♦♥s t♦ t♠♣rtrs r♦♠ ❯ t♦ ts s♥♥ts t♦ srtsts ♥ ♣r♦r♠ t♦ ♥sr tt ♥ rtt s ♥♦t t t ♦r♥ ♦ ts♣ttr♥ ♦s t♦ ♠ ♦♥ ♠♦r tst s ♦♥ s♠t♦♥s qst♦♥ ♥t t♦ s s ♣rtr ♣ttr♥ ♦♠♣r t♦ t s ♥ ♦♥ ♥t r t s s♠t ♥ ♥P② r♦r t ♥ s ♣r♦r♠ t♦ ♠ s♠t♦♥s s② t ♥♦♥♦♠♦♥♦s ♠♦ ♦ sq♥♦t♦♥ ♣r♦r♠ s ♣rs♥t ♥ t ♥①t rt rt

❲ s ♦r s♠t♦♥s ♦♥ t rsts ♦♥ ② ♥P② ♦♥ t tst♦♥t♥♥ 456 r sq♥s 1043 ss ♦♥ ♥ ts tst ♥P②♣rt ♦♥t♥t t t r♦♦t r♦♥ 73% ♥ qr♠ rq♥st♦rs t tr ♥ rr②♦t ♥st♦rs ♦ r♦♥ 97% ♥ 87%

❲ tr♦r ♦s t♦ s♠t tsts s ♦♦s

• ❲ ①trt str ♦ s r♦♠ t r tr ♦t♥ ♥ trt s♥ t ♣r♦r♠ ♣♣♣②s♠♣ r♦♠ t ♦ rr② tt s♦ tt t s♠♣♥ ♦♠♦♥♦s② ♦rs t rst②♣rs♥t ♥ t tr ❲ t♥ s ts tr t♦♣♦♦② t♦ s♠t tsts ♦ sq♥s 1043 ss ♦♥ ♦r tst t♦ s♠t

• ❲ r♥♦♠② ♣ ♦♥t♥t ω t t r♦♦t ω ∈ [0.2; 0.8] srq♥s r ♦t♥ s ♦♦s [A] = 0.4×(1−ω); [C] = 0.4×(ω); [G] =0.6 × (ω); [T ] = 0.6 × (1 − ω)

• ♥ r♥ ♦ t tr r♥t HKY ♠♦ s t s s ♥ qr♠ ♦♥t♥t θ s ♥ θ ∈ [0.1; 0.9] ♥qr♠ s rq♥s r ♦t♥ s ♦r t r♦♦t s rq♥ss ♥t t♦ s♠t tsts ♦♠♣r t♦ t r ♦♥ ♦r t t♦r♥s ♦♠♥ r♦♠ t r♦♦t ♠ sr tt t s♦t r♥t♥ θ ♥ ω s s♣r♦r ♦r q t♦ 0.15

• q♥s r ♦ ♥ t r♦♦t s ♦♠♣♦st♦♥s ♥ t ♠♦s♦♥ r♥ r♥ ts ♦t♦♥ srt③ ♠♠ ❨♥ t 4 t♦rs ♣s t♦r② ♦ ♥r♥t s s t ♥ ♣♣r♠tr st t♦ 1.0 ♥ t tr♥st♦♥tr♥srs♦♥ rt♦ s st t♦ 1.0

❲ r♥ 150 s s♠t♦♥s ❲ t♥ s ♥P② ♦♥ ♦ ts s♠t♦♥s s♥ ♠♠ t ♦r t♦rs ♥ st♠t ♣ ♥ tr♥st♦♥tr♥srs♦♥ ♣r♠trs t t tr t♦♣♦♦② ♥ st ♦ ♥P②r♦♥strt t ♦t♦♥ ♦ ♦♥t♥t ♥ ❯ ♥ ts t♦ s♥♥ts

Page 114: Early Evolution and Phylogeny

❲ ♥ tr ♣♦ss t♦rs ♦r t s♠t♦♥s ♣♥♥ ♦♥ t ♦♥t♥t ♦t♦♥ r♦♠ ❯ t♦ ts s♥♥ts

• Pr ♥rss ♥ ♦♥t♥t

• Pr rss ♥ ♦♥t♥t

• ♥ ♥rs ♥ ♦♥ rs ♥ ♦♥t♥t

❲ ♦♠♣r t rsts ♦t♥ ② ♥P② t♦ t ♥ s♠t♥ ♦t♥ t ♦♦♥ rsts

t♦r② Pr♦♣♦rt♦♥ ♦ ♦rrt r♦♥strt♦♥

♦ t s♦s tt ♥P② rt② r♦♥strts t tr ♦t♦♥r② s♥r♦ ♥ t rr ss r t s t♦ r♦♥③ ♣r ♦t♦♥st r♦rs stt♦♥ r tr s ♥ ♥rs ♥ r♥ ♥ rs ♥t ♦tr r♥ stt♦♥

♥trst♥② s t s♠ s♠t♦♥ ♣r♦t♦♦ t ts t♠ ♦ ♥♦t s ♠♠ ♥ ♥P② t♦ r♦♥strt t ♦t♦♥r② s♥r♦ t ♣r♦r♠sr② ♦♥sr② r♦♣s s s♦♥ ♥ t ♥①t t

t♦r② Pr♦♣♦rt♦♥ ♦ ♦rrt r♦♥strt♦♥

r♦♣ s ♣rtr② str♥ ♥ t r♦♦t ♦♥t♥t s t♥ t ♦♥t♥ts ♦ ts s♥♥ts stt♦♥ ♣r♣s s ♥ ts s ts♥ ♦r rrrst② s ♠♦r

s rs tt t ♠♦ ss rt tr♦♥ts t♦ r♦r t tr ♦t♦♥r② st♦r② ♥ ♣ r♥s ♥ s r♠st♥s t ♠♦ ♥ st♥s s♦② ♦♥ r♦♠ st ♦♥ sts ♥ ♥r rq♥s t t r♦♦t♠♦r r② ② trst♥ s♦ sts ♥st st ♦♥s

s ♥tt♦♥ s ♦♥r♠ ♦♥s♥ss sq♥s ♦ t r ♥♠♥t ❲♥ ♣♦st♦♥ s♦ ♥t ♥ 80% ♦ t sq♥s t♦ ♥tr t♦♥s♥ss t ♦♥s♥ss sq♥ s ♦♥t♥t ♦ ≈ 74.0% ♥ ♥t♦ 584 sts t trs♦ s ♣t t 90% ♥st t ♦♥s♥ss sq♥ s ♦♥t♥t ♦ ≈ 71.4% ♦r ♥t ♦ 419 sts ♦♥s♥ss ♦t♥t t 90% trs♦ ♦♥t♥s sts tt ♥r♦♥ ss ssttt♦♥s ♥r ss r t♥ t ♦♥s♥ss ♦t♥ t t 80% trs♦ Prt ♦ t

Page 115: Early Evolution and Phylogeny

P P P ❨ ❱❯ P❯

s♥ ♦r ❯ r ss r t♥ ts t♦ s♥♥ts ♠② tr♦r♦♠ r♦♠ t t tt s♦② ♦♥ sts ♥ r r ss r t♥str sts

stt♦♥ s ♣r♦② t s♠ ♦r ♣r♦t♥ sq♥s s ♦r♥r t ♦rt♥ ♦♥ tt t ♠♥♦s ♦r♥ ♥ r rq♥s ♥ t♦♥ ♦r♥s♠s ♦r ♥ ♦r ♣r♦♣♦rt♦♥s ♠♦♥ ♦♥st♥t ♣♦st♦♥s t♥ ♥♦tr ♣♦st♦♥s ♦♥♦♠♦♥♦s ♠♦s ♦ ♦t♦♥ ts ♥ s♥ ♦r ♥str ♦♠♣♦st♦♥s r t s ♠♦♥ ♦ss sts

Page 116: Early Evolution and Phylogeny
Page 117: Early Evolution and Phylogeny

7♦rs ttr ♦♥♦♠♦♥♦s

♦s ♦ q♥ ♦t♦♥

Pr♦s ♦rs ♦♥♥ ♠ tt ♠♦r rsr ♥ t♦ ♦♥ ♦♥ ♠♦s t♦ ♦♣ t ♦♠♣♦st♦♥ tr♦♥t② ♦ rrs r st ♦ r♦t♥s ♦r ♠♥ ♣②♦♥t ♥②ss ♥♦t② ♥ t ♥ ♠♣♠♥t s♣♣♦rt ♦r ♥♦♥♦♠♦♥♦s ♠♦s ♥ ♦ s s♦ ♣rsrrs ①♣r♠♥t t ♥ ♠♦s ♦r ♥ ♦rt♠s ♥ ♠② ♠♦rtss ♠♦s

s rt s ♥ ♣t ♥ ♦t♦♥r② ♦♦②

Page 118: Early Evolution and Phylogeny

BioMed Central

Page 1 of 12

(page number not for citation purposes)

BMC Evolutionary Biology

Open AccessSoftware

Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programsJulien Dutheil*1 and Bastien Boussau2

Address: 1BiRC – Bioinformatics Research Center – University of Aarhus, C. F. Møllers Alle, Building 1110, DK-8000 Århus C, Denmark and 2UMR CNRS 5558 – Laboratoire de Biométrie et Biologie Évolutive, CNRS, Université de Lyon, Université lyon 1, 43 Boulevard du 11 Novembre, 69622 Villeurbanne, France

Email: Julien Dutheil* - [email protected]; Bastien Boussau - [email protected]

* Corresponding author

Abstract

Background: Accurately modeling the sequence substitution process is required for the correct

estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or

ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are

needed to estimate the null-distribution of complex statistics, an approach referred to as

parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction

programs. It has often been observed that homologous sequences can vary widely in their

nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly

among lineages, and may therefore be most appropriately approached through non-homogeneous

models. Several programs implementing such models have been developed, but they are limited in

their possibilities: only a few particular models are available for likelihood optimization, and data

sets cannot be easily generated using the resulting estimated parameters.

Results: We hereby present a general implementation of non-homogeneous models of

substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any

C++ program. Two programs that use these classes are also presented. The first one, Bio++

Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the

second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from

these models. These programs allow the user to describe non-homogeneous models through a

property file with a simple yet powerful syntax, without any programming required.

Conclusion: We show that the general implementation introduced here can accommodate

virtually any type of non-homogeneous models of sequence evolution, including heterotachous

ones, while being computer efficient. We furthermore illustrate the use of such general models for

parametric bootstrapping, using tests of non-homogeneity applied to an already published

ribosomal RNA data set.

BackgroundIn phylogenetics, simulations have been widely used tostudy the robustness of inference methods [1] and have

been involved in parametric bootstrapping [2]. Forinstance, simulations have shown that maximum likeli-hood methods often more accurately reconstructed the

Published: 22 September 2008

BMC Evolutionary Biology 2008, 8:255 doi:10.1186/1471-2148-8-255

Received: 24 April 2008Accepted: 22 September 2008

This article is available from: http://www.biomedcentral.com/1471-2148/8/255

© 2008 Dutheil and Boussau; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 119: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 2 of 12

(page number not for citation purposes)

evolution of an alignment than distance or parsimonymethods [3,4], but could also fail in conditions wherecompositional biases (a condition here referred to as non-homogeneity) or rate heterogeneity along branches (aphenomenon named heterotachy, [5]) were too intense[6-8]. Similarly, simulations have been used to comparetopologies with respect to an alignment [9], or to assessthe fit of a model to a particular data set [10-13]. In thislast case, a model has a good fit to a particular data set ifthe alignments it generates have properties similar to theproperties of the real alignment. Both for investigatingreconstruction methods and for parametric bootstrap-ping, it is highly desirable that simulation methods modelas precisely as possible the conditions that shaped biolog-ical sequences through evolution. However, widely-usedsimulation programs cannot be easily tuned to preciselyreproduce the peculiar evolution of a particular data set.Noticeably, non-homogeneity cannot be simulated bySeq-Gen [14] or PAML [15], even if these phenomena areall known to affect the evolution of many data sets [5,16-20].

The ability to estimate parameters of sequence evolutionwith realistic models, and then computationally evolvesequences using these fitted parameters is crucial to bettercharacterize the behavior of reconstruction methods inrealistic settings.

Here we introduce extensions to the Bio++ package [21]that permit first to estimate parameters of evolution on aspecific data set in a maximum likelihood framework, andsecond to simulate the evolution of sequences using theseestimated parameters. Importantly, nearly any combina-tion of non-homogeneous (including non-stationarymodels) and heterotachous models of evolution can befitted to data, so that simulations may mimic very pre-cisely the evolution of a data set. Such a flexibility shouldenable one to probe how robust methods of phylogenetictree or ancestral state reconstruction are to more realisticevolutionary conditions. Moreover, it offers the possibil-ity to compare a large variety of models by assessingthrough parametric bootstrapping their respective abilityto reproduce a given characteristic of interest, measuredon a real data set.

ImplementationMolecular phylogenetic methods are used by a wide rangeof biologists, from bioinformaticians willing to character-ize and improve models of sequence evolution to molec-ular biologists trying to grasp the particular evolutionaryhistory of their gene of interest. These different types ofusers have different needs: the former may benefit fromeasy-to-assemble, high-level object-oriented code to con-duct phylogenetic analysis, while the latter likes user-friendly interfaces. However, both demand programs able

to run the most recent models of evolution. The newlyintroduced extensions are available in two flavors thatmight fit different users' needs: (i) as classes in the Bio++phylogenetic library, including a special class called Sub-stitutionModelSet which implements the relationshipsbetween models, parameters and branches, and (ii)through the BppML and BppSeqGen programs, which canrespectively adjust these models to a data set and simulatedata from these models. These programs share a commonsyntax for model specification and are hence fully inter-operational and easy-to-use.

The SubstitutionModelSet class

The Bio++ libraries [21] provide data structures and algo-rithms dedicated to analysis of nucleotide, codon andamino acid sequences, phylogenetics and molecular evo-lution, and are designed in an object-oriented way. Theseinclude classes for storing phylogenetic trees, computinglikelihood under various models of substitution, and esti-mating parameters. The likelihood classes take as input aphylogenetic tree and a substitution model, and wereextended to allow the computation under non-homoge-neous models (figure 1). This support is achieved throughthe addition of parameters for the rooting of the tree, sincethe likelihood may not be independent of the root posi-tion with a non-homogeneous model [6], and through anew class named SubstitutionModelSet. The Substitution-ModelSet class essentially associates a substitution modelwith each branch of the phylogenetic tree, and links eachsubstitution model to a list of corresponding parameters(figure 2). It also provides a series of methods for thedeveloper to set up the general model, to assign parame-ters to substitution models and substitution models tobranches.

Substitution models can be totally independent of eachother, or can share any number of parameters. Virtuallyany non-homogeneous model can thus be set up, pro-vided the alignment is not a mix of nucleotide, amino-acid or codon sequences. All models available in Bio++can be used with this class (e.g. K80, T92, HKY85, GTR,JTT92, etc), including heterotachous models (Galtier'smodel [22] and Tuffley and Steel's model [23]) and anyrates across sites model (i.e. Gamma and Gamma + invar-iant distributions). The developer can also use the Substi-tutionModelSet class with his own substitution modelthrough the Bio++ SubstitutionModel interface. The Sub-stitutionModelSet class can be used in conjunction withother Bio++ classes to reconstruct ancestral states or tomap substitutions, and hence allows to perform theseanalyses in the general non-homogeneous case.

Estimating parameters

Estimation of numerical parameters is performed using amodified Newton-Raphson optimization algorithm,

Page 120: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 3 of 12

(page number not for citation purposes)

commonly used in phylogenetics [4,24,25], and thereforerequires computing derivatives with respect to parametersof the model. Because the use of the cross derivatives leadsto numerical instabilities in the optimization (NicolasGaltier, personal communication), they are set to zero inthe Hessian matrix. Derivatives regarding branch lengthsare computed analytically, whereas derivatives regardingthe rates across sites distribution are computed numeri-cally. Although the substitution model derivatives can becomputed analytically in the homogeneous case as well asin Galtier and Gouy's model, they are difficult to computeanalytically in the more general case, and are conse-quently computed numerically in Bio++. To prevent con-vergence issues due to erroneous derivative values we use,in the last optimization steps, Powell's multi-dimensionsalgorithm, which does not rely on parameter derivatives[26].

A general file format to describe non-homogeneous models

We introduced a new user-intuitive property file format todescribe non-homogeneous substitution models. Thisformat is an extension of PAML or NHML property fileformats, and uses a syntax of the kind

property_name = property_value

A parser that automatically instantiates the appropriateSubstitutionModelSet object is included in the Bio++libraries and is used by all programs in the Bio++ pro-grams suite. Moreover, the same format is used for theinput file of the programs and for their output, so that theoutput of one program (e.g. which adjusts a model to realdata) can easily be used as the input of another one (e.g.which simulates data from a model). Figure 3 shows howthe models in figure 1 are coded using this format. Thecore part of the description is the "model" property, whichis associated to one or several nodes of the phylogenetictree through node identifiers. These node identifiers canbe obtained from the programs in the Bio++ programsuite, or set by the user in his own program.

The BppML and BppSeqGen programs

Parameter estimation and simulation procedures areavailable as dedicated classes in the Bio++ phylogeneticlibrary, and can hence be used in any C++ program. How-ever, for users who would rely on appropriate softwarerather than program their own tools, the Bio++ programsuite was designed. These programs, including BppML(for Bio++ Maximum Likelihood) and BppSeqGen (Bio++Sequence Generator) are command line driven and fullyparametrized using property files, as introduced above.

General non-homogeneous model of substitutionFigure 1General non-homogeneous model of substitution. The substitution model depicted here is Tamura's 1992 model of substitution, which contains two parameters: κ, the transitions/transversions ratio and θ the equilibrium G+C content. In the homogeneous case, θ and κ are constant over the tree (case 'a'). In Galtier and Gouy's 1998 model, κ is constant over the tree and one distinct θ is allowed per branch (case 'b'). Between these two extrema lay models with certain branches, but not all, sharing a common value of θ (case 'c'). In the most general case 'd', there are two sets of parameters, one for κ and another for θ, that are shared by the branches of the tree.

Page 121: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 4 of 12

(page number not for citation purposes)

They can thus easily be pipelined with scripting languagesas bash, python or perl. In addition to the BppML andBppSeqGen programs, the Bio++ program suite also con-tains programs for distance-based phylogenetic recon-struction, sequence file format conversion and treemanipulation.

Results and DiscussionOur new general non-homogeneous model implementa-tion was applied to Boussau and Gouy's data set of con-catenated small and large subunit ribosomal RNAsequences and tree [6]. This data set contains 92sequences and 527 complete sites. We first compare com-putation time, memory usage and parameter estimationfor various models and software. We then show how thegeneral non-homogeneous model introduced here can beused to study model fit through parametric bootstrap-ping.

In this section, we use the following model notations:

H Homogeneous model, using a Tamura 1992 substitu-tion model [27].

NH1 One-theta-per-branch non-homogeneous model[24]. This model uses Tamura's 1992 substitution model,with one θ (equilibrium G+C content) per branch in thetree, whereas κ (transitions/transversions ratio) is sharedby all branches.

NH2 One-theta-per-kingdom non-homogeneous model.In this general model, we allowed each kingdom (Bacte-ria, Eukaryotes or Archaea) to have its own equilibriumG+C content, while sharing the same transitions/transver-sions ratio.

NH3 Same as NH2, but in addition the (hyper)ther-mophilic Bacteria on one hand, and the eukaryote G+C-rich genus Giardia on the other hand were allowed tohave their own equilibrium G+C content.

NH4 One-kappa-per-branch non-homogeneous model.This model has one κ per branch in the tree, whereas θ isshared by all branches.

Performance

We compared the likelihood of our implementation withthe NHML [22,24] and [nh]PhyML [4,6] programs (see

Relations between branches, models and parametersFigure 2Relations between branches, models and parameters. In the general non-homogeneous case, model parameters are shared by different branches across the tree. These parameters are part of branch-specific substitution models, which specify branch-wise probabilities of replacement between states. Branches are here defined according to their rightmost node. The SubstitutionModelSet class stores dependencies between nodes, models and parameters.

Page 122: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 5 of 12

(page number not for citation purposes)

table 1 and Additional file 1). Several models have beentested: Kimura two parameters (K80) for the homogene-ous case, and Tamura 1992 (T92) derived models for thenon-homogeneous cases, with constant rate, Gamma dis-tributed rates (4 classes), Gamma (4 classes) + invariantand Galtier's 2001 site-specific rate variation model (cov-arion-like). On all tested models, the optimization algo-rithm in Bio++, while using numerical derivatives, leadsto similar or better likelihood values than other programs,although at the price of an increase in computationaltime. However this increase is not sufficient to prevent theuse of complex models on data sets of usual sizes, as ittakes a little bit more than an hour and a quarter to opti-mize parameters with the richest models on a data set con-taining 92 sequences. It is also noteworthy that the Bio++implementation requires less memory than other pro-grams. This is partly explained by differences in the algo-rithms used to compute the likelihood [28]. The PhyMLprograms, including nhPhyML, use a double-recursivealgorithm [6], which saves a lot of computation whenexploring the space of tree topologies but results in a threefold increase in memory usage compared to the simple-recursive algorithm. Because no tree space explorationwas involved, BppML computations used the simple-

recursive algorithm. If desired, however, Bio++ also offersthe double-recursive algorithm.

The convergence of the optimization algorithm wasassessed by two methods, using the NH3 model. First, weused 100 distinct randomly chosen initial sets of parame-ter values and the RNA data set (see methods). We foundthat the estimated values obtained in each run were thesame for all parameters up to the 5th decimal. Second, wesimulated 100 data sets using the NH3 model with aGamma + invariant rate distribution, with parameter val-ues estimated from the real data set and the same numberof sites. These parameters were then re-estimated for eachsimulated data set using random initial values. The resultsare displayed on figure 4, and show that the parametervalues are recovered without bias and with a good preci-sion. The only exception is the proportion of invariantsites which is slightly overestimated. These results also val-idate the simulation procedure.

Example of application: parametric bootstrap and

Bowker's test for non-homogeneity

As most phylogenetic reconstruction models are homoge-neous, they do not properly model the evolution of

Model specification in BppML and BppSeqGenFigure 3Model specification in BppML and BppSeqGen. A general file format is introduced to allow for the user-friendly descrip-tion of virtually any non-homogeneous model. The tree shows the nodes identifiers, which can be obtained from the programs or defined by the user in its own program. Each case presented here corresponds to a particular model in figure 1, and was labeled accordingly. Each parameter can be fixed to a specific value or optimized with BppML.

Page 123: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 6 of 12

(page number not for citation purposes)

homologous sequences that vary widely in their composi-tions. Analyzing compositionally heterogeneous data setswith homogeneous models of sequence evolution maytherefore lead to incorrect inferences, provided the heter-ogeneity is large enough. Several tests have been devel-oped to assess the amount of heterogeneity present in adata set (see [29] for a review).

Estimating the amount of compositional heterogeneity in a data set

Most commonly, a matrix is assembled that containscompositions in all characters for all sequences, and thismatrix is analyzed through χ2 statistics [29]. However, thisapproach usually does not distinguish between constantand variable sites, and therefore may underestimate thetrue amount of heterogeneity in a data set [29].

Recently, Ababneh et al. [30] re-introduced Bowker's pair-wise test [31] for symmetry. Given two aligned sequencesS1 and S2 on a given alphabet of size n and characters

x1,2...n, it compares the numbers of substitutionsbetween xi in S1 and xj in S2, i,j ∈ [1 : n], with the num-bers of substitutions between xj in S1 and xi in S2. If thesepairs of numbers are equal for all i, j ∈ [1 : n], the twosequences may have evolved according to two identicalprocesses. Otherwise, the two processes were necessarilydifferent.

Bowker's test therefore permits to assess whether compo-sitional differences have accumulated between twosequences through non-homogeneous evolution. Toapply it to more than two sequences, Rodriguez-Ezpeletaet al. [32] computed all pairwise Bowker's tests in theiralignment and computed the median value; one couldalso have counted the number of Bowker's tests that aresignificant at a 5% threshold according to a χ2 table.

However, none of these tests permit to estimate if theamount of heterogeneity that they detect in a given data

Table 1: Comparison of the NHML, (NH)PhyML and BppML programs. Likelihood: - log (likelihood) of the optimized parameters, with

a fixed tree topology.

Likelihood

Rate Constant Γ(4) Γ(4) + I Covarion

Model H NH1 NH3 H NH1 NH3 H NH1 NH3 H NH1 NH3

NHML 15307 15034 -- 14145 13828 -- -- -- -- 13750 13397 --

PhyML 15187 15011 -- 14141 13824 -- 14128 -- -- -- -- --

BppML 15187 14920 15109 14141 13821 14029 14128 13810 14018 13747 13399 13615

Time

Rate Constant Γ(4) Γ(4) + I Covarion

Model H NH1 NH3 H NH1 NH3 H NH1 NH3 H NH1 NH3

NHML 0:01:40 0:02:28 -- 0:03:07 00:02:13 -- -- -- -- 0:19:24 0:19:09 --

PhyML 0:00:07 0:01:43 -- 0:00:34 00:02:29 -- 0:00:35 -- -- -- -- --

BppML 0:00:27 0:11:57 0:01:12 0:00:47 00:35:46 0:00:48 0:01:01 0:29:40 0:01:38 0:02:52 1:14:32 0:14:27

Memory

Rate Constant Γ(4) Γ(4) + I Covarion

Model H NH1 NH3 H NH1 NH3 H NH1 NH3 H NH1 NH3

NHML 16.38 20.48 -- 55.30 65.54 -- -- -- -- 55.30 65.54 --

PhyML 10.24 28.67 -- 30.73 77.82 -- 30.72 -- -- -- -- --

BppML 08.19 08.19 08.19 14.34 14.34 14.34 14.34 16.38 16.38 12.29 14.34 12.29

Time is shown as hours:minutes:seconds. Numbers in bold font correspond to the best performance for each comparison. Memory corresponds to the maximum memory usage during the program execution in megabytes. H: homogeneous case, with a K80 substitution model, NH1: theta per branch model, with a T92 substitution model, NH3: clade-specific and G+C-rich species theta model, see methods. The PhyML program was used for the H model, and nhPhyML for the NH1 model.

Page 124: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 7 of 12

(page number not for citation purposes)

set is sufficient to bias inferences made using homogene-ous models, although this is likely the question an averageuser would like to answer.

Assessment of the fit of evolutionary models with respect

to compositional heterogeneity

Here, we describe a method to reveal the ability of evolu-tionary models to account for the compositional hetero-geneity in a sequence alignment, which we measure usingthe median of all Bowker's pairwise statistics, or thenumber of significant Bowker's pairwise tests (in the fol-lowing, we note the measure of compositional heteroge-neity h). This method is tree-based, and uses parametricbootstrapping [10-12]. In this respect, it is similar to themethod recently introduced in [13] in the Bayesian set-ting. Our approach requires 5 steps to estimate the fit of amodel M to a data set D.

1. Compute the compositional heterogeneity measure hfor the data set D.

2. Estimate the parameters of model M based on the dataset D according to the Maximum Likelihood criterion.

3. Simulate a large number of data sets D' using the modelM previously estimated.

4. Compute the compositional heterogeneity measure h'for each alignment D'.

5. Compare the measure h obtained on data set D to meas-ures h' obtained on data sets D'. If h is outside 95% of thedistribution of h', the model does not properly reproducethe heterogeneity of data set D.

Using such an approach, any model can be comparedwith others with respect to their ability to handle the com-positional heterogeneity of a given data set: the closest thedistribution of h' is from h, the highest is the fit. Ideally,the distribution of measures h' obtained on the paramet-ric bootstrap replicates of a good model should be cen-

Assessing parameter estimation using simulationsFigure 4Assessing parameter estimation using simulations. Left: boxes show the median and quartiles of the distribution of parameter estimates for 100 simulations. The 'true' value used in the simulation is shown in red. Right: boxes show the distri-bution of the bias (estimated value – real value), as a function of the (pooled) real values of the branch length. θ*: GC content, ω GC content at root, α : shape of the Gamma distribution of rates across sites, p proportion of invariant sites (see text for details on the model used).

κκ θθ1 θθ2 θθ3 θθ4 θθ5 ωω αα p

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

'True' value

Substitution parameters

[0,

0.0

57

]

[0.0

57

, 0

.11

4]

[0.1

14

, 0

.17

]

[0.1

7,

0.2

27

]

[0.2

27

, 0

.28

4]

[0.2

84

, 0

.34

1]

[0.4

55

, 0

.51

1]

[1.6

48

, 1

.70

5]

−1.5

−1.0

−0.5

0.0

0.5

Branch lengths

Page 125: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 8 of 12

(page number not for citation purposes)

tered around the value obtained for the real alignment h,with a very low variance. If one neglects potential prob-lems linked with over-parametrization, the inferences ofthe best model should be preferentially trusted comparedto a model that fails to account for an important featureof a data set. Overall, our approach can be used for modelselection, although contrary to criteria such as AIC or BIC[28] this approach does not take into account the numberof parameters; more importantly, it can also be used forestimating model adequacy.

Application to an rRNA data set

Our approach to assess the composition-wise fit of evolu-tionary models to a data set was applied to an alignmentcontaining ribosomal RNA sequences from Archaea, Bac-teria and Eukaryotes [6]. First, several homogeneous andnon-homogeneous models were fitted to the data set,using a Tamura 1992 model of substitution with a fourclasses Gamma + invariant distribution of rates acrosssites. Then, 10,000 artificial data sets were simulated ineach case using these estimated parameters. Eventually,the real data set and the simulated data sets were com-pared with respect to their compositional heterogeneity:models able to simulate data sets with similar amounts ofheterogeneity as the real data set appropriately account forthis specific aspect of the data.

Results are shown in figure 5 and table 2. Both thenumber of significant Bowker's tests and the median oftheir values give similar results. For instance, both indicesfind that the real data set shows significantly more heter-ogeneity than the distributions of data sets simulatedunder the homogeneous model of sequence evolution (p-value = 0.0008 for the number of significant pairwise testsand p-value = 0.0028 for the median). The homogeneousmodel therefore lacks parameters useful to account forthis particular feature of the data. Allowing different tran-sition/transversion rates for each branch as in model NH4does not solve this problem, as the obtained bootstrappeddistribution also significantly underestimates the hetero-geneity in the real data (p-value = 0.0015 and p-value =0.0047, respectively). It is noteworthy, however, that thelikelihood ratio test finds that this model describes thedata significantly better than the homogeneous one,whereas the AIC and BIC criteria do not. On the contrary,the NH1 model simulated sequences distribution sur-rounds the value obtained on the real data set (p-value >0.7 in both cases). This suggests that Galtier and Gouy'smodeling [24] properly accounts for the heterogeneity inrRNA data sets, and that there may be no point in usingmore parameter-rich models such as Yang and Roberts'[33] on these molecules. The results even suggest thatNH1 might be slightly prone to over-estimating theamount of heterogeneity. For instance, the medianBowker's test value for simulated data sets are most often

higher than the value obtained on the real data set. NH1'sbehavior may be explained by over-parametrization: it islikely that during sequence evolution, not all brancheswitnessed significant shifts in mutational parameters orselection pressures. To investigate further the impact ofthe number of parameters on model fit, two other modelswere tested: NH2, in which different equilibrium G+Ccontents are associated to each kingdom, and NH3, whichfurther adds two equilibrium G+C contents, one for thehyperthermophilic (G+C rich) Bacteria, and one for theG+C rich Eukaryote Giardia. Hyperthermophilic (G+Crich) Archaea were not considered separately from theothers as nearly all Archaea in our data set were ther-mophilic or hyperthermophilic. The NH2 model seems tolack useful parameters to properly account for the hetero-geneity in the real data set, as its simulated data sets areless heterogeneous than the real one (p-value = 0.0040 forthe number of pairwise tests, and 0.0141 for the median).The NH3 model improves upon NH2 as its bootstrappeddistribution is more centered upon the observed value,which is no longer rejected (p-value = 0.14 and 0.27).However, the observed value is still on the right side of thenull-distribution, and it is very likely that the correct par-ametrization lays between NH1, too rich with its 182equilibrium G+C contents, and NH3, maybe too poorwith its 5 equilibrium G+C contents. However, as NH3provides a fit nearly as good as NH1 with a much loweramount of parameters, the best model may well have lessthan a dozen equilibrium G+C contents. Interestingly,Bowker's tests are in agreement with the Bayesian infor-mation criterion (BIC, see table 2) and favor the NH3model. Conversely, Akaike's information criterion (AIC)and the likelihood ratio test (LRT) favor the more param-eter-rich model NH1. Obviously, although a few worksalready addressed this issue in the Bayesian framework[11,13,34], automatic ways to explore and choose amongheterogeneous models in a maximum likelihood frame-work are much needed. All the tools required for such aproject are now available in the Bio++ libraries.

ConclusionBio++ is a growing set of libraries designed for sequence,phylogenetic and molecular evolution analyzes. In thisarticle extensions allowing to implement a wide variety ofnon-homogeneous models of sequence evolution wereintroduced. Combined with support for rates across sitesand heterotachous models of evolution, and with rou-tines for optimizing parameters and tree topology in themaximum likelihood framework, they provide a compre-hensive platform for phylogenetic studies, either for bio-informaticians willing to develop their own software, orfor biologists characterizing the evolution of a particularset of sequences using the BppML and BppSegGen pro-grams. Whilst being a generalist program implementing alarge variety of models, BppML was shown to be of a sim-

Page 126: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 9 of 12

(page number not for citation purposes)

Distributions of the Bowker's test statistics under various modelsFigure 5Distributions of the Bowker's test statistics under various models. First column: number of pairwise tests significant at the 5% level. Second column: median of the pairwise statistics. First row: homogeneous model (H). Second row: one theta per branch non-homogeneous model (NH1). Third row: 3 thetas non-homogeneous model (NH2). Fourth row: 5 thetas non-homogeneous model (NH3). Fifth row: one kappa per branch non-homogeneous model (NH4). All models use the Tamura 1992 substitution model with a 4-classes discrete Gamma + invariant rate distribution. The arrows indicate the observed val-ues from the real data set and the resulting p-values.

H

0.0

00

00

.00

10

0.0

02

0

7e−04 ***

NH

1

0.0

00

00

.00

10

0.0

02

0

0.6994 NS

NH

2

0.0

00

00

.00

10

0.0

02

0

0.0014 **

NH

3

0.0

00

00

.00

10

0.0

02

0

0.1446 NS

Niumber of significant pairwise tests

NH

4

0 1000 2000 3000 4000

0.0

00

00

.00

10

0.0

02

0

0.004 **

0.0

0.1

0.2

0.3

0.4

0.0028 **

0.0

0.1

0.2

0.3

0.4

0.811 NS

0.0

0.1

0.2

0.3

0.4

0.0141 *

0.0

0.1

0.2

0.3

0.4

0.2672 NS

Median of pairwise statistics

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.0047 **

Page 127: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 10 of 12

(page number not for citation purposes)

ilar quality as programs dedicated to particular homoge-neous or non-homogeneous models of evolution,achieving higher likelihood scores with smaller memoryrequirements while conserving reasonable running-times.Its joint use with BppSeqGen permits to precisely studythe evolution of a particular data set through parametricbootstrapping, and may be used to generate realistic arti-ficial data sets to study the robustness of phylogeneticreconstruction methods in the presence of heterogeneityand heterotachy. Further developments may involvemethods to optimize the number of models necessary toaccount for the heterogeneity in a data set, or methods toexplore the space of tree topologies with a broad range ofnon-homogeneous models of sequence evolution.

MethodsData and phylogeny reconstruction

RNA sequences from the small and the large subunit ofthe ribosome were aligned and concatenated. Sequencescoming from 22 Archaea, 34 Bacteria and 36 Eukaryoteswere selected to yield a data set containing 92 sequencesand 527 complete sites, with G+C contents ranging from43% to 71%. A phylogenetic tree was built with nhPhyML[6]. For additional information, please refer to [6].

Comparing likelihood optimizations

The NHML, (NH)PhyML and BppML programs were usedto compare optimization performances. The programswere run on the data set from [6], after all columns in thealignment containing at least either a gap or an unknowncharacter had been removed. The phylogenetic tree from[6] was used as a fixed topology, and the branch lengthsused as initial values for the optimization. To allow thecomparison between the three programs, the Kimura twoparameters model of substitution [35] was used forhomogeneous models and models derived from Tamura's1992 model [27] for non-homogeneous models. Initialvalues were set to 1 and 0.5 for the κ and θ parametersrespectively. A Gamma (4 classes) + invariant rates across

sites distribution was also tested, with initial value set to0.5 for the Gamma shape parameter, and 0.2 for the pro-portion of invariants. Galtier's 2001 [22] heterotachousmodel was also tested, with 4 rate classes, initial values ofthe shape parameter set to 0.5, and initial value of the ratechange parameter set to 0.5. The precision in the optimi-zation algorithm was set to 0.000001 for the three pro-grams. The total length of execution was correctedaccording to the average CPU usage, and the memoryusage corresponds to the maximum reached during pro-gram execution, as reported by the Unix "top" command.All calculations were performed on a 64 bits Intel(R)Core(TM)2 Duo, CPU 2.66 GHz.

Assessing the convergence of the optimization procedure

Different initial values were used as initial guesses for theoptimization algorithm. The GC frequencies and the pro-portion of invariant sites were chosen randomly from auniform distribution between 0 and 1. The transitions/transversions ratio and the alpha parameter of the rate dis-tribution were picked from a [0, 5] and [0.2, 2] uniformdistributions, respectively. Branch lengths were takenfrom a uniform distribution between 0 and 0.1.

Computing p-values for Bowker tests

Alignment-wise tests for non-homogeneity were per-formed using two types of statistics:

• The number of 5% significant pairwise tests,

• The median of pairwise statistics.

In both cases, the global p-value was computed as

where N1 is the number of simulations performed underthe null model, and N2 is the number of values of the sta-

p value− =+

+

N

N2 1

1 1, (1)

Table 2: Model comparisons.

Model lnL k LRT AIC BIC Bowker

H NH2 NH3 # tests median

H -14110.628293 185 28591.26 29380.69 0.0008 0.0028

NH1 -13810.371502 368 600.51 556.74 416.97 28356.74 29927.07 0.7010 0.8110

NH2 -14088.739682 189 43.78 28555.48 29361.98 0.0040 0.0141

NH3 -14018.854234 191 183.55 139.77 28419.71 29234.74 0.1448 0.2672

NH4 -13970.841467 368 279.57 28677.68 30248.01 0.0015 0.0047

Comparison of the various non-homogeneous models with the homogeneous case, using different criteria. k is the number of parameters and lnL is the log likelihood of each model. The Akaike's information criterion (AIC) of each model is defined as 2k - 2·lnL, and the lowest value, corresponding to the best model according to this criterion is in bold font. The Bayesian information criterion (BIC) is computed as k· ln(n) - 2·lnL, n = 527 being the number of observations. The lowest value is in bold font. The likelihood ratio test (LRT) allows to compare nested models only, and is defined as minus two times the logarithm of the ratio of likelihoods. All LRT are significant at the 0.1% level. This ratio follows a χ2

distribution with the number of additional parameters as the degrees of freedom. The last two columns show the p-values of the two Bowker's test introduced in this paper.

Page 128: Early Evolution and Phylogeny

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 11 of 12

(page number not for citation purposes)

tistic in the simulations that were greater or equal to theobserved one, measured from the real data set. In thisstudy, N1 was set to 10,000.

Program source code for performing Bowker's test is pro-vided as Additional file 2. The data and scripts to run theanalyses are in Additional file 3.

Availability and requirementsProject name: The Bio++ libraries (version 1.6) and pro-grams suite (version 1.0).

Project home page: http://kimura.univ-montp2.fr/BioPPand http://home.gna.org/bppsuite

Operating systems: Any platform with a C++ compilerand supporting the Standard Template Library

Programming language: C++

Other requirements: The C++ Standard Template Library

License: The CeCILL free software license (GNU compat-ible)

Authors' contributionsBB and JD designed the method, implemented the soft-ware and wrote the article. JD ran the analyses.

Additional material

AcknowledgementsThe authors would like to thank Manolo Gouy, Nicolas Galtier, Mathieu

Emily and Matthew Spencer for helpful comments on this manuscript.

References1. Williams PD, Pollock DD, Blackburne BP, Goldstein RA: Assessing

the accuracy of ancestral protein reconstruction methods.PLoS Comput Biol 2006, 2:e69-e69.

2. Goldman N: Statistical tests of models of DNA substitution. JMol Evol 1993, 36:182-198.

3. Kuhner MK, Felsenstein J: A simulation comparison of phylog-eny algorithms under equal and unequal evolutionary rates.Mol Biol Evol 1994, 11:459-468.

4. Guindon S, Gascuel O: A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. SystBiol 2003, 52:696-704.

5. Lopez P, Casane D, Philippe H: Heterotachy, an important proc-ess of protein evolution. Mol Biol Evol 2002, 19:1-7.

6. Boussau B, Gouy M: Efficient likelihood computations withnonreversible models of evolution. Syst Biol 2006, 55:756-768.

7. Kolaczkowski B, Thornton JW: Performance of maximum parsi-mony and likelihood phylogenetics when evolution is heter-ogeneous. Nature 2004, 431:980-984.

8. Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F: Hetero-tachy and long-branch attraction in phylogenetics. BMC EvolBiol 2005, 5:50-50.

9. Goldman N, Anderson JP, Rodrigo AG: Likelihood-based tests oftopologies in phylogenetics. Syst Biol 2000, 49:652-670.

10. Bollback JP: Bayesian model adequacy and choice in phyloge-netics. Mol Biol Evol 2002, 19:1171-1180.

11. Foster PG: Modeling compositional heterogeneity. Syst Biol2004, 53:485-495.

12. Lartillot N, Brinkmann H, Philippe H: Suppression of long-branchattraction artefacts in the animal phylogeny using a site-het-erogeneous model. BMC Evol Biol 2007, 7(Suppl 1):S4-S4.

13. Blanquart S, Lartillot N: A Site- and Time-HeterogeneousModel of Amino-Acid Replacement. Mol Biol Evol 2008.

14. Rambaut A, Grassly NC: Seq-Gen: an application for the MonteCarlo simulation of DNA sequence evolution along phyloge-netic trees. Cabios 1997, 13:235-238.

15. Yang Z: PAML 4: phylogenetic analysis by maximum likeli-hood. Mol Biol Evol 2007, 24:1586-1591.

16. Sueoka N: On the genetic basis of variation and heterogeneityof DNA base composition. Proc Natl Acad Sci USA 1962,48:582-592.

17. Galtier N, Lobry JR: Relationships between genomic G+C con-tent, RNA secondary structures, and optimal growth tem-perature in prokaryotes. J Mol Evol 1997, 44:632-636.

18. Foster PG, Jermiin LS, Hickey DA: Nucleotide composition biasaffects amino acid content in proteins coded by animal mito-chondria. J Mol Evol 1997, 44:282-288.

19. Zeldovich KB, Berezovsky IN, Shakhnovich EI: Protein and DNAsequence determinants of thermophilic adaptation. PLoSComput Biol 2007, 3:e5-e5.

20. Wang HC, Spencer M, Susko E, Roger AJ: Testing for covarion-like evolution in protein sequences. Mol Biol Evol 2007,24:294-305.

21. Dutheil J, Gaillard S, Bazin E, Glémin S, Ranwez V, Galtier N, BelkhirK: Bio++: a set of C++ libraries for sequence analysis, phylo-genetics, molecular evolution and population genetics. BMCBioinformatics 2006, 7:188-188.

22. Galtier N: Maximum-likelihood phylogenetic analysis under acovarion-like model. Mol Biol Evol 2001, 18:866-873.

23. Tuffley C, Steel M: Modeling the covarion hypothesis of nucle-otide substitution. Math Biosci 1998, 147:63-91.

24. Galtier N, Gouy M: Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model ofDNA sequence evolution for phylogenetic analysis. Mol BiolEvol 1998, 15:871-879.

25. Felsenstein J: PHYLIP (Phylogeny Inference Package) version3.6. Distributed by the author 2005.

26. Press WH, Teukolsky SA, Vetterling WT, Flannery BP: Numerical Rec-ipes in C. The Art of Scientific Computing second edition. Cambridge Uni-versity Press; 1992.

Additional file 1Detailed results of model comparison. OpenDocument spreadsheet

(.ods) file containing detailed results from table 1, with parameter esti-

mates obtained.

Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2148-8-255-S1.ods]

Additional file 2Program to compute Bowker's test. Zip archive containing the C++ pro-

gram used to compute Bowker's test.

Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2148-8-255-S2.zip]

Additional file 3Data set, tree and scripts for running Bowker's tests. Zip archive con-

taining the sequence alignment and phylogenetic tree used, together with

scripts for running the tests presented in this article.

Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2148-8-255-S3.zip]

Page 129: Early Evolution and Phylogeny

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for

disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:

http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

BMC Evolutionary Biology 2008, 8:255 http://www.biomedcentral.com/1471-2148/8/255

Page 12 of 12

(page number not for citation purposes)

27. Tamura K: The rate and pattern of nucleotide substitution inDrosophila mitochondrial DNA. Mol Biol Evol 1992, 9:814-825.

28. Felsenstein J: Inferring Phylogenies Sinauer Associates, Inc; 2004. 29. Jermiin L, Ho SY, Ababneh F, Robinson J, Larkum AW: The biasing

effect of compositional heterogeneity on phylogenetic esti-mates may be underestimated. Syst Biol 2004, 53:638-643.

30. Ababneh F, Jermiin LS, Ma C, Robinson J: Matched-pairs tests ofhomogeneity with applications to homologous nucleotidesequences. Bioinformatics 2006, 22:1225-1231.

31. Bowker A: A test for symmetry in contingency tables. J Am StatAssoc 1948, 43:572-574.

32. Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF,Philippe H: Detecting and overcoming systematic errors ingenome-scale phylogenies. Syst Biol 2007, 56:389-399.

33. Yang Z, Roberts D: On the Use of Nucleic Acid Sequences toInfer Branchings in the Tree of Life. Mol Biol Evol 1995,12:451-458.

34. Blanquart S, Lartillot N: A Bayesian Compound StochasticProcess for Modeling Nonstationary and NonhomogeneousSequence Evolution. Mol Biol Evol 2006, 23:2058-2071.

35. Kimura M: A simple method for estimating evolutionary ratesof base substitutions through comparative studies of nucle-otide sequences. J Mol Evol 1980, 16:111-120.

Page 130: Early Evolution and Phylogeny
Page 131: Early Evolution and Phylogeny

8♦♣♥ t tr♦♥♦s ♦t♦♥r②

♦s ♥ ♥ ♥

rt s♦ tt ♦♠♣♦st♦♥ tr♦♥t② s ♥♦t t ♦♥② ♣r♦♠ tt ♣②♦♥tst t♦ ♦♥r♦♥t t♦ ♣②♦♥t tr tr ♥ tr♥sr♥♦t② s r② ♠♣♦rt♥t t② tt ts ♥ t♦r ♥ ♦♥ ♣tstt ♥♦t ♦♥② ♦ ♥s r ①♥ t♥ s♣s t tt s♦ ♥ ♣rts ♦ ♥s tr♦ ♣r♦ss ♥♠ r♦♠♥t♦♥ ♥ s r♠st♥s♦♥ ♥ ♠② sr r♥t st♦rs♥ ts rt ♦♣ ♠♦s t♦ st♠t t st♦rs tt ♠② ♥♥ ♥ sq♥s ♠♥② ♥s ♥r♦♥ r♦♠♥t♦♥ ♥ts t ♠t ♠♣♦rt♥t t♦ s s ♠♦s t♦ r♦♥strt s♣s ♣②♦♥s

s rt s ♥♦t ♥ s♠tt ②t

Page 132: Early Evolution and Phylogeny

①tr ♦ ♥ ♥ r♦ ♦ t♦ tt

♦♠♥t♦♥

st♥ ♦ss r♥t é♥ ♥ ♥♦♦ ♦②

♠r

♦rrs♣♦♥♥ t♦r❯♥rsté ②♦♥ ♥rsté ②♦♥ ❯ ♦rt♦r ♦♠étr t

♦♦ ♦t ♦r ♥♦♠r ❱r♥♥ r♥

♠ ♦ss♦♠sr♥②♦♥r

② ♦rs ♠♦r ♣②♦♥② r♦♠♥t♦♥ ♠①♠♠ ♦♦ P②

Page 133: Early Evolution and Phylogeny

strt

♦♠♦♦♦s r♦♠♥t♦♥ s ♣rs ♦♦ ♣r♦ss tt ts sq♥s ♥

♥ ♦r♥s♠s ♥ rss ♥ t ♣rs♥ ♦ r♦♠♥t♦♥ t ♦t♦♥r② st♦r② ♦

♥ ♥♠♥t ♦ ♦♠♦♦♦s sq♥s ♥♥♦t ♣r♦♣r② ♣t ② s♥ rt♥

tr s♦♠ sts ♦ ♦♥ s♣ ♣②♦♥t tr ♦trs ♦♦ ♥♦tr

♣t t♦s t♦ ♥②s r♦♠♥t♦♥ ♥ sq♥s s② ♥♦ ♣♥st♥

♥②ss ♦ t ♥♠♥t tr♦ s♥♥♦s ♦r r ♣rtr② ♠♥♥ ♥ ♦♠♣

tt♦♥ rs♦rs ♥ r ♦t♥ ♠t t♦ ♥♦t sq♥s ♥ ts rt ♣r♦♣♦s

♥ ♠♣♠♥t ①tr ♦ ♦♥ trs ♥ ♣②♦♥t ♥ r♦ ♦ t♦ r

r♦♠♥t♦♥ r♣♦♥ts sr♥ ♦r t r♦s ♦t♦♥r② st♦rs tt r

♣rs♥t ♥ t ♥♠♥t s ♠♦s r s♥t② ♥t t♦ ♣♣ t♦ ♦③♥s ♦

sq♥s ♥ ♥ ♥ ♥r♥t② ♥♦t ♦r ♣r♦t sq♥s ❲ st♠t tr

r② ♦♥ s♠t sq♥s ♥ tst t♠ ♦♥ r t

Page 134: Early Evolution and Phylogeny

♥tr♦t♦♥

♦♠♦♦♦s r♦♠♥t♦♥ s ♣r♦ss tr♦ ♥s s♥♥ r♦♠ s♠ ♥st♦r①♥ ♣rts ♦ tr sq♥ ♦♥sq♥t② sq♥s ♥ ♥r♦♥ r♦♠♥t♦♥ s♣② t♦ r♥t st♦rs ♦♥ st♦r② ♦r t ♥♦♥r♦♠♥♥ ♣rt ♦ tr sq♥ ♥ ♦♥st♦r② ♦r t r♦♠♥♥ ♣rt t r♦♠♥♥ ♥s ♥ ♣rts ♦ r♥t ♥s♦♥ ♥♦ ♣r♦r t♦ ts r♦♠♥t♦♥ ♥t t r♥ ♥ t st♦rs ♦ t r♦♠♥♥♥ ♥♦♥r♦♠♥♥ ♣rts ♦ t ♥ ♠② tr♥st ♥t♦ t♦♣♦♦ ♥♦♥r♥s t♥ trrs♣t ♣②♦♥s

♦♥ ♣♣s ss ♣②♦♥t ♠t♦s ♦♥t♦ ♥ ♥♠♥t tt s ♥r♦♥ r♦♠♥t♦♥ ♦♥② ♦♥ tr r♦r t ♥♦ r♥t tt ts tr ♦rrs♣♦♥s t♦ tr♦♠♥♥ ♣rt ♦ t sq♥ t ♥♦♥r♦♠♥♥ ♣rt ♦r ♥② ♦ ts t♦ r ♠t♦s ♥ ♦♣♣ t♦ tr② ♥ tt r♦♠♥t♦♥ ♥ ♥♠♥ts ❬ ❪ s ♠t♦s ♥tr♦r s ♣r♦r t♦ ♣②♦♥t ♥②ss t♦ s tr t s ♠♥♥ t♦ sr tst♦r② ♦ ♥ ♥♠♥t ② s♥ rt♥ tr ♥ ss r ♥♦ r♦♠♥t♦♥ s ♥tt t ssq♥t ♥②ss s ss ♣②♦♥ts ♥ ss r r♦♠♥t♦♥ s ♥tt tr r ♠t♦s tt ♥ ♥②s ♥ ♥♠♥t ♥ ♣rs② ♣rt ♦tt r♦♠♥t♦♥ r♣♦♥ts ♥ t ♦t♦♥r② st♦rs ♦♥ ♥ t ♥♠♥t

♣t s ♠t♦s s ♦♥ s♥ ♥♦s tt r ♣♥st♥ ♥ ♥♥♦t ♣rs②♣♥♣♦♥t t r♦♠♥t♦♥ r♣♦♥ts t♦ r♦♣s ♣r♦♣♦s ♠t♦s t♦ ♥ ♦t tr♦♠♥t♦♥ ♣♦st♦♥s ♥ t ♣②♦♥t trs ♥ r t ❬❪ ♥s♣r ②t ♦r ♦ s♥st♥ ♥ r ❬❪ ♣r♦♣♦s ♠t♦ s ♦♥ ♥ r♦ ♠♦ ♥ t ♥ stts r t ♣②♦♥t trs t♠ss r♦r tr♥st♦♥t♥ t stts ♦t t♦ r♦♠♥t♦♥ r♣♦♥t ♦r ts rst tt♠♣t s ♣r♦♥t♦ tt♥ r♦♠♥t♦♥ ♥ts r tr s ♦♥② rt tr♦♥t② s♠r ssq♥t②t ♣♦♥ ts ♠♦ t♦ t tr♦♥ts ♥ st ♦t♦♥r② rts ❬❪ ② s♣r♠♣♦s♥♥♦tr ♦s stts ♦rrs♣♦♥ t♦ ♦t♦♥r② rts tr♦r t♦ ♥s ♦ tr♥st♦♥sr ♦ ♦♥ t ♥♠♥t tr♥st♦♥ t♥ t♦♣♦♦s ♥t ♦ r♦♠♥t♦♥ ♥ tr♥st♦♥ t♥ rts ❯♥♦rt♥t② ts ♠t♦s r ♦♠♣tt♦♥② ♠♥♥ ♥♥ ♦♥② ♣♣ ♥ ss r t s♣ ♦ tr t♦♣♦♦s s r② ♠t s t♦♣♦♦s♥ t♦ ♥ ♣r♦r st② ③rs ♥ s♠r ❬ ❪ ♣r♦♣♦s ②r ♣♣r♦ ♥ s♥ ♥♦ s rst ♣♣ t♦ t ♥♠♥t t♦ ♣②♦♥t tr strt♦♥s♦♥ t ♥♠♥t ♥ s r♥ ♦♥ t ♥♠♥t t ts ♥ stts ♥ t trstrt♦♥s t♠ss s ♣♣r♦ ♦s t♦ ♥ rtr ♥♠r ♦ sq♥s t♥ t♣r♦s ♦♥s t s s♦ ♣r♦② ss rt ♥ t tt♦♥ ♦ t r♣♦♥ts s tt♦♣♦♦② strt♦♥s r t r♦♠ s♠ rtrr② ♥♦s ♠② ♥♦t ♦rrs♣♦♥ t♦ ttr r♦♠♥t♦♥ strtr ♦ t ♥♠♥t

♥ r ♥ ♦♦rrs ❬❪ ♣r♦♣♦s ②s♥ ♠t♣♥♣♦♥t ♠♦ t♦ ttr♦♠♥t♦♥ ♥ rtr ♠♣r♦ t ② ♥ s♦♥ ♥♣♦♥t ♣r♦ss t♦ ♦♥t ♦r♥s ♥ t ssttt♦♥ ♣r♦ss ❬ ❪ s s♦♣stt ♠t♦ ♦r s♦ srs r♦♠ts ♦♠♣tt♦♥ rqr♠♥ts ♥ t ♦t ts ♠t♦ ♥ t ♦♥s ♦ s♠r ❲rt ♥♦♦rrs ♥ ♠♣♠♥t t♦ ♦♥② t sq♥s ♥ ♥ ♥♦t s tr ♥♠rs ♦ sq♥s

♦r t tt♦♥ ♦ r♦♠♥t♦♥ s♦ ♥♦t ♠t t♦ r♥t② r sq♥s❲♥ ♣r♦t♥♦♥ sq♥s r ♦♥ t♠ ♦ t ♥♦t sq♥ ♠② strt s♦ tt t ♦♠s ♠♥t♦r② t♦ rs♦rt t♦ ♠♥♦ sq♥s ♥ s ♦♥t♦♥s♥♦♥ ♦ t ♣r♦s② sr ♠t♦ ♥ s

♦st r♥t② P♦♥ ♥ ♦♦rrs ♦♣♣ ❬ ❪ s♦tr t♦ ttr♦♠♥t♦♥ t ♥② t②♣ ♦ ♣t s ♣r♦r♠ st♠ts t ♣②♦♥t trs t♥♠r ♦ r♦♠♥t♦♥ r♣♦♥ts ♥ tr ♣♦st♦♥s ♥ ♠①♠♠ ♦♦ r♠♦r ♦♦ s♦ t trs r♥t ♥♠rs ♦ r♣♦♥ts ♥ ♦r ♥♠r ss ♥t ♦rt♠ t♦st♠t t st r♣♦♥t ♣♦st♦♥s r♥ ts ♣r♦r ♣②♦♥t trs r st♠tt t ♦r♦♥♥ ♦rt♠ ❬❪ ♥ t st ♥♠r ♥ ♣♦st♦♥s ♦ r♣♦♥ts r♦s♥ ♦r♥ t♦ t rtr♦♥ s ♦♥sr ts ♥ ♥t② tr♦

Page 135: Early Evolution and Phylogeny

♣rs rttr ♥ r♥ ♦♥ str ♦ ♦♠♣trs♥ ts rt ♣rs♥t t♦ ♥ ♠t♦s t♦ ♥♦r t r♦♠♥t♦♥ strtr ♦ ♣r♦t♥

♦r ♥♦t ♥♠♥t tt ♥ s② ♥ ♥t② r♥ ♦♥ st♦♣ ♦♠♣tr rst ♠t♦ s s ♦♥ ①tr ♦ ♥ t s♦♥ s s ♦♥ ♣②♦♥t♥ r♦ ♦ P②♦ ❲ ♥ ② ♥tr♦♥ t ♠t♠ts ♥ ts♠♦s s♦rt② ①♣♥ ♦ ts r ♠♣♠♥t ♥ ♥② ♣r♦ t♦ tst t♠ ♦♥ ♦ts♠t ♥ r ♥♠♥ts ❲ sss t ♠rts ♥ ♠ts ♦ ♦r ♠t♦s ♥ ♣r♦♣♦s r♥♠♥ts

♦♠♣t♥ t ♦♦ ♦ ♥ r

❲ rst ①♣♥ ♦ ♦♥ ♦♠♣ts t ♦♦ ♦ ♣②♦♥t tr ❬❪ t ♥♦t ♦r♣r♦t♥ sq♥s s♥ t ♦♦♥ ①♠♣

r ①♠♣ r♦♦t tr ♦r ♦♦ ♦♠♣tt♦♥R

lA

,vAlU

,vU

lB

,vBlC

,vC

A

B

U

C

z

x

y

v

q

♦st ♦♠♠♦♥② sts r s♣♣♦s t♦ ♦ ♥♣♥♥t② ♦ ♦tr st ♦s ♥♦t♣♥ ♦♥ ts ♥♦rs stts t ♦♥② ♦♥ ts ♣st stt s ♦♥sq♥ t ♦♦ ♦ tr ♦r ♦ sq♥ s ♦t♥ ② ♠t♣②♥ t ♦♦s ♦t♥ t s♥ sts

♦♦ Ls,τ ♦ t tr τ ♥ ♥ ♦r s♥ st s s ♦♠♣t s ♦♦s

Ls,τ =∑

g∈Γ

1

|Γ|

[

x∈Ω

[

P (R = x)

×∑

z∈Ω

[

Pxz(lA, g, vA)Ls,low(RA)(A = z)]

×∑

y∈Ω

(

Pxy(lU , g, vU )

×∑

q∈Ω

[

Pyq(lB , g, vB)Ls,low(UB)(B = q)]

×∑

v∈Ω

[

Pyv(lC , g, vC)Ls,low(UC)(C = v)]

)

]]

r Pxy(lA, vA) s t ♣r♦t② ♦r s x t♦ ♥ ♥t♦ s y ♦♥ r♥ ♦ ♥tlA t rt t♦r② g r♦♠ t Γ strt♦♥ ♥ ♦tr ♦t♦♥r② ♣r♠trs vA P (R = x)s t ♣r♦t② t♦ s x t t r♦♦t R ♥ Ω s t st ♦ ♣♦ss stts ♦r ♥st♥Ω = A, T, C,G ♥ s ♦ ♥♠♥t Ls,low(RA)(A = z) s t ♦r ♦♥t♦♥♦♦ ♦ ♦sr♥ t t ♦♥str♠ r♦♠ r♥ RA ♦♥t♦♥② ♦♥ t ♥r②♥

Page 136: Early Evolution and Phylogeny

str ♥ ♦♥ ♥ s z t ♥♦ A ♦t tt ♦♠♣t♥ t ♦♦ ♦ st ♥ s♥ strt♦♥ ♦r t ♦t♦♥r② rts ♠♦♥ts t♦ r♥ t ♦♦s ♦ t st ♦t♥♥ s♥ ♦t♦♥r② rt ♥ tr♥

♦♠♣t♥ t ♦♦ t ①tr ♦ ♦♥ trs

s ♦r t ♦♦ ♦ ♠♦ r r♥t rts r ♦ ♦♥ ♥ ♦♠♣t t ♦♦♦ ♠♦ r ♦♥ ♦s r♥t trs ♦♥sq♥t② t♦ t t ♦♦ ♦ ♠♦ ♦s♣r♠trs ♦ ♥trst r t trs tt st sr t ♥♠♥t ♦♥ ♥ t t st tr ♦r t ♦♦s ♦t♥ t ♦♥ ♦ t trs tt r ♦♥srs s s♠♠♣ ♥ t ♦♦♥ ♦r♠ ♦r t ♦♦ ♦ s♥ st r T r♣rs♥tst st ♦ trs τ rr♥t② ♥ s ♥ |T | t ♥♠r ♦ trs ♥ T

Ls,T =∑

τ∈T

1

|T |Ls,τ

❲t s ♦r♠ ♦t rt tr♦♥t② ♥ t♦♣♦♦② tr♦♥t② r t♥ ♥t♦ ♦♥t rs♣t② ② t ♠♠ strt♦♥ ♥ t ①tr ♦ ♦♥ t♦♣♦♦s ♥ t♦♦ ♦ ①tr ♦ ♦r trs s ♥ ♦♠♣t ♥ ♠①♠③ t s ♣♦ss t♦♣rt ♣♦str♦r t ♠♦st ② tr ♦r ♥ st s ♦ s ♣♦sst② ♥ st♦ ♥♦r t r♦♠♥t♦♥ strtr ♥ ♥ ♥♠♥t

♦② ①♠♣ t s ♣♦ss t♦ ♦♣t♠③ t t♦♣♦♦s t ①tr ♦ ♦♥ trs

♥ stt♥ r sr ♦r |T | trs τ tt sr ♥ ♥♠♥t tr② t♦ ♥ t st ♦|T | trs ♦s ♦♦ s ♦♠♣t ♦ ♥ q s ♠①♠ ♦t tt s ♦♦ ♦rs t st T ts s ♥ ♥ ♣r♥♣ t♦ s♦♠t♥ r♥t t♥ s♠♣② s♥ ♠♥② t♠st ♠①♠♠ ♦♦ t♦♣♦♦② s ♥ s♥ ♥ ts t♦② ①♠♣ r |T | = 4 t 4 sts

♦♣♦♦s t ♦♦ t ♦♦ t ♦♦ t ♦♦♦♣♦♦② 10−2 10−4 10−4 10−4

♦♣♦♦② 10−4 10−2 10−3 10−4

♦♣♦♦② 10−4 10−4 10−2 10−4

♦♣♦♦② 10−4 10−4 10−4 10−2

♥ ts ①♠♣ t ♠♦st ② t♦♣♦♦② s Topology 2 t ♦♦♦ ♦ log(10−4 ×10−2 × 10−3 × 10−4) = −13 ♦r ♦♥ r t♦ s ①tr ♦ ♦♥ trs ♥ trs r ♦ ts s♦ ♥♦t s♠♣② rst ♥ t s♠ Topology 2 t♦♣♦♦② ♥ ♦♥ ♥t trs ♥ s ♦r st t r ♦r t ♦♦s ♦r t♦♣♦♦② s ♦♠♣t♦♥ t ♥♠♥t ♦♥ ♦t♥s t ♦♦♥ ♦♦♦

log(LT ) = log(10−2 + 3 × 10−4

10−2 + 3 × 10−4

10−2 + 2 × 10−4 + 10−3

10−2 + 3 × 10−4

4) ≈ −10.3

t s ts ♠♦r ② ♦♥ ts ①♠♣ t♦ s 4 r♥t trs rtr t♥ s♥ tr ♦r t ♥♠♥t ♥ ♦♠♦♥♦s ts ♠♦ ♦ rst ♥ t s♠ tr r♣t 4t♠s ♣♦ss② t r♥ ♥ts r♥ t♥ trs

s ①♠♣ s♦s tt ♥ s ♦ ♥ ♥♠♥t tr ② r♦♠♥t♦♥ ♥t st ♦ |T |trs ♥ ♦♣t♠③ t♦ st ♦♥t ♦r t sq♥ ♦t♦♥ t ①tr ♦ t s ♥♦t♥ssr② tt t tr t♦♣♦♦s r s♣ ♦r t sr ♦r t r♦♠♥t♦♥ r♣♦♥ts ♥rt♥

Page 137: Early Evolution and Phylogeny

P②♦♥t ♥ r♦ ♦ t♦ tt r♦♠♥

t♦♥

①tr ♦ sr ♦ s t♦ ♦♥t ♦r ♥ ♠♣♦rt♥t ♣r♦♣rt② ♦ t ♥♠♥tt s ①♣t tt t t♦♣♦♦② tt st srs ♥ st s ♣r♦t② ♦ ♣r♦♣r②sr♥ t ♥♦r♥ sts s tr s ♣♥♥② t♥ sts tt ♥ ♠♦tr♦ t s ♦ ♥ r♦ ♦ ♦s ♥ stts r t t♦♣♦♦s t♠sss ♠♦ tr♦r ♦♥s t♦ t ♠② ♦ P②♦s rt tr♦♥t② s t♥ ♥t♦♦♥t tr♦ ♠①tr ♠♦ ♦♥ rts tr♦ t ♦♠♠♦♥② s ♠♠ strt♦♥

♦♠♣t♥ t ♦♦ t t P②♦

♦♦ ♦ t P②♦ ♥ ♦♠♣t t t ♦rr ♦rt♠ s r② ①♣♥ ♥ t ♣②♦♥ts r♠♦r ② s♥st♥ ♥ r ❬❪ ❲ r♣② ♦ tr♦ts ♦rt♠ r

♦rt♠ strts r♦♠ ♦♥ ♥ ♦ t ♥♠♥t ♥ ♥ss t t ♦tr ♥ rtrr② strt ② t ♥♥♥ ♦ t ♥♠♥t t st 1 ♥ ♥ t st n ❲ s♣♣♦s tt♥ st ♦♦s ♥ r② ♦♠♣t ♦r t trs ❲ ♥♦t s L1,τ t♦♦ ♦t♥ t s♥st♥s ♣r♥♥ ♦rt♠ ♥♦ ♣♥♥② t♥ sts t st 1♦r t tr τ ♦♦ ♦ t ♥♠♥t ♣ t♦ st k t tr τ t t♦ st k s ♥♦t

L(k)τ tr♥st♦♥ ♣r♦t② ♦ ♦♥ r♦♠ tr τ t st k t♦ tr τ ′ t st k + 1 s rtt♥

Pτ,τ ′ ❲ ♥ s |T | t t♦t ♥♠r ♦ trs ♥ t st T t t rst st t ♦♦ ♦ t ♥♠♥t ♣ t♦ st 1 ♥ tt st 1 s tr τ s

s♠♣② t ♦♦ ♦ tr τ ♦r t st 1

L(1)τ = L1,τ

t t s♦♥ st t ♦♦ ♦ t ♥♠♥t ♣ t♦ st 2 ♥ tt st 2 s tr τ ′

L(2)τ ′ = L2,τ ′ ×

τ∈T

Pτ,τ ′L(1)τ

s ♦r♠ ssts rrs s♠

L(k+1)τ ′ = Lk+1,τ ′ ×

τ∈T

Pτ,τ ′L(k)τ

rst ♣rt ♦ t ♦r♠ ♦r t ♠t♣t♦♥ s②♠♦ s t ss ♦♦ ♦ tr ♦r st k ♥ ♦t♥ tr♦ s♥st♥s ♣r♥♥ ♦rt♠ ❬❪ s ♥ qt♦♥ ♣♥♥② t♥ sts s ♥tr♦ tr♦ t s♦♥ ♣rt ♦ t ♦r♠ t t♥ ♦ t ♥♠♥t t st n t t♦t ♦♦ ♦ t ♥♠♥t ♥ t st ♦ trs T s♦♠♣t s ♦♦s

LPhylo−HMM =∑

τ∈T

1

|T |× L(n)

τ

♥ ♦r ♠♦ t tr♥st♦♥ ♣r♦t② ♦ ♦♥ r♦♠ tr τ t st k t♦ tr τ ′ t st k + 1Pτ,τ ′ s ♥ s ♦♦s t t ♣ ♦ t t♦♦rrt♦♥ ♣r♠tr λ

Pτ,τ ′ = λδτ,τ ′ +1 − λ

|T |

r δτ,τ ′ s t r♦♥r t ♥t♦♥ s 1 ♥ (τ = τ ′) ♥ ♦trs s♠♥s tt t ♥② st tr s ♦♥st♥t ♣r♦t② λ tt t s♠ tr s ♣t ♦r t ♥①tst ♥ ♣r♦t② 1− λ tt ♥♦tr tr s r♥ ♦r t ♥①t st t t ♣♦sst② ttt s♠ tr s r♥ ♥

Page 138: Early Evolution and Phylogeny

♥ ♦♥ ♥ ♦♠♣t t ♦♦ ♦ t ♥♠♥t t t P②♦ ♣r♠trs♥ st♠t ♥ t ♠①♠♠ ♦♦ r♠♦r ♦r ♥ ②s♥ r♠♦r r♦r♥ ♦r ♣r♦r♠ ♦t t trs t♦♣♦♦s r♥ ♥ts ♣r♠trs ♦ t ♠♦s ♥ t♣r♠tr λ r st♠t ② ♦♣t♠③♥ t ♦♦ s ♦♠♣t ♥ qt♦♥ tr♦ ts♠ ♦rt♠ s P② ♦r ♦♠♠♦♥ ♣r♠trs ♥ tr♦ r♥ts ♥♠r ♦♣t♠③t♦♥♦rt♠ ❬❪ ♦r t t♦♦rrt♦♥ ♣r♠tr λ

①♣♦r♥ t ♣ ♦ r ♦♣♦♦s t ①tr ♦

♦♥ trs ♦r t P②♦♥t ♥ r♦ ♦

♣r♦♠ ♦ ♦♣t♠③♥ |T | trs s♠t♥♦s② s r♥t r♦♠ t ♣r♦♠ ♦ ♦♣t♠③♥ s♥ t♦♣♦♦② |T | t♠s t ♥② ♥ t♠ t♦♣♦♦② s t♦ ♦♣t♠③ t♥ ♥t♦ ♦♥t t

♦tr t♦♣♦♦s ♥ t♦♣♦♦② r ♦♣t♠③ ♥♣♥♥t② ♦ t ♦tr t♦♣♦♦st rst ♦ |T | ♥t trs ts ♦ ♥ q♥t t♦ s♦♥ t s♥ tr♦♣t♠③t♦♥ ♣r♦♠ |T | t♠s ♥ ♣r ♣r ♦rt♠ s ♦♥ srr♥t rttr s sr ♥ ♦s t♦ ♥♦ t ♣♥♥s t♥ t♦♣♦♦s

r rr♥t rttr t♦ ♥t② ♥ st ♦ t♦♣♦♦s tt st sr t♥♠♥t

Server

Client Client Client Client

All trees likelihoods

Updated likelihood

Tree 1 Tree 2 Tree 3 Tree 4

Tree 1Tree 2Tree 3Tree 4

Tree 4

srr ①♥s t t ♥ts ♦r st ♦ ♦♠♠♥t♦♥s t♥ t srr ♥ ♥t ♦♥ r rr♦ ♦rrs♣♦♥s t♦ t s♥♥ ② t srr t♦ t ♥t ♦ ♠tr① ♦♥t♥♥ t st ♦♦s ♦r t t♦♣♦♦s ♥ t ♦tr ♦♥ ♦rrs♣♦♥s t♦ t s♥♥ ② t♥t t♦ t srr ♦ ♥ ♦♣t♠③ ♦♦ t♦r

♥ ts ♦rt♠ ♥t s t t♦♣♦♦② tt t trs t♦ r♥ tr♦ ♦♠♠♦♥②s tr sr ♦rt♠s ♦r ♥ ♦♠♠♦♥ ♦rt♠s s s P② t ♥t ♦s♠♣② tr② t♦ ♠①♠③ t ♦♦ ♦ t t♦♣♦♦② r t ♥s t♦ ♠①♠③ t ♦♦ ♦t ♦r ♦ t ♣②♦ s ♦ ② ♦♥② ♠♦②♥ t t♦♣♦♦② t s ♥ t t♥ ♥t♦ ♦♥t t ♦tr t♦♣♦♦s ♦r ♥st♥ ♥ t ①tr ♦ t ♦♦♥t♦♥ ♥t trs t♦ ♠①♠③ ts s LTree mixture =

s

τ∈T1|T |Ls,τ ♠♣s

tt ♥t ♥s t♦rs ♦ st ♦♦s ♦t♥ r♦♠ t ♦tr ♥ts ♣♥♥②t♥ t♦♣♦♦s s ♦♥② t♥ ♥t♦ ♦♥t tr♦ sr ♠tr① ♦ ♦♦ t♦rs

♦rt♠ s ♥ s♠♠ ♣ ♥ t ♣s♦♦ ♦

Page 139: Early Evolution and Phylogeny

♦rt♠ r♥ ♦r t ♠♦st ② st ♦ trs

♦♦❴trs♦❳❯|T | srr ④

t ♥♠♥t ♥st❴♦❴trs ♥rt|T |♥rt |T | ♥tss♥❴ ♥♠♥ts♥ trs ♦♦❴♠tr① r❴❴♦♦❴t♦rs♦♦♠♣t❴♦♦♦♦❴♠tr①s♥❴♦♦❴♠tr①❳❯ ♦♦❴trs♦ ④

r♦♦❴t♦r♣t♦♦❴♠tr①♥♦♠♣t❴♦♦♦♦❴♠tr①♥ ♦♦♥s♥❴♦♦❴♠tr①

⑥s♥❴st♦♣❴s♥♦t♣t❴srr❴rsts

⑥s ♥t ④

r ♥♠♥tr tr♦♠♣t❴♦♦s♥♦♦❴t♦rr♦♦❴♠tr① ♥♦t st♦♣❴s♥④

♦♣t♠③tr ♦♦❴♠tr①s♥♦♦❴t♦r

⑥♦t♣t❴♥t❴rsts

t t ♥♥♥ ♦ t ♣r♦r♠ t ♥♠r ♦ t♦♣♦♦s t♦ ♦♥sr ♥s t♦ st sts ♦rt♠ s ♥♦t t♦ st♠t t ♣♣r♦♣rt ♥♠r ♦ trs |T | t♦ ♦♥sr t♦ srt st♦r② ♦ ♥ ♥♠♥t ♥ t ♣s♦♦ ♦ t s ♥ st t♦ 2 ♥ ♣rt stt♥ts ♣r♠tr s♦ r② ♣r♦♠ s ♥ sq♥ s♦ ♥♦t r♦r ♠♦r t♥t♦ tt r♥t ♦t♦♥r② st♦rs t s ♦r ♣♦ss t♦ s♣② ♠♦r t♥ t♦t♦♣♦♦s t♦ sr ♦r ♥ s♥ ♥♠♥t t t ♥♥ ♦ t ♦rt♠ t ♥t♦♥♥rt s t ♥♠♥t ♥ |T | q ♣rts ♥ s ❬❪ tr ♦r ♣rts rsts ♥ |T | trs s s strt♥ t♦♣♦♦s ♦r t ♦ t ♦rt♠ tr♥t②t sr ♥ s♦ ♣r♦ |T | strt♥ trs ♥t t♥ rs t ♥♠♥t ♥ tr ts ♥ r ♦ ♦♠♣ts t ♦♦ ♦ ts t♦♣♦♦② ♥ rtr♥s t♦r ♦ st ♦♦st♦ t srr srr ss♠s t♦rs ♥t♦ ♠tr① tt s s♥t t♦ ♥ts ♥t ssq♥t② ♠♦s t s♣ tr t s ♥ r ♦ ♥ ♦rr t♦ ♠①♠③ LTree mixture

♦r LPhylo−HMM Pr♦② t s♥s ♥ ♣t t♦r ♦ st ♦♦s t♦ t srr ♣ts t ♦♦ ♠tr① ♦♥t♥♥ ♦♦ t♦rs s ♣t ♠tr① s ss

Page 140: Early Evolution and Phylogeny

q♥t② s♥t t♦ ♥ts s♦ tt t② ♦♥t♥ ♦♣t♠③♥ tr t♦♣♦♦s ♥♦♥ t♠♦st r♥t ♥s ♥ ♦tr t♦♣♦♦s ♥ ♣rt ♦♠♠♥t♦♥s t♥ t srr ♥ t♥t r s②♥r♦♥♦s s♦ tt s♦②♦♠♣t♥ ♥ts ♦ ♥♦t s♦ ♦♥ t ♦tr ♥ts ♦rt P②♦ t t♦♦rrt♦♥ ♣r♠tr λ s s♦ ①♥ t♥ t srr ♥ t♥ts ♥ ♦♣t♠③ ② t srr r② t♥ t♠s t rs ♦♦ t♦r r♦♠ ♦♥ ♦ ts♥ts

s ♦rt♠ s ♥ ♠♣♠♥t t♦ ♥t♦♥ t ♦t t ♥ t t P②♦ r t t♦♦rrt♦♥ ♣r♠tr λ s ①♥ t♥ t srr ♥ ♥ts ♥♣r♦② ♦♣t♠③ ② t srr ♥ t P②t ♣r♦r♠ s ♦♥ P② ♦❬❪ s ♣r♦r♠ ♥ t ♥t ♦ ♠t♣r♦ss♦r ♦r ♠t♦r ♠♥ ② s♣t♥♥ts ♥ r ♦ trs t♦ r♥t ♣r♦ss♦rs t s ♥ ♦♠♣ ♥ tst ♦♥ ♥① ♠♥s♥ s ♦♥ rqst

s rst ♥t ♦t♣ts ♥ ♦♣t♠③ t♦♣♦♦② ♥ t srr ♦t♣ts t ♠tr①♦♥t♥♥ st ♦♦s ♦♠♣t t t♦♣♦♦② tr ♥ r♦♠♥t♦♥ ♥ts♥ t st♦r② ♦ t ♥♠♥t tr s♦ strts ♦ sts ♦s ♠♦st ② t♦♣♦♦② st s♠ r♦ s♠♥t♥ t ♠tr① ♦ st ♦♦s ♦♥ s♦ t♦ ♥♦r tsstrts ♦ sts t ♦♠♠♦♥ st♦r② P②♦ ♥ rt② ♦t♣t ♠♦st ②s♠♥tt♦♥ ♦♥ t ♦tr ♥ t ①tr ♦ ♦s ♥♦t ♣r♦ s s♠♥tt♦♥

♠♥t♥ t ♠tr① ♦ st ♦♦s ♦t♣t ② t ①

tr ♦

t♦s t♦ ♣rtt♦♥ ♥ ♥♠♥t

♦♠♠♦♥ ♣♣r♦s t♦ s♠♥tt♦♥ ♥♦ t s ♦ s♥ ♥♦s ♥ r♦ ♦s♦r ♦ t ①♠♠ Prt Prtt♦♥♥ ♦rt♠ PP ♦rt♠ ❬ ❪ ❲ ♦s♥♥♦t t♦ s s♥ ♥♦s s t ① s③ ♦ t s♥ ♥♦ ♦s ♥♦t ♦ t♦ ♣rs②♣♥♣♦♥t t r♦♠♥t♦♥ ♥ts ♦t t PP ♦rt♠ ♥ t ♣♣r♦ r② ♦♥ sttst ♣♣r♦ t♦ s♠♥t sq♥ ♥ st ♦ ♠♦s t② ♥r t ♠♦st ②♣rtt♦♥♥ ♦ t sq♥ ♥t♦ ts ♠♦s ♥ ♦r s t ♠♦s r t trs t♠ss♥ t sq♥ s t ♥♠♥t ♦r ♠♦ t st ♦♦s ♥ ♣r♦s②♦♠♣t ② t ♣rtt♦♥♥ ♦ t ♥♠♥t tr♦r s ♦♥ ♦r♥ t♦ ts st♦♦s r s r♥ t ♦♠♣tt♦♥ ♦r t s♠♥tt♦♥ ♦♦

♣♣r♦ ♣r♠ts t♦ rt② st♠t ♣rtt♦♥♥ ♣♥s ♣♦♥ ttr♥st♦♥ ♣r♦ts t♥ ♠♦s s tr♥st♦♥ ♣r♦ts ♥ st♠t t t♠❲ ♦rt♠ ♦r t② ♦♥str♥ t ♥t ♦ t strts ♦ sts tt sr ts♠ ♠♦ t♦ ♦♦ ♦♠tr strt♦♥ ♥ t s ♦ t tt♦♥ ♦ r♦♠♥t♦♥ ts♥ ♣r♦♠t s tr s ♥♦ rs♦♥ tt t ♥ts ♦ s♠♥ts sr♥ ♥qst♦r② s♦ ♦♦ s strt♦♥

PP ♦rt♠ ♦♥ t ♦tr ♥ ♦s ♥♦t rqr tt tr♥st♦♥ ♣r♦ts r st♥ ts ♦s ♥♦t ♦♥str♥ t s③s ♦ t s♠♥ts ♦r s ♦♥sq♥ t PP♦rt♠ ♦s ♥♦t ♣r♦ s♥ ♠♦st ② ♣rtt♦♥♥ t ♦t♣ts ♠♦st ② ♣rtt♦♥♥♥ t♦ s♠♥ts tr s♠♥ts ♦r s♠♥ts ♥ t ♥ t sr s t r♥ ♦ ♠♦st② ♣rtt♦♥♥s ♠♦♥ ♦ s t♦ ♠ ♦r♥ t♦ s♦♠ rtr♠

st♠t♥ t ♥♠r ♦ s♠♥ts t t PP ♦rt♠

s t ♥♠r ♦ s♠♥ts ♥rss t ♦♦ ♦ t s♠♥tt♦♥ ♥r② s♦ ♥rss♥♦t ♥ssr② s ♥ s♠♥t rs s♥♥t ♣r♦♣rt② ♦ t ♥♠♥t t s♦s ♥ s♠♥t ♠② ♣r♠t t♦ ttr t ♥♦♥s♥♥t tr♦♥t② ♥ ♣rtr♣rt ♦ t ♥♠♥t ♥ ♦tr ♦rs t ♠♣r♦♠♥t ♥ ♦♦ ♦sr ♥ t ♥♠r ♦s♠♥ts ♥rss s t♦ t tt♥ ♦ t ♥♦s② ♣rt ♦ t s♥ rtr t♥ t ♠♥♥♣rt

Page 141: Early Evolution and Phylogeny

♥♦♥s♥♥t ♥s ♥ ♦♦s ♥ s♦ s♥ ♥ ♥♠♥ts r sts ♥r♥♦♠② s♣♣ rs♥ t ♠♥♥ s♥ ♦ t r♦♠♥t♦♥ strtr t r ♥♦♥s♥♥t tr♦♥ts r ①♣t t♦ ♦♥ s♠♣② ② ♥ r♦r t ♦♠♣rs♦♥t♥ t tr ♥♠♥t ♥ r♥♦♠③ rs♦♥s ♦ t ♥♠♥t ♣r♠ts t♦ st♥s♠♣r♦♠♥ts ♥ t ♦♦ ♦ ♣rtt♦♥♥ t♦ t ♥♦r♥ ♦ ♦♠♦♥♦s s♠♥t♦♠♥ r♦♠ ♣st r♦♠♥t♦♥ ♥t r♦♠ ♥♦s ♠♣r♦♠♥ts ♥ t ♦♦ t♦ ttt♥ ♦ ♥♦♥s♥♥t tr♦♥ts

♦ t ♥ st♠t ♦ t ♥♠r ♦ s♠♥ts ♥ ♥ ♥♠♥t t ♦♦♥ ♣r♦t♦♦ s ts♣♣ ♦r ♥♠r i ♦ s♠♥ts ♥ [1;n] t n ♥ ♣r♦r ② t sr

• t ♦♦ ♦ t ♠♦st ② ♣rtt♦♥♥ ♥ i s♠♥ts s ♦♠♣t s♥ t PP♦rt♠ ♥ st♦r ♥ t L

• t ♠tr① ♦ st ♦♦s s r♥♦♠③ t♠s ② s♣♣♥ ♦♠♥s ♦ st ♦♦s s q♥t t♦ s♣♣♥ sts ♥ t ♥♠♥t ♥ ♦r ♦ ts r♣tst ♦♦ ♦ t ♠♦st ② ♣rtt♦♥♥ s ♦♠♣t s♥ t PP ♦rt♠ tr ♦ ts ♦♦ r♣ts s ♦♠♣t ♥ st♦r ♥ t l

• t L∗ = Lls ♦♠♣t ♥ s s ♥♦r♠③ ♦♦ ♦r t ♣rtt♦♥ ♥ i

s♠♥ts

♥ t ♥ ♥♦r♠③ ♦♦s ♥ ♦♠♣r t ♣rtt♦♥♥ t t st ♥♦r♠③ ♦♦ s ♦♥sr s t ♠♦st rs♦♥♥ ♣rtt♦♥♥

sts ♦ t ①tr ♦ ♥ t P②♦ ♦

s♠♥t ♥t ♦♦s ♦♠tr ♦ ♣r♠tr λ t t♦♦rrt♦♥ ♣r♠tr s ♠t ♥♦t ♣♣r♦♣rt t♦ ♠♦ t ♥t s♠♥ts ♥ ♥ ♥♠♥t r tr s ♥r♦♠♥t♦♥s PP ♣♣r♦ ♦s ♥♦t ♥tr♦ s ♦♥str♥t ♦♥ s♠♥t ♥t ♥♠② tr♦r ♣r♦ r♥t rsts r♦♠ t s♠♥tt♦♥ P②♦ ♣♣r♦♥ t PP ♣♣r♦ ♠② tr♦r ♦♠♣♠♥t ♦tr ♥ ts ttt ♦tr ♦s ♥♦t s ssts tt ♦t ♣♣r♦s s♦ s ♥ ♣r ♥ trrsts ♦♠♣r ♥ ts ♣r♣♦s s s♠t♦♥s

♠t♦♥ ♣r♦r

rst 100 trs r♦♠ t P② tst st ❬❪ r st s trs ♦♥t♥ 40 s rs♥ t♦ rs♠ r tsts ♥ s♦ tr♦r ♣r♦ ♥ ♣♣r♦♣rt tstst ♥♥♠♥t t ② r♦♠♥t♦♥ s ♥ ♥♠♥t ♦s ♣rt s st sr ② ♣rtrtr ♥ ♣rt ② ♥♦tr tr ♥ t ♠♦st t ♥st♥s t t♦ trs ♦rrs♣♦♥♥ t♦ tt♦ ♣rts ♦ t ♥♠♥t r ② s♥ ♦s ♣♦st♦♥ s ♥ ♦♥ ♣ ♥ t rst tr ♥♥♦tr ♣ ♥ t ♦tr tr ♦ ♦t♥ s ♣rs ♦ trs ♦ t trs s stt♦ tr Pr♥ ♥ rt ♦♣rt♦♥ P ♥ str s t r♦♠ t tr♥ tt ♥ ♥♦tr ♣♦st♦♥ s ② ♣rs ♦ trs s♣rt ② ♦♥ r♦♠♥t♦♥♥t t ♦♥s♦♥ ♥ ♦s st♥s r♥♥ r♦♠ ♥ t P rrt t ♣r♥str r② ♦s t♦ ts ♦r♥ ♣♦st♦♥ t♦ ♥ t ♣r♥ str s rrt r r♦♠ts ♦r♥ ♣♦st♦♥ ♥♠♥ts r♦r♥ r♦♠♥t♦♥ ♥t r s♠t ② ♦♥ ♣♦rt♦♥ ♦ ♥ ♥♠♥t ♦r♥ t♦ ♦♥ ♦ t 100 trs ♥ t rst ♦ t ♥♠♥t ♦r♥t♦ t s♠ tr ♠♦ ② t P ♦r ♣r ♦ trs 9 1000♥♦t ♥♠♥ts rs♠t t k sts ♦r♥ t♦ ♦♥ tr ♥ 1000 − k sts ♦r♥ t♦ t ♦tr tr tk t♥ t s 100, 200, 300, 400, 500, 600, 700, 800, 900 q♥ ❬❪ s s t♦ s♠tsq♥s t t ♠♦ ❬❪ ♥ ♦♥t♥♦s ♠♠ rts r♦ss st strt♦♥ t♣r♠tr ♣ st t♦ 0.8

Page 142: Early Evolution and Phylogeny

♦♥strt♦♥ ♦ t r♦♠♥t♦♥ strtr t t ①tr ♦

♥ t ♥ r♦ ♦

♦t t ①tr ♦ ♥ t ♥ r♦ ♦ r ♣♣ t♦ t s♠t tsts ♥♠r ♦ trs r st t♦ t♦ ♦r ♦t ①♠♣s s ♥♦♥ ♦ t ♣r♦r♠s s t♦ st♠tt rt ♥♠r ♦ trs t♦ ♦♥sr t♦ t② sr ♥ ♥♠♥t ♦t♦♥r② ♠♦s s ❨ ❬❪ t ♠♠ strt♦♥ srt③ ♥ ♦r sss t♦ ♦♥t ♦r r♦ss strt rt♦♥ r♦♥strt♦♥ ♠♦ tr♦r ♦s ♥♦t ①t② ♦rrs♣♦♥ t♦ t s♠t♦♥♠♦ s ♦ t s ♥ rst stt♥ r sq♥s ♦ ♦r♥ t♦ ♥♥♥♦♥ ♥ ♦♠♣① ♣r♦ss

t② t♦ tt t rt ♥♠r ♦ s♠♥ts

r♦♥strt♦♥ ♠♦s s♦ tt t♦ ♣rts ♥ t ♥♠♥t r s♦s tt ♦t♠♦s r♦r② rt tt s ♣♥♥t ♣♦♥ t ♣♦st♦♥ ♦ t r♣♦♥t t r♣♦♥t s t♦♦ ♦s t♦ t ♥♥ ♦r t ♥ ♦ t ♥♠♥t t r♦r② rt s ♦r t♥ t r♣♦♥t s ♠♦r ♥tr s s ② s ♥ts s s 100 ♦r 200 ♥♦t sts♦♥t♥ t♦♦ tt ♥♦r♠t♦♥ t♦ ♣r♦♣r② r♦♥strt tr t♦♣♦♦② s ♠② tr♦rr♣rs♥t t sttst ♠t ♦ ♦r ♠♦s ♥♥♦t tt r♦♠♥t♦♥ P②♦ s s♣r♦r t♦ t ♥ ss ♥ts tt ♥♦♥ tt t s ②♣r♦ tt ♥♦r sts t s♠ ♠♦st ② tr ♠♣r♦s t r♣♦♥t tt♦♥

r t② ♦ t ①tr ♦ t ♥ P②♦ rt t♦ tt t ♥♠r ♦s♠♥ts ♥ s♠t ♥♠♥ts

100 200 300 400 500 600 700 800 900Breakpoint position

Per

cent

of c

orre

ctly

pre

dict

ed n

umbe

r of

par

titio

ns0

2040

6080

100

100 200 300 400 500 600 700 800 900Breakpoint position

Per

cent

of c

orre

ctly

pre

dict

ed n

umbe

r of

par

titio

ns0

2040

6080

100

t② t♦ tt t r♣♦♥t ♣♦st♦♥

♦t t ♥ t P②♦ ♠♦st ♦t♥ tt t♦ s♠♥ts ♥ t ♥♠♥t ♥ sss r s♦s tt t ♣rs♦♥ t t r♣♦♥t s ♣rt s♣②s t s♠♣♥♥② ♣♦♥ t ♥t ♦ t s♠r s♠♥t s t t② ♦ t ♠♦s t♦ tt t♥♠r ♦ s♠♥ts ♣②♦ s♠s st② ttr t♥ t ♥ tt♥ t ♣rsr♣♦♥t ♣♦st♦♥ ♥ t s♠st ♣rtt♦♥ s ≥ 200 ss ♦♥ ♦r t P②♦s ss ♦♦ tt t ①tr ♦ ♥ t s♠st ♣rtt♦♥ s 100 ss ♦♥ s s ② ♠♥stt♦♥ ♦ t s ♥tr♦ ② t ♦♠tr strt♦♥ ♦ s♠♥t ♥t ♥ t

Page 143: Early Evolution and Phylogeny

r t② ♦ t ①tr ♦ t ♥ P②♦ rt t♦ tt t r♣♦♥t♣♦st♦♥ ♥ s♠t ♥♠♥ts

100 200 300 400 500 600 700 800 900

020

040

060

080

010

00

Breakpoint position

Rec

onst

ruct

ed b

reak

poin

t pos

ition

100 200 300 400 500 600 700 800 900

020

040

060

080

010

00

Breakpoint position

Rec

onst

ruct

ed b

reak

poin

t pos

ition

t② t♦ r♦r t tr t♦♣♦♦s

♥ r t P②♦ s ttr t r♦r♥ t trs s ♥ t s♠t♦♥ t♥ t ♥ ♦t ♠♦s ♥ t sr t♦ t ♦♦ trs t ♥♠♥t tt s ♥ s♠t ♦♥t♠ s ♦♥ ♦r t qt② ♦ t r♦♥strt trs ♥s ♥ ♦♣t♠♠ ♦r ♥♠♥ts ttr 600 t♦ 800 sts ♥♦t ♦♥r ❲♥ ♦♥ ♦ t t♦ t♦♣♦♦s ♦♥ ♥ t ♥♠♥t r♣rs♥ts♦♥② 100 sts ♦t t♦♣♦♦s t ♦♥ ♦♥ ♥ 100 sts ♥ t ♦♥ ♦♥ ♥ 900 sts r ss r♦♥strt

r t② ♦ t ①tr ♦ t ♥ P②♦ rt t♦ r♦r t♦♣♦♦s r♦♠s♠t ♥♠♥ts

100 200 300 400 500 600 700 800 900

010

2030

40

Number of sites

RF

dis

tanc

e be

twee

n tr

ue a

nd r

econ

stru

cted

topo

logi

es

100 200 300 400 500 600 700 800 900

010

2030

40

Number of sites

RF

dis

tanc

e be

twee

n tr

ue a

nd r

econ

stru

cted

topo

logi

es

♦♠♣tt♦♥ t♠s

♦♠♣tt♦♥s r r♥ ♦♥ t P ♦♠♣t♥ ♥tr ♦♥ ♦♠♣trs r♥♥ r♦♠ t♦ ③ t t♦♦ ♦♥ r ♠♥s ♦r t ①tr ♦ ♠♣♠♥tt♦♥ t♦ rst ♦♥ ts♠t♦♥s ♦♥② ♠♥s ♦r t ♣②♦ t♦♥ ♦♣t♠③t♦♥ ♦ t t♦♦rrt♦♥ ♣r♠tr s ♥♦t rst ♥ ♥ ♥rs ♦♠♣tt♦♥ t♠ t rs ♣r♣ss t ♥srs tt t st ♦ sts ♣♥ ♦r ♥ t♦♣♦♦② s ♠♦r st tr♦♦t t tr s♣ sr t♥ ♥ t s s ♦r ♦t ♠♦s r r② ♥t ♦♥tsts ♦♥t♥♥ sq♥s ♥ ♦♥ s♥ st♦♣ ♦♠♣trs

♦♥s♦♥s ♦♥ t s♠t♦♥s

r t P②♦ s ttr t♦ ♥♦r t r♦♠♥t♦♥ strtr ♦ s♠t ♥♠♥ts s♥ t ♠♦r ♦t♥ ♥s t rt ♥♠r ♦ s♠♥ts s ♠♦r rt t ♣♥♣♦♥t♥t r♦♠♥t♦♥ r♣♦♥t ♥ s♦ r♦rs trs ♦sr t♦ t tr trs s s ♣r♦② s t ts ♥t♦ ♦♥t t ♣♥♥② ♦ ♥♦r♥ sts ♦r ♦r ts♠st s♠♥ts 100 st ♦♥ t P②♦ ♣♣rs ss ♦♦ t♥ t t ♣rt♥t r♣♦♥t ♣♦st♦♥ ♣r♦② s s♥ t♦♦rrt♦♥ ♣r♠tr s s t♦ sr

Page 144: Early Evolution and Phylogeny

t ♥t ♦ ♦t s♠♥ts t ♦♥ tt s 100 ss ♦♥ ♥ t ♦♥ tt s 900 ss ♦♥♦♥ r♥t t♦♦rrt♦♥ ♣r♠trs ♦r r♥t ♥ stts r ♣②♦♥t trs♠t ♦rrt ts ♥ss ♦r t ♦ s♦ ♥rs t ♥♠r ♦ ♣r♠trs ♦ t♠♦ ♥st r♦♠♠♥ s♥ ♦t t ♥ t P②♦ t♦ ♥②s tsts st ♥ts ♦ ♦♥ ♦♠♣♥sts t rs ♦ t ♦tr

♣♣t♦♥ t♦ r ♣r♦t♥ sq♥s

r sts ♥ r♦♠♥t♦♥ ♥ts ♥ rss ♦r ♥st♥ ♥ ❱ rss ♥ ♦ t s♦r tt r♦♠♥t♦♥ ♥t ♥ ♠♣♥③ ♦st s t t ♦r♥ ♦ t❨ r♦♣ ❱ rs t ♥♥ ♦ t ♥♦♠ ♦ ❨ s ♠♦st ♦s② rt t♦r♦♣ rs t rst ♦ ts ♥♦♠ s ♠♦st ♦s② rt t♦ ♠♣♥③ rs ❱♣③❯② s ts ♦♥s♦♥ ♦♥ rst s♥ ♥♦ ♥②ss r r♥ t♥ ♣rs ♦sq♥s s ♦♠♣t ♥ s♦♥ t r♦♥strt♦♥ ♦ trs ♦r t♦ ♣♦rt♦♥s ♦ t ♥♠♥t♦♥ s ♦ ♣tt r♦♠♥t♦♥ r♣♦♥t ♥ ♥t ② ② ♦♦tsts ♦♥r♠ t r♦♠♥t♦♥ ♥t s♦♥ tt t rst ♣rt ♦ t ♥♠♥t rt ttr ♦t♥ ♦r t s♦♥ ♣rt ♥ rs

s st② tr♦r ♣r♦s ♦♦ tst ♦ t t② ♦ t ①tr ♦ ♥ t P②♦ t♦ tt r♦♠♥t♦♥ ♥ ♥tr ♦♥t♦♥s t♦ ♠♦s r r♥ ♦♥ t ♥♠♥tr♦♠ ♦ t stt♥ t ♥♠r ♦ trs t♦ t♦ ①tr ♦ ♣rt t♦ r♣♦♥ts♦♥ t ♣♦st♦♥ ♥ t ♦tr t ♣♦st♦♥ ♣②♦ ♣rt ♦♥② ♦♥ r♣♦♥tt ♣♦st♦♥ t♦ ♠♦s tr♦r r ♦♥ t ♣rs♥ ♦ r♣♦♥t r♦♥ ♣♦st♦♥ s r② ♦s t♦ t r♦♠♥t♦♥ r♣♦♥t tr♠♥ ② ② ♥ t ♦r♥♥②ss t ♣♦st♦♥ t♦♥ r♣♦♥t ♣rt ② t s ♠♦r ♥rt♥ st s ♥♦t tt ② t P②♦ t ♠t t♦ t r s♥stt② ♦ t ♥♦♥ ♦ t t♦ s♠♥ts s s♠ ♦r ♠t ♥♦♥s♥♥t ♥trst♥② ♦t t ♥ tP②♦ ♥♦r t st♥ ♣♦st♦♥ ♦ ❨ rst s ♦s t♦ r♦♣ sq♥s♥ t♥ ♦s t♦ ❱♣③❯ s r ♦r trs ♦♥ t t P②♦

Page 145: Early Evolution and Phylogeny

r rs ♦♥ ② t P②♦ ♦♥ ♦ t t trs ♦♥ ② t ①tr♠♦ r ♥r② ♥t

M M

OO

N

N

s ①♠♣ s♦s tt t P②♦ s s♦ ♥t ♦♥ r sq♥ tsts s♦ s ♣r♦r♠ ♦rs ♥ ♠♣r♦♠♥t ♦r t s♥♥♦ ♣♣r♦ t♥ ② ♦ t ♥ ♦♥ s t♦ ♦♦ ♦r r♦♠♥t♦♥ ♥t ♥ ♥② sq♥ sq♥s r t♦ ♥②st♦ ② t♦ ♦r t sq♥s ♣rs♥t ♥ t tr ♠♦♥ts t♦ ♦♦♥ t 16 ∗ 15/2 = 240♣♦ts ♦ r♥ ❲t ♣r♦r♠s s s ♦rs ♦♥② t♦ st♣s r rqr s ♦t ②♥ t ❬❪ rst sttst ♠sr t♦ tt t ♦r♥ ♦ r♦♠♥t♦♥ ♥s t♦ ♣♣ ♣♦st ♦r ♣r♦r♠s ♥ t♥ s t♦ ♣rs② ♣♥♣♦♥t t r♦♠♥t♦♥r♣♦♥t ♥ r♦♥strt ♣②♦♥t trs s ② t sq♥s r ♥②s t ♦♥♥ t sr ♥♣t s ♠♥♠ ♥t② sttst tsts s s ♠♣♠♥t ♥ ♦♥s ❬❪ ♥ ♣♣ t♦ ♦♥r♠ t ♦r♥ ♦ r♦♠♥t♦♥

♦♥s♦♥

♥ ts rt ①tr ♦ ♥ P②♦♥t ♥ r♦ ♦ t♦ tt r♦♠♥t♦♥r ♣rs♥t ♦t ♠t♦s r tst ♦♥ s②♥tt tsts s♦ tt t P②♦ s s♣r♦r t♦ t ①tr ♦ ♥ ♠♦st r♠st♥s ①♣t ♥ t r♦♠♥t♦♥♥t ♦♥② t s♠ ♣♦rt♦♥ ♦ t ♥♠♥t ♦t② ♦t ♠t♦s r ②♥t ♥②ss ♦ ♥ r② ♣s tst s♦ tt t ♠♦s ♦ sss②♥♦r r♦♠♥t♦♥ r♣♦♥ts ♥ t♦♣♦♦s tr ♠♣r♦♠♥ts ♠t ♥ sr♥♦r t ♣♣r♦♣rt ♥♠r ♦ t♦♣♦♦s t♦ s ♦r ♦♥str♥♥ t t♦♣♦♦s ♦♥ s ♦ r♣♦♥t t♦ r ② ♥♦ ♠♦r t♥ ♦♥ rrr♥♠♥t

Page 146: Early Evolution and Phylogeny

r♥s

❬❪ P♦s t♦♥ ♦ ♠t♦s ♦r tt♥ r♦♠♥t♦♥ r♦♠ s

q♥s ♠♣r t ♦ ♦ ♦

❬❪ r♥ P♣♣ r②♥t s♠♣ ♥ r♦st sttst tst ♦r

tt♥ t ♣rs♥ ♦ r♦♠♥t♦♥ ♥ts ❬❬tt♣①♦♦r♥ts❪❪

❬❪ r ❲rt Pr♥t ②s♥ ♠♦ ♦r tt♥ ♣st r♦♠

♥t♦♥ ♥ts ♥ ♠t♣ ♥♠♥ts ♦♠♣t ♦ ❬❬tt♣①♦♦r❪❪

❬❪ s♥st♥ r ♥ r♦ ♦ ♣♣r♦ t♦ rt♦♥ ♠♦♥

sts ♥ rt ♦ ♦t♦♥ ♦ ♦ ♦

❬❪ s♠r sr♠♥t♥ t♥ rt tr♦♥t② ♥ ♥trs♣

r♦♠♥t♦♥ ♥ sq♥ ♥♠♥ts t ♣②♦♥t t♦

r ♥ r♦ ♠♦s ♦♥♦r♠ts ♣♣ ❬❬tt♣①♦♦r♦♥♦r♠tst❪❪

❬❪ s♠r ❲rt ♥ tt♥ ♥trs♣ r♦♠♥t♦♥ t

♣r♥ ♣r♦st r♥ ♠sr ♦♥♦r♠ts ❬❬tt♣①♦♦r♦♥♦r♠tst❪❪

❬❪ ③rs s♠r rst ②s♥ ♠t♦ ♦r s♠♥t♥ s

q♥ ♥♠♥ts ♥ tt♥ ♥ ♦r r♦♠♥t♦♥ ♥ ♥ ♦♥rs♦♥

tt ♣♣ ♥t ♦ ♦ rt ❬❬tt♣①♦♦r❪❪

❬❪ r ❲ss ♦r♠♥ ♥s♠r r♦tr r rt t♦

②s t♦r tst ♦r r♦♠♥t♦♥ t ♥rt♥ rt ②st ♦ ❬❬tt♣①♦♦r❪❪

❬❪ ♥♥ ❱ ♦r♠♥ ♥ r ♠t♣ ♥♣♦♥t ♠♦ s

t♦ ♠♦r rt r♦♠♥t♦♥ tt♦♥ ♦♥♦r♠ts ❬❬tt♣①♦♦r♦♥♦r♠tst❪❪

❬❪ ♥♥ ❱ ♦r♠♥ ♥ r P②♦♥t ♠♣♣♥ ♦

r♦♠♥t♦♥ ♦ts♣♦ts ♥ ♠♥ ♠♠♥♦♥② rs s♣

t② s♠♦♦t ♥♣♦♥t ♣r♦sss ♥ts ❬❬tt♣①♦♦r♥ts❪❪

❬❪ P♦♥ P♦s r♥♦r ❲♦ r♦st ❲ ♥t

♦rt♠ ♦r r♦♠♥t♦♥ tt♦♥ ♦♥♦r♠ts ❬❬tt♣①♦♦r♦♥♦r♠tst❪❪

❬❪ P♦♥ P♦s r♥♦r ❲♦ r♦st ❲t♦♠t ♣②♦♥t

tt♦♥ ♦ r♦♠♥t♦♥ s♥ ♥t ♦rt♠ ♦ ♦ ♦ ❬❬tt♣①♦♦r♠♦♠s❪❪

❬❪ t♦ ♥♦r♦♥♥ ♠t♦ ♥ ♠t♦ ♦r r♦♥strt♥

♣②♦♥t trs ♦ ♦ ♦

❬❪ s♥st♥ ♦t♦♥r② trs r♦♠ sq♥s ♠①♠♠ ♦♦

♣♣r♦ ♦ ♦

❬❪ ♥♦♥ s s♠♣ st ♥ rt ♦rt♠ t♦ st♠t r

♣②♦♥s ② ♠①♠♠ ♦♦ ②st ♦

Page 147: Early Evolution and Phylogeny

❬❪ s ♥ ♠♣r♦ rs♦♥ ♦ t ♦rt♠ s ♦♥ s♠♣

♠♦ ♦ sq♥ t ♦ ♦ ♦

❬❪ ♥ ♠♥tt♦♥ ② ♠①♠ ♣rt ♣rtt♦♥♥ ♦r♥ t♦ ♦♠

♣♦st♦♥ ss ♥ ♦♠♣tt♦♥ ♦♦② ❱♦♠ ♦ t ② ♣r♥r❱r

❬❪ ♥ r♠♥t P②t♦♥ ♠♦s ♦r ♥②ss ♥

♣rtt♦♥♥ ♦ sq♥s ♦♥♦r♠ts ❬❬tt♣①♦♦r♦♥♦r♠tst❪❪

❬❪ ♠t rss② q♥ ♥ ♣♣t♦♥ ♦r t ♦♥t r♦ s♠t♦♥

♦ sq♥ ♦t♦♥ ♦♥ ♣②♦♥t trs ♦♠♣t ♣♣ ♦s

❬❪ ♥ Pr♣rt ♦♥ r♦ ♥ ♠t♦ ♦r t♥ ♦t♦♥r②

ssttt♦♥ rts ♦ ♦

❬❪ s s♥♦ ❨♥♦ t♥ ♦ t ♠♥♣ s♣tt♥ ② ♠♦r

♦ ♦ ♠t♦♦♥r ♦ ♦

❬❪ ♥ ❳ ♦ ♥ tt♥ r♦♠♥t♦♥ ♥ ♦♥ ♥♦t

sq♥s ♦♥♦r♠ts ❬❬tt♣①♦♦r❪❪

❬❪ ♠♦r ♥ ♣♣r♦①♠t② ♥s tst ♦ ♣②♦♥t tr st♦♥ ②st

♥♦♠♥ts

s ♦r s s♣♣♦rt ② ♥ t♦♥ r P ❴ ♥ ②t ♥tr t♦♥ r ♥tq

Page 148: Early Evolution and Phylogeny
Page 149: Early Evolution and Phylogeny

9♠t♥♦s ♥r♥ ♦ ♣s r

♥ ♦ ♥ rs

r ♥ trs r ♦r♠ s♦s ♦ t s♣s tr ♦♥ ♥ts t♦ ♥r s♣s tr t st ② t♦ ♦ s♦ s t♦ s ♠♦s ♦ ♥ ♠② ♦t♦♥ P♥t♥② r♥ç♦s ♦ss♥♦♥

Page 150: Early Evolution and Phylogeny

rt s r strt♦♥ tt ♥ trs ♥ r r♦♠ s♣s trs ♥ts rt ♥♦♥r♥s r t♦t t♦ t♦ ♥ tr♥sr t t♦t♥② ♣r♦♦ ♥ t ♦tr ♦♦ ♣♥♦♠♥♦♥s ♥ r♥r ♥ trs r♥tr♦♠ s♣s trs ♦r ♠♦r ♦♥ ts s rt

♥ t ♣rs♥t rt t ♠♦ t♦ tr② ♥ ♥r s♣s tr ♥ ♥trs ♠② r r♦♠ t s ♦ ♥ ♣t♦♥s ♥ ♥ ♦sss s ♠♦s ♠♣♠♥t ♥ ♣r♦r♠ tt ♥ r♥ ♦♥ sr ♦♠♣trs s♠t♥♦s②♥ tt ♦ s② ♠♦ t♦ ♦♣ t ♦tr ss ♦ ♥ trs♣str ♥♦♥r♥s

s rt s ♥♦t ♥ s♠tt ②t

Page 151: Early Evolution and Phylogeny

♠t♥♦s ♥r♥ ♦ ♥ trs ♥ s♣s

tr ♥ t ♣rs♥ ♦ ♣t♦♥s ♥ ♦sss

st♥ ♦ss r ♥♥r ♥♦♦ ♦② ❱♥♥t ♥

♣t♠r

♦rrs♣♦♥♥ t♦r❯♥rsté ②♦♥ ❯♥rsté ②♦♥ ❯ ♦rt♦r

♦♠étr t ♦♦ ♦t ♦r ♥♦♠r ❱r

♥♥ r♥

♠ ♦ss♦♠sr♥②♦♥r

strt

♣s trs r s② t s ♥ r ♦ t s♥ ♦ sr

♥s ♦r sr ♦♦ ♣r♦sss ♥ t ♥ ♠s t♦

t ①t♥t tt ♥ trs ♠② str♦♥② r r♦♠ t tr s♣s tr

♣t♦♥s ♥ ♦sss r t♦ s ♣r♦sss ♥ ♦rr t♦ r♦♥strt

s♣s tr r♦♠ ♥s ♣r♦♣♦s t♦ ♠♦ ♥ ♠② ♦t♦♥ ♥ t

♣rs♥ ♦ ♥ ♣t♦♥ ♥ ♦ss ♥ ♦♥sq♥t② s♣rt② ♥r

♥ trs ♥ s♣s tr ♥ ts ♠♦ r♥ ♦ t s♣s tr s

ss♦t t♦ ♣rtr ♣t♦♥ ♥ ♦ss ♣r♦ts ❲ ①♣♥ ♦

♦♥ ♥ ♦♠♣t t ♦♦ ♦ s♣s tr t s ♠♦ t

♦rt♠s ♥ s t t ♥ ♣rs♥t ♥tr ♣r rttr

t♦ s♣♣ t ♦♠♣tt♦♥s ♥ t♦♥ t♦ ♣t♦♥ ♥ ♦ss ts

r♠♦r ♦ s② ①t♥ t♦ s ♠♦s ♦ ♥ tr♥srs ♦r ♦

tr♥ss♣ ♣♦②♠♦r♣s♠

Page 152: Early Evolution and Phylogeny

♥tr♦t♦♥

❲♥ ♥rr♥ s♣t♦♥s tt ♦r ♠♦♥s ♦r ♦♥s ♦ ②rs ♦ ♦♥s t♠♣t t♦ s s ♠ t s ♣♦ss ♥ ♥ t rst ♠♦r s♣s tr t r♦♠ s♥ sq♥s ♦♥t♥♥ ♥♥t♥ ♠♥♦s ♦♦tt ♥ ♦♠ ts t♥♥② s ♥ tt ②♣r♦rsss ♥ sq♥♥ t♥qs ♥ ♦♠♣tr s♥ ♥t② t r♦♠♦ ♥♦♠ ♦r r s sq♥♥ ♣r♦ts ♣r♠tt t♦ ♥②s ♦③♥s ♦ ♥s ♦r ♦③♥s ♦ s♣s s♠t♥♦s② r t s t ♥♥ t ♦r t rst♥ trs r ♥♦t♣rt② rs♦ ♥ st ♦♥t♥ ② s♣♣♦rt ♣rtt♦♥s

s ♥rt♥ts ♠② rst r♦♠ ♦s② s♣ ♦♥ss ♥ts ♥♥ ♥ ♦s♥r ♠♦ ♠ss♣t♦♥ s♥st♥ ❲srt ♦r t ② ♦rt♦♦② s ♥ ♥ ♥ ♦t rr♥t② ♠t♦s t ♦♥t♥t♦♥ ♥ t s♣rtr ♣♣r♦s s t rqr tt s♣s ♦s st♦r② s t♦ r♦♥strt s r♣rs♥t ♥ ♥♠♥t ② ♥♦ ♠♦r t♥ ♦♥ ♥ s ♠♦♥ts t♦ tt♥tt ♥ ♣rs♥t ♥ s♥ ♦♣② ♥ ♥♦♠s ♥r ♦♥srt♦♥ s ♦♥

r♣rs♥tt ♦ t s♣s ♣②♦♥② ♥ s ♥♦t ♥r♦♥ srs ♦♣t♦♥s ♥ ♦sss tt ♠t ♦♠♣t ts st♦r② ♦r t s ♥♦t♥rq♥t tt ♥ ♠s ♥r♥t ♣t♦♥s s♦ tt ♥ ♠②st♦r② ♠② ♥♦t s♠♣② t ♠rr♦r ♠ ♦ t s♣t♦♥ st♦r② t♦ t♣♦♥t tt ♦♥ s♣s ♠② r♦r sr ♣r♦♦s ♦♣s ♦ t s♠ ♥♥ s ss ♦ ♦♦s ♣r♦② ♥♦t t♦♦ ♠♥② s♣s ♣♦ssss sr ♥sr♦♠ s♥ ♥ ♠② ♦r ♣②♦♥t r♦♥strt♦♥ t ♣②♦♥tst♦♦ss ♦♥ ♦♣② s ♦♥ ♣r♦r ♥♦ ♦ s♦♠ ♣②♦♥t rt♦♥s♣s ♦r s ♦♥ t ♦♥srt♦♥ ♦ s♠rt② s♦rs ♦r t♦tr srs ♠s ♦♥t♥ ♣r♦♦s ♥s ♥ t♦♥ st♣ ♦♥t♦♥s t♥②ss rss t ♠♦♥t ♦ t s♠tt t♦ ♣②♦♥t r♦♥strt♦♥♥ s ② ♣♥♥t ♣♦♥ st ♦s ♦ t rsrr

r ♣rs♥t rr ♣r♦st ♠♦ ♦ ♥ tr r♦♥t♦♥♥ sq♥ ♦t♦♥ t ♣r♦s r♦st ♥ ♦♠♣r♥s ♣♣r♦ t♦s♣s ♣②♦♥② t♦ ♥②s t♦s♥s ♦ ♥ ♠s ♣r♦s ♥♥ s♠t♥♦s② r♦♥strt ② rs♦ s♣s ♥ ♥ ♠② trst♦♥② t s♦ s♣♣s r♦♥t ♥ trs ♥ ♦rts t r♥s♦ t s♣s tr t ♦♥ts ♦ ♥ ♣t♦♥ ♥ ♥ ♦ss ♥ts s ♥♦t ♥ t♦ ♣r♦ ② ♦r♥ ♦rt♠ ♥ t♠ ♥♦ ①♠♣♦ ♣♣t♦♥ ♣r♦

rr ♠♦ ♦ ♥ ♠② ♦t♦♥

♥ r②♦ts ♥ ♠s ♦ ♠♥② tr♦ ♣t♦♥ ♦ss ♥ sq♥ r♥ ♣r♦st ♠♦♥ ♦ sq♥ r♥ s ♥t ♦t ♦ r ♦② ♦ ttrtr strt♥ t t ♠♦ ♦ s♥ ♥t♦r s ♥ ♦♥t♥♦s② ♠♣r♦♥ t t ♥s♦♥ ♦ rt② ♦ rts ♦ ♦t♦♥ ♠♦♥ sts ❨♥ s♥st♥♥ r ♥ ♠♦♥ r♥s ② ♥ t tr rt② ♦ ♠♦s ♦ ♦t♦♥ ♠♦♥ sts P ♥ rt♦t ♥ P♣♣ ♥ ♠♦♥ r♥s ❨♥ ♥ ♦rts

Page 153: Early Evolution and Phylogeny

tr ♥ ♦② ♦str ♦r♥r ♥ ttr② ♥qrt ♥ rt♦t ♦ss ♥ ♦② ♦r♥r ♥ ttr② ♥qrt ♥ rt♦t t♦ t ♦♥② r♥t ①♠♣s sttst ♠♦♥ ♦ ♣t♦♥ ♥ ♦ss s ♠♦r r♥t Prs♠♦♥② r♦♥strt♦♥s ♦ ♥ ♠② ♦t♦♥ r rst ♦♣♣ ♥ ♦♦♠♥t ♥ s♥ t♥ ♥ t ♦t ♦ sr rts tt♠♣t♥t♦ ♠♣r♦ t ♦rt♠s r♥ t ♦ t P ♥rst♦♥ ❩♠s ♥ ② ♥s ♥ ♥st♥ ❲t ♦r r♥t② sttst ♠♦s ♦ ♥ ♠② ♦t♦♥ ♥♦♣♣ rst t ♥ ♥ ♣t♦♥s ♥♥ ♦sss r ♠♦ ② rtt ♣r♦ss t rt ♥ t ♣r♦ts sr ② ♥s s ♦r ♠♦s ♦ sq♥ ♦t♦♥ tt r t♦ ♥r ♥ ssttt♦♥s ♥ ♦♥trr② t♦ ♠t♦s s ♦♥ ♣rs♠♦♥②t s ♦ rtt ♣r♦ss ♣r♠ts t♦ ♥r ♥ts ♦ ♥ ♣t♦♥s♥ ♥ ♦sss tt ♥♦t t ♥② tr ♦♥ t rst♥ t♦♣♦♦② s♦♦ rs♠ ♦r ♦s ♥♦t ♦♠ t♦t ♦st ♥ t s♠s t♦r ♣r♦r♠ ♠♣♠♥t♥ s ♠♦ t♦ ♥②s t♦s♥s ♦ ♥s s♠t♥♦s② ♦r ♦③♥s ♦ s♣s t♦ ♥r s♣s tr tsts ♦rr r② s ♠♦r t♥ ♦ ♥♦♠s r♦♠ r②♦ts ♥ sq♥ ♥ ♣s t♦ t ♦♦s t ♠♦ tt ♦ t ♥②ss ♦ s tsts t♦t ♦rt♥ t♦♦ ♠ ♦♥ t rs♠s s tr♦r ♥

s♠♣t♦♥ ♥r♥t t♦ t ♠♦s ♦ rst t ♥ s tt ♣t♦♥ ♥ ♦ss ♣r♦ts r ♦♥st♥t ♦r t ♦s♣s ♣②♦♥② ♦r r♥s ♥ s♣s ♣②♦♥② ♥♦t ♥r♦♥ t s♠ ♠♦♥t ♦ ♣t♦♥s ♥ ♦sss ♠♦♥ s ♥ts t s♥ ♣r♦t② ♦r ♣t♦♥s ♥ ♥♦tr ♦♥ ♦r ♦sss ♥♣♥♥t②♦ t ♣♦st♦♥ ♦ t ♥t ♥ t s♣s tr ♠② ♥♦t ♣♣r♦♣rt

❲ tr♦r ♦♦s t♦ ss♦t ♣rtr ♣r ♦ ♣t♦♥ ♥ ♦ssrts di, li t♦ r♥ i ♦ t s♣s tr ♦ ♦♠♣t t ♦♦ ♦ r♦♦t ♥ ♠② tr s r♦♥t♦♥ ♦rt♠ ♦ ❩♠s ♥ ② ♥ ♥♦s ♦ t ♥ tr r ♠♣♣ ♦♥t♦ ♥♦s ♦ ts♣s tr t s ♣r♥♣ s t♦ ♠♣ t ♥♦s ♦ t ♥ tr t♦ t♥♦s ♦ t s♣s tr ♦r♥ t♦ t ♦♦♥ ♣r♥♣ ♦r ♥♦ u L(u)♥♦ts t st ♦ s♣s tt ♥ tt s s♥♥t ♦ u ♥ ♦r st ♦ s♣s S lca(S) ♥♦ts t ♥♦ tt s t st ♦♠♠♦♥ ♥st♦r ♦ s♣s ♥ S ♥ ♥♦ u ♥ t ♥ tr s ♠♣♣ t♦ λ(u) = lca(L(u))♦r♦r ♥ ♥♦s ♥ t ♥ tr ♥♦s tt r ♥♦t s ♥ t♥ tr s ♦♥ ♦ tr t♦ s♥♥ts s ♥ ♦st r s♦ ♠♣♣t♦ ♥♦s ♦ t s♣s tr

♥ ❩♠s ♥ ② ts ♠♣♣♥ ♠s t ♣r♦♥ ♠♦st ♣rs♠♦♥♦s s♥r♦ ♦ ♣t♦♥s ♥ ♦sss ♥ t ♥ ♠② tt ①♣♥s tr♥ t♥ r♦♦t ♥ tr ♥ r♦♦t s♣s tr ♣t♦♥♥t s ss♦t t♦ ♥♦ u ♦ t ♥ tr t s s st ♦♥ v stt λ(u) = λ(v) ♥ ♦ss ♥t s ♥r r② t♠ v s ♦ u ♥ t♥ tr λ(u) s ♥♦t ♦ λ(v) ♥ t s♣s tr

r ♦ ♥♦t ♥r s♥r♦s t ♥trt ♦♥ s♥r♦s tt ♠②①♣♥ ts r♥ t♦ ♦♠♣t ♦♦ ♦ ♦r ①♠♣ ♥ ♦ss s♥r ② t ♦rt♠ ♦ ❩♠s ♥ ② ♦♥sr t ♣♦sst②

Page 154: Early Evolution and Phylogeny

r ♣♣♥ t♥ t s♣s tr ♥ ♥ tr ♣s tr tts ♥♦s ♥♠r r♦♠ 1 t t r♦♦t t♦ r ♥♠rs s ♥♦s r rtrr♦♠ t r♦♦t ♥ tr r♦♦t t ♥♦ s♦♥ t ♠♦st ♣rs♠♦♥♦s♣t♦♥♦ss s♥r♦ ♦♥ t tr ♥♦s r ♥♠r ♥ rr♥t♦ t s♣s tr ♠ ♥ tr r♦♦t ♥ r♥♦♠ ♣♦st♦♥ ♥♠r♥ rr♥ t♦ t s♣s tr ♠♦st ♣rs♠♦♥♦s r♦♦t♥ ♣♦st♦♥ s ♦♥ r♥ ♥ ② ♥② ①tr♠t② t♦ ♥♦ ♥♠r s 1

♦ ♣t♦♥ ♦♦ ② t♦ ♦sss ♦r ♥ ♠♦r ♥♣rs♠♦♥♦s s♥r♦ss ♥trt♦♥ ts ♦♥ts ♦r ♥ ♥ts st s ♠♦s ♦ sq♥♦t♦♥ ♦♥t ♦r ♥ ssttt♦♥ ♥ts

♦ ♥♦ u ♦ t s♣s tr t T (u) t s♦rst ♦ t ♥ tr s t r♣ ♥ ② ♥♦s ♠♣♣ t♦ u ② t ♥t♦♥ λ ♦r ♦♠♣♦♥♥t T ♦ T (u) rt① s ♥ ①♠♣r t s ♦r ♥ ♥tr♥ ♥♦♦ r ♦r t r♦♦t t s r r② ♦♠♣♦♥♥t T ♦rrs♣♦♥s t♦ ♥♠r ♦ ♣t♦♥s ♥ t ♠♦st ♣rs♠♦♥♦s s♦t♦♥ ♦ ❩♠s ♥ ② s q t♦ t ♥♠r ♦ ①♠♣rs ♠♥s ♦♥ ♥ ♣rtr Ts ♦♠♣♦s ♦ s♥ ♥♦ t ♦rrs♣♦♥s t♦ ♥ ♦t♦♥ t♦t ♣t♦♥♥t ♥ t r♥ i ♦ t s♣s tr ♥ t♦ u ♥♠r ♦ ①♠♣rs♦ ♦♠♣♦♥♥t T ♦ T (u) s t ♥♠r ♦ ♣r♦♦s ♥s ♦t♥ ②t ♣t♦♥ ♣r♦ss ♦♥ T s s♥ ♥♦

♦ t ♥t♦ ♦♥t t ♣♦sst② ♦ ♥♣rs♠♦♥♦s s♥r♦s s rtt ♣r♦ss r rt ♦rrs♣♦♥s t♦ ♣t♦♥ ♥ t ♦rrs♣♦♥s t♦ ♦ss ♦t ♣♦ss s♥r♦s r ②t t♥ ♥t♦ ♦♥t s t♦r r♥t t ♥♠r ♦ ♣r♦♦s ♥s tr ♣t♦♥ ♣r♦ss t♥ss② t ①♠♣rs ♦ ♦♠♣♦♥♥t ♦ T (u) ♥ s ♦♥t 0 ①♠♣r ♥ ♦♠♣♦♥♥t ♥t t ♣r♦t② tt ♠♦r ♣r♦♦s ♦♣s ♥♥rt t ♥♦ u ♥ ♦st tr ❲ ts ♣♣r♦①♠t♦♥ ♠② ♥♦t t♦♦ r♠ t♦ ♦r ♠♦

♦r♠s ♦r ♦♠♣t♥ t ♣r♦t② Pu(k) tt ♦♠♣♦♥♥t T ♦ T (u)s k ①♠♣rs ♦r tt ♦ss ♦ ♥ s ♥rr t ♥♦ u ts ♦rrs♣♦♥st♦ t s k = 0 ♥ ♦♥ ♥ ♦r♥ t ♦ s rtt

Page 155: Early Evolution and Phylogeny

♣r♦ss t♦ ♠♦ ♥srt♦♥s ♥ t♦♥s ♥ sq♥s

Pu(0) = lu × β

Pu(k) = (1 − du × β) × (1 − lu × β) × (du × β)k−1 t k ∈ [1; +∞[

r

β =1 − edu−lu

lu − du × edu−lu

❯s♥ ts r♥ ♣r♦ts ♥ ss♠♥ t ♦t♦♥ ♦ ♥ ♦♥ r♥ s ♥♣♥♥t ♦ ts ♦t♦♥ ♦♥ ♥♦tr r♥ ♦♥ ♥ ♦♠♣t ♦♦ ♦r r♦♥t♦♥ t ♣t♦♥♦ss ♦t♦♥ ♦ ♥ ♠②L(reconciliation) s ♦♦s qt♦♥

L(reconciliation) =∏

Pu(ku) t ku ∈ [0; +∞[

s ♣r♦t s ♦♠♣t ♦r ♥♦s u ♦ t s♣s tr ♥ ♦r ♥♦ u ♦r ♦♠♣♦♥♥ts ♦ T (u) ♥ ♦sss ♥ts ♥rr ② t ♦rt♠ ♦ ❩♠s ♥ ② ♥ ku s t ♥♠r ♦ ①♠♣rs ♥ ♦♠♣♦♥♥t T ♦ T (u) ♦r ♦r ♥ ♦ss

qt♦♥ ♦s t♦ ♦♠♣t t ♦♦ ♦ r♦♥t♦♥ ♦ ♦sss♥ ♣t♦♥s t ♦s ♥♦t t ♥t♦ ♦♥t sq♥ ♦t♦♥ ♠②

♦♦ ♦♠♥s t ♦♦ ♦ r♦♥t♦♥ ♥ t ♦♦s♥s s♥st♥ s♥st♥ ♦ ♥ tr s ♦t♥ s ♦♦s

L(family) = L(reconciliation) × LFelsenstein(gene tree)

L(family) ♣r♠ts t♦ ♦♠♣t t ♦♦ ♦ ♥ ♠② ♥ s♣str s ♦♦ ♥ t♥ ♠①♠③ t rs♣t t♦ t ♥ tr ♦rt rs♣t t♦ s♣s tr ♠♥② ♥ ♠s r t♦ ♥②s ♥♣r ss♠♥ tt t② ♦ ♥♣♥♥t② r♦♠ ♦tr ♦t ♥trs ♥ s♣s tr ♥ ♦♣t♠③ ② ♠①♠③♥ t ♦♦♥ ♦♦q

L(Species & gene trees) =∏

G∈All gene families

LG(family)

♦♥t sr rqrs s♣ ♦rt♠s

♦rt♠s t♦ s♠t♥♦s② ♥r s♣s ♥ ♥

trs

♥♥ t st r♦♥t♦♥ t♥ ♥ tr ♥

s♣s tr r♦♦t♥ t ♥ tr

♦r ♥ r♦♦t ♥r② ♥ tr ♥ ♥ r♦♦t ♥r② s♣s tr♥♥ t r♦♥t♦♥ ♠♣♣♥ λ s tr♦ ♥ ♦rt♠ ♥ t♦❩♠s ♥ ② ♦r ♥ trs r ♥♦t ♥tr② r♦♦t ♥sst② r ♥rr tr♦ ♠♦r ♦ ♠♦s ❩r♥ ♥ P♥ ♠r ♦r tr♦ ♥♦♥rrs ♠♦s ♦ ♦t♦♥ ❨♥ ♥

Page 156: Early Evolution and Phylogeny

♦rts tr ♥ ♦② s♥ t ❨♣ ♥ ♣ ♦ss ♥ ♦② ♥ ♥ ♥ s ss tr ♠② tts♥ ♦r t ♣♦st♦♥ ♦ t r♦♦t r ♣r♦r tr♦r r♦♦ts ♥ trs♥ srs ♦r t r♦♦t ♣♦st♦♥ ♦♥ t st s♣s tr ♦r ♥ r♦♦ts♣s tr t st r♦♦t ♣♦st♦♥ ♦♥ t ♥ tr ♥ ♦♥ ② ♦♠♣t♥ t r♦♥t♦♥ ♦♦ ♦r ♣♦ss r♦♦t ♣♦st♦♥s ♥ t♥ t♥t r♦♦t ♣♦st♦♥ tt ♠①♠③s t ♦♦ ❲t ts ♣r♦r t ♠♦st② r♦♦t ♠② r r♦♠ t ♠♦st ♣rs♠♦♥♦s r♦♦t ♦r ts ♣♣r♦s s♦ r② t♠♦♥s♠♥ s ♦r ♥ tr t n s ♥ s♣str t ns s 2n − 3 r♦♦t ♣♦st♦♥s ♥ t♦ tr ♥ t ♦rt♠ ♦❩♠s ♥ ② ♠♣♣♥ t♥ ♥♦s ♦ t ♥ tr ♥ ♥♦s♦ t s♣s tr ♠st ♦t♥ tr♦ ♥ tr trrs s ♦♠♣①t② ♦ O(n) ♥ ♠♦st ss ♦ t s♠♣st ♦rt♠ t♦ ♦♠♣t r♦♥t♦♥s ♦r t 2n − 3 ♣♦ss r♦♦t ♣♦st♦♥s ♦ ♠♣② 2n − 3 trtrrss t ♦♠♣①t② ♥ O(n2) ♥ s♦rs ♥ t♦ ♦♠♣t ♦r ♦ t 2n − 3 r♦♥t♦♥s ♦♠♣t♥ t s♦r ♦ r♦♥t♦♥ ssqt♦♥ ♥♦s O(ns) ♦♣rt♦♥s ♦♥sr tt ns ≈ n ♦t♥ t♦t ♦♠♣①t② ♦♥ t ♦rr ♦ O(n3) t♦ t ♦♦ s♦rs ♦r ♣♦ss r♦♦t♥s ♣r♠ts t♦ ♦♦s t ♠♦st ② ♦♥ ♦ ♠♥♠③ t♥♠r ♦ tr trrss ♥ t r ♦♥ ♦rrs trtrrs ♦rt♠ ♦r ♣r♠tt t♦ ♦♠♣t s♦rs ss♦tt t 2n − 3 r♦♥t♦♥s ♥ O(n2) ♥st s ♥♦tr ♣♣r♦♦♥ t♦ tr② r♦♦t ♣♦st♦♥s ② ♦♥sr♥ tt t ♠♦st ② ♦♥ s♦♥♦t t♦♦ st♥t r♦♠ t ♠♦st ♣rs♠♦♥♦s ♦♥ s t♠ ② ♥♦t ♦♠♣t♥ 2n− 3 r♦♦t s♦rs rt♦♥ s s ♦♥ t rst st♣ ♦ ❩♠s♥ ② ♦rt♠ ♠♣s s♣s tr ♥♦s ♦♥t♦ ♥ tr ♥♦s t s♣s tr s ♥ ♥♠r s♦ tt ♥♦s ♦s t♦ t r♦♦t s♠r ♥♠rs t♥ ♥♦s r r♦♠ t r♦♦t ♥ ♥ rtrr② r♦♦t♣♦st♦♥ ♦s♥ ♦♥ t ♥ tr t ♠♦st ♣rs♠♦♥♦s r♦♦t♥ ♦♥ r♥ ♥ t♦ ♥♦ ♦s ♥① s t s♠st ♦♥ t ♥ tr ♣r♦♦ ♥♦t s♦♥

r♦r ♦♥ ♠♣♣♥ s ♥ ♦♠♣t ♥ ♥ rtrr② r♦♦t♥t♦ t t ♠♦st ♣rs♠♦♥♦s r♦♦t♥ ♦♥② r♥s ♥ t♦ ♥♦s t ts♠st ♥① ♥ t♦ tr ♥ ♦r ♠♦ t ♠♦st ♣rs♠♦♥♦s r♦♦t♥♠② ♥♦t t ♠♦st ② ♦♥ ♦r ss♠ tt t ♠♦st ② r♦♦t♥ ♥♦t r② st♥t r♦♠ t ♠♦st ♣rs♠♦♥♦s ♦♥ ♦♥sq♥t② t♦ ♥t r♦♦t ♦ ♥ tr t ♦♦♥ ♣r♦r s ♣♣

♦♦s ♥ rtrr② r♦♦t ♥ ♦♠♣t t ♠♣♣♥

♦♠♣t r♦♥t♦♥ s♦rs ♦t♥ ♥ r♦♦t♥ ♦♥ r♥ ♥t♦ ♥♦ ♦s ♥① s ♥r♦r t♦ rt♥ threshold

♦♦s t ♠♦st ② r♦♦t♥

threshold s s sr♥ tr♦ ♣r♠tr t ♦r ♥ tr ts♠st ♥♦ ♥① s threshold = s + t

Page 157: Early Evolution and Phylogeny

♥♥ t st ♥ tr ♥ s♣s tr ♦♣t♠③♥

t ♥ ♠② ♦♦

♣r♥ st♦♥ ①♣♥ ♦ ♣rtr ♥ ♥ s♣s trs ♦ ♦♠♣r t♦ ♦♠♣t t ♠♦st ② r♦♥t♦♥ s♦r ② ♦♦♥ ♦r t ♥tr r♦♦t ♥ trs ♦ ♥♦♥ ♣r♦r ts r♦♥t♦♥ s♦r ♦ ♥♦ t♦ sr ♦r t s♣s tr ② ♠①♠③♥ t ♣r♦t ♦ r♦♥t♦♥ s♦rs ♦r ♠s ♦r ♥ trs r ♥♦t t t ♥ ♦♥② st♠t s ♦♥ sq♥ ♥♠♥t ♦r ♥st♥ tr♦ ♠①♠③t♦♥ ♦s♥st♥ ♦♦ s♣s tr ♥ t♥ ♦t♥ ♦r♥t♦ qt♦♥ ♥ rqrs ♦♠♣t♥ st ♠② ♦♦s t♥ ♥♥ s♣s trs r ♦t r♦♥t♦♥ ♥ s♥st♥ ♦♦sr t♥ ♥t♦ ♦♥t ♦ sr ♦r t st ♠② ♦♦ t♥ ♥ tr ♥ s♣s tr ♦♥② t ♥ tr s ♠♦ s ♠♦t♦♥♥ ♦t♥ tr♦ ♦♠♠♦♥② s tr sr rsts ♦r t s ♦r♣t② s s♠♣ rst ♦r ♥tr♥ ♥♦♥ ♥s strt② s ♦♦s

♦r r♥ ♦ t rr♥t ♥ tr t♦♣♦♦② ♠♦st ② r♦♥t♦♥ s♦r s ♦♠♣t ♦r t t♦ ♣♦ss s

r♦♥t♦♥ s♦r s ttr t♥ t st rr♥t r♦♥t♦♥ s♦rt r♦♥t♦♥ s♦r s ♦♠♣t

r♦♥t♦♥ s♦r s ttr t♥ t rr♥t r♦♥t♦♥ s♦rt s ♣t ♥ t ♦rt♠ rs♠s t t ♥ ♥ trt♦♣♦♦②

t♣ 1 ♥ ts ♦rt♠ ♠♣t② ss♠s tt t strt♥ ♥ tr s t ♠♦st② ♦r♥ t♦ s♥st♥ ♦♦ s t ♦♥② ♦♠♣ts t r♦♥t♦♥ s♦r ♠♦♥ts t♦ ♦♥sr♥ tt ♦♥② t r♦♥t♦♥ s♦r♥ ♥rs ♥ ♣rt strt♥ trs ♥ ♦t♥ ② P② ♥♦♥♥ s ♦r ♥st♥

♥♥ t st s♣s tr ♥ sr ♥ ♠②

trs ♦♣t♠③♥ t ♦♦

♦♦ ♥ ♥ q ♥ s t♦ ♦♠♣t t ♦♦ ♦ s♣str ♥ ♥ trs ♥ sq♥ ♥♠♥ts s t ♣r♥ st♦♥s s♦♥ ♦ ♦♥ ♦ ♦♠♣t t ♦♦ ♦ ♥ ♠② t r♠♥st♦ ①♣♥ t♦ ♠①♠③ ts ♦♦ s ♦ t♦ ①♣♦r t s♣s trt♦♣♦♦② s s ①♣♦r t s♣ ♦ ♣r♠trs r♥ t r♥s♣♣r♦ts ♦ ♥ ♦ss ♥ ♥ ♣t♦♥

♥♥ t ♠♦st ② s♣s tr t♦♣♦♦②

♦ ♥ t ♠♦st ② s♣s tr t♦♣♦♦② ss ♦rt♠s ♥ st t s♠♣t♦♥ tt t s♣s tr ♦s ♥♦t r♥ ♥ts ♥t t② tt t ♥s t♦ r♦♦t ♦s♥ ♦rt♠ s s♦♦s

♦r str ♥♦ ♥ t s♣s tr

Page 158: Early Evolution and Phylogeny

• ♣r♥ t str

• rrt t str ♥ ♣♦ss ♣♦st♦♥s ♥ t s♣s tr ♥♦♠♣t t ss♦t ♦♦

• s tr Pr♥♥ ♥ rt♥ P ♥rss t ♦♦ ♣ t

♥ r♥ ♦ t rr♥t s♣s tr t♦♣♦♦② r♦♦t t s♣s trs s♦♦♥ s r♦♦t♥ ♠♣r♦s t ♦♦ ♦ t s♣s tr ♦♣t t

♦r r♥ ♦ t rr♥t s♣s tr t♦♣♦♦② ♣r♦r♠ s ss♦♦♥ s ♠♣r♦s t ♦♦ ♦ t s♣s tr ♦♣t t

trt ♣♦♥ts 1 t♦ 3 ♥t ♥♦ ♠♣r♦♠♥t s ♦sr ♦r r ♥♠r♦ st♣s

tt♥ t r♥s ♣t♦♥ ♥ ♦ss ♣r♦ts

♥ ♣r t♦ t s♣s tr t♦♣♦♦② s ♦r t r♥s rts ♦ ♥♣t♦♥ ♥ ♦ss ♥ t♦ ♦♥ s rsr ♥ s t t ttr♥s ♣t♦♥ ♥ ♦ss rts r ♥♣♥♥t ♦ ♦tr ♥♥t ♠♦st ② di ♥ li ♦r r♥ i tr♦r ♦♥② rqrs ♦♥sr♥ t♦♥ts ♦ ♥ts ♦♥ ts r♥ i ♦ ts ♥ ♦♥ ♦ s ♥♠r ♦♣t♠③t♦♥ t♥qs ♦r s ♥ ♥②t s♦t♦♥ ♦r t s ♦ r♣t② ♦s♥ ♣♣r♦①♠t ♥②t s♦t♦♥ ♦r♠② ♦♥ s♦ ♦♥sr ♦♥ts♦ t♠s r tr r k ♥s t t ♥ ♦ r♥ i t k ∈ [0;∞[ ♥st ♦♥② s t ♥♠rs ♦ t♠s 0 1 ♥ 2 ♥s ♥ ♦♥ tt ♥ ♦ r♥ i ❯s♥ ts ♦♥ts ♦♥② ♣♣r♦①♠t ♠①♠♠ ♦♦s ♦ di ♥ li ♥ ♦♠♣t s ♦♦s

di = −ln( y+2z

x+y+z)k(x + y + z)

zx − yz − z2 + yx

li = −ln( y+2z

x+y+z)x(y + 2z)

zx − yz − z2 + yx

r x, y, z r t ♥♠rs ♦ t♠s 0, 1, 2 ♥s r ♦♥ t t ♥♦ r♥ i rs♣t②

♦rt♠ s ♦

♦ sr ♦r t ♠♦st ② s♣s tr ♥ s♠t♥♦s② ♦r t ♠♦st ②♥ ♠② r♦♥t♦♥s ♥r ♦r ♠♦ r② ♦♥ srr♥t rttr s♠t③ ♥

Page 159: Early Evolution and Phylogeny

r rr♥t rttr ♦ t ♣r♦r♠ srr s ♥ r♦ t s♣s tr sr s s ♦ t ♣t♦♥ ♥ ♦ss rts t ♦♠♠♥ts t ♥ts ♦♥ ♥ r ♦ ♦♥ ♦r ♠♦r ♥ ♠s ♦r t② r♦♥st♦♥s ♥ ♦♠♣t ♠② ♦♦s s♥ sq♥♥♠♥ts

Page 160: Early Evolution and Phylogeny

♦♠♣t ♦rt♠ ♣r♠tt♥ t♦ st♠t t s♣s tr s♠t♥♦s② t ♥ trs s s♠♠r③ ♥ t ♦♦♥ ♣s♦♦

♦rt♠ ♣t♠③♥ t s♣s tr s s ♥ trs

♦♦❴trs♦|T | srr ④

t ♥t ♣s tr ♦r r♥♦♠ ♦♥ ♥ st♦r t ♥t♦ currentS♥ ♥t♦ oldS

t t st ♦ ♥ ♠s t♦ ♥②srt n ♥tss♥ ♥t t st ♦ ♥ ♠s t② r ♥ r ♦s♥ ♥t currentS ♥ rts ♦ ♣t♦♥ ♥ ♦ssrr♥t ❨ trt♦♥s❴t♦t❴♠♣r♦♠♥t ♠t ④

r ♠② ♦♦s ♥ ♦♥ts ♦ ♥ ♣t♦♥s ♥ ♦sssr♦♠ t ♥ts

♦♠♣t t♦t ♦♦ ♥ ♦r currentS ♥ > rr♥t

t♥ ④oldS = currentS trt♦♥s❴t♦t❴♠♣r♦♠♥t ⑥

s ④currentS = oldS trt♦♥s❴t♦t❴♠♣r♦♠♥t ⑥

♥ t t♦♣♦♦② ♦ currentS ♥ ♣t rts ♦ ♣t♦♥ ♥♦ss

s♥ ♥t currentS ♥ rts ♦ ♣t♦♥ ♥ ♦ss⑥

⑥s ♥t ④

r st ♦ ♥ ♠sr currentS ♥ rts ♦ ♣t♦♥ ♥ ♦ssr ♥♠♥ts ♥ P② ♣r♦♠♣t trs ♦r ♥ ♠② trt♦♥s❴t♦t❴♠♣r♦♠♥t ♠t ④

♦♠♣t ♠② ♦♦ss♥ t♦ t srr ♠② ♦♦s ♥ ♥♠rs ♦ ♥ ♣t♦♥s

♥ ♦sss ♣r s♣s tr r♥r currentS ♥ rts ♦ ♣t♦♥ ♥ ♦ss

⑥⑥

♦♠♠♥t♦♥s t♥ t srr ♥ ♥ts rr♥ t trt♦♥s❴t♦t❴♠♣r♦♠♥t r r ♥♦t s♦♥

ts ♦ ♠♣♠♥tt♦♥

s t ♦rt♠ strts r♦♠ r♥♦♠ s♣s tr r♦♥♥ ♥ ♠②trs t t ♥ t r② ♦♥ t♠ ♦ s ♦♠♣tt♦♥ t♠ r♥ trst trt♦♥s ♦♥② r♦♥t♦♥ ♦♦s r ♦♠♣t ♥ ♠②

Page 161: Early Evolution and Phylogeny

trs r ♥♦t ♠♦ r♥ ts st♣s r♥s ♣t♦♥ ♥ ♦ss♣r♦ts r t s♠ ♦r t r♥s ♦ t s♣s tr ♥ ♦rr ♥♦tt♦ tr♣♣ ♥ ♦ ♠①♠♠ ♥ t s♣s tr s ♥ ♦♣t♠③ s♦♥ ♦♣t♠③t♦♥ ♣s s strt r ♥ trs r ♠♦ ♥ s♦ ♣t♦♥ ♥ ♦ss rts r tr② r♥s ♣r♦r♠ s ♥ ♠♣♠♥t t t ♣ ♦ t ♦ t t ♥ ♦♦st rrs♦♦st ♥ ♥ r♥ ♦♥ strs ♦ ♦♠♣trs s♥ t ss Pss♥♥tr P

♦♥s♦♥

❲ t ♣r♦r♠ tt ♥ r♥ ♦♥ sr ♦♠♣trs ♥ ♣r t♦ r♦♥strt s♣s tr s♠t♥♦s② t ♥ trs ♥ ♠①♠♠ ♦♦r♠♦r r ♠♦t♦♥s ♦r ♠♣r♦♠♥ts ♦ s② ♠♣♠♥trst ♥ trs r ♦♥sr t♦ r♥ ♥ts ♥♣♥♥t ♦ tr♥ ♥ts ♦ ♦tr ♥s s s r② ♥rst ♥st ♦♥ ♦s ♥ ♣♣r♦ s♠r t♦ s♠ss♥ ♥ s ② ss♦t♥ r♥♥ts t♦ t s♣s tr ♥ ♥ ♦♥ s♥ ♣r♠tr ♣r ♥ ♠②s ♦ s♦ ♦r t ② sr ♣♦sst② t♦ ♣r♦ t s♣str♦♥ ttr ♠♦s ♦ sq♥ ♦t♦♥ ♦ s ♦t② ♥♦♥♦♠♦♥♦s ♠♦s tt t♦rt tsts s♦♥ tr♦♥ts ♥ ♦♠♣♦st♦♥ ♦ s ❲t s ♠♦s r♥t ♠♦s ♦ ♦t♦♥ ♦ ss♦t t♦ r♥t r♥s ♦ t tr ts ♦♥t♥ ♦r ♥♦♠ss s t♦♥ ♦ ♥♦tr s♥ ♦r r♦♦t♥ t s♣s tr ♥♦t♦r♦s② t ♣r♦♠ s ♥♦♥♦♠♦♥♦s ♠♦s r ♥♦♥rrsr ♦tr ♥ts t♥ ♥ ♣t♦♥s ♦ ♠♦ s s ♥tr♥sr ♥ tr♥ss♣ ♣♦②♠♦r♣s♠s ❲ t ♥ts♦ ♦♥② t t r♦♥t♦♥ ♦♦ ♥ ♦ ts s② t♦ ♥trt♦rt ♣♥♥s t♥ ♥ trs ♦ ♥s ♠② sr st♦r② s t② r♠♥ ♦s t♦ ♦tr ♦♥ r♦♠♦s♦♠tr♦♦t tr st♦r② ♥ s ♥ r♦ ♦s s ♥ ♦♦tt ♠② s ♦r s t② ♥trt s ♣rt ♦ tr ♥t♦♥ ♥ s t ♠♦r ♥r r♠♦r ♦ ♥t♦tr ♠♣s ♥é t ♠② r♥t r♥ ♦r ♦rrt♦♥s t♥ ♥s ♠② rqr rr♥s ♥ t ♣r♦r♠s ♦r s ♦r ♦rt♠ strtr rs ♦♥ t ttt ♥s ♥ ♦♥sr s ♥♣♥♥t ♦ ♦trt ♦r ♣r♦r♠ ♦ s② ♠♦ t♦ ♦♠♣t ②s♥ ♣♦str♦r♣r♦ts ♥st ♦ ♦♦s ♥ t♦ s♠♣ r♦♠ ♣♦str♦r strt♦♥str♦ r♦ ♥ ♦♥t r♦ strt♦♥s ♥st ♦ ♣r♦♥ ♠①♠♠ ♦♦ st♠ts❲ tt ♣♣r♦s s s ts r ♦♥ t♦ r② s ♥ t trst♥s♥ t s♣s tr r♦♠ ♥ trs ♠② ♦st② t s ♥ssr②st♣ t♦rs ♣r♦♣r st♠t♦♥ ♦ t tr ♦

Page 162: Early Evolution and Phylogeny

r♥s

♥é é rt rt ♠ ♠t t② ♥ ♦s ♥t♦♥s ②s♥ st♠t♦♥ ♦ ♦♥♦r♥ ♠♦♥ ♥ trs ♦ ♦

rst rs r♥ ♥♥r♦tt rr♥ ♥s ♥ ♥♥♥t ②s♥ ♥s♣s tr r♦♥t♦♥ ♥ ♦rt♦♦② ♥②ss s♥ ♦♥♦r♠ts ♣♣

♥s ♥ ♥st♥ r ♠t♣ ♥ ♣t♦♥♣r♦♠ rst ♦♥♦r♠ts

♥qrt ♠ ♥ rt♦t ♦s ②s♥ ♦♠♣♦♥ st♦st ♣r♦ss ♦r ♠♦♥ ♥♦♥stt♦♥r② ♥ ♥♦♥♦♠♦♥♦s sq♥ ♦t♦♥ ♦ ♦ ♦

♥qrt ♠ ♥ rt♦t ♦s t ♥ ♠tr♦♥♦s ♦ ♦ ♠♥♦ ♣♠♥t ♦ ♦ ♦ ♥

♦♦st ♦♦st rrs tt♣♦♦st♦r

♦ss st♥ ♥ ♦② ♥♦♦ ♥t ♦♦ ♦♠♣tt♦♥st ♥♦♥rrs ♠♦s ♦ ♦t♦♥ ②st ♦

♥ r♥ ♥ r♦t♦♥ ❯ ♣r♦r♠ ♦rt♥ ♥ ♣t♦♥s ♥ ♦♣t♠③♥ ♥ ♠② trs ♦♠♣t ♦

r r♥s ♦rs ♦s ♦♥ r♥ rst♥ r②rst♦♣r ♥ r♥ ♥ ♦r Pr ♦r t♦♠t r♦♥strt♦♥ ♦ ② rs♦ tr ♦ ♥

♥♥ ♠s ♥ ♦s♥r ♦ s♦r♥ ♦ s♣s trst tr ♠♦st ② ♥ trs P♦ ♥t

s réér r♥♠♥♥ ♥♥r ♥ P♣♣ ré P②♦♥♦♠s ♥ t r♦♥strt♦♥ ♦ t tr ♦ t ♥t

s réér r♥♠♥♥ ♥♥r ♦rr♦t ♥ ♥ P♣♣ ré ♥ts ♥ ♥♦t ♣♦♦rts r t ♦sst ♥ rts ♦rtrts tr

♦♦tt ♥ ♦♠ ❯ ❱ PP ❱❯ ❱❯❨ P tr ♣r

♥s② ♦♦ ♦ ♦ ♥ ♠② ♦t♦♥ Ptss ❯♥rst② ♦ ❲s♥t♦♥

♥♥ s② ❲ ♥♦ ♥rs ts P♥ ♥ r♦♥❲♠ ♠t t♣♥ r ♥ ♦s r ❲ st tts♦♠ r♦r② ør♥s♥ rt♥ ❱ ♦ t♥ ♠ts ♥rs s ♦ rst♥s♥ ♥rt ør ❲r

Page 163: Early Evolution and Phylogeny

❲r rt♥ r ♥ rt ♦♥③♦ r♦ ♣②♦♥♦♠ s♠♣♥ ♠♣r♦s rs♦t♦♥ ♦ t ♥♠ tr ♦ tr

t ♥ r ②♥ ③♥ r é♠♥ ②♥ ♥③ ❱♥♥t tr ♦s ♥ r ♦ st ♦ rrs ♦r sq♥ ♥②ss ♣②♦♥ts ♠♦r ♦t♦♥ ♥ ♣♦♣t♦♥ ♥ts ♦♥♦r♠ts

s♥st♥ ♦t♦♥r② trs r♦♠ sq♥s ♠①♠♠♦♦ ♣♣r♦ ♦ ♦

s♥st♥ ♥ r ♥ r♦ ♦ ♣♣r♦t♦ rt♦♥ ♠♦♥ sts ♥ rt ♦ ♦t♦♥ ♦ ♦ ♦

s♥st♥ ♦ ss ♥ ♣rs♠♦♥② ♦r ♦♠♣tt② ♠t♦s ♣♦st② ♠s♥ ②st ❩♦♦

♦str Ptr ♦♥ ♦♠♣♦st♦♥ tr♦♥t② ②st ♦

tr ①♠♠♦♦ ♣②♦♥t ♥②ss ♥r ♦r♦♥ ♠♦ ♦ ♦ ♦

tr ♥ ♦② ♥rr♥ ♣ttr♥ ♥ ♣r♦ss ♠①♠♠♦♦ ♠♣♠♥tt♦♥ ♦ ♥♦♥♦♠♦♥♦s ♠♦ ♦ sq♥ ♦t♦♥ ♦r ♣②♦♥t ♥②ss ♦ ♦ ♦

♦♦♠♥ ③s♥ ♦♦r ♦♠r♦rrr ♥ ts tt♥ t ♥ ♥ ♥t♦ ts s♣s ♥ ♣rs♠♦♥② strt② strt ② ♦r♠s ♦♥strt r♦♠ ♦♥ sq♥s ②st♠t

❩♦♦♦②

♦r♥r ❱ ♥ ttr② ♥ t ♦rrt♦♥ t♥ ♦♠♣♦st♦♥ ♥ sts♣ ♦t♦♥r② rt ♠♣t♦♥s ♦r ♣②♦♥t ♥r♥ ♦ ♦ ♦

♦r♥r ❱ ♥ ttr② rs ♠♣ t♦ ♦r②s♥ P②♦♥t ♥r♥ t ♦♥♦♠♦♥♦s sttt♦♥ ♦♦ ♦ ♦

♦ ♥ ♥ ♠t ♦♥strt♦♥ ♦ ♥♥t♠♦r ♣②♦♥② ♦ P②♦♥t ♦

♥♦♥ té♣♥ ♥ s r s♠♣ st ♥ rt♦rt♠ t♦ st♠t r ♣②♦♥s ② ♠①♠♠ ♦♦ ②st ♦

♦♦t sr rst♥s♥ ♥ ♦♠s ♥ r♣ ♥♦♠ rt♦♥s♣s ♥ s♣t♦♥ t♠s ♦ ♠♥ ♠♣♥③ ♥♦r ♥rr r♦♠ ♦s♥t ♥ r♦ ♠♦ P♦ ♥t

s♥ ♦♥ P ♦ ♦♥t♥ P ♥ ♥ ♠② ♥rr♥t r♦♦t ♦ ♣②♦♥t tr ②st ♦

Page 164: Early Evolution and Phylogeny

s ♥t♦r ♦t♦♥ ♦ ♣r♦t♥ ♠♦s Ps ♦

♥r♦ ♠♠♥ Pr♦t♥ t♦s♠ ♦ ♠ Prss ❨♦r

♠r r ♦r ♦s ♦r s ♦ ♦t♦♥ t

♥t

rt♦t ♦s ♥ P♣♣ ré ②s♥ ♠①tr ♠♦ ♦rr♦ssst tr♦♥ts ♥ t ♠♥♦ r♣♠♥t ♣r♦ss ♦ ♦

♦♦s ♦♥st♥t♥♦s r♦♠ts ♦♥st♥t♥♦s r♥rs tr♦s ♥②r♣s ♦s ♥♦♠s ♥ ♥ ts ♥ stts ♦ ♥♦♠ ♥ ♠t♥♦♠ ♣r♦ts ♥ tr ss♦t ♠tt s s ts ss

r♥ ♥ ♥ ♠t ♦♦② ♦♥sst♥t ♠♦♦r ♦♠♣r♥ ♠♦r ♣②♦♥s ♦♠♣t ♦

P ♥ rst♦♥ r♦♠ ♥ t♦ ♦r♥s♠ ♣②♦♥②r♦♥ trs ♥ t ♥ trs♣s tr ♣r♦♠ ♦ P②♦♥t ♦

P r ♥ ♥r ♣②♦♥t ♠①tr ♠♦ ♦rtt♥ ♣ttr♥tr♦♥t② ♥ ♥ sq♥ ♦r rtrstt t ②st♦

s♠ss♥ tt ♥ s ♥♦s rt ♥tr r♦♥strt♦♥ ② r♥♥ ♥ ♥ s♣ss♣ ssttt♦♥ rts r♦ss ♠t♣ ♦♠♣t ♥♦♠s ♥♦♠ s

♦r♥ s♥♦ ♥ s♥st♥ ♥ ♦t♦♥r② ♠♦♦r ♠①♠♠ ♦♦ ♥♠♥t ♦ sq♥s ♦ ♦

② ♥ t ♦♥ t ♦r♦♥ ②♣♦tss ♦ ♥♦tssttt♦♥ t ♦s

❲ ♥ré ♥s r ♦r♦♥ ♥ ♥st♥ r ♣r ♣r♦r♠ ♦r rs ♣②♦♥t ♥②ss s♥ ♥tr ♣rs♠♦♥② ♦♥♦r♠ts

❲sr ❲ ♦♥♥♦♥ ♥ ❲♦s ♥♦♦sr♠s ♣②♠ ♥ t t ♦ r ♦♠♣♦st♦♥ ♦♥ ♣②♦♥t tr♦♥strt♦♥ ②st ♣♣ r♦♦

❲ rst♥ ❩♦ ②♥ ♥♥♥ ♥ ♦r♦r ♥s ♣r♦t② ♥ r♦♠♦s♦♠ ①t♥t ♦ tr♥ss♣ ♣♦②♠♦r♣s♠ ♥ts

❨♥ ❩ ①♠♠ ♦♦ ♣②♦♥t st♠t♦♥ r♦♠ sq♥s t r rts ♦r sts ♣♣r♦①♠t ♠t♦s ♦ ♦

Page 165: Early Evolution and Phylogeny

❨♥ ❩ ♥ ♦rts ♥ t ❯s ♦ q♥s t♦ ♥rr♥♥s ♥ t r ♦ ♦ ♦ ♦

❨♣ ❱♦♥ ♥ ♥ ♣ rr② ♦♦t♥ ♣②♦♥t tr t ♥♦♥rrs ssttt♦♥ ♠♦s ♦ ♦

❩♠s ♥ ② s♠♣ ♦rt♠ t♦ ♥r ♥ ♣t♦♥ ♥ s♣t♦♥ ♥ts ♦♥ ♥ tr ♦♥♦r♠ts

❩r♥ ♠ ♥ P♥ ♥s ♦r③♦♥s ♥ ♦♠str② ♠ Prss ❨♦r Ps

Page 166: Early Evolution and Phylogeny
Page 167: Early Evolution and Phylogeny

10Pr♦♠s ♥ Prs♣ts ♦r t

♦t♦♥r② t② ♦ ♥♦♠s

② tss s rss sss rt t♦ ♣②♦♥t r♦♥strt♦♥ s s sss rt t♦ t r♦♥strt♦♥ ♦ t ♠♥sts ♦ ♦t♦♥ ♦ts r ①♣t t♦ ♠ ♠ ♣r♦rss ♥ t ♥①t ②rs ♥ ts st rt tt♠♣ts t♦ ♦rs t ts ♥s ② ♥②s♥ r♥t trtr

s rt s ♥♦t ♥ s♠tt ②t

Page 168: Early Evolution and Phylogeny

♥♦♠s s ♦♠♥ts ♦ ♦t♦♥r② st♦r②

st♥ ♦ss ❱♥♥t ♥

t♦r

❯♥rsté ②♦♥ ♥rsté ②♦♥ ❯ ♦rt♦r ♦♠étr t ♦♦

♦t ♦r ♥♦♠r ❱r♥♥ r♥

♦♥t♥ts

♥tr♦t♦♥

♦rt♦♦♦s ♥ ♠② ♠②t

P②♦♥ts

P♦♣t♦♥ ♥ s♣s st♦rs

♣t♦♥s ♥ ♦sss

rtt tr

♦♠♣tt♦♥ ♥s

♥♠♥t ②r

♦♠♥t♦♥ ♥ ♦♠♦♦②

♥♦♠ ♥♠♥t

P♥♦♥♦♠s

♦♦♥♦♠s

♦♥♦♠s

♦♥s♦♥s ♥ ♣rs♣ts

Page 169: Early Evolution and Phylogeny

strt

s ♣r♠r② s♠♥ts ❩r♥ t P♥ ♥♦♠s ♦♥ st ♥trt r♦r ♦ trrrrs s♥t ♥ ♦t♦♥ ♦ ♥r ts ♥♦r♠t♦♥ P②♦♥ts ♠st ss♠t s♣ts♦ ♥♦♠ ♦t♦♥ ♥ rtr♥ ♠ ♦♦② ♥ r♥ ♠r♥ ♦ P②♦♥♦♠s♥r♥ts ts ♣♦r ♦ ♥♦♠s ♥ ♦t♦♥ t♦ ♠t② t ♦tr ♣♣r♦s♥trt♥ ♣♦♣t♦♥ ♥ts ♥♦♠ ♦t♦♥ ♦r♣② ♦♦② ♥♦r ♦♦② ♥t♦ ♣②♦♥t♠♦s r ♥♦ ♠r♥ ♥ r② ♥t♣t t tr ♦ ts s♣♥ ♥ ts rt rr♥t ♥s ♥ sss ♣♦ss ♦♣♠♥ts t♦rs ♦♠♣r♥s r♦♥strt♦♥ ♦ t st♦r②♦

♥tr♦t♦♥

♠♥② r♥t s t tr ♦ s s♠♣ ♦♥♣t ❲♥ rs r♥ ♥ ♦♥ t♠♥tt ② s ♣rrs♦r ♦t♦♥sts ♦rt ♣♦♥ t ♠t♣♦r ♦ tr s t ts ♥

r♦♥ r♥s t rst ♦ t rt ♥ ♦rs t sr t ts r r♥♥ ♥ t r♠

t♦♥s r♥ s ♣r♦② ♣tr♥ rtr ♣♥ rt♥ ♦s♦♥② ♠trt♥♣ttr♥ ♦ rt♦♥s♣s ♠♦♥ s♣s ♦ ♦t tt ♥ ♠r t t t♥♥② ♦ ♥ s②st♠st♦ ♣r♦ ①♣t♦♥s t♦ ♠♦st rs ♦ ♥ t♦ rtr ♥t♣t t ①st♥ ♦ ♦♠♣①②♥ ♣ts s s ♠rs♠ ♥ ♥♦s②♠♦ss ♦r t t② ♦ t ts ♦ r♦♥strt♥ ② rs♦ ♥ t st♦r② ♦ ♦ r② ♦rs♥ ♦♥② s ♦ rt ♦ srst② t ♥tr♦♥ r♦ts ♦ ♦t♦♥ ♥ t ♦♠♣①t② ♦ ♣②♦♥t② ♦♣rt♦♥ rtrsst ♠♣r tt♠♣ts ♦ ♦t♦♥sts t♦ ♣r♦ ♦♠♣r♥s ♣tr ♦ ♦♦ ♦t♦♥ ♦♠t♠st♦ t ♣♦♥t ♦ rs♥t♦♥ ♦♦tt ♦s t rr♦ ♥ ♥ t rr♥t ♦ ♥♦♠sq♥s t s ♦♠ ♠♦r ♥ ♠♦r t t♦ rtrt r♥ tt ♠♦r t r ♥

♦♥ ♦r ❩r♥ ♥ P♥ ♣r♦♣♦s tt ♥s ♦ s s ♦♠♥ts ♦ ♦t♦♥r②st♦r② ❩r♥ t P♥ trt♥t ♥ ♦③♥s② trt♥t t ♦③♥s② r② t t s t♦ ♦r ♥♦ t rst ♥♦♠ ♣②♦♥② tr ♦ r♦s♦♣ ♣s♦♦sr

str♥s s ♦♥ t ♣rs♠♦♥♦s r♦♥strt♦♥ ♦ r r♦♠♦s♦♠ ♥rs♦♥ s♥r♦s ♥ t♥♠♥② r♥t ♣♣r♦s ♥ s t♦ ①♣♦t t ♥♦r♠t♦♥ ♦♥t♥ ♥ ♥♦♠s ♥ s♣s trs s ♥ t s t ♦r r♥t rs ♦② tt ♥rs♦ ♦♠♣t ♥♦♠ sq♥s t rsts ♦ ts ♣②♦♥♦♠ ♣♣r♦s r ♠① sr ②rs♦ rs ♦ ♥ ♣s tt ♦♥t t ♦tr r♦♥ t ttst③③t r t ♥ tr ♥tr♣rtt♦♥ s ♥ ♦♥tr♦rs ♣tst t ♥ t rt♥ ♦r ts ♣♣r♥t ♥ ♦ s♣ ② ♣♥ ♠t♦♦♦♦s r♥t ttrtr s ♦rs♥ t rst tt♠♣ts t ♦♣♥ ♥r♥ ♦ s♣s ♥ ♥♣②♦♥s t ♠♦s ♦ ♣♦♣t♦♥ ♥ts ♦r ♥♦♠ ♦t♦♥ s ♣♣r♦s ♣♦♥r t tr②♥trt s♥ tt ♣②♦♥♦♠s ♦t t♦ s♥ tt ♦♠♥s r♦s s ♦ ♦♠♣①t②♥ r♥t s ♦ ♦t♦♥r② ♦♦② ♥ ts ♣♣r ♣r♦♣♦s t♦ r s♣② ts r♥trtr♦s ❲ s♦ sss ♥ tr② t♦ ♥t♣t t rtr ♦♣♠♥ts tt r ♥ t♦r♦♥strt ♦♠♣r♥s st♦r② ♦

♦rt♦♦♦s ♥ ♠② ♠②t

❲② ♣②♦♥♦♠s ♠t♦s ♥ ♥sss t ♣r♦♥ ♥ ♥s♣t tr ♦ t♦♥♥t ♠♦r rtrs ♣♣r ♦ ♦♥sr ♦♠♣①t② ♦r ♣②♦♥ts ♦♥ ② ❲trt t t tr♠ ♦rt♦♦ s♥ts ♥s tt r rt tr♦ s♣t♦♥ ♥ts s♦♣♣♦s t♦ ♣r♦s r t rst ♦ ♣t♦♥s r♦r t t ♥t♥t ♦ r♦♥strt♥ ♣②♦♥② ♦ s♣s t ♥tr♣rtt♦♥ ♦ trs s ♦♥ ♦rt♦♦♦s ♥s s♦ str♦rr tt ♦♠♥ t ♦ ♥ ♣r♦s tr ♥ tr♥sr tr♥ss♣ ♣♦②♠♦r♣s♠ P♥ ♣②♦♥t rtts ♦① ♠s t ♣r♦ss ♦ r♦♥strt♥ t st♦r② ♦ s♣s t

♦♠♠♦♥ ♦♣♥♦♥ ♥ t ♦ ♣②♦♥♦♠s s tt t r s♦ ♥♥t tt ♦♥ ♥ ♥r♦s②t ♥t♦ t♠ t♦ t ♥ tst ♥ ♦♥ ♦ t rs♦♥s ② rr♥t ♣♣r♦s ♥ ②t♥sss t ♣r♦♥ ♦♠♣r♥s ♦ t ♦t♦♥ ♦ s ♣rs② s t② r s②

Page 170: Early Evolution and Phylogeny

♥♦t ♦♠♣r♥s ❲♥ r♦♥strt♥ t tr ♦ tr♦ t ♦♠♥t♦♥ ♦ ♥ ♠s ♦♥ s②♦ss t♦s ♥s ♥ r♣rs♥tts ♥ ♠♦st s♣s ♥r st② ♥ s♦♥ ♥♦ ♦♦s ♥ ♦r♦♠♣①②♥ ♥ts ♥ tr st♦rs s s ♣t♦♥ ♦r ♦ ♥r ♣ ♣②♦♥t rt♦♥s♣s♦♥② ♥ ♦ ♥s r tr♦r s t t ♦♣ tt t ♣②♦♥t s♥ ♦r t s♣s tr ♣r ♦r ♦♠♥♥ t ♥ t ♣rs♥ ♦ ♣r♦② ♦r P ♥ ♣♦st② ♠s♥r♦♥ t ♥♥ t ♦s♥r r②♥ t♦ rst r♠♦ ♦♥t ♠♦♥ ♦♠♥ tstsrtr rss t ♥♠r ♦ ♥s ♥r st② ♠♥ t rst♥ tr ♣♣r s ♥ ♥♦t ♣tr♦ t st♦r② ♦

t t ♦tr ①tr♠♠ ①st ♥ r♣rt♦rs ♥ s ♦r ♣②♦♥t ♥r♥ t tt ♣r ♦ ♥♦r♥ t ♣②♦♥t ♥♦r♠t♦♥ rr ② sq♥s t♦ ♦s ♦♥ t ♣rs♥ ♥ s♥♦ ♥s ♥ ♥♦♠s ♦r ♥♦t ♦♥sr♥ ♥ st♦rs s♠s ♥s♦♥ ♥ ♥♦ s♦ tt ♦tt ♣r♦ts ♦ ♥ tr♥srs ♥ ♦sss ♥ t ♠♦s ♦ ♦t♦♥ ♣♣ ♦r s r♦♥strt♦♥sr s② ♦r② s♠♣ t ♥s ♥ t s♠ ♣r♦t② ♦ ♥ qr ♦r ♦st ♦♥ t ♥trtr t♦ ♦tr rr ♥♦♠ ♥s ♦s t ♦♥ ♥ s ♦r ♣②♦♥t♥r♥ ♦ ♣ ♣②♦♥s ♠♦st ♦ t♠ r ①♣t t♦ st s s♥st t♦ ♥ ♣r♦② ♥tr ♥ tr♥sr

♠♣t ♦ ♠♦st ♣r♦sss sr ♥ ♦① s ♦♥② ①♣t t♦ ♥rs t ♠♦r t ♦ t ♥ ♦ ♥ ♦♠ t♦ ♠t tt ♥ s♣t ♦ t ♦ ♥♦♠ sq♥s tr s ♥♦ ♥ ♥r ♣rt tst ♦ ♦ tr ♥ tr♥sr ♥♦♠♣t ♥ s♦rt♥ ♥ ♦r ♣♣r♥t ♣r♦s♦♥r♥t ♥ ♦sss s②st♠t② s ♦r rt ♦t♦♥r② rts t ♥sr s ♣r♦②①♣♦t t ♦t♦♥r② s♥♥ ♦ ts ♥ts ♣r♦♣r ♠♦♥ ♥ ♥♦t ♦♥② ♠♣r♦ ♣②♦♥tr♦♥strt♦♥ t s♦ r♥ rtr ♥st ♥t♦ t ♦t♦♥ ♦ tr ♥ tr♥sr ♠② ♦r ♥st♥♣r♦ str♦♥ s♣♣♦rt ♦r t ♠♦♥♦♣②② ♦ s♦♠ r♦♣s ♦ s♣s ♥ ♥ ♥♦r♠t♦♥ ♦t trt t♠♥ ♦ rst♦♥ ♥ ♦♦ ♥ts ♣t♦♥s ♥ ♦sss ♥ t ♠rs♦ ♥♦♠ ②♥♠s ♥ s t♦ ttr ♥rst♥ t rt♦♥s♣s t♥ ♥♦♠s strtr ♥s♣s rst♦♥ ♦r ♦♦② ♥♦♠♣t ♥ s♦rt♥ ♦rs ♣r♦s② ♥♦rs♥ ♦♣♣♦rt♥ts t♦st♠t ♥str ♣♦♣t♦♥ s③s ♥ r♥ t♠s ♥trst♥② ts tr s♦rts ♦ ♣♥♦♠♥ ♥ ♠♦ s♠r②

P②♦♥ts

r♦♥strt♦♥ ♦ ♥ st♦rs trt♦♥② rs s♦② ♦♥ ♥ ♥♠♥t ♥ ♠♦ ♦ ♥♦t♦r ♠♥♦ ssttt♦♥ r r ♦r t♦♥♥ ♦♥str♥ts tt ♥ ♥♦r t♦ ♥ trr♥ t ♣r♦ss ♦ r♦♥strt♦♥ t ♠♦st ♦♦s ♦♦ ♥♦ tt ♥ s t♦ ♠♣r♦ ♥tr r♦♥strt♦♥ s t t tt r② ♥ ♦s t♥ t ♥s ♦ s♣s ♣②♦♥② ♦r♥t♦ ts ♥ tr s ♦r♠t♦♥ ♦ t s♣s tr tr♦ t ♣rs♠ ♦ t ♦t♦♥r② ♥tssr ♦ r t♦ ♦rrt② ♠♦ t r♥t ♣r♦sss tt ♠ ♥s ♥s♣s tr r ♥ sr ♥s tt ♦ ♥r ts s♠ ♦♥str♥t ♥ trt t♦trt♥ ♥ ♥ s♣s trs ♥ sr s♠t♥♦s② s♦ rst ♥ ttr trs s s♥rs ♥♦ ♦ t ♣r♦sss s♦♥ r t

st♠t♥ ♥s ♥ s♣s st♦r② ♥ ts tr♦ rr strtr ♦♥ t♦♣ ♦ s♣s tr s ♥rr r♦♠ ♥ trs tr♦ ♠♦s ♦ ♥ ♠② ♦t♦♥ t♠ss ♥rrr♦♠ sq♥ ♥♠♥ts tr♦ ♠♦s ♦ sq♥ ♦t♦♥ rt♦♥s♣ t♥ ts♣s tr ♥ ♥ trs s t♦②s t s♣s tr ♥s ♣r♦t② strt♦♥ ♦r ♥ trss♦♠ ♥ trs r ♠♦r ② t♥ ♦trs ♥ ♣rtr s♣s tr ♥ ♥ rtr♥ t ♥rrstrt♦♥ ♦ ♥ trs ♥♦r♠s ♦t t s♣s tr ♦♥ t② r ♥rt

♦♥sr♥ tt ♥s ♦ ♥ t ♦♥t①t ♦ s♣s tr s♦ rst ♥ ttr ♥ trs ♥tr r sr rs♦♥s ② r♦♥strt ♥ tr ♠② r r♦♠ t tr ♥ tr ♥ sq♥s ♠② ♥r♦♥ t♦♦ ♠♥② ssttt♦♥s ♣♦ss② ♥ t♦ ♦♥ r♥ ttrt♦♥ sq♥ ♦♠♣♦st♦♥s♠② ② r r♦♠ ♦♥ ♥ t♦ t ♦tr ♦♠♣♦st♦♥ tr♦♥t② ♦r ♥ sq♥s ♠② s♦♦♥str♥ ♦r s♦ s♠ tt tr s s♠♣② ♥♦t ♥♦ s♥ ♥ ts t♦ r♦♥strt tr st♦r② ♥ sss ss ♣②♦♥t ♠t♦s ♦t♣t ♥ trs tt ♠② r② r♥t r♦♠ t tr trs♥t♥ s♣s tr ♥t♦ t ♣r♦ss ♦ ♥r♥ ♥ ♦♥tr♥ r♦♥strt♦♥ rtts ♥ s♦ts rst ♥ ttr ♥ trs

Page 171: Early Evolution and Phylogeny

r ♥ trs r ♦r♠ rt♦♥s ♦ t s♣s tr tt ♦♥str♥ tr ♦t♦♥ rtr♥ç♦s ♦ss

r ♥tr ♣r♦sss ♠② ♣r♦ ♥ trs r♥t r♦♠ s♣s trs tr♥ss♣ ♣♦②♠♦r♣s♠s♣t♦♥s ♥ tr ♥ r♥srs ♦① ♦♥sq♥t② tr r♥t ♠♦s ♦ ♥ ♠②♦t♦♥ ♥ s t♦ ♠♦ ♥ ♠② ♦t♦♥ ♥ t ♥①t ♣rr♣s ♣rs♥t ♠♦s♥ ♦rt♠s tt ♥ ♦♣ t♦ ♦♥t ♦r t ♣r♦sss ♦ ♥ ♠② ♦t♦♥

P♦♣t♦♥ ♥ s♣s st♦rs

t ♠t ♥♦t ♦♦s ♣r♦r ② ♣r♦sss t♥ t t ♣♦♣t♦♥ ♥♥ s♣s ♣②♦♥s♥t ♦r ♥♥ t ♦s♥r ♦r s♦ tt ♥ s♦♠ ♦♥t♦♥s ♦ ♣♦♣t♦♥ s③s♣♦♣t♦♥ strtrs ♥ r♥ t♠s ♠♦st ♥ trs r r♦♠ t s♣s tr ♥ ♦♥t♥t♥ts ♥s ♦♥r t♦ ♥ ♥♦rrt st♠t ♦ t s♣s tr t♦ t ♠♣t ♦ ♣♦♣t♦♥♥ts ♣r♦sss ♦♥ ♥ tr t♦♣♦♦s ♠② ♥♦t ②s s♦ sr ♦s♥t t♦r② ♥♠♥ ♣rts tt ♥r ♥tr ♦t♦♥ ♣r♦♣♦rt♦♥ ♦ ♥ trs r r♦♠ t s♣s tr tr♦tr♥ss♣ ♣♦②♠♦r♣s♠s P ♦r ♣rs② t ♥♠r ♦ ♥rt♦♥s s♣rt♥t♦ s♣t♦♥s s ♥♦t r② r ♦♠♣r t♦ t ♣♦♣t♦♥ s③ ♦sr t♥ ts t♦ s♣t♦♥st ♦♠s ② tt t ♦s♥ ♦ ♥s ♣rs♥t ♥ t♦ s♣s s ♠♦r ♥♥t t♥ t ♣r♦ss♣t♦♥s rsts ♥ ♥ tr r♥t r♦♠ t s♣s tr ♠♦♥t ♥ t②♣s ♦ ♥ tr s♣s tr ♥♦♥r♥s ts ♥♦r♠ ♦t r♥ t♠s ♥ ♥str ♣♦♣t♦♥ s③s ♦s ♦♣♦♣t♦♥ ♥ts ♠♣r♦ t r♦♥strt♦♥ ♦ s♣s tr ♥ ♠② ♥ t ♦♥② ② t♦ t ♦rrt s♣s tr ♥ s♦♠ ss ♥♥ t ♦s♥r t♦ t ♥♥ ♦s♥r t ♦

♦s ♦ P s ♦♥ t ♦s♥t r♠♦r ♥♠♥ ♥ ♣r♦♣♦s sr t♠sr t s♥st♥ ♥♥ t ❨♥ s♦♥ t ♥♦s t Pr rst♥st ♥♦s rst ♠♦s ss♠ ♥♦♥ s♣s ♣②♦♥② ♥ st♠t r♥ t♠s ♦♥

Page 172: Early Evolution and Phylogeny

r P②♦♥t r♥ss t t♦ ♣ts r♦♠ sq♥s t♦ s♣s tr ♥ t ♥r ♣t ttrt♦♥ ② ♦ ♥rr♥ s♣s ♣②♦♥s st ♦ t ♣②♦♥t ♥r♥ s ss♥t② ♥♣♥♥t r♦♠ t st♣s ♣ ♥ ♦♥str♠ ♥ t♦♥ sq♥ ♥♠♥ts t♦ ♣ss r♥t trs ♥♦rr t♦ ♠ ♥ trs r② ♥rst♥ s s♣s trs s♥ ♦ ♣ts ♥ ♦♥trstt r ♣t ♠♦s t ♣♥♥② t♥ st♣ ♥ r ♦ ♦♠♣①t② s♥ ♥♦ r♦♠r♥t s ♦ ♦♦② r ♣ss t st s ♥♦t ①st ♥♠♥ts ♥ sttst② st♠t s♠t♥♦s② t ♥ trs s♥ ♠♦s ♦ sq♥ ♦t♦♥ tt ♥♦r♣♦rt ♥srt♦♥t♦♥ ♥ts♥ ♠♦s ♦ ♥ ♠② ♦t♦♥ ♥♦r♣♦rt♥ ♣t♦♥ ♥♦r ♥♦♠♣t ♥ s♦rt♥ s♣② t ♣♥♥② t♥ ♥ trs ♥ s♣s tr ♦ ②s rr♦s r♣rs♥t ts ♣♥♥s♥ ♦♥t♥♦s rr♦s r♣rs♥t ♥ tr ♥ s♣s tr srs ♣♥♥② t♥ ♥ ♠②♥♥♦tt♦♥ ♥♠♥t ♥ ♣②♦♥ts s ♥♦t ♥ ②t ①♣♦r t ♦ t♦rt② ♠♦ st①t ♦r sss♦♥ s♠t r♣rs♥tt♦♥ ♦ t s②♥r♦♥♦s sr ♦r s♣s trs ♥ trs♥ ♥♠♥ts t ssts ♥ ♦♦s ttr ♦r ♣r③♥ ts sr

t ♥str ♣♦♣t♦♥ s③s r t s♥st♥ ♥♥ t ❨♥ s♦♥ t ♥♦s ♠♦r r♥t ♦♥s ♥ s♦ st♠t t s♣s ♣②♦♥② t Pr rst♥s t ♥♦s ❯♣ t♦ ♥♦ ♥♦ ♣♣r♦ s ♥rr ♦t ♥ trs ♥ s♣s trs s♠t♥♦s② t♦ ts♦ t ♥tr t ♦st② ♣♣r♦ t♦ t ♦r ♥st♥ rst♥s t ♥♦s rst st♠t♥ trs ♥ t ♠①♠♠ ♦♦ r♠♦r r♦♠ sq♥ ♥♠♥ts t② t♥ s ts trs t♦♦♠♣t t ♦♦s ♦ ♥t s♣s tr t♦♣♦♦s ♦r♥ t♦ t ♦s♥t ♠♦ ts♣s trs r t♥ ♦♠♣r s♥ ♦♦ rt♦ tsts ♠t♦♥s s♦ tt ts ♣♣r♦ ♦♥sr② ♦t♣r♦r♠ ♦♥t♥t♦♥ s♣s trs r ♦rrt② r♦♥strt s♥ t ♦s♥ts♣♣r♦ ♥ ♦r r② r♥t ♥ ♦s s♣t♦♥s rs ♦♥t♥t♦♥ trs r ♠♦st ♦t♥ r♥tr♦♠ t tr s♣s tr ♥ t s♠ ♦♥t♦♥s ♥st ♦ ♠①♠♠ ♦♦ t Pr s

Page 173: Early Evolution and Phylogeny

②s♥ r♠♦r tr♦ r♦ ♥ ♦♥t r♦ s♠♣♥ ② ♥t st♣rtr t♦rs t ♦♥t r♦♥trt♦♥ ♦ ♥ trs ♥ s♣s trs ② r♦♥③♥ tt s♣s tr ♥♥str ♣♦♣t♦♥ s③s ♥ st♠t s ♣♦♥ ♥ trs t② s♦ ♥♥ ♥ trs ♥ rtr♥r ♣♣r♦ s ♦♠♣♦s ♦ tr st♣s rst ♥ trs r r♦♥strt s♥ r②s s♥t ♦♥qst ♣♣r♦①♠t② ♦♥t♥ ♦r ♥ ♥♥♦♥ s♣s tr s♦♥ s♣s tr s ♥rrs ♦♥ t strt♦♥ ♦ ♥ trs ♦t♥ ♣r♦s② tr t ♣♣r♦①♠t♦♥ ♠ r♥ t rstst♣ s ♦rrt ♥ t ♥ ♦r sq♥ ♥♠♥t strt♦♥ ♦ ♥ trs s ♦t♥ s s strt♦♥s ♦ s♣s trs ♥ ♥str ♣♦♣t♦♥ s③s s ♦r t ♦r♠r② sr ♣♣r♦ ts♠t♦ s ♦♥ t♦ ♠ ttr t t♦ t t t♥ ♥ ♦♥t♥t♦♥ ♠♦r♦r ts ②s♥♣♣r♦ s s♣r♦r t♦ t ♠①♠♠ ♦♦ ♦♥ ♥ sr rs♣ts s t ♥ ♥②s rr ♥♠r♦ s♣s ♦s r♥t ♥str ♣♦♣t♦♥ s③s ♦r r♥t ♥♦s ♦ t s♣s tr ♥ ♥ st♠t ♣r♠trs ♦ t ♠♦

r ♦♥♦♥♥ t♦rs r♦♠ ♣♦♣t♦♥ ♥ts ♦r ♠② ♥ t♦ ♠♦ t♦ t ♦rrtst♠ts ♦ s♣s trs ♥ trs r♥ t♠s ♥ ♥str ♣♦♣t♦♥ s③s ♦t② st♦♥ t t ♥ trs ♥ ② tt ♥ ♠s♥tr♣rt ♥ tr♠s ♦ ♣♦♣t♦♥ s③ ♥♥ st♦♥ ♦r♥st♥ ♦r Ps ♥ ♠♥♥r rt② ♥♣♥♥t r♦♠ ♣♦♣t♦♥ s③ ♥ ♠♠ t ♦t♦♥ ♦ ♥tr ♥ ♥ r ♣♦♣t♦♥ str♦♥ ♣r②♥ st♦♥ ♦♥ t ♦tr ♥ ♠rr♦r trs ♦t♥♦♥ ♥tr ♦ ♥ s♠ ♣♦♣t♦♥s ♠r② ♠rt♦♥ s♦ ♥tr♦ s♦r♥ ♥ trs ♥tr♦r s ♦r ♠♦r ♦♠♣① ♠♦s ♥♥♥ t ❲t♥

♣t♦♥s ♥ ♦sss

♦♠♥ t♦♥ ♦ ♥ ♣t♦♥ ♥ ♦ss ♦♥sr② ♦♠♣①s ♥ trs ♥ ♥ ♠② t ♦♥② ♦♥ r♣rs♥tt ♣r s♣s ♠② r♦r ♥ts ♦ ♣t♦♥ ♥ ♦ss ②♥ ♥ tr r♥t r♦♠ t s♣s tr ♦s ♦ ♥ ♠② ♦t♦♥ t♥ ♥t♦ ♦♥t ♣t♦♥s♥ ♦sss ♥ ♣r♦♣♦s ② rst t ♥ ♥ ♦t ss t ♦t♦♥♦ ♥ ♠② s ♠♦ ② rtt ♣r♦ss r♥♥♥ ♦♥ s♣s tr rt ♦rrs♣♦♥s t♦♥ ♣t♦♥ ♥ t t♦ ♥ ♦ss ♥ tr ♦r♥ ♠♦ ♦t ♥ trs ♥ s♣s trsr ① rst ♥ ♦♦rrs rst t tr ♦♠♥ tr ♠♦ ♦ ♥ ♠②♦t♦♥ t ♠♦ ♦ sq♥ ♦t♦♥ s♦ tt ♥ s♣s tr ♦♦s ♦ ♥ trs ♥ ♦♠♣t ♠♦ s ♥ ♠♣♠♥t ♥ ♣r♦r♠ tt ♥ st♠t ♥ trs ♥ ♥♦rt♦♦②♣r♦② ♣r♦ts ♥ s♣s tr tr♦ ②s♥ ♥trt♦♥ st♠t♦♥♦ s♣s tr ♦r s ♦♥② t t♦rt② ♦r ♦♠♣tt♦♥ rs♦♥s t ♣r♦r♠ tr♦r♥s sr♥♣t s♣s ♣②♦♥② t ♥ ♦♠♣t ♥ trs tt r ♦♦② ♠♦r ♠♥♥ t♥ t② ♥ ♥rr s ♦♥ sq♥ ♥♠♥t ♦♥ ♥ t♦♥ ts ♠♦ ♣r♦s ♣♦str♦r♣r♦ts ♦ ♦rt♦♦② ♥ ♣r♦② ♦r ♣r ♦ ♥s ♦ rtr s s ♥ ♦r♥t♦♥ ♣rt♦♥ ♦r t♦ t ts ♠♦s s s♥ ♣t♦♥ ♣r♦t② ♥ s♥ ♦ss♣r♦t② ♦r r♥s ♦ tr ♥ t♦ t s ♥♦♥ tt r♥t ♥s ♥r♦ r♥t rts♦ ♣t♦♥ ♥ ♦ss tr ♠♦s s♦ ♦♣ t ts ♥♦♠♦♥t② ♦ t ♦t♦♥r② ♣r♦ss t♦♣r♦♣r② ♣t ♥ ♠② ♦t♦♥ ♦r♦r r②♥ ♦♥ ♥♦♥ s♣s tr ♦r ♥ rt ♥trs s rt♥② ♦♣t♠st ♥ sr ss st s rst t ♣r♦r♠ ♥trts ♦r s♥r♦s♦ ♣t♦♥s ♥ ♦sss t rs♣t t♦ ♥ s♣s tr ttr sttst st♠t♦♥ ♦ ♥ tr ♥♦ ♦rt♦♦②♣r♦② ♣r♦ts s♦ ♦t♥ ② ♥trt♥ ♦r t strt♦♥ ♦ s♣s trs ♠♦ ♦ ♣r♦ s♣s tr t s♥ ♠♦r t♥ t ♥s tt ♣♣♥ t♦ s♥♦♣②♥ ♠♦st ♥♦♠s ♥ ♠② ♣ rs♦ s♦♠ t ♣②♦♥s ♥♥ t ♥ t t s♠ t♠r② t ②♥♠s ♦ ♥♦♠ ①♣♥s♦♥sr♥♥ ♦r t ♥tr tr

rtt tr

❲♥ ♠♦♥ ♥ s♦rt♥ ♥ ♣t♦♥ t s♠s rs♦♥ t♦ ♦♥sr tt tr ①sts ♥ ♥r②♥ s♣s tr tr ♣t♥ t st♦r② ♦ rt ♥rt♥ ♥ t ♥♦♠ ♥ t s ♦tr ♥ r♥sr ♦r t ss ♦ tr t ♦♥♣t ♦ s♣s tr ♣♣s s ♥ sss

Page 174: Early Evolution and Phylogeny

t ♥t ♦♦tt ♠♥ t r♥ t ♥ t rt t ♦② t s♠s r tt t ♣②♦♥t ♥♦r♠t♦♥ ♦ ♥ trs s strtr ♥ ② tt ssts tst s♦♠ rt s♥ ♥ t st♦r② ♦ t t t st ♦s♥t ♣♣r t♦ stt rr♥t♦♥s♦♥ ♦♥ ts t♦♣ ♠t ♦♥② t♦ ♥♣♣r♦♣rt ♠t♦s ♥ ♠♦st ♥ ♠s t♠ r♥ tr st♦r② t♦ tr♥srr ♠♦♥ st♥t ♦r♥s♠s ♥ ♦♥ ♦ ♥ ♠② ♦♦ ♥② ♥ ♦ tr♥sr t♦② rtr s♠♣♥ ♦ ♥t rst② ♦ rt♥② r♥ s♦♠ ♦r♦s t ♠♥ tt tr s ♥♦ rt s♥ t♦ ♦♥ ♥ ♥ trs ♥ ♣♣r♦♣rt ♠♦♥ ♦ ♦ ♣r♦② ♣ t♦ r② t ss

r r rst st ♥ t ♦rt♦♥ ♦ s ♠♦ ♥ s ♠♦ s♣s tr♣r♦s strt♦♥ ♦ ♥ trs tr♦ t♦♣♦♦ rrr♥♠♥ts ♠♠♥ ♥ ♥ trs rt t rs♣t t♦ ♥ ♥♠♥ts ♦ ♠♦ s ♠♣♠♥t ♥ t ②s♥ r♠♦rt s♠♣♥ ♥②ss ♣r♦ rtr ♥ ♦r t ♦♠♣①t② ②♣♦tss tt ♥s r ♣rt ♦ r ♣r♦t♥ ♦♠♣①s r ss st t♦ ♦r t ♠♥s♦♥ ♦ t t♦♣♦♦s♣ tt ♥ r tr♦ s♠t♦♥ r♦♠ s♥ s♣s tr ♠t t ♣♣t② ♦ t♠t♦ t♦ s① s♣s tr ♣♣r♦s s♥ st ♦rt♠s r♦♥t♥ ♥ ♥ s♣s trs ♥r ♠♦ ♠② ♣ t ts ♣r♦♠ r♦rr② t tt t rst ♦ ♣r♦r♠ ♠♣♠♥t♥ s ♠♦ ♦ s♣s tr ♦rt t rq♥s ♥ ♦ts ♣r♠t t♦ ♦r♠② tst t tr ♦ ②♣♦tss r t rq♥s ♦ tr♥srs ♦♥ ♦♥ r♥ s tt ♥♦ tr s♦ ♣rrr ♦r t ♦tr ♣♦ssts r tr tr t♦♣♦♦stt r ♦♥ s♥♥t② ♠♦r ♦t♥ t♥ ♦trs ♥ rt ts t♦♣♦♦s t♦ ♥♥t ♥ts ♦♥♦s②♠♦ss ♦r ♦♦ sts r s♣s sr♥ s♠r ♦♦ ♥s ①♥♥ r ♥♠rs ♦♥s r r♥t t②♣s ♦ ♥s r♥t② tr♥srr s qst♦♥s r t ♦t ♦ ♥ ♠♣rss♠♦♥t ♦ ♦r ♦ t t ❩①②② t ♦ t ♠ ♥t rt♥ tt ♦ str♦♥② ♥t r♦♠ ♣r♦♣r sttst r♠♦r tt ♦s ♥♦t t ♥trs s t t s sttst st♠ts r♦♠ ♥ sq♥s t♠ss ♥ t♦♥ ♥ tr♥sr ♥ts♦♥sttt ♥♦r♠t rtrs ♦r ♣②♦♥t r♦♥strt♦♥ ♥ t ♦rt♥ ♥ ♣r♦rt ts ♦r ♥♦s ♦ s♣s tr s♥♥t r♦♠ ♥♦ s ♥ t♦ ♥ ♥st♦r ♦ ♥♦ ts ♠♥s tt ♥♦ s ♠♦r r♥t t♥ ♥♦ ♦♥sr♥ t ♠♠♥s t② ♦ t♥ ♥♦s♥ t ♣r♦r②♦t tr ♦ r ♦sss r s♥t ♥ t st t t♦ ♦♠♣r t♦ ①t♥t s♣s s rt t♥ ♦ rt♥② ②

♦♠♣tt♦♥ ♥s

♦sr ♠♦s ♥♦t ♥r t t s♠ t♠ s♣s tr ♥ trs ♥ ♣r♠trs ♦ t♠♦ ♦ ♥ ♠② ♦t♦♥ ♦r s♦ ♦r r② ♠♦st ♥♠r ♦ s♣s r ♥ t trr strtr ♥♥ ♠♦s ♦ ♥ ♠② ♦t♦♥ r♣rs♥ts ♥ ♥trst♥ ♥♦r ♦rt♠s r♥ ♦r ♥ tr s ♦♥ ♥ ♥♠♥t s r② ♥ ♥t♠t♥ ts s t♥♠r ♦ t♦♣♦♦s ♥rss ♠♦r t♥ t♦r② t t ♥♠r ♦ s sr♥ ♦r ♦t s♣str ♥ sr ♥ trs t t s♠ t♠ s ♥ ♠♦r t ♦r s♣s tr ♦♥ ♥s t♦st♠ts♠♣ ♦rrs♣♦♥♥ ♥ trs s ♦♠♣tt♦♥ s♣ ①♣♥s ② ♥♦ ♣r♦r♠ s ②t ♥s tt ♦ ♥t② ♥r ♦t ♥ ♥ s♣s trs s♠t♥♦s② r s ♦r ♥ ♦♦s② t♦ ♣rs ♦♠♣tt♦♥s tr♦ ♥ rttr s ♦♥ srr ♥ sr ♥ts srr ♥♦♦ sr♥ ♦r s♣s tr ♥t ♥♦s ♦ ♦r s♣s tr sr ♦rrs♣♦♥♥♥ trs ♣rst♦♥ ♦ ♥ssr② t♦ ♦♠♣t trs s ♥♦t ♦♥ ♥s ♦r s♣s t ♦♥ ♦ ♥♦♠s ♠② rqr ♥rs ♦r t♦s♥s ♦ ♦♠♣trs r♥♥♥ ♦r sr②s

♠♦st ♦♠♣t ♠♦ ♦ ♥ ♠② ♦t♦♥ ♦ ♦♥t ♦r ♥ tr♥srs ♣t♦♥s ♦sss♥ sr♣♥s ♥ ♥ s♦rt♥ t ♠② t t♦ s ♦♥ ♦r♣r♠tr③t♦♥ ♦① ♦r ♣r♦rss ♥ ♦rt♠s ♠② r♥r s ♠♦s ♦♠♣tt♦♥② trt tt t ♥ t ♥ ♦ ♥r♥ ♦ t rt ♦♥trt♦♥s ♦ ts ♥ts t♦ t r ♠♦♥ts♦ ♥♦♥r♥s ♦sr ♥ Pr♦r②♦ts ♦r ♥st♥ ♦ ♦♦② rst s ♠♦s s♦ss♦t r♥t ♣r♦ts ♦ ♥ ♣t♦♥ ♦ss ♥ tr♥sr t♦ r♥t r♥s ♦ t s♣s tr ♠♦♥ tr t②♣s ♦ ♥ts ♦r ♥ r♥ tr♥s ♦t t♦ t♦♦ t ♥ ♦♥♦♠ tr♥t

Page 175: Early Evolution and Phylogeny

♦ ♦ r♥t ♥s ♦ ♥ts ♦r r♥t ♣rts ♦ t tr ♦r ♠♦r ♥t② t♦♠t② t♠♦s t♦ r♦♥s ♦ t tr

♥ ♠♦ ♥♦r♣♦rt♥ tr♥ss♣ ♣♦②♠♦r♣s♠s ♥ ♣t♦♥♦ss ♠② ♦rs♠♣st t ♦s ♥♦t ♦♥t ♦r ♣♥♥s t♥ ♥s ♦r ♥st♥ t♦ ♥♦r♥ ♥sr♦♠ tr ♦♣r♦♥ ♦r r♦♠ r②♦t r♦♠♦s♦♠ r ♠♦r ② t♦ sr s♠r st♦r② t♥♥s r♦♠ r♥t r♦♥s ♦ t ♥♦♠ ♦♦t♦♥ ♠② s♦ t ♥s tt ♥trt s ♣rt ♦ tr♥t♦♥ rr t P é♠♦♥ t ❲♦ r② t ♦♥t♥ ♦r ts ♦ s♣t♣r♦①♠t② ♦♥ ♥ st♦rs ♥ tr♦ ♥ r♦ ♦s s r♥t② s② ♦♦t t t♦ ♥r r♦♠♥t♦♥ ♦ts♣♦ts s♠t♥♦s② t ♥str ♣♦♣t♦♥ s③s ♥r♥ t♠s ♥♦tr ♠♦r ♥r ♣♣r♦ t♦ ♠♦ ♦♦t♦♥ ♠② s ♥t♦tr ♠♣s ♥ét t qt ♠♦s ♦ ♣♥♥② ♥t♦tr ♠♣s r ♦ts ss♦t ♥s tstrt♦♥s ♦ trs t♦ ♥s tt ♦♦ ♦r ♣rt ♦ tr st♦r② s♦ ♣rt② s♠rtr t♦♣♦♦s ♦ s♠r tr t♦♣♦♦s ♣♥s ♦♥ t t②♣ ♦ ♦♦t♦♥ ♥ ts ♥ ♥t ♥t♦ sttst ♠♦ tr♦ ♣r♦r ♣r♦ts ♦r ♥st♥ t♦ ♥trt♥ ♥s r ♣r♦r

♠♦r ② t♦ sr tr t♦♣♦♦s t♥ ♥s tt r ♣rt ♦ t♦ t♦t② r♥t ♣t②sr ♦♥t♥ ♦r t ♦t♦♥r② ♣r♦sss t♥ t t ♥ ♠② s ♠♥t♦r② ♦♥

♥ts t♦ ♦ s ♥ st♠t♥ s♣s tr ♥ ♥ trs t♦♥② ♣r♦♣r ♠♦s ♦ ♥ ♠②♦t♦♥ ♦ ♣r♦ ♠♦r ♥♦r♠t♦♥ t♥ t ♠r ♣ttr♥ ♦ s♣s rst♦♥ s s r♥t♠s ♥ ♥str s♣s t ♣♦♣t♦♥ s③s rt② r♦♥strt♥ ♥ trs ♦ ♥ ♦r♥ ♣♦ssts t♦ st② t ♦t♦♥ ♦ ♥♦♠ ♦♥t♥ts ♣ t♦ ♥♦ ♥♦♠ ♦♥t♥t r♦♥strt♦♥s ♦♥②r ♦♥ ♦♥ts ♦ ♥s ♥ t ♦ss t ♦ t ♦♥ ❯s♥ ♥ trs♥st ♦ ♣r♦② rt② ♠♣r♦ ♥r♥s

♥♠♥t ②r

♦ ♠ttr ♦ s♦♣stt ♠♦s ♦ ♥ ♠② ♦t♦♥ ♠② t t ♥♠♥ts ss♦t t♥ ♠s r ♥♦rrt s♦ ♣r♦② t rst♥ ♥ ♥ s♣s trs ss t①t♦♦r♣rs♥tt♦♥ ♦ ♣②♦♥t r♦♥strt♦♥ s tr st♣ ♣r♦ss rst ♦♠♦♦♦s ♥s r stt♥ ♥ t ♣②♦♥tst ♠② ♥tr♥ r ♥ stt② ♣♦s t ♥♠♥t ♥ ♥② trs t r♦♠ t ♥♠♥t s ♦r ♠♦♥ts t♦ ♦♥sr♥ sq♥ ♥♠♥t s trs t s ♥ st♠t ♠♦st ♥♠♥t ♣r♦r♠s s rsts t♦ ♣ ♣ rtrs ♥t♦ sq♥s♥s s♦ s t♦ ♦r♥③ ♣tt② ♦♠♦♦♦s sts ♥t♦ ♦♠♥s ♦♣t♠t② ♦ ♣ ♣♠♥t sssss t rs♣t t♦ s♦r ♣♥③s ♣ ♥srt♦♥s ♣ ①t♥s♦♥s ♥ ssttt♦♥s ♥t ♥ t ♥♠♥t s t st st♠t ♦ t tr ♥♠♥t ♦r♥ t♦ rtrr② ♣♥ts ♠② ♥rst ♦r t t ♥r st② ♥ ♦r♥ t♦ ♣rtr rsts ♦♠♣s♦♥ t ♦tr♠ t r ö②t②♥♦ t ♦♠♥ ♥ t ♣♥ts r♣rt② t♥ t♦ t t ♠② ♥♦t ♥ t ♦♣t♠♠ ♥♠♥t st t ♦♣t♠♠ ♥♠♥t s ♦♥tr s ♥♦ r♥t tt t s t tr ♥♠♥t s s♦r♥ ♦♥srt♦♥s ♦♥ ♥ ♥♦♥ ♥t ♠tt♦♥s ♥r♥t t♦ r②♥ ♦♥ s♥ ♥♠♥t t♦ ♥r ♣②♦♥t tr r ♥♦ ♣t ♥♥ t rr ❲♦♥ t ♦r ♦♥② r♥t② ♣♦♣ tr t♦ s♦t ♣r♦♠ t sttst② s♦♥ ♣♣r♦ rst t ♣r♦st ♠♦ ♦ ♥srt♦♥ ♥ t♦♥♥ts ♦♠♥ t ss ssttt♦♥ ♠trs ♥ s♦♥ ② ♦♥t② st♠t♥ sq♥ ♥♠♥ts♥ ♣②♦♥t trs

rst ♣r♦st ♠♦ ♦r t ♠①♠♠ ♦♦ ♥♠♥t ♦ t♦ sq♥s s s ②s♦♣ t ♦♠♣s♦♥ ♠♦r rst ♠♦ s tr ♣r♦♣♦s ② ♦r♥ t t♦t ♦♥sr ♦♥② ♣♦♥t ♥srt♦♥s ♥ t♦♥s t♥ t♦ sq♥s ②r tr t s ♥r t♦♦♥t ♦r ♠t♣st ♥srt♦♥s ♥ t♦♥s ♦r♥ t ♠♦s ♦ sttst ♥♠♥t♥ s♥ s r②♥ ♦♥ ♥ r♦ ♦s ♦r ♦♥ t ♦s② rt tr♥srs r② t ♦♠s ♥ stts r ♠t ♥srt♦♥ ♥ t♦♥ ♦r r♥t② s ②s♥ ♠t♦s ♦♠ ♥rs♥② ♣♦♣r sr ♦rt♠s ♠♣♠♥t♥ ②s♥ ♦♥t s♠♣♥s ♦ ♠t♣♥ ♥♠♥ts ♥ ♥ trs ♥ ♣r♦♣♦s ts♦♥ ♥ ♦♠s t r♥♦ t③r ♥tr t ♥s t r ss♦t♥ ♣rs ♥♠♥ts s♦♥ s t♦ r♥ ♦ ♣②♦♥t tr ♣r♠ts t♦ s② ♦♠♣t t ♦♦ ♦ ♠t♣♥♠♥t ♦♠s t r♥♦ t ♥trt♥ ♦r t strt♦♥ ♦ ♣r♦ ♥♠♥ts ♥ trs s

Page 176: Early Evolution and Phylogeny

r② ♦♠♣tt♦♥② ♥t♥s ♦r t ♣♣rs s t st ♣♣r♦ ♦r st♠t♥ ♣②♦♥t trs ♦♥t♥ ♦r t ♥rt♥t② ♥ ♥ ♥♠♥t

♦r♦r ②s♥ ♦♥t st♠t♦♥s ♦ sq♥ ♥♠♥ts ♥ ♣②♦♥t trs ♦r t ♣♦sst②t♦ ttr rtr③ t sq♥ ♦t♦♥r② ♣r♦ss s ♣r♦ts ♦ ♥srt♦♥s ♥ t♦♥s ♥ s♠t♥♦s② st♠t ♥ ♦♠♣r t ssttt♦♥ ♣r♦ts ♦♥trr② t♦ ♠♦st ♦♠♠♦♥② ss♦tr ♣s ♣r♦r♠s tt s♠t♥♦s② st♠t ♥♠♥ts ♥ ♣②♦♥s ♦ ♥♦t trt ♣s s♥♥♦♥ rtrs t ♥ s ♥srt♦♥t♦♥ s ♣②♦♥t② ♥♦r♠t ♥ts ♥ t s ♦ ❱♥ ❱ rss ts s ♥ s♦♥ t♦ ♠♣r♦ rs♦t♦♥ ♦ t ♣②♦♥t tr ♥s t r s t rt ♦ ♥srt♦♥t♦♥ s t♦ ♦r t♥ ssttt♦♥ rts tr ♥♦r♣♦rt♦♥♥t♦ ♣②♦♥t r♦♥strt♦♥ ♠② s♦ ♣ rs♦ ♥♥t r♥s ♦r♦r s t rtr♥♥♦t r♥srt ♦trs t s ♥♦t ♦♥sr ♦♠♦♦♦s ♥srt♦♥t♦♥ ♥ts ♥ ♠♣♦s rt♦♥ ♦♥ ♣②♦♥t tr ♥ tr♦r ♣♦♥t t♦ ts r♦♦t ♥ r♦♦t♥ ♣②♦♥t tr s ♥♦t♦r♦s②t s♥ t ❨♣ t ♣ ♥srt♦♥t♦♥ ♥ts ♦♥t♥ ♥♦r♠t♦♥ ♦rt①♣♦t♥ ♦r r♦♦t♥ ♥ trs t s♦ ♦r r♦♦t♥ s♣s trs ♥ ts rs♣t ♦r♥ ♦♥ ♦rrr ♠♦ tt ♦ ♦ r♦♠ sq♥s t♦ s♣s trs tr♦ ♥♠♥ts ♥ ♥ trs ♠② ♥♦t r♠ t♦♦ ♠ t ♦♠♣tt♦♥ ♥② ♦ ♥♠♥t s♠♣♥ ♥ t ♥②ss ♦ s♥♥♠♥t ♠② t ♥ ♣rt s tr s ♥♦t ♠ ♥♦r♠t♦♥ ♦♥t♥ ♥ ts sq♥s tr♠② tr♦r ♦t ♦ ♥rt♥t② ♦♥ t ♣②♦♥t tr s♦ tt ♥ ②s♥ stt♥ r ♥♠r♦ ♥ trs ♥ t♦ st ♥ ♦r ♦ ts trs r ♥♠r ♦ ♥♠♥ts ♥ t♦ s♠♣♦r sr ♥s r ♥ ♥ ♣r ♥ ♦ ♥t r♦♠ t ♥♦r♠t♦♥ ♦ ♦tr♥s tr♦ tr ♦♠♠♦♥ s♣s tr t strt♦♥ ♦ ♣r♦ ♥ trs ♦ ♥rr♦r ♥ ♥♦♥sq♥ t ♠t t ss t♠ t♦ ♦♥t② s♠♣ ♥♠♥ts ♥ trs

♥ t♦♥ t♦ ♣②♦♥t r♦♥strt♦♥ ♦tr sq♥s ♥r♥s ♥ ♥t r♦♠ r♥♦t t ♥♠♥t ♦r ♥st♥ t tt♦♥ ♦ sts ♥r ♣♦st st♦♥ s r♥t② ♥ s♦♥ t♦♣♥ ♣♦♥ t ♠t♦ ♦ ♥♠♥t s ❲♦♥ t ♥ t rt② ♦ ♣r♦t♥ strtr ♣rt♦♥ s ♥rs② ♦rrt t♦ ♥♠♥t ♠t② ós t s♥ s♥ ♥♠♥t t♦ ♣rt♣r♦t♥ strtr s tr♦r ② t♦ t♦ ♥♣r♦♣r ♥r♥s ♥ ♣♦rt♦♥s ♦ ♣r♦t♥s tt r ♥s② t♦♥ ♠r ♦♥s♦♥s ♥ r♥ ♦r ♣②♦♥t ♦♦t♣r♥t♥ t♥qs s ♠t♦s ♥tr♦♠ t ♦♠♣rs♦♥ ♦ sr ♥♦♠s t♦ tt ♣tt② ♥t♦♥ r♦♥s ♣♦rt♦♥s ♦ sq♥stt r ♠♦r ♦♥sr t♥ ①♣t ♥ tr♦r ♠st ♥r ♣r②♥ st♦♥ rt t r r②♥ ♦♥ s♥ ♥♠♥t ♦r ts t qt② ♦ ♥r♥s ♥ t♦ r♥t ♥♠♥t♦rt♠s r s t♦ ♥♥♦tt tr♥sr♣t♦♥ t♦r ♥♥ sts ♥ r♦s♦♣ ♥♦♠s t② r♣♦♥ ss t♥ tr t ♦♥sq♥t② t t s ♥ ♦rt♠ t♦ tts♦② ♦♥ r♦♥s r t ♥♠♥t s ♥♦t ① t ♥trt ♦r ♥ s♦ tt ts ♥trt♦♥ ♦ s♥♥t② ♠♣r♦ ♥♥ st tt♦♥ ♥ t ♥♥ sts r ♥♦t ♣rt② ♦♥sr

♦♠♥t♦♥ ♥ ♦♠♦♦②

s t ts ♦ r♦♥strt♥ ♥ ♥ s♣s st♦rs s ♥♦t ♦♠♣t ♥♦ ♦♥sr♥ ♥♠s s t ♥s rs ♦ ♣②♦♥t r♦♥strt♦♥ s ♥♦rrt s♥ ♦ ♥t ♠trtr♦ t ♣r♦sss ♦ r♦♠♥t♦♥ ♥ ♥ s♦♥ rq♥t② ♣r♦s ♥s t ♠① ♣②♦♥ts♥s ♦r ♥ tr♦♦♦s ♣rts ♦♠♦♦♦s r♦♠♥t♦♥ t r♣♠♥t ♦ ♣rt ♦ sq♥ ② rt s♠♣② rst ♥ ♥ ♥♠♥ts t ♦♥tt♦r② ♣②♦♥t s♥ ♦r tr ♥t ♥s♦♥ ♣r ♠♣t ♥ t t rr st♣s ♦ ♥r♥ ♦ ♥ ♦♠♦♦② s ♦♥② ♣rts ♦♣r♦t♥ sq♥s ♥ ♦♥sr ♦♠♦♦♦s ♥ ♥② s t r♦♥strt♦♥ ♦ ♥ ♠② st♦r②s ♦♥ ts ♥tr ♥t ♠② t st ♣rt ♦r ♦♠♣t② r♦♥ t♦ t ♦♥② tr② rr♦♠♦♦♦s rtr s t ♥♦t ts ♥ts ♠② ♦♠♣rs rt② ♦♥ strs ♦ sq♥ ♥t ♦♥t♥ s♥s ♥ ♥t

♥② ♣♣r♦s ♥ ♦t t♦ ♥t②♥ ♥ts ♦ ♦♠♦♦♦s r♦♠♥t♦♥ ♥ ♠t♣ ♥♥♠♥ts ♥ r♥t ♠♦s ♥ s♠t♥♦s② sr ♦r s♠♥t ♦♥rs ♥ st♦rs ♥ ♥ ♥♠♥t♥♥ t ③rs t s♠r P♦♥ t t♦ ts st♣ s ♥♦t ♥♥rt♥ ②t s♣s tr r♦♥strt♦♥ ♠♦ s♥ ♠t♣ ♥s ♦ s ♥tr♦s

Page 177: Early Evolution and Phylogeny

ts ♠♦s ♦ ♥ r♦♠♥t♦♥ ♥t♦ t ♠♦ sr ♦♥ s♦♥ ♥ ♦♠♥ s♥ ♣r♦② ♠♦r ♦♠♣t t♦ ♠♦ s t② ♥ ♠♣t ♦♥

t ♣r♠r② ②♣♦tss ♦ ♦♠♦♦② t ttrt♦♥ ♦ ♣r♦t♥ t♦ ♠② Pr♦t♥ ♦♠♦♦② s t②♣② ♥rr r♦♠ tr ♦r s♠rt② ♥ sr ♣ tss ♣r♦♣♦s t♦♠t② r♦♥strt♥ ♠s s ♦♥ ts rtr♦♥ t♦ t str♥ ♠t♦ s t♦ r♦♣ ♣r♦t♥s ♠② r②♦ s♠rts r s② s♠ss ♥ ♣r♦t♥ sq♥s sr♥ ♦♠♦♦♦s s♠♥ts ♥ t②♣②♣ ♥ r♥t tr♦♦♦s ♠s s♥♥ ♦ ts ♣r♦t♥ ♠♦rt② ♥ s ♦ ♦♠♦♦② ♣r♦♠ ♥ r♦♠ t t tt tr ♠② 19% ♦ r②♦t①♦♥s tt ♥r♦♥ r♦♠♥t♦♥ t ♥♦♥♦♠♦♦♦s ♣♦rt♦♥ ♦ t ♥♦♠ ♦♥ t ♥trst♥② ♦♠♣r t♦ ♥tr ♣r♦t♥s ♦♠♥s s♦ ♣r♦ ♥♦r♠t♦♥ ♦♥ ♣r ♣②♦♥t rt♦♥s♣s ♥ ② ♦ ♥ t ts ss ♦ t♦ ♦♣ t ♣r♦sss ♦ ♦♠♦♦② ss♥♠♥tsq♥ ♥♠♥t ♥ ♣②♦♥t r♦♥strt♦♥ ♥t♦ ♠♦ t♦ r♦♥strt ♥ ♦♠♥ trs tr♥t s ♦ ♦♠♦♦②

♥♦♠ ♥♠♥t

♦♥strt♥ t ♦t♦♥ ♦ ♥♦♠s s ♥♦t ♦♥② r♦♥strt♥ t st♦r② ♦ tr ♥s ♥ts ss ♥rs♦♥s t♥♠ ♣t♦♥s r♦♠♦s♦♠ ss♦♥s♦♥ s♦ t ♥♦♠s ♥ ♥ tr♦r t♦ ♠♦ t♦ ♣r♦♣r② ♣t tr ♦t♦♥ ♦r t ♦♥srt♦♥ ♦ s ♥ts ♦♥sr② ♦♠♣①s t ♥♠♥t ♣r♦♠ ♥ ♦♥sq♥t② ♠♦st ♣s ♣♣r♦s ♥ s ♦♥ ♣rs♠♦♥②♥t② rt t s ②s♥ ♣r♦r♠ t♦ st♠t strt♦♥s ♦ ♥str ♥♦♠ rr♥♠♥ts s s t ♥♦♠ ♣②♦♥② ♥ 87 ♥♠ s♣s s♥ t 37 ♥t ♠rrs ♥ t ♠t♦♦♥r ♥♦♠ ♥r②♥ ♠♦ ♦♥sr ♦♥② ♥rs♦♥s s ♣♦ss rrr♥♠♥ts❯s♥ t s♠ ♣r♦r♠ r♥ t ♥②s t strt♦♥ ♦ rrr♥♠♥t s♥r♦s ♥ ♥②rtr③ t ②♥♠s ♦ 8 ♥♦♠s ♦ ♦s② rt tr str♥s ♦♥t♥♥ 78 ♦♥sr r♦♣s♦ ♥s

s♥ ♠♦ ♦ ♥♦♠ ♦t♦♥ t♥ ♥t♦ ♦♥t ♥rs♦♥s ♣t♦♥s ♦sss r♦♠♦s♦♠ss♦♥s♦♥ ♦ ♣r♦② ♦r ♠♦r ♥st ♥t♦ ♥♦♠ ♦t♦♥ t ♦♥sttts ♦♥sr ♥ rtr ♥♦r♣♦rt♥ ssttt♦♥s ♥ ♥srt♦♥st♦♥s ♦ ♠♦ t♦ ♥ ♦♥♦♠s ♥ sttst r♠♦r ♥ s♦ rst ♥ ♠♣r♦ ♥rst♥♥ ♦ ♥♦♠ ②♥♠s s ts ♥♦♥ tt t ♥♦♠ ♥r♦♥♠♥t ♥♥s ssttt♦♥ ♣ttr♥ ②r❲r t rst ♥t Prrèr ♦r② s t ♦r② s ♠② ♣ tst t♦rs ♦ s♣t♦♥ ② r♦♠♦s♦♠ rrr♥♠♥ts sr ♣ tt ♥ rtr③ ♦ ♥♦♠ ♣t♦♥s é♠♦♥t ❲♦ ♥ ♣ t ② ♦r t r♦♥strt♦♥ ♥ st② ♦ ♥str ♥♦♠s trt♥t t♦③♥s② rt t t♦ t r♦s ♦r♦r s ♣♣r♦s ♦ ♦r ♥♥trt r♠♦r ♦r t st② ♦ ♥♦♠ ♦t♦♥ ♥ s♦ ♣ tst ♦r ♥st♥ tr ♥♦♠♦♠♣①t② ♦r♥ts ♥ s♠ t ♣♦♣t♦♥ s③s ②♥ t♦♥② t ♦ ♣r♦ ♥①♥t ♣t♦r♠ ♦r ♥♦♠ ♥♥♦tt♦♥ r♥ ♦t ♥r♦♥ ♥♠♥t ♠♣r♦s r♦st♥ss ♥r② ♦r r♥ ♦ st♠ts ❲♦♥ t t t ós t ♦♥ ♥ ①♣ttt r♥ ♦t ♥♦♠ ♥♠♥t ♠♣r♦ ♥♥♦tt♦♥ r② s ♥♥♦tt♦♥ s r② ♥♦♥ t♦♥t r♦♠ t ♦♠♣rt ♣♣r♦ ② t ♥rt t t♥ t

P♥♦♥♦♠s

②♦♥ ♥♦r♠t♦♥ ♦♥ ♠♦r ♦t♦♥ ♥ ♣②♦♥② ♥♦♠s ♦♥ t ♦♦t♣r♥ts ♦ ♥♥t ♥t♦♥s ♥r♦♥♠♥t ♦♥str♥ts ♥ st♦♥ ♣rssrs ♦♥strt♥ t ♥♦ ♣♥♦t②♣ ♦ ♥str♦r♥s♠s s ♦♥ t ♥②ss ♦ t ♥♦♠s ♦ ①t♥t ♥s ♠② t t♠t ♥ ♦ ♦t♦♥r② ♥♦♠s ♥t tt♠♣ts t ♦♠♥♥ ♠♦s ♦ ♣♥♦t②♣ ♥ ♥♦♠ ♦t♦♥ ♦rs♦t ♥trt ♣♣r♦ tt ♦ s♦ ts ♣r♦♠ t s ♣♦ss t♦ ♥r ♣rts ♦ ♥ ♥♥t ♦r♥s♠♣♥♦t②♣ ② r♦♥strt♥ ♥ rtr③♥ ♦♥ ♦ ts ♥ ♦r ♥st♥ r t ♦r② sttst② ♣rt♥ ♥ ♥t♦♥ t ♠♦ ♦ ♥t♦♥ ♦t♦♥ ♥rt t ♥ ♣♣r♦ ♦ ①tr♣♦t t♦ t ♥tr ♥ r♣rt♦r ts ♠♣r♦♥ ♣r♦s ♣♣r♦s tt♥rr ♣♥♦t②♣ s ♦♥ ♥ ♦♥t♥t r♥ t ♦ss t r♦ t

Page 178: Early Evolution and Phylogeny

♦r ♠ ♦ ♣♥♦t②♣ ♠rs r♦♠ ♥t♦r ♦ ♥trt♦♥s ♠♦♥ ♥s ❯s♥ ♣♣r♦ss s t♦s sr ♥ ♣r♥ st♦♥s t ♥tr sq♥ ♦ ♥ ♥str ♥♦♠ ♠② ♥rrt♦ t rt ♥rt♥t② ♦r ② ♦♥str♥ ♥♦♠ ♣rts ♦r ♦r r② ♥♥t ♦r♥s♠s r♦♠s sq♥s ♥ ♥str ♥ ♥t♦r ♥ rs♦♥strt t♦ ♥r t ♠t♦ ts ♦ ♥ ♥♥t♦r♥s♠ ♦r t ♠♣s t ts ♠♦r♣♦♦②

r ♦s ♦ ♥♦♠ ♦t♦♥ r♦♠ r ♥♦♠ sq♥s t♦ ♦s②st♠s r♥s♠s ♥ sr t r♥t s ♦ ♦r♥st♦♥ ♥ ♦r r♥t ♠♦s ♦ ♦t♦♥ ♥ s♦♥t② s♥ ts ♠♦s ♦ ♦t♦♥ ♥ ♥t t r♦♥strt♦♥ ♦ rtrsts ♦ ♥♥t ♦r♥s♠s ♥r♦♥♠♥t ♥ ♦r♥s♠s ♦♣rts rt st ♣rssr ♦♥ t ♦r♥s♠ ♦♥t♥♦srr♦ s r♣rss♦♥s ♦♥ ♦r♥st♦♥ s rr♦s ♥♦♠s ♠② ♣t t♦ t♥r♦♥♠♥t ② st♥ t ♦♠♣♦st♦♥ ♦ tr ♥s ♥ ♥t♦rs tr♦ t ♦ss ♦r qst♦♥ ♦ ♥♥s ♥ ♥ ♥trt♦♥s t♥ ♥s ♥ ♦♦ ♥t♦rs ② t ♣♣r♥ ♦r s♣♣r♥ ♦♥ ♦♥♥t♦♥s t♥ ♦r♥s♠s ♦r t♦tr ♥ ♦r♥s♠s r s ts rt♦♥ t♥ t ②♦r♥s♠s ♥t♦♥ ♥ tr ♥r♦♥♠♥t ♥ ♥♣t ♥t♦ ♠♦s ♦ ♦t♦♥ ♦r ♠♦r r②

♦ r♦♥strt t ♥♥r ♦r♥s ♦ ♠♦♥②rs ♦ s ♦♥ ♦ t♦ ♥r t ♥tr s♥♥♣t②s ♥ ♦♠♣t ♠t♦ ②s r♦♠ ♥♦♠ t r♦♥strt♦♥ ♦ rt♦♥ ♥♠t♦ ♥t♦rs s ♥ s♦♥ t♦ rt② ♥t r♦♠ ♥ ①♣t ♦t♦♥r② ♠♦ ❲ t t♠♥♥ t P♥♥② t P♥♥② t s ♠♦ ♦ ♥t♦r ♦t♦♥ t♦r♦♥strt t ♥trt♦♥ ♥t♦r ♠♦♥ ❩P tr♥sr♣t♦♥ t♦rs ♥ t ♦rt ♥st♦r rst ♣②♦♥t tr ♦ ♦rt ❩P ♣r♦t♥s s t ♥ ♥♦ s ♥t s ♣t♦♥ ♦r s♣t♦♥♥ts s ♥♦r♠t♦♥ s t♥ r♦ss t ♥trt♦♥ t r♦♠ ①t♥t ♣r♦t♥s ♥ ♠♦ ♦ ♥t♦r♦t♦♥ s s t♦ ♥r ♥trt♦♥s ♥ r♦s ♥st♦rs ♥ t ♦rt tr ♥trst♥② ts t♦rs r t♦ s♦ t r♦st♥ss ♦ ts ♣♣r♦ t♦ ①♣r♠♥t rtts ♥ tr♠♥♥ ♥trt♦♥st♥ ♣r♦t♥s sts ♠♦♥strt t ♠♣♦rt♥ ♦ ♦t♦♥r② ♠♦s t♦ ♠♣r♦ ♣rt♦♥s♦ ♣r♦t♥ ♥trt♦♥s ♥ ♥♦♠ ♥♥♦tt♦♥ ♥ ♥r s s ♥str ♥t♦r r♦♥strt♦♥

♥ r♦♥strt ♥str ♥t♦rs ♥ ♥ t♦ ♣♥♦t②♣ rtrsts ♦t② ♠t♦♥t♦rs ♥ ss♦t t♦ t ♦♦② ♦ ♠r♦♦r♥s♠ tr♦ ♦♥str♥ts r♦♥strt♦♥ ♥♥②ss Pr t ❯s♥ s ♥ ♣♣r♦ Pá t s♠t t rt ♦t♦♥ ♦♥trr ♥♦s②♠♦t tr ♥ ssss t rt ♠♣♦rt♥ ♦ st♦stt② ♥ st♦♥ ♥t ♣r♦ss ② r♥♦♠② t♥ ♥s r♦♠ t ♥♦♠ ♦ r ♥ tr ♠♦♥t♦r♥ t rst♥♠♣t ♦♥ t ♠t♦ ♥t♦r ♥ ♦♥st♥t② st♥ ♦r ♥tr♠ts ts t♦rs r t♦ s♠t r ♥♠r ♦ rt ♥♦s②♠♦♥t ♥♦♠s rst♥ rt ♥♦♠s r s♠rt♦ ♦tr ♥ t♦ t ♥♦s②♠♦♥t ♥♦♠s rt st♦♥ ♦r s ♦r t②s♦ s♦ ♥trst♥ r♥s ♥r♥t t♦ t r♥♦♠ ♦rr ♦ ♥ t♦♥s ♥ t♦♥ t♦ s♠t♦♥s♦♥str♥ts r♦♥strt♦♥ ♥ ♥②ss ♦ s ♦r r♦♥strt♦♥ r♦♥strt ♥str

Page 179: Early Evolution and Phylogeny

♥♦♠ ♦ st♠t ♥r t ♦♥str♥t tt t ♠t♦ ♥t♦r t ♥♦s s♦ t♦sst♥

♥♥ ♥t♦rs ♠② s♦ ♥ t♦ ♣♥♦t②♣ rs s♦ tt ♦t ♦ ♦♥t② r♦♥strt ♦r ♥st♥ s ♦♥ ♠♣r sts ♥ ♠♦s ♥ t t ♠♦ ♦ t♦♦t♦♣♠♥t ♣rts t ♥♠r ♥ t②♣ ♦ ♠♦rs s ♦♥ t rt s♦♥ ♦ t♦ t②♣s ♦♣r♦t♥s ♥ t ♠s♥②♠ ♠♦ ♥ ♣rt ♥tt♦♥ ♣ttr♥s ♥ ♠r♥ r♦♥ts ♠t t♦ ♣rt t♦♦t ♦♣♠♥t ♠♦♥ ♠♠♠s P♦② s♥♥ ♣t② r♥ t♦♣♠♥t ♦ tt ts s♠s t♦ ♥ r♦② ♦♥sr ♦r t♥s ♦ ♠♦♥ ②rs ♦r ♥♦r ♥t♦rs tt ♥r♦♥ ♣r♦♦♥ rrr♥♠♥ts ♥ ♦t♦♥ t ♠② ♣♦ss t♦ r♦♥strt♥str s♥♥ ♥t♦rs ♦♥t② t ♣♥♦t②♣ q♥tts s♥ ♠♦s ♦ ♦t♦♥ ♦ ♦♥t♥♦s ♦rsrt rtr P P t

♦♦♥♦♠s

♠♦♥ t srs tt ♦t♦♥ s ♥r ♥ ♥♦♠s r t trs ♦ rt ♥♥t ♦s②st♠s ♥♠ts ♦r♣ ♦sts ♥ ♦♦ tstr♦♣s t r♠r② ♠♦s ♦ sq♥ ♦t♦♥♥ s♦ ♣ ssss ♥♥t ♥r♦♥♠♥ts tr♥st♦r② ♥ts ♥ ♥trt♦♥s ♠♦♥ ♦r♥s♠s

♦r ♥st♥ ♠ss ①t♥t♦♥s ♠② t trs ♥ t rst♦♥ ♣ttr♥ ♦ s♣s ② ♣r♦♠♦t♥t ♦♣♠♥t ♦ ♥ r♥s ♦ t tr ♦ ♦♥ t ss ♦ t sstr st♥ ts ♣rt♦♥♥ ♠♠♠s ♥♥♠♦♥s t ♦♥ tt t ♥rt♦s ①t♥t♦♥ ♦ ♥♦srs ♥♦t♥rs tr rst♦♥ rt ♦r r♦r ♥♦♠s s♠♣s ttr ♠♦s ♦ tr r♦♥strt♦♥♦♠♥ t ♠♦s ♦ rst♦♥ ♥ ①t♥t♦♥ s♦ ♠♣r♦ ♦r ♥rst♥♥ ♦ t t♦rs♥♥♥ s♣t♦♥ ♥ ①t♥t♦♥ rts s

♠r② ② st♥s♥ s♣s tr ♥ ♥ trs ♠♦s ♦ tr♥ss♣ ♣♦②♠♦r♣s♠s ♥tt ♣♦♣t♦♥ s ♥ t ♣r♦ss ♦ s♣t♦♥ ♥♦s t rst♥s t②♥ s♣t♦♥ ♥ tsr♠♦r ♦ ♦ ♦♣♥ ①tr♥ rs t♦ ssss t r♦ ♦ ♦r♣ ♦r ♦♦ t♦rs ♥ t♣r♦ss ♥trt sts ♦♠♥♥ ♦r♣② ♦♦② ♥ sq♥ ♦t♦♥ r rr♥t② ♣♣r♥♥ rr② ♠ ♣r♦♠s ❲♥s t ♦③ t

♥ ♦tr t②♣ ♦ ①tr♥ r tt ♦ t t ♦t♦♥ ♦ s♣s s ♦♥ ♥ ♦tr s♣st t ♥trts ♦r ♥st♥ t s ♥ ②♣♦ts③ tt t rs ♦ ♥♦s♣r♠s trr trst♦♥ ♦ ♥ts ❲s♦♥ t ö♦r ♦r t ♥r② ♦♦t♦♥ t♥s♣s s ♥ ♣rs ♥ t st♦r② ♦ r♦♠ ①♠♣s s ♥trt s ♥♦s②♠♦ss ♦ ♠t♦♦♥r♦♥ ♦r ♦r♦♣st t♦ ♦♦t♦♥ t♥ ♦st ♥ ts ♣rst t ♥③ t st ♦t♦♥ ♦ t♦ ♥trt♥ ♣rt♥rs ♥♦r♠s ♦trs ①♣t② ♠♦♥ s ♥trt♦♥s ♦♥t t ♥r♥ ♦ trs ♥ trts ♦s ♦ ♦♦ ♦t♦♥ ♦ s♣r♠♣♦s ♦♥ ♠♦s♦ ♥♦♠ ♦t♦♥ t♦ ♠♥t ♥str ♦s②st♠s tr♦ t r♦♥strt♦♥ ♦ ♥str ♦♦♥t♦rs ♥t② ♥♥ t s♦ tt ♦♦ s ♥rr r♦♠ ♣rsr ♦ssss♠s t♥ r♦♠ t ♠r♥ ♠♦♥ ②rs ♦ r r② s♠r t♦ ①t♥t ♦♥s ♠ssrs ts ♦♥str♥ ♦♦ ♥t♦rs sst♥ tt ♠♦s ♦ ♦t♦♥ ♦ s t♦ r♦♥strtrst ♥str ♦♦ s r♦♠ t st② ♦ ♦♦♥ s♣s ♦♥ ♦ ♦ t♦ t st② ♦ ♦♥t♦r ♦ ♥trt♥ ♦r♥s♠s ♥ ♥r ♦ t② ♦ tr♦ t♠ t r t ♥s ♥ t♥trt♥ ♣rt♥rs ♦r ♥ ♥ s♣s ♥tr ♦r t t ♥t♦r ② ♥trt♥ ♦t♦♥ ♥ tst② ♦ ♦♦② ♠② ♣r♦ s r t♦ ♣rt ♦ ♦♠♠♥ts ♠② rt ♥ t ② str♥ ♥ tr♦♣ s ♦r ② r♣t ♠t ♥s

♦♥♦♠s

s♦♠ s♣s t ①t♥t ♥ ♦♥r♦♥t t rst ♥r♦♥♠♥t ♥ ♦trs sr ♥ ♦②s t♦ ♦♣ t ♥ ♥r♦♥♠♥t rss ♠② ♥♦t tt ♦♥② ② ts ♦♦t♣r♥t ♦♥ t ♣ttr♥ ♦rst♦♥ ♥ ①t♥t♦♥ t s ♥♦♠s ♣t trs ♦ ♥ ♣tt rs♣♦♥s ♦r ♥st♥rst♥ t tst t ②♣♦tss tt r♦♣ ♥ t♠♦s♣r CO2 ♠♦♥ ②rs ♦ trrt ♦♣♠♥t ♦ C4 ♠t♦s♠ ♥ rsss s C4 ♠t♦s♠ s ♠♦r ♥t t♥ t ♠♦r ♦♠♠♦♥

Page 180: Early Evolution and Phylogeny

C3 ♠t♦s♠ ♦ ts ♥ t② t r t ♣②♦♥② ♦ ①t♥t rss s♣s ♥ st♠t ♥st♦rs r② C4 ♠t♦s♠ tr♦ ♣r♦st ♠♦ ♥ t② tst tr C4

♠t♦s♠ ♣♣r s♥♥t② ♠♦r ♦t♥ tr t r♦♣ ♥ t♠♦s♣r CO2 t♥ ♦r rrsts s♦ tt tr ♠♦♥ ②rs tr♥st♦♥s t♦ C4 ♠t♦s♠ ♦r t r② rt ♥tr♥st♦♥s t♦ C3 ♠t♦s♠ t ♥r② ♥ rt rs ♦r tt t t rt ♦ tr♥st♦♥s t♦ C4

♠t♦s♠ s ♥ ts ♦♥r♠s tr s ♦rrt♦♥ t♥ t♠♦s♣r CO2 ♥ t ♦t♦♥ ♦ C4

♠t♦s♠Pr♦r②♦t ♦r♥s♠s ♣t t♦ t♠♣rtr tr♦ ♥s ♥ t ♥♦t ♦♥t♥t ♦ tr r♦s♦♠

s rs tr t ♦r② s t ♠♥♦ ♦♥t♥t ♦ tr ♣r♦t♥s ❩♦ t ♦♥sq♥t② ② r♦♥strt♥ ♥♥t ♥ sq♥s ♦♣t♠ r♦t t♠♣rtr ♦ ①t♥t♦r♥s♠s ♥ ♥rr tr t r♦♥strt ♥♥t r ♦♠♣♦st♦♥s ♦ t st♥rs ♦♠♠♦♥ ♥st♦r ❯ ♥ ♦♥ s♣♣♦rt ♦r ts ♠s♦♣② ♦r r♥t② ♦ss t

st♠t r♦t t♠♣rtrs ♦r ♣r♦r②♦ts ♥ t tr ♦ s♥ ♦t rs ♥ ♣r♦t♥sq♥s ♥ ♦♥ ♥ ♦r t♦ ♣ss ♥ t st♦r② ♦ ♥r♦♥♠♥t t♠♣rtrs tr♠♦t♦r♥rst ♥rs r♦♠ ♠s♦♣ ❯ t♦ tr♠♦♣ ♥st♦rs ♦ tr ♥ r ♥ t♥ st②rs ♥ t tr ♥♦♠ s s♦♥ ♣s ♥ ♣r♦s② r♣♦rt ② ♥♦tr st② r②♥♦♥ ♥♥t ♥ rsrrt♦♥ r t tt s♦ tt ts rs s r② s♠r t♦ t♦t♦♥ ♦ ♦♥ t♠♣rtrs ♦r t st ♦♥ ②rs ♦rt t ss♦♥ tr s ♦ ♠② ts ♦♥t♥♦s② ♣t tr ♦♣t♠ r♦t t♠♣rtrs t♦ t r t♠♣rtr♦♥ rt t♦ ts ②♣♦tss s ♣♣♥ ♥ s♠s t♦ t t t r② t ♦ ♥ ♠♦r♦♥♥♥ t r ♦♥ sttst tst ♦ t ♦rrt♦♥ t♥ ts t♦ t♥♥s ♦ ts ♥ ♠♦ ♦ t tt r♦♥strts ♥♥t ♥ ♦♠♣♦st♦♥ ♥ s♠t♥♦s② sssss t ♦rrt♦♥t ♥r♦♥♠♥t t♠♣rtr s ♥rr r♦♠ ♦♦② rt♦t

r ♠♦♥ ♥trt♦♥s t♥ s♣s ♥ t tr ♥r♦♥♠♥t tr♦ s♣t ♦r ♦♦rs ♦ rt② ♥t t st② ♦ ♦t♦♥ s s♦ ♥ sttst tsts ♦ t s♥♥♦ ♦rrt♦♥s t♥ ♥r♦♥♠♥t ♦♥t♦♥s ♥ ♦♦ ♣♥♦♠♥ ♥ ♥t♥ ♦ rt♥ ts ♦s♣r s♣ ♦tr ♥ tr ♦♥s ♦ ②rs ♦ ♦①st♥

♦♥s♦♥s ♥ ♣rs♣ts

♣st s ♥r ts ♥♦t ♥ ♣st ♥r r♦♥s ♦ rs♦♥ts ♥ ①t♥t♥♦♠s ♥ ♦♥② strt t♦ ①♣♦t t st♦r ♣♦t♥t ♦ ♠r♦♠♦s sttst♠♦s ♦ ♦t♦♥ tt ② ①♣♦t ssttt♦♥s ♥ ♣t♦♥ ♦ss ♥ tr♥sr ♥srt♦♥t♦♥s♥ rrr♥♠♥ts ♠② ♥♦t ♦♥② ② ttr rs♦ tr ♦ t s♦ t♦r♦ r♣rs♥tt♦♥ ♦t ♦ ♣r♦ss r♦ ♦rrt♦♥s t♥ ♥♦♠ ♣r♦♣rts ♥ ♥♦♥♥♦♠ rs ♥♦♠srtr ♦♠♥t t st♦r② ♦ ♥trt♦♥s ♦ ♦r♥s♠s t ♦tr ♥ tr ♥r♦♥♠♥t ❲ ♥ ♥♦♥s♦♥ t r♦♥strt♦♥ ♦ t st♦r② ♦ ♥♦♠s ♦♥ t t♦s ♦ ♠t♦ ♦r s♥♥ ♥t♦rs♣♥♦t②♣s ♥ ♥r♦♥♠♥ts

Page 181: Early Evolution and Phylogeny

♦①s

♦① s♦r♥ t♥ s♣s tr ♥ ♥ trs

♥tt♦♥ ♦ ♦rt♦♦♦s ♥s s ♥♦t ②s ♥q♦ rst P②♦♥tsts s② r② ♦♥t s♥ ♦ ♣t ♦♣s ♥ t tsts ♥r st② t ♣t♦♥s ♠② ♦rr r♥t st♦r② ♦ ♥ ♠② t♦t ♥ ♦♦s trs s s ♣rtr② r♠t ♥ t ♥t ♦r♣r♦ ♦sss ♥ t♦ s♣s ♦s r♥t ♦♣s ♦ ♥ ♥str② ♣t ♥ ♠♣t ♦ ts♣♥♦♠♥♦♥ ♥♦♥ s ♥ ♣r♦② s t t♦ st♠t ♦♥ r s t r♣r♦ ♦sss ♥ s♦♥ t♦ rq♥t tr ♦ ♥♦♠ ♣t♦♥s ♥ ②sts ♥ ss é♠♦♥ t ❲♦ ♥♥ t ♦♥ tr ♥ tr♥sr s ♥ s♦♥ t♦ ♣rs ♥ t st♦r②♦ ♥ t s ♥s t♦ ss♠ ♣r♦r tt t st♦r② ♦ ♥ s ♦ ♦ s ♥ts trts ♥t♦♥ r ♥ ♥s tt ♦ ♦♥sr ♥♥ ♦rt♦♦s ♠② ♥♦t rtr t st♦r② ♦s♣s t ♣rsst♥ ♦ r♥t ♦r♠s ♦ ♥ r♥ ♦♥ ♣r♦s ♦ t♠ rt t♦ t ♣st♥ s♣t♦♥ ♥ts ♣♥♦♠♥♦♥ ♥♦♥ s tr♥ss♣ ♣♦②♠♦r♣s♠s P ❲ t ♠② rst ♥ r♥s ♠♦♥ ♥ trs ♥♦♠♣t ♥ s♦rt♥ ♥ ♥ t s♥ ♦♣r♦② ♦r ss♠ ♦ ts ♣r♦sss ♠s t t t♦ ①♣t tt s♥ ♥ st♦r②♦ t② ♠rr♦r tr ♦ s♣s tr♦♦t sr ♦♥ ②rs ♦ ♦t♦♥ ♥ t♦♥ t♦ ts♦♦ ♣r♦♠s ♥ t ♠♦st ♥ ♣②♦♥t ♠t♦s r ♦t♥ ♥ t♦ rt② ♠♦ t♦t♦♥ ♦ ♦♦ sq♥s ♥ rst ♥ t ♥r♥ ♦ rr♦♥♦s trs

♦① ♦s♥ t rt ♥♠rs ♦ ♣r♠trs

♥② ♣r♦sss s♥♥t② ♦♥trt t♦ t ♦t♦♥ ♦ ♥♦♠s s♦♠ rt t♦ t ♥tr♥♠♥s ♦ t ♦trs t♦ t ♥trt♦♥ ♦ t ♦r♥s♠ t ts ♥r♦♥♠♥t ♥ ♦tr ♦r♥s♠st ♠② ♥rs♦♥♥ t♦ s ♠♦ ♦♥t♥ ♦r ♣r♦sss t ♦♥ ♦♥② ♠t q♥tt②♦ ♣r♠trs ♥ st♠t r♦♠ ♥t ♠♦♥t ♦ t t ②s♥ t♥qs♥ rt♥② t♦rt r ♠♥s♦♥ts t♥ ①♠♠ ♦♦ ♣♣r♦s s ♥trts ♦r t strt♦♥s ♦ ♣r♠trs ♥ ♦♥② ss ♣♦♥t st♠ts ♥ ts ttr s

Page 182: Early Evolution and Phylogeny

♥② s♠ rr♦r ♦r t ♦ r ♥ s♥♦ ts ♦r t r② ♦ ♦tr st♠tss♣② ♥ r ♥♠r ♦ ♣r♠trs r s ♦r ♠♦st rt♥② ♦s t♦ ♠♥ r♥t ♠♦s s ♣♥♥ ♦♥ t qst♦♥ ♥r srt♥② r ♦♥② t ♠♦st r♥t♣r♠trs r ♥ t♦♦ ♣r♠trs ♥ st♠ts r s t♦♦ ♠♥② ♥ tr r r♥s♣r♦t ♥② ♦♥s♦♥ ♥ ts rs♣t sss ♦ ♠♦ st♦♥ ♣rtr② ♣rss♥ ♥t②sr ♦rs st♠t t s ♦ ♣r♠trs t s♦ tr ♥♠rs tr♦ t♥qs s s rtPr♦ss ♣r♦rs rt♦t t P♣♣ s♥ t rrs ♠♣ r♦ ♥ ♦♥tr♦ ♠t♦s r t s♥ t ♦r P♦ss♦♥ ♣r♦sss s♥ t ♥qrt t rt♦t t♥qs t♦rt tr ♥♠r ♦ ♣r♠trs ♥ssr② t♦ s ♦♠♣① ♠♦s t r ♠♦♥ts ♦ t

♦① ♦tr

♣s ♥ ♥ trs

• st ②s♥ st♠t♦♥ ♦ ♣s r ②s♥ ♣r♦r♠ t♦ r♦♥strt s♣s trs r♦♠ ♥♥♠♥ts ♦♥t♥ ♦r r♥s♣ ♣♦②♠♦r♣s♠s tt♣stt♦s ♣

• ② ②s♥ ❯♥t♥♥ ♦ ♦♥♦r♥ ♥♦ts ②s♥ ♣r♦r♠ ♣r♠tt♥ t♦ ♥②s sr♥ ♠s s♠t♥♦s② ♦♥t♥ ♦r s♦♠ ♦rrt♦♥s t♥ ♥ st♦rs tr♦ ♥t♦trs ♠♣s tt♣stts rt②t♠

• Pr♠ t ♦ s♦tr t♦ ♥②s ♥ ♠s ♥ t ♣rs♥ ♦ ♣t♦♥s ♥ ♦sss ♦♥t♥♦r ♥♦♥ s♣s tr tt♣♣r♠sss

♥♠♥t ♥ ♣②♦♥②

• P② ②s♥ ♥♠♥t ♥ P②♦♥② st♠t♦♥ ②s♥ ♣r♦r♠ t♦ r♦♥strt ♥♠♥ts ♥ ♣②♦♥t trs tt♣♦♠t♠sr♣②♥①♣♣

• tt♥ ②s♥ ♣r♦r♠ t♦ r♦♥strt ♥♠♥ts ♥ ♣②♦♥t trs tt♣♣②♦♥②ttt♥

• ♠♦ ②s♥ ♣r♦r♠ t♦ r♦♥strt strtr ♥♠♥t s s ♣②♦♥t trstt♣s r♠trs♠♦

• rt ♠♥♦ ♥ sts ♦tr ♣ t♦ ♥ ♥②s ♥♠♥ts ♥ ♣②♦♥t trs tr♦ tr♥srs ♥♦t② ♦r sq♥s s s s♦♥r② strtrstt♣♦♦r

♥rs♦♥s ♥ ♣②♦♥②

• r ②s♥ ♥②ss t♦ sr ♥♦♠ ♦t♦♥ ② rr♥♠♥t r s ②s♥♣r♦r♠ t♦ ♥②s ♥♦♠ ♦t♦♥ tr♦ ♥rs♦♥s tt♣rq

rtr ♦t♦♥

• tr ttst ♥r♥ ♦ ♥t♦♥ r♦ ♦t♦♥r② t♦♥s♣s tr ♣rts t ♥t♦♥ ♦ ♥s ♥ ♥ ♠② s ♦♥ ♠♦ ♦ ♥t♦♥ ♦t♦♥ ♥ ♦♥ ♣②♦♥t tr ♦ t♥ ♠② tt♣strr②

• sqt sqt s ♠♦r s♦tr tr♥ sr ♣s ♦♥ t♦ r♥ r♦s t②♣s ♦♥②ss t ♦s ♦♥ t♦ ♥②s t ♦t♦♥ ♦ srt ♦r ♦♥t♥♦s rtrs ♦♥ ♣②♦♥② s s t s♣ ♦ ♣②♦♥② tt♣♠sqt♣r♦t♦r♠sqt♠sqtt♠

• ②srts ②s♥ ♣r♦r♠ ♦♥ ♦♥ t♦ ♥②s t ♦t♦♥ ♦ srt ♦r ♦♥t♥♦s rtrs ♦♥ strt♦♥ ♦ ♣②♦♥s tt♣♦t♦♥r♥②srtst♠

Page 183: Early Evolution and Phylogeny

• ♣ P ♦ ♥t♦♥s t♦ s ♥ t sttst s♦tr ♣ ♥♦t② ♣r♠ts ♥②s♥ t♦t♦♥ ♦ srt ♦r ♦♥t♥♦s rtrs ♦♥ ♣②♦♥② ♦r st②♥ s♣s ♦ ♣②♦♥stt♣♣♠♣rr

♥t♦♥s

r♥ss♣ ♣♦②♠♦r♣s♠s Ps sr♥ ♠♦♥ s♣s ♦ s ♥rt r♦♠ ♥ ♥st♦rs s r ♣r♦r t♦ s♣t♦♥ s♦ tt ♥ trs r♦♥strt s♥ ts ♥s ♠② r♥t r♦♠ t s♣s tr

♥♦♠♣t ♥ s♦rt♥ sr sr♣♥② t♥ ♥ tr ♥ t s♣s tr t♦ t ♦♥srt♦♥ ♦ ♥str ♣♦②♠♦r♣s♠s ♥ r♥t s♣s tr♥ss♣ ♣♦②♠♦r♣s♠s

r♦ ♦ Pr♦st ♠♦ ♦ ♣r♦ss ♥ t stt t t♠ t + 1 ♦♥② ♣♥s ♦♥ sttt t♠ t ♥♦t t t♠ t − 1 ♦s ♦ ssttt♦♥ ss♠ t ssttt♦♥ ♣r♦ss s ♠r♦♥ ssttt♦♥ x → y ♦s ♥♦t ♣♥ ♦♥ t stt ♣r♥ x

♥ r♦ ♦ Pr♦st ♠♦ s t♦ sr sss♦♥ ♦ stts ② ss♦t♥ ♥ stts t ♦sr ♦♥s r♦ ♦ s s t♦ sr tr♥st♦♥s t♥ ts♥ stts ♥ stts ♠② ♥tr♦♥ ♥tr♥ ♦r ①♦♥ ♦r ♠♦s ♣rt♥ ♥strtr s♦ ♦r st ♦r ♠♦s ♣rt♥ ♦t♦♥r② rt ♦r r♥t tr t♦♣♦♦s ♦r ♠♦s♣rt♥ ♥ trs ♦r r♦♠♥t♦♥

①♠♠ ♦♦ ♥r♥ ♦r ♥ ♣r♦st ♠♦ M t s♣ ♣r♠trs ♥♣rtr t D t ①♠♠ ♦♦ s ♦ ts ♣r♠trs ♦rrs♣♦♥ t♦ t s ♥r t s ♠♦st ♣r♦ tt t ♠♦ s ♥rt t t ♦♥ s t♦ s♠t ♥ t t♠♦ M ts s s♥ ts s tt t D ♦ ♦t♥ ♠♦st ♦t♥

②s♥ ♥r♥ ♦♦ s t ♣r♦t② ♦ t t ♥ t ♠♦ ②s♥ ♥r♥ ♥sts t t ♣r♦t② ♦ t ♠♦ ♥ t t s♦ ♥♠ ♣♦str♦r ♣r♦t② s ♣♦str♦r ♣r♦t② ♦ ♠♦ s ♣r♦♣♦rt♦♥ t♦ t ♣r♦t ♦ t ♦♦ ♥ ♦ ♣r♦r ♣r♦t② ♣r♦r ♣r♦t② ♣r♠ts t♦ ♥♦r♣♦rt ①tr♦r ♥♦ ♥t♦ ♥ ♥②ss ♦r ♥st♥ ♦♥♦ ss♠ tt t ♣r♦r ♣r♦t② ♦r t tr♥st♦♥tr♥srs♦♥ rt♦ ♥ ♣rtr tst♦♦s ♥♦r♠ strt♦♥ ♦♥ [1; 10] ♦♥trr② t♦ ①♠♠ ♦♦ ♥r♥ t ♦♠♠♦♥♣rt ♥ ②s♥ ♥r♥ s ♥♦t t♦ rtr♥ ♣r♠tr s ♦ st ♣♦str♦r ♣r♦t② ♥st ♦ strt♦♥s ♦ ♣r♠tr s r rtr♥ ♦ ♦t♥ ts strt♦♥s t♥qs r ♦t♥ s

r♦ ♥ ♦♥t r♦ ♦rt♠ s t♦ s♠♣ r♦♠ ♣r♦t② strt♦♥ ②♥ r♦ ♠♦ ♦s qr♠ strt♦♥ s t sr ♣r♦t② strt♦♥ s♠♥s tt ♥ t ♥ s ♥ r♥ ♦r s♥t② ♦♥ t♠ stt s st t rq♥② q t♦ ts ♣r♦t②

st ❯♥rs ♦♠♠♦♥ ♥st♦r ❯ ♠♦st r♥t ♥st♦r ♦ tr r ♥ r②♦ts

Page 184: Early Evolution and Phylogeny

r♥s

r♦rr② ♦ tt t rr♥ ♥s ♦rs ♥t②♥ tr ♥r♥sr ♥ts Ps ♦ P ②♠♣♦s♠ ♦♥ ♦♦♠♣t♥

♥é é rt rt ♠ ♠t t② t ♦s ♥t♦♥s ②s♥ st♠t♦♥♦ ♦♥♦r♥ ♠♦♥ ♥ trs ♦ ♦ ♦

rst rs r♥ ♥♥r♦tt rr♥ ♥s t ♥♥ ♥t ②s♥ ♥s♣str r♦♥t♦♥ ♥ ♦rt♦♦② ♥②ss s♥ ♦♥♦r♠ts ♣♣

rst rs r♥ ♥♥r♦tt rr♥ ♥s t ♥♥ ♥t ♥ tr r♦♥strt♦♥ ♥ ♦rt♦♦② ♥②ss s ♦♥ ♥ ♥trt ♠♦ ♦r ♣t♦♥s ♥ sq♥ ♦t♦♥ ♥

r② ♥r ♦♥ r rt r♥t ♦ ♥♠♥ ♥ r P♦r t♥ ér♥sétr ♥ ❱♥♥t ♥t♦r ❱ér♦♥q t r♥③ r t ♥ ss♦♥ ♥♥ ♥ s ♦♦ â♠r r♥s♦ r♦rt ♥r ♦ ♦r♦♥ ♣♥ t♥ r ♥♥r ss♠ ♦♥ ♦t③ tr♥ ♦r♥ ♦ë ♥♥ ♣èr rs♥ ♥s② ♦♣ ♦ rs③ ♦ Ptt♥r ♠t P♦♥ ③ r♥ç♦s rr♥♦ ❱♥♥t ❩s r ss♥ P♣♣ étr♠r r ❲ss♥ ♥ r♣ ätr ❱♥♥t ♣r♥ ♥ ②r r♦♥ ♥ t ❲♥r Ptr ♦ tr♥s ♦ ♦♥♦♠ ♣t♦♥s r ② tt Pr♠♠ ttrr tr

♣tst s♦ ♦ r♦s t ♦♦tt ❲ ♦ ♦rt♦♦♦s♥ ♣②♦♥s r② s♣♣♦rt trt♥♥ ♦ ♦

rr ♥ t P r Prt♥ ♥t♦♥ ♥ ♥s r♦♠ ♣②♦♥tsttst ♥②ss♦ ♦ ♥♦♠s P♦ ♦♠♣t ♦

ttst③③ ❯ ♦ ♥r t s r ♥♦♠ t♠s ♦ ♣r♦r②♦t ♦t♦♥♥sts ♥t♦ t ♦r♥ ♦ ♠t♥♦♥ss ♣♦t♦tr♦♣② ♥ t ♦♦♥③t♦♥ ♦ ♥ ♦ ♦♦

r P t s♥st♥ ①♠♠♦♦ st♠t♦♥ ♦ ♠rt♦♥ rts ♥ t ♣♦♣t♦♥♥♠rs ♥ t♦ ♣♦♣t♦♥s s♥ ♦s♥t ♣♣r♦ ♥ts

♦ ♦rt r♦ ♠♦t② t ♥ r ②s ♦ ♥ sr♥ ♥ ♣r♦r②♦tsPr♦ t ❯

♦♠♥ r♠♠♦♥ ① t P♦ss r② rs rs ♣♦♣t♦♥ strtr ♥ r♥t♠♦r♣ st♦r② ♦ ts r♥♦r ♦st ♥

♥♥♠♦♥s P r♦ r ♦♥s t P ♦ss ♦♥ r♥②rr Pr ♠♥t ❱♦s tr tt♠♥ ♦♥ t Prs ♥② ② rs♦ ♣rs♥t② ♠♠♠s tr

s♦♣ t ♦♠♣s♦♥ ①♠♠ ♦♦ ♥♠♥t ♦ sq♥s ♦ ♦

♥qrt ♠ t rt♦t ♦s ②s♥ ♦♠♣♦♥ st♦st ♣r♦ss ♦r ♠♦♥ ♥♦♥stt♦♥r② ♥ ♥♦♥♦♠♦♥♦s sq♥ ♦t♦♥ ♦ ♦ ♦

♦ss st♥ rr ♦ r♥ r♦♥ t ♦rs♥t♦♥ t ♥rss♦♥ ♦♠♣tt♦♥ ♥r♥ ♦ s♥r♦s ♦r ♣♣r♦t♦tr ♥♦♠ ♦t♦♥ Pr♦ t

♦ss st♥ ♥qrt ♠ s ♥♠r rt♦t ♦s t ♦② ♥♦♦ Pr ♣tt♦♥s t♦ ♠♣rtrs ♥ t r♥ ♦♥ tr

Page 185: Early Evolution and Phylogeny

r② ♦rt t ♦♠s ♥ r♥srs ♥ ♠r♥ ♣r♦st r♠♦r ♦r ♠♦♥♥s ♦♥ trs ♦♥♦r♠ts

r♦♥ ♦② t rs ❲ t t♥♦♣ ❯♥rs trs s♦♥ r ♦♠♥ ♣r♦t♥ sq♥ t sts t ♥t

rt ❲ r② ♥♥ ♦♥s ♠ ♦rr Pt♦♥ ♠t ❲♥s♦r ③♥♦ rs t ❲♥t♦♥ ②♥♠s ♦ r♦♠♦s♦♠♦t♦♥ ♥ rs ♥ ♠♠♠s tr

rst♥s r②♥ t ♥♦s ② st♠t♥ s♣s ♣②♦♥② r♦♠ ♥tr ♣r♦tss♣t ♥♦♠♣t ♥ s♦rt♥ ♥ ①♠♣ r♦♠ ♥♦♣s rss♦♣♣rs ②st ♦

♦ ♥♦ t ♠ ♥♦ ♦ ①t♥t ♦ ♦r③♦♥t ♥ tr♥sr Pr♦ t ❯

rst♥ Ps♥t♦♥ s♥r ♠ ♠rt♥ ♠♥ ♥ ♦♥s♦♥r♦r ♦♥♥ ❱♥♥t t ♠♥ ♦s ♦♥ ♥ ♣r♦♠♦t ♣♦t♦s②♥tss ♥ rsss rr ♦

r r♥s ♦rs ♦s ♦♥ r♥ rst♥ r② rst♦♣r ♥ r♥ t♦r Pr ♦r t♦♠t r♦♥strt♦♥ ♦ ② rs♦ tr ♦ ♥

♥ t rt♥ ❲♠ tr ♦ ♦♥ ♣r♥t ♥♦♠ ♦

♥ t rt♥ ❲♠ ♥str ♥♦♠ s③s s♣② t ♠♥♠♠ rt ♦ tr ♥tr♥sr r♥ ♣r♦r②♦t ♦t♦♥ Pr♦ t ❯

r♥ r♦♥ ós stá♥ t ♥ r ②♥♠s ♦ ♥♦♠ rrr♥♠♥t ♥ tr♣♦♣t♦♥s P♦ ♥t

r♥ rs ♥ t ♦r♥ ♦ s♣s ② ♠♥s ♦ ♥tr st♦♥ ♦r t ♣rsrt♦♥ ♦

♦r rs ♥ t str ♦r st ♥ ♦♥♦♥ ♦♥ rr②

♥ ❱♥♥t t Prrèr ② strtr♥ ♦♥ t ♥♦♠ ♦♠♠♦♥ tr ♥ ♣r♦r②♦ts ♦ ♦ ♦

♥ ❱♥♥t ♦r♥ ♥② t ♠♥ ♦r P②♦♥ts ♥ t ♦s♦♥ ♦ tr♥♦♠s ♥

♥♥ ♠s t ♦s♥r ♦ s♦r♥ ♦ s♣s trs t tr ♠♦st ② ♥trs P♦ ♥t

s réér r♥♠♥♥ ♥♥r t P♣♣ ré P②♦♥♦♠s ♥ t r♦♥strt♦♥ ♦t tr ♦ t ♥t

② ♦♥ ❲ ♥ ② ♠♦♥ ①♥rss♦♥ r♥ s r t Ptr ♦r rt ♥tt♦♥ ♦ ♥♦ ♠♥ ♥s tr♦ s♠t♥♦s ♥ ♣rt♦♥ ♥ ♠♥ ♠♦s ♥rt ♥♦♠ s

♦♦tt ❲ P②♦♥t sst♦♥ ♥ t ♥rs tr ♥

♥s② ♦♦ ♦ ♦ ♥ ♠② ♦t♦♥ P tss ❯♥rst② ♦ ❲s♥t♦♥

Page 186: Early Evolution and Phylogeny

♥♥ s② ❲ ♥♦ ♥rs ts P♥ ♥ r♦♥ ❲♠ ♠t t♣♥ r ♥ ♦s r ❲ st tts ♦♠ r♦r② ør♥s♥ rt♥ ❱ ♦t♥ ♠ts ♥rs s ♦ rst♥s♥ ♥rt ør ❲r ❲r rt♥ r t rt ♦♥③♦ r♦ ♣②♦♥♦♠ s♠♣♥ ♠♣r♦s rs♦t♦♥ ♦ t♥♠ tr ♦ tr

♥♥ ♥♥r ❲♠s r rt♥③ ♦ ❲♦♦ t r♥ ♦s ♦♠♣t♦♥ ♥ ♥t♦r ♥②ss ♦ ♠r♥ ♦♦ s P♦ ♦

rt ♦r t tr tr♦♥ ♦♥srt♦♥ ♦ ♥♦♥♦♥ sq♥s r♥ rtrts♦t♦♥ ♣♦t♥t ♥♦♠♥t ♥ ♣♦sttr♥sr♣t♦♥ rt♦♥ ♦ ♥ ①♣rss♦♥ s s

r ♦rt ❯ ♠t♣ sq♥ ♥♠♥t ♠t♦ t r t♠ ♥ s♣♦♠♣①t② ♦♥♦r♠ts

♥rt rr ♦r♥ rt♦r tr②♥ t r♥♥r t♥ Pr♦t♥♠♦r ♥t♦♥ ♣rt♦♥ ② ②s♥ ♣②♦♥♦♠s P♦ ♦♠♣t ♦

②r❲r t rst ♦t♦♥ ♦ s♦♦rs t ♥t

♥r ❲♠ q♠ ♦r ♥ P♥♥ ♦♦s

t ❲ st♥s♥ ♦♠♦♦♦s r♦♠ ♥♦♦s ♣r♦t♥s ②st ❩♦♦

tr t ♦r② t♦♥s♣s t♥ ♥♦♠ ♦♥t♥t s♦♥r② strtrs♥ ♦♣t♠ r♦t t♠♣rtr ♥ ♣r♦r②♦ts ♦ ♦

tr ♦rss t ♦② ♥♦♥②♣rtr♠♦♣ ♦♠♠♦♥ ♥st♦r t♦ ①t♥t ♦r♠s♥

r ♦♠s♦♥ r♥ t ♥♥r ♥rr♥ t ♣♦♥r♦♥♠♥t ♦♥♥t tr ♦♥ t ss ♦ rsrrt ♣r♦t♥s tr

r r ♦♥r♥ rr t ♥s ♠♦② P♦t♠♣rtr tr♥ ♦r Pr♠r♥ ♥rr r♦♠ rsrrt ♣r♦t♥s tr

♥ ❲♥ ♥ t ♠ ♥②♦♥ ♦ ♦ r ② ♥♦♠s st♠ts ♦♦r③♦♥t ♥ tr♥sr P♦ ♦

tt rr♥ t ♦ ♠t♦s ♥tt♦♥ ♦ ♥ ♣t♦♥ ♥ ♦r③♦♥t r♥sr ♥ts ♥

♦ ❲♦♥ t ♦♥ r♥ ❯♥♦r♥ rt rt♦♥ ♦ tr ♥ tr♥sr r♥ tr♥♦♠ ♦t♦♥ ♥♦♠s

♥ ♥ ♦rt♠ ♦r sttst ♥♠♥t ♦ sq♥s rt ② ♥r② tr P ②♠♣

♦♦♠♣t

♦♦t sr rst♥s♥ ♥ ♦♠s t r♣ ♥♦♠ rt♦♥s♣s♥ s♣t♦♥ t♠s ♦ ♠♥ ♠♣♥③ ♥ ♦r ♥rr r♦♠ ♦s♥t ♥ r♦ ♠♦P♦ ♥t

♦♠s t r♥♦ ❲ ♦t♦♥r② s ②s♥ ♣♣r♦ t♦ ♠t♣ ♥♠♥t ♦♥♦r♠ts

♥ ♥♥ t ♦rt♥ ♦♥♥ Ptr ♥♥t ♦r③♦♥t ♥ tr♥sr ♥ ♥t ♣②♦♥tr♦♥strt♦♥ r♥s ♥t

s♥ P t ♦♥qst ❨ ②s♥ ♥r♥ ♦ ♣②♦♥t trs ♦♥♦r

♠ts

Page 187: Early Evolution and Phylogeny

s♥ P rt t ♦♦r ♦♠♣♦♥ ♣♦ss♦♥ ♣r♦ss ♦r r①♥ t ♠♦r♦ ♥ts

s♥ ♦♥ P rt rt r r t ♦♥qst rr P♦t♥t ♣♣t♦♥s ♥♣ts ♦ ②s♥ ♥r♥ ♦ ♣②♦♥② ②st ♦

s♥ ♦♥ P rt rt t r♦ ②s♥ ♣②♦♥t ♠♦ st♦♥ s♥rrs ♠♣ r♦ ♥ ♦♥t r♦ ♦ ♦ ♦

s♥ ♦♥ P ♥ ♦♥ r♦st ♠♦♥ ❲ t P♦♥ r ♦s♦s② rt♣r♦ss ♠♦ ♦r tt♥ ♣♦st st♦♥ ♥ ♣r♦t♥♦♥ sq♥s Pr♦ t ❯

♥♥♥ t ❲t♥ ♠ t ♦ ♥ ♦ ♦♥ t ♦s♥t t♠ ♥ t ♠♥♠♣♥③ ♥str ♣♦♣t♦♥ ♦ ♦ ♦

♥ tr②♥ ♥s str t r♥ Prt♥ ♦t♦♥r② ♣ttr♥s ♦♠♠♠♥ tt r♦♠ ♦♣♠♥t tr

③rs ♥♥ t s♠r r rst ②s♥ ♠t♦ ♦r s♠♥t♥ sq♥♥♠♥ts ♥ tt♥ ♥ ♦r r♦♠♥t♦♥ ♥ ♥ ♦♥rs♦♥ tt ♣♣ ♥t ♦ ♦ rt

♥♠♥ ♥ t ♥♦② ♦ r ♣♦♣t♦♥s ♦r♥ ♦ ♣♣ Pr♦t②

♥♦s ② t rst♥s r②♥ ♠t♥ s♣s t♦t ♠♦♥♦♣②t ♥ trs ②st

♦③ ♥♥t r♠ tr♥ t ❲♥s ♦♥ ♥trt♥ s ♥r♦♥♠♥tt ♥t♦ ♦t♦♥r② ♦♦② r♥s ♦ ♦

t♦ r tr t ♥♥ ♠s ♥♦♥sst♥② ♦ ♣②♦♥t st♠ts r♦♠ ♦♥t♥tt ♥r ♦s♥ ②st ♦

r♥ ♥ t r tt♦ ♦r③♦♥t ♥ tr♥sr rt Pr♦ t

♦rr ♦ sq♥ ♥♠♥t ♥ s t st♦♥ ♦ tr t♦♣♦♦② ♦ ♦ ♦

♥♥ ② t rr ♥ s ♦r ts s♠♣ rt② ♦r ♠t♣ sq♥ ♥♠♥ts ♦ ♦ ♦

rt rt ♥ ♦s♣ t ♠♦♥ ♦♥ ②s♥ ♣♣r♦ t♦ t st♠t♦♥ ♦♥str ♥♦♠ rr♥♠♥ts ♦ P②♦♥t ♦

rt♦t ♦s Prs♦♥ ♦♠♠♥t♦♥

rt♦t ♦s t P♣♣ ré ②s♥ ♠①tr ♠♦ ♦r r♦ssst tr♦♥ts ♥ t♠♥♦ r♣♠♥t ♣r♦ss ♦ ♦ ♦

rt ♠♠♥ ♥ ❱♥♥t t ♦r♥ ♥② r♦♠ ♥ trs t♦ ♦r♥s♠ ♣②♦♥②♥ ♣r♦r②♦ts t s ♦ t ♠♠Pr♦t♦tr P♦ ♦

♥③ ♦♦ ♦① r♥ç♦s ♦♦② ❨♦s♥ ♥ ♥r ♦♠♥ P♣♣ s♥ t♠r rst♥ Pr♥♦ r♥ ♥ r r ❲ ❨♠♦ ❨♦s♦ r♠ ❨ Pr③rr♦ ♠♦ ❲str♦♠ ♦r r♠ st♥ t t♠♥ r ♥r♥ ♦r♥ ♦r t ♥t♠t ss♦t♦♥ t♥ ♠♥s ♥ ♦tr ♣②♦r tr

Page 188: Early Evolution and Phylogeny

♥ t Pr ♥♥s ♣s trs r♦♠ ♥ trs r♦♥strt♥ ②s♥ ♣♦str♦rstrt♦♥s ♦ s♣s ♣②♦♥② s♥ st♠t ♥ tr strt♦♥s ②st ♦

♦r② s②♠♠tr ssttt♦♥ ♣ttr♥s ♥ t t♦ str♥s ♦ tr ♦ ♦ ♦

♦♥ ♥②♥ trá♥ str ♦r♥t♦♥ ♥ t ❲♥ ❲♥ ♦r♥ ♦ ♥ ♥s ♠♣ssr♦♠ t ②♦♥ ♥ ♦ t ♥t

♥tr rt♦♥ ós stá♥ r♠♠♦♥ ① ♥s♥ ♥s t t ♥ ♦t♥ ②s♥♦st♠t♦♥ ♦ ♣②♦♥② ♥ sq♥ ♥♠♥t ♦♥♦r♠ts

②♥ r♥s ♦ ♥♦♠ rttr ♥r ss♦s ♥ ♥r♥

ö②t②♥♦ r t ♦♠♥ P②♦♥②r ♣ ♣♠♥t ♣r♥ts rr♦rs ♥ sq♥ ♥♠♥t ♥ ♦t♦♥r② ♥②ss ♥

s♦♥ ❲②♥ P ♥ trs ♥ s♣s trs ②st ♦

s♦♥ ❲②♥ P t ♥♦s ② ♥rr♥ ♣②♦♥② s♣t ♥♦♠♣t ♥ s♦rt♥②st ♦

r♦ sr ❲♦ ❨ ♦r♦♥ r♥ ♦♦♥♥ P♦ P♦ r♠② ❱ P♦♦♥ ♦ ❱ r♦r ♦ ❨ ♦sr s ♥ ♦♦st♥ ♥s P♥② ❱ ❲r s ♦ ❨ ♥s♦♥ ♥ í③ñ③ ♦st ♠♥♦ ❱ ❲tr ❲ r♦t ♦r tr♠♥♥ rr♥♦ ♥s♥ ❳ ❨ st♦r♥ ♠r Prr rt r♦♥t t♥s ♥ t ❯♥ r ♥♠♠r rs♦♥ P ♦③②♥ ❲♠r t s ♦♠♣rt ♥♦♠s ♦ t t tr Pr♦ t

t③r r ttst ♥♠♥t s ♦♥ r♠♥t ♥srt♦♥ ♥ t♦♥ ♠♦s ♦♥♦r♠ts

ós stá♥ ♦á á♠ ♦♠ á③s t ♥ ♦t♥ ♦ r② ♥ ♣rt trt② ♦ ♣r♦t♥ strtr ♣rt♦♥s ♦♥♦r♠ts

♥♥ ❱♠r ♦r♠♥ r♥ ♥ ♥ t r r ♠t♣ ♥♣♦♥t♠♦ s t♦ ♠♦r rt r♦♠♥t♦♥ tt♦♥ ♦♥♦r♠ts

r♥ ♦rs ♥♥r r♦r ♣r♥ ❨ t ♦♦♥♥ ♥ ❱ ♦rt♠s ♦r ♦♠♣t♥ ♣rs♠♦♥♦s ♦t♦♥r② s♥r♦s ♦r ♥♦♠ ♦t♦♥ t st ♥rs ♦♠♠♦♥ ♥st♦r ♥♦♠♥♥ ♦ ♦r③♦♥t ♥ tr♥sr ♥ t ♦t♦♥ ♦ ♣r♦r②♦ts ♦ ♦ ♥

ts♦♥ ♣r♦st trt♠♥t ♦ ♣②♦♥② ♥ sq♥ ♥♠♥t ♦ ♦

♦r ♦rr rs ❱ ♦r r r t Pr ♦♠ P②♦♥②♦ t ♥ts rst♦♥ ♥ t ♦ ♥♦s♣r♠s ♥

t♦ tt t r♦s s ♦st P♦♥♦♠s ♥ rtrts ♦r t r♦r② ♦ ♦st♥♦♠s r♦♠ t ♠st ♦ t♠ ♦ss②s

s ♥♠r t ♦r② ♥ ♥ ♠t♦ ♦r ssss♥ t t ♦ r♣t♦♥ ♦♥ s ♦♠♣♦st♦♥ s②♠♠tr② ♦ ♦ ♦

♦tr♠ ♥s t r♥ ♦ ♥♦ ♠t♦ ♦r st ♥ rt ♠t♣sq♥ ♥♠♥t ♦ ♦

Page 189: Early Evolution and Phylogeny

♠♥ r♥ t r♦s♠♥ tr ♥ tr♥sr ♥ t ♥tr ♦ tr♥♥♦t♦♥ tr

P ♥rr♥ t st♦r ♣ttr♥s ♦ ♦♦ ♦t♦♥ tr

P r ♥r t rr ♥ ②s♥ st♠t♦♥ ♦ ♥str rtr stts ♦♥♣②♦♥s ②st ♦

P♥♥② ♦♥ ❲ ♠♦t③s r♦rs ttr② ♥s t ♦rts♦♥ ♦♥strt♦♥♦ ♥str ♣r♦t♥ ♥trt♦♥ ♥t♦rs ♦r t ❩P tr♥sr♣t♦♥ t♦rs Pr♦ t ❯

P♦② P ♦t♦♥r② ♦♦② ♦♣♠♥t t t tr

P♦♥ r ♦s♦s② P♦s r♥♦r ❲♦ rst♦♣r t r♦st ♠♦♥ ❲ t♦♠t ♣②♦♥t tt♦♥ ♦ r♦♠♥t♦♥ s♥ ♥t ♦rt♠ ♦ ♦ ♦

Pr t♥ ♥♥r t Pss♦♥ r♥r Ø ♥♦♠s ♠♦s ♦ ♠r♦ st♥ t ♦♥sq♥s ♦ ♦♥str♥ts t r♦♦

Pá s P♣♣ á③s rr rt♥ sr♠② Pétr r t♣♥ t rst r♥ ♥ ♥ ♥sst② ♥ t ♦t♦♥ ♦ ♠♥♠ ♠t♦ ♥t♦rs tr

♥♥ r t ❨♥ ❩♥ ②s st♠t♦♥ ♦ s♣s r♥ t♠s ♥ ♥str ♣♦♣t♦♥s③s s♥ sq♥s r♦♠ ♠t♣ ♦ ♥ts

t♠♥♥ r ør♥s♥ ♥② r♦r t♠♣ rs♦♥ ② t ❲ rst♥ ❯s♥ ♦♦r ♥r♥ t♦ ♦♠♣r ♦t♦♥r② ②♥♠s ♦ t ♣r♦t♥ ♥t♦rs ♦ ♣②♦r ♥ P ♣r♠ P♦ ♦♠♣t ♦

♥s ♥♠♥ t r r ♦♥t ②s♥ st♠t♦♥ ♦ ♥♠♥t ♥ ♣②♦♥②②st ♦

♥s ♥♠♥ t r r ♥♦r♣♦rt♥ ♥ ♥♦r♠t♦♥ ♥t♦ ♣②♦♥② st♠t♦♥♦r r♣② ♠r♥ ♣t♦♥s ♦ ♦

s ♦rt st♠t♥ rst♦♥ rts r♦♠ ♣②♦♥t ♥♦r♠t♦♥ r♥s ♦ ♦

sr r♦♠♦s♦♠ rrr♥♠♥ts ♥ s♣t♦♥ r♥s ♦ ♦

♦rt t ss♦♥ ♣♦t♠♣rtr r ♦r t Pr♠r♥ ♦♥s s ♦♥ s♦♥s♦t♦♣s ♥ rts tr

♦s t ♦♥ r ♥♦♠ ♥s s t♦♦ ♦r ♣②♦♥ts r♥s ♦ ♦

♦s ♥t♦♥s t rr♦ ♥ ss ♥ t tr ♦ P♦ ♦

♦s♥r ♦ t ♦ ♥ s♦r♥ ♦ s♣s trs t tr ♠♦st ② ♥ trs ts ♦ t① ②st ♦

t Ptr ♦r t ♥ ♦t♥ ♦♠♥♥ sttst ♥♠♥t ♥ ♣②♦♥t ♦♦t♣r♥t♥ t♦ tt rt♦r② ♠♥ts ♦♥♦r♠ts

♥♥ ♥ r♥ r♦♥ ♦♥♥t ♥ ②r♥ ♥ P ❲♦♦t ♥ t ❲♦ ♥♥t ♥♣♥♥t s♦rt♥♦t ♦ t♦s♥s ♦ ♣t ♥ ♣rs ♥ t♦ ②st s♣s s♥r♦♠ ♦♥♦♠ ♣t♦♥ Pr♦ t ❯

Page 190: Early Evolution and Phylogeny

♥ r♥ ♦r Pr t ②♥♥ rt♥ ♥♦♠s ♥ ① t ♦t♦♥ ♦ r ♥♣r♦t♦tr ♥ ♦♥t♥t ♥♦♠ s

♥ r♥ ②♥♥ rt♥ t t s ♥♦♠ trs ♥ t ♥tr ♦ ♥♦♠ ♦t♦♥♥♥ r♦♦

t♥ r♦ ③t♦ ♥ t ♦r♥str♥ rr ❯❯❯ t P s♥ ♣r♦t♥ ♥ ♥♦♠ ♥♠♥ts ♦r ♠♣r♦ ♥ ♣rt♦♥ ♥ t ♠♥ ♥♦♠ ♥♦♠ ♦ ♣♣

tr ①♥r ♥ r♣♦r P♦② Prs♥ ♦ Prts ♦♣♦ rs♦♥♦s♣ ❲ r♦s② ♥ s♠ss♥ tt ♦② s♠t ♦rs ♠② ② r♠ r♥♥ s rt♦rs rr ②s Pr♦t r② r♦s♦♣ ♥♦♠ ♦s ♠②♥rs ♥ s♣ ♥t Pt♥ ♥t Pr ♥❲♦♥ ♥ r ❱ r ♦r♥ P♦♥s② ♥♠♥ ♦s♦♥ r②♥♥ rts t♥ ♥ ♥ qs ss♥ ss♠ rt♦♥ st♠♥ ♦r ❲r ♥ tt ❲ Pr ❨♦♥② ②♦♥ Ptr ♦r ♥t ❲ ♠s ssr r rt P ♥♥♦♥ r♦r② ♠♥ ♦♠s s♥ r ♥r ♠t ♦s ♥r s♥ rt❲♠ t s ♥♦s s♦r② ♦ ♥t♦♥ ♠♥ts ♥ r♦s♦♣ ♥♦♠s s♥♦t♦♥r② s♥trs tr

t ♦ ♣②♦♥t ♠♦s tr②♥ t♦ t ♥ ♣♥t r♥s ♥t

trt♥t t ♦③♥s② ♥rs♦♥s ♥ t r r♦♠♦s♦♠ ♦ ❲ s ♦ r♦s♦♣Ps♦♦sr ♥ r ❯s ♥ t t② ♦ t st♦r② ♦ t ♣s Pr♦ t ❯

r ❲ss t ♥s♠r ②s♥ st♦♥ ♦ ♦♥t♥♦st♠ r♦ ♥♦t♦♥r② ♠♦s ♦ ♦ ♦

r r t♦st ♠♦s ♦r ♦r③♦♥t ♥ tr♥sr t♥ r♥♦♠ tr♦ trs♣ ♥ts

r r t♥ rst♥ ♥s♠r ♥t t ❲ss ♦rt rr♣②♦♥t ♠♦s ♦r ♥②③♥ ♠t♣rtt sq♥ t ②st ♦

é♠♦♥ r t ❲♦ ♥♥t ♦♥sq♥s ♦ ♥♦♠ ♣t♦♥ rr ♣♥ ♥t

é♠♦♥ r t ❲♦ ♥♥t ♣r♦ ♥ ♦ss t♥ tr♦♦♥ ♥ ③rs tr♦ ♥♦♠ ♣t♦♥ ♥ tr ♥st♦r r♥s ♥t

♥ ♦♥ ts r ♥♥♥ t ② ♦♥♦♥♥ t♦rs ♥ tt♦♥sttst rr♦r ♦s♥t ts ♥ ♠t♣ s♦t♦♥s ♦♠♣t ♦

♦♠♣s♦♥ ♥s t s♦♥ ❯❲ ♠♣r♦♥ t s♥stt② ♦ ♣r♦rss♠t♣ sq♥ ♥♠♥t tr♦ sq♥ t♥ ♣♦st♦♥s♣ ♣ ♣♥ts ♥ t ♠tr①♦ s s

♦r♥ s♥♦ t s♥st♥ ♥ ♦t♦♥r② ♠♦ ♦r ♠①♠♠ ♦♦ ♥♠♥t♦ sq♥s ♦ ♦

♦r♥ s♥♦ t s♥st♥ ♥♥ t♦r rt② ♥ ♠♣r♦ ♦♦ ♠♦ ♦sq♥ ♦t♦♥ ♦ ♦

❲♥s ♦♥ ③②♥s t♥ ♠♥ ❲♠ t r ♦ ❲ ♦ss ♥ r♦t♦♥♦ ♦♠♣① ②s ♥ ♠rs♣ r♦s ♦s ♥str trt r♦♥strt♦♥ ♠s ♦t♦♥

Page 191: Early Evolution and Phylogeny

❲s♦♥ r t ö♦r rt rs ♦ t ♥ts ♣②♦♥t ♥ ♦♦ ①♣♥t♦♥Pr♦ t ❯

❲ rst♥ ❩♦ ②♥ ♥♥♥ t ♦r♦r ♥s ♣r♦t② ♥ r♦♠♦s♦♠①t♥t ♦ tr♥ss♣ ♣♦②♠♦r♣s♠ ♥ts

❲ rst♥ r♠r rs r sr t t♠♣ P ♦♦ ♣♣r♦ t♦♥②ss ♦ ♥t♦r t Pr♦ t ❯

❲♦♥ r♥ r r t s♥ ♦♥ P ♥♠♥t ♥rt♥t② ♥ ♥♦♠♥②ss ♥

❨♣ ❱♦♥ ♥ t ♣ rr② ♦♦t♥ ♣②♦♥t tr t ♥♦♥rrs ssttt♦♥ ♠♦s ♦ ♦

❩♦ ♦♥st♥t♥ r③♦s② ♦r t ♥♦ ♥ Pr♦t♥ ♥ sq♥tr♠♥♥ts ♦ tr♠♦♣ ♣tt♦♥ P♦ ♦♠♣t ♦

❩①②② ♦rt♥ Ptr r♦s ♦rt ♦♦tt ❲ ♦r t P♣ ♥ P②♦♥t ♥②ss ♦ ②♥♦tr ♥♦♠s q♥tt♦♥ ♦ ♦r③♦♥t ♥ tr♥sr ♥ts♥♦♠ s

❩r♥ t P♥ ♦s s ♦♠♥ts ♦ ♦t♦♥r② st♦r② ♦r ♦

Page 192: Early Evolution and Phylogeny
Page 193: Early Evolution and Phylogeny

11♦♥s♦♥

s tss s tt♠♣t t♦ t sss rt t♦ t r② ♦t♦♥ ♦ ②♥②s♥ t ♥♦♠s ♦ ①t♥t ♦r♥s♠s t r ♦♥ sttst ♣♣r♦ss t ts ♠ts t♦ rt② r ♥♦♠s

♥ rt t sttst ♣♣r♦ t♦ ♥♦♠s t♦ t ♦♥s♦♥ tt strt ♥ r♠ ♦♥t♦♥s t♥ ♣t t♦ r t♠♣rtrs ♦r rs♥ ♥ s rt♦♥s ♦ ♠t t ②♣♦tss r♦♠ ♦♦sts

♥ ♦♥s♦♥s s ♦♥ ♦♠♣rt ♥♦♠s ♥ t♦ ♦♥r♦♥tt ♥♦ ♦♠♥ r♦♠ ♦tr s♣♥s ♦r r② ♦t♦♥ ♦♦② st ♦♥② r♥t ♣♦♥t ♦ ♦♠♣rs♦♥ ♥ ♠♦s ♦ ♦t♦♥ s♦♥t r♦♠ ♦t s ♦ st② ♥ ♦♠♥ ♠♦s ♦ ♥♦♠ ♦t♦♥ t♠♦s ♦ t ♦t♦♥ ♦ t rt

♥ ts ♦♥t①t ♠♥② ①♠♣s ♦ ♥trt ♣②♦♥♦♠s ♦♠♥♥ ♣②♦♦r♣② ♠t♦♦② ♦r ♦♦② t ♥♦♠ ♦t♦♥ ♠② ♣♣r ♥ t♥①t ②rs ♥ ♣r♦ ttr ♣tr ♦ t ♦t♦♥ ♦ t rt ♥ ♦ ts♥t♥ts s ♦r ♠ t sr t♦ ♠r ♦♥ s ♣r♦ts

Page 194: Early Evolution and Phylogeny
Page 195: Early Evolution and Phylogeny

♦r♣②

♦tt ♦♥ rt♦t s♣♥t Pr♦♠♠ t ♦s ♥ ♥♠ ♣②♦♥② rt② ♥ ♠♣t♦♥sPr♦ t ❯

♦♦ ❲tr ♦♠ ♠r ③ rs r P tr ♥❲ tr♦♠t♦t r r♦♠ t r② r♥ r ♦ strtr

ts ♥ är ❩♥ ❩♥ ❩ r ❲t ♣♠♥ ♣♣ ♥ P ♥ ♥rt♦♥ ♦♣r♦t♥ ts sr ♣r♦r♠s s s

♥é é rt rt ♠ ♠t t② t ♦s ♥t♦♥s ②s♥ st♠t♦♥ ♦ ♦♥♦r♥ ♠♦♥ ♥ trs ♦ ♦ ♦

♦ st ss♠♥ t r ♦ tt♣t♦ss

♣tst s♦ ③r♦ ♥♠ t ♦♦tt ❲ tr♥t ♠t♦s ♦r ♦♥t♥t♦♥ ♦ ♦r ♥s ♥t ♦rs♦t♦♥ ♥ ♣ ♥♦s ♦ t ♣r♦r②♦t ♣②♦♥② ♦ ♦ ♦

ttst③③ ❯ ♦ ♥r t s r ♥♦♠t♠s ♦ ♣r♦r②♦t ♦t♦♥ ♥sts ♥t♦ t ♦r♥ ♦ ♠t♥♦♥ss♣♦t♦tr♦♣② ♥ t ♦♦♥③t♦♥ ♦ ♥ ♦ ♦ ♦

♦ ♦rt r♦ ♠♦t② t ♥ r ②s ♦ ♥sr♥ ♥ ♣r♦r②♦ts Pr♦ t ❯

r ♦♥ ❲♥ P ♠ t♥ ♥♥ ♦t③ t s t♥ t rs ♦ t♠♦s♣r ♦①②♥tr

r♥ rs t ♦r t♦♠t st♦♥ ♦ r♣rs♥tt♣r♦t♥s ♦r tr ♣②♦♥② ♦ ♦

r♥r Pts r♥ P♦♣♣ ♥ s ❲st② ssr ❲♦♦rt s♦t♦♣ rt♦♥t♦♥ ♥ t♠♦s♣r ♦①②♥ ♠♣t♦♥s ♦r♣♥r♦③♦ ♦t♦♥ ♥

♦♥♥ t ♦♦tt ❲ ♥ t ♣r♦r②♦t ♥tr ♦ r ♦r♦♣sts Pr♦ t ❯

Page 196: Early Evolution and Phylogeny

P❨

♦♥♥ ♥♥♥♠ r② ❲ t ♦♦tt ❲ ❲t♠r②♦ ♠t♦♦♥r r♦s♦♠ ♥ ♦r ts ♣r♦r②♦t ♥tr s s

♦rt r s♦ttr ♦rr ♦ rst♦♣r r♠♥♦rt r♦♥♦③ ♦♥♥ rs♥r r ♥r r ♦r♥② ♥♦ r♦ ♦♥ ♥r ②♥ ♥rs ♦r♦③♦♥ ♦♣② r t ♦r ①♠♥ tr♦st♦♠♣②♦♥② rs ♠♦♥♦♣②t ♦rts ♥ t ♥ ♣②♠ ❳♥♦trtr

♦ss st♥ t ♦② ♥♦♦ ♥t ♦♦ ♦♠♣tt♦♥s t♥♦♥rrs ♠♦s ♦ ♦t♦♥ ②st ♦

♦ss st♥ rr ♦ r♥ r♦♥ t ♦rs♥t♦♥t ♥rss♦♥ ♦♠♣tt♦♥ ♥r♥ ♦ s♥r♦s ♦r ♣♣r♦t♦tr ♥♦♠ ♦t♦♥ Pr♦ t ❯

r② ♠♦♥ P♦s♠♥♥ rs t t r ♥t rs t rst r rtr♦♣♦ ♦ tt

r♦r ♥ r♦ ♠♦♥tt ❩♥♦ ❨♥ ♦♥♦♥r r t♦rtrr Ptr ♥♦r r♣rs♥tts ♦ ♥♦ r♣②♠ ♦r st♦♥ r②r ♥ rt t♦ r♠♦♦s♥♦♠ ♦

r♦r é♥ t P♣♣ ré P②♦♥② ♥♦♥②♣rtr♠♦♣♥st♦r ♦r tr tr

r♦r é♥ ♦rtrr Ptr t r♦ ♠♦♥tt r ♣②♦♥② s ♦♥ ♣r♦t♥s ♦ t tr♥sr♣t♦♥ ♥ tr♥st♦♥ ♠♥rs t♥ t t♥♦♣②rs ♥r ♣r♦① ♥♦♠ ♦

r♦r é♥ ♦rtrr Ptr t r♦ ♠♦♥tt ♥ ♠r♥ ♣②♦♥t ♦r ♦ r ♣②♦♥s ♦ tr♥sr♣t♦♥ ♥ tr♥st♦♥♠♥rs ♦♥r ♦♦♥ t♦♥ ♦ ♥ ♥♦♠ sq♥s ♦

r♦rr♠♥t é♥ ♦ss st♥ r♦ ♠♦♥tt t ♦rtrrPtr s♦♣ r♥r♦t ♣r♦♣♦s ♦r tr r ♣②♠t ♠r♦t t r♦♦

r♦s ♦♥ t ♠♠♦♥s r♥ ♠♦r ♦sss ♥ t r② rs ♦ r②♦ts ♥

Page 197: Early Evolution and Phylogeny

P❨

r♦s ♦♥ ♦ ♦r♦♥ ♠♠♦♥s ♦r ♥♦ ♥r ♦♥r♠ t ♦♥ t♣♥ ♦♠rr ♥ ♦r r♥ ♥ ♣r♣ s♣r tr ♥ strt P♦♣r♦tr♦③♦ s tr

r ♥ ♥r③ ♠r♥ t P♦s ♥ P②♦♥♦♠s rs ♥ ♠r♦♣ ♥♥ ♠♦st ♣♦t♦s②♥tt r②♦ts♦ tt

♥ t t ♠r♣ r♥ sr ②♥ t r② st♦r② ♦ t♠♦s♣r ♦①②♥ ♥

♦r③ t rs ❲ P②♦♥t ♥②ss ♦s ♥st♠t♦♥ ♣r♦rs ♠ ♠ ♥t Pt

♦ ♥♦ t ♠ ♥♦ ♦ ①t♥t ♦ ♦r③♦♥t ♥ tr♥srPr♦ t ❯

r r♥s ♦rs ♦s ♦♥ r♥ rst♥ r② rst♦♣r ♥ r♥ t ♦r Pr ♦r t♦♠t r♦♥strt♦♥ ♦ ② rs♦ tr ♦ ♥

♦♥♥② ♠s ♠♥ ❨r r♦t ①♥r t ③③rr♦ rt♥ r♦♥♦♦② ♦ t ♦r ②st♠s st ♦s str♦♣②s ♦r♥

ttrs

r♥ rs ♦r♥ ♦ rsrs ♥t♦ t ♥tr st♦r② ♥

♦♦② ♦ t ♦♥trs st r♥ t ♦② ♦ r♦♥ t

♦r ♥r t ♦♠♠♥ ♦ ♣t t③ ♦② ♥ ♦♥♦♥ ♦♥rr②

r♥ rs ♥ t ♦r♥ ♦ s♣s ② ♠♥s ♦ ♥tr st♦♥

♦r t ♣rsrt♦♥ ♦ ♦r rs ♥ t str ♦r st ♥ ♦♥♦♥♦♥ rr②

♥ ❱♥♥t ♦r♥ ♥② t ♠♥ ♦r P②♦♥ts♥ t ♦s♦♥ ♦ tr ♥♦♠s ♥

s ♦♥t♥ rr♦ ♠♦t② s r ❲ ♦ts P♠ ♦ts ♦s t ♦♥♥ ❱♥♥t r♥s ♦♠♥ ♠②str②♥sts r♦♠ s♣rtr ♦ t ♥♦s♣r♠s Pr♦ t ❯

♥s r ♥st♦rs Pr♠ t♦ t ♥ ♦

♦t♦♥ r♥r ♦♦s

Page 198: Early Evolution and Phylogeny

P❨

②♦ ② ❱ t Pr ts ♦ ♣r♦t♥ sq♥ ♥

strtr ❲s♥t♦♥ t♦♥ ♦♠ sr ♦♥t♦♥ ♣ ♠♦ ♦ ♦t♦♥r② ♥ ♥ ♣r♦t♥s ♥ ts ♦ ♣r♦t♥ sq♥ ♥strtr ♣

s réér r♥♠♥♥ ♥♥r ♦rr♦t ♥ t P♣♣ ré ♥ts ♥ ♥♦t ♣♦♦rts r t ♦sst ♥ rts ♦rtrts tr

s r ♥♥ ② ♦ttr ②♦ r♥t ♦ ♦s ❱ ♥ ♦♥ rt♥ ❲♠ t ♥ ♥s ♦②♥♦tr ♦r♥ ♥ ♣♥t ♥r ♥♦♠s ♣♦♥t t♦ tr♦②st♦r♠♥♣st ♥st♦r ♦ ♦ ♦

♦③♥s② ♦♦ss ♦t♥ ♥ ♦♦② ♠s s♥s ①♣t ♥ tt ♦ ♦t♦♥ ♠r♥ ♦♦② r r

♦♦tt ❲ P②♦♥t sst♦♥ ♥ t ♥rs tr ♥

♦③r② ♠♠♥ P ♥ ③t ♣tst r s réért P♣♣ ré t♠♥ ♦ r②♦t ♦t♦♥ ♦s r①♠♦r ♦ r♦♥ ♣r♦t♥s ♥ ♦sss Pr♦ t ❯

r♠♠♦♥ ① ♦ ♠♦♥ ❨❲ P♣s tt t ♠t ♥r ① ♣②♦♥ts ♥ t♥ t ♦♥♥ P♦ ♦

② t♠♦s♣r ♦①②♥ ♥t P♦③♦ ♥sts ♥ t ♦t♦♥♦ r ♦♦♠♦t♦r ♣r♦r♠♥ ①♣ ♦ Pt

♥♥ s② ❲ ♥♦ ♥rs ts P♥ ♥ r♦♥❲♠ ♠t t♣♥ r ♥ ♦s r ❲ st tts♦♠ r♦r② ør♥s♥ rt♥ ❱ ♦ t♥ ♠ts ♥rs s ♦ rst♥s♥ ♥rt ør ❲r❲r rt♥ r t rt ♦♥③♦ r♦ ♣②♦♥♦♠s♠♣♥ ♠♣r♦s rs♦t♦♥ ♦ t ♥♠ tr ♦ tr

t ♥ r ②♥ ③♥ r é♠♥ ②♥ ♥③ ❱♥♥ttr ♦s t r ♦ st ♦ rrs ♦rsq♥ ♥②ss ♣②♦♥ts ♠♦r ♦t♦♥ ♥ ♣♦♣t♦♥ ♥ts ♦♥♦r♠ts

Page 199: Early Evolution and Phylogeny

P❨

r ♦rt ❯ ♠t♣ sq♥ ♥♠♥t ♠t♦ tr t♠ ♥ s♣ ♦♠♣①t② ♦♥♦r♠ts

rs ❲ ♦♦ ♠r ❯♥rst② Prss

rs ❲ t ♦r③ P♥t ♥ P②♦♥t s

st♦♥ ②st♠ts ss♦t♦♥ Psr ♦♥♦♥ ♣ ♦♥strt♦♥♦ ♦t♦♥r② trs ♣s

r ♦♥ ♦♠str② ♦st ♦ss ♦r st ♥♦tr r♦

♥s ♠s P♦r r r♠ r♦ r ❲♦ ❨r♥ ♥♥rt ♥ r♥ P r♦rr♠♥t é♥ ♥♥ ❱t♦r ♥rs♦♥ ♥ ♣s ♦ts♠♥ ♥ rr② rr ♦♦♥♥♥ ❱ ♥♦t③ P ②r♣s ♦s ❲♥♥r rr rs♦♥P r rt♥ t tttr r ♦rr ♥♦♠ rs♥sts ♥t♦ t ♦t♦♥ ♦ t r Pr♦ t ❯

ssr rst♥ ♠♥ ❲♥ rst♥ ♦tt r♠♥st♥ r♦ str r ♥③ tr♥ rts♠♥♥r♥st ② r str r♦ r②♥t t ♦rt Ptr P♥♥② t rt♥ ❲♠ ♥♦♠ ♣②♦♥②♦r ♠t♦♦♥r ♠♦♥ ♣♣r♦t♦tr ♥ ♣r♦♠♥♥t② tr♥str② ♦ ②st ♥r ♥s ♦ ♦ ♦

s♥st♥ ①♠♠♦♦ st♠t♦♥ ♦ ♦t♦♥r② trs r♦♠♦♥t♥♦s rtrs ♠ ♠ ♥t

s♥st♥ ♦t♦♥r② trs r♦♠ sq♥s ♠①♠♠ ♦♦ ♣♣r♦ ♦ ♦

s♥st♥ ♦ ss ♥ ♣rs♠♦♥② ♦r ♦♠♣tt② ♠t♦s ♣♦st② ♠s♥ ②st ❩♦♦

s♥st♥ ♦s♣ ♥rr♥ P②♦♥s ♥r ss♦ts

r♦t③♥r P Prtt t ♠♠♦♥s ①t♦♥♦ t r♥ ♦♥ tr

sr ♥ t t♠t ♦♥t♦♥s ♦ ♦rt ttstsP♦s♦♣ r♥st♦♥s ♦ t ♦② ♦t② ♦ ♦♥♦♥

t ❲tr ♦r ♥♥ t♦rs ♦ ♦t♦♥ ♥♠♠ ♥♦r s♣ tr t♦♣♦♦② ②st ❩♦♦

Page 200: Early Evolution and Phylogeny

P❨

s♠♥♥ ♠s ❲t ②t♦♥ r♥ss r t ♦♠ ♦rt② t rr ❲♦♥♦♠ r♥♦♠ sq♥♥ ♥ ss♠② ♦ ♠♦♣s♥♥③ ♥

♦rtrr P ♦r♥ ♦ ♥♦♠s ♥ r♣t♦♥ ♣r♦t♥srr ♣♥ r♦♦

♦rtrr P ♥♥♦ ♦♥♦♥r t t♥ ♥tr ♦ t st ♥rs ♥st♦r ♥ t r♦♦t ♦t tr ♦ st ♦♣♥ qst♦♥s ♦s②st♠s

♦rtrr Ptr r♦s ♥r ♥

♦rtrr Ptr r♦r ♥ t P♣♣ ré ♦t♦♥ ♦ tr ♦r P♦♣ ♦

♦r♥r P t ♦rt♥ P ♥tr ♦ ♣r♠t ♥t ♦ ♥♥♥t ♣r♦t♥ ♥s ♦ ♦

r♦ ❲ t s r ❲ tr ♦③♥ ②rs ♦ ♣r♦rss t♦r♥ ♦ ♥♦s♣r♠s s st rt ♠②str② tr

tr t ♦② ♥rr♥ ♣ttr♥ ♥ ♣r♦ss ♠①♠♠♦♦♠♣♠♥tt♦♥ ♦ ♥♦♥♦♠♦♥♦s ♠♦ ♦ sq♥ ♦t♦♥ ♦r♣②♦♥t ♥②ss ♦ ♦ ♦

tr ♦s ♣♣r♦ sttstq ♥ ♣②♦é♥ ♠♦ér ♥

♥ s ♦♠♣♦st♦♥s ♥ ss rs P tss ❯♥rsté r♥r ②♦♥

tr ♦s ♠♦ ♦ ♦r③♦♥t ♥ tr♥sr ♥ t tr♣②♦♥② ♣r♦♠ ②st ♦

s ♥ ♠♣r♦ rs♦♥ ♦ t ♦rt♠ s ♦♥ s♠♣ ♠♦ ♦ sq♥ t ♦ ♦ ♦

r r ♦♥r♥ rr t ♥s ♠♦② P♦t♠♣rtr tr♥ ♦r Pr♠r♥ ♥rr r♦♠ rsrrt ♣r♦t♥s tr

②r rs t♥ ♦♥t r♦ ①trs r♣t ♦♦ ♦ tsts ❯♥rst② ♦ ♥♥s♦t

s♣ ♠♦r ♦ ♠② ♥ ♣s♦ ♦ Pr♦ t

Page 201: Early Evolution and Phylogeny

P❨

♦rt♥ P ttr P ③ ♦♠♥ ♦♠♥ ♥♦s♦♥ P♦♦ t t s♠ ♦t♦♥ ♦ t♦r Ps ♠♣t♦♥s ♦r t ♦r♥ ♦ r②♦ts Pr♦ t

♦♠s s♦♥ s♥s t ♦r r♥ ♦ tt②s♠ t ② ♦♠r♠♥t ♣r♦ ♦ t trrstr ♣♥ts tr

♦② t ❲ ♦r ♣②♦♥② ♦ t ♥♦♠s ♥♠P♥t ♥ ♥ ♦ ♦ ♦

rr rt t ♦② P②♦♥t ♣♦st♦♥ ♦ t ♦rr♦♠♦r♣ rts rs ♥ s tr

rr ♦② t rt ♦t♦♥r② ♥ts ♦ t ♦rrPrss♦t② ♥ t ♣②♦♥t stts ♦ t s♣r♦r♥ t① ❯♥t♥ t♥t ♦ P②♦♥t ♦

r♦ ♠♦♥tt t r♦rr♠♥t ♥ ♦r♥ ♥ ♦t♦♥ ♦ r stt ♦ t rt P♦s r♥s ♦ ♦♥ ♦

♥♦♥ té♣♥ t s r s♠♣ st ♥ rt ♦rt♠ t♦ st♠t r ♣②♦♥s ② ♠①♠♠ ♦♦ ②st ♦

♥♦♥ té♣♥ ét♦s t ♦rt♠s ♣♦r ♣♣r♦ sttstq ♥

♣②♦é♥ P tss ❯♥rsté ♦♥t♣r ♥s t ♥qs ♥♦ ❯ ♥s ♦♥t♣r

s t r t ♥s ♦ t ♠①♠♠ ♦♦ ♠①♠♠ ♣rs♠♦♥② ♥ ♥♦r♦♥♥ ♠t♦s ♦r st♠t♥ ♣r♦t♥ ♣②♦♥② ♦ P②♦♥t ♦

s s♥♦ t ❨♥♦ t♥ ♦ t ♠♥♣ s♣tt♥② ♠♦r ♦ ♦ ♠t♦♦♥r ♦ ♦

r② ♥ rt ♦rt P ♦③ P r♦♥♦á ♦str Ptr ③② ♥ t ♠② rt♥ r♦♠♦♥s ②r♦♥♦s♦♠s ♦♥t♥ t ②r♦♥s ♠♦ ♦ ♠t♦♦♥r ♦♠♣① tr

s♥ P s t s♥st♥ ③♦♥ ② tr♣ ②st ♦

Page 202: Early Evolution and Phylogeny

P❨

s♥ ♦♥ P ♦ ♦♥t♥ P t ♥ ♠② ♥rr♥t r♦♦t ♦ ♣②♦♥t tr ②st ♦

♠ s s t ②t ♦t♦♥r②rt♦♥s♣ ♦ rtr tr ♥ r②♦ts ♥rr r♦♠ ♣②♦♥t trs ♦ ♣t ♥s Pr♦ t ❯

♥s♥ ♦rt ❩♥q s♦♥ ♥ ♥ ♥r② ♣♠♣s ❲ ♥s ♠s ür s♥r♥r② r ♦s♠r ♥s♥ ♥♥ ♠② ♠♦t② ❲ ♥♠ Pr② ♥♥♦♥ ♦ ♥♥r ❱ t ♦♦rr② ♥②ss ♦ ♥s r♦♠ ♣st ♥♦♠s rs♦s rt♦♥s♣s ♥ ♥♦s♣r♠s ♥ ♥ts ♥♦♠s ♦t♦♥r② ♣ttr♥s Pr♦

t ❯

♦♥s ②♦r ❲ t ♦r♥t♦♥ ♠tt♦♥ t ♠tr①♦r tr♥s♠♠r♥ ♣r♦t♥s tt

s ♥t♦r ♦t♦♥ ♦ ♣r♦t♥ ♠♦s Ps ♦

♥r♦ ♠♠♥ Pr♦t♥ t♦s♠ ♦ ♠ Prss ❨♦r

♠r s♠♣ ♠t♦ ♦r st♠t♥ ♦t♦♥r② rts ♦ sssttt♦♥s tr♦ ♦♠♣rt sts ♦ ♥♦t sq♥s ♦ ♦

♥♦ ♥r ♦♥ ❨♦♥ P♥t rst r ♦♥ ❨rs

♦ ♦t♦♥ ♦♥ rt Pr♥t♦♥ ❯♥rst② Prss

♥r t s♥st♥ s♠t♦♥ ♦♠♣rs♦♥ ♦ ♣②♦♥②♦rt♠s ♥r q ♥ ♥q ♦t♦♥r② rts ♦ ♦ ♦

♥r ❨♠t♦ t s♥st♥ st♠t♥ t ♣♦♣t♦♥ s③ ♥ ♠tt♦♥ rt r♦♠ sq♥ t s♥ tr♦♣♦sst♥ss♠♣♥ ♥ts

♠r r ♦r ♦s ♦r s ♦ ♦t♦♥ t

♥t

♥ Pr♣rt ♦♥ t r♦ ♥ ♠t♦ ♦rt♥ ♦t♦♥r② ssttt♦♥ rts ♦ ♦

♥ ①②♥ ♦ tt t ❲♦r ①♦r ❯♥rst②Prss ❯

Page 203: Early Evolution and Phylogeny

P❨

♥ t s r ♥ ♠♣r♦ ♥r ♠♥♦ r♣♠♥t ♠tr① ♦ ♦ ♦

♥♥r ❯r ♦tr r ♦ ❲ Pr♦ssr str t ♣r r ♣r♦♠♥t ♠♦♥♠♠♦♥♦①③♥ ♣r♦r②♦ts ♥ s♦s tr

Pr t ♦ss P②♦♥t tr ♦♥strt♦♥ s♥ r♦♥ ♦♥t r♦ ♦r♥ ♦ t ♠r♥ ttst ss♦t♦♥

❲ ♦② r♣ P ❯♥ t ❨♥ ❨ ❲ ♦r♣②♦♥② ♦ ♦♥t ♦♠♦r♣ Pr♠ts rt♦t② ♥ r♥♦r♥ ♠♦r ♦s Pr♦ t ❯

♣♣ s ♦r♦♥♦ ❨ ♥ ♠♦ t ♥rs ❯ ♥♥t ♦♥trt♦♥ ♦ r t♦ ①t♥t ♦♠ss ♥ ♠r♥ ssr s♠♥tstr

ö②t②♥♦ r t ♦♠♥ P②♦♥②r ♣ ♣♠♥t♣r♥ts rr♦rs ♥ sq♥ ♥♠♥t ♥ ♦t♦♥r② ♥②ss ♥

s♥ ② ♦② ♦ r② ❲ ♥s ♠r♥ t♥♦♣ ♦♥ ❲ ❲ t ♣r♥r Pr ♣t rt♦♥s ♥ t♦ ♠♦r s ♦ ♣♥t ♠♠♠s tr

rs ②♥♥ r♥ ♦ r②♦t s♥ ♥ sr ♠♣

t♦♥s ♦r ♦r② ♦ t r♥ ♥ ♦t♦♥ ♦ r♦ P♥t ♥ ♥♠

s ♦♥ t Pr♠r♥ rt ❨ ❯♥rst② Prss

rét③ r♥♥ rt♥ s Pr③ ❨♥ P♣♦♥ ♥ t❳r ♦ rst♦♣r r♠♥ ♦ s♥♦ r♥t ♦sst r♦❲♥r Ptr ❲ss♥ ♥ t Pr♦ ❨♥♥ t♦♥t♣②♦♥♦♠s ♣r♦t♦st♦♠ t tr♦st♦♠ ♦♣♠♥t rr ♦

rt♥ ❲ t ür ②r♦♥ ②♣♦tss ♦r t rst r②♦ttr

tt③ r♥ r♦r é♥ ♦rtrr Ptr t P♣♣ ré r ♣②♦♥② s ♦♥ r♦s♦♠ ♣r♦t♥s ♦ ♦ ♦

Page 204: Early Evolution and Phylogeny

P❨

t t♦♥ P②♦♥t ♥r♥ ♦r ♥r② t ♦♥ ♥r♦r♠s s♥ r♦ ♥ ♦♥t r♦ ♦r♥ ♦ ♦♠♣tt♦♥ ♥

r♣ ttsts

tr♦♣♦s ♦s♥t ❲ ♠ ♦s♥t r t r qt♦♥s ♦ stt t♦♥s ② st ♦♠♣t♥ ♠♥s ♦r♥ ♦♠ P②ss

♦③ss rr♥s ♥ rrs♦♥ t♠♥ Pt r♥ ♥ ♦r ♦♥ rt ♦r ♠♦♥ ②rs♦ tr

♦rr ②r t P♣♣ ♦r♥ ♦ r ♥t ♦t♦♥ ♦ ♦r♦♣sts tr

r♣② ❲ ③r r♥ s♥ ② ♦② ♥ ②r t♥♦♣ ♦♥ ❲ ❲ t ♣r♥r s♦t♦♥ ♦ t r② ♣♥t ♠♠♠ rt♦♥ s♥ ②s♥♣②♦♥ts ♥

♦tr♠ ♥s t r♥ ♦ ♥♦ ♠t♦ ♦rst ♥ rt ♠t♣ sq♥ ♥♠♥t ♦ ♦

P ♥rr♥ t st♦r ♣ttr♥s ♦ ♦♦ ♦t♦♥ tr

P r ♥r t rr ♥ ②s♥ st♠t♦♥ ♦♥str rtr stts ♦♥ ♣②♦♥s ②st ♦

P♣♣ t ♦rtrr P r♦♦t♥ ♦ t ♥rs tr ♦ s ♥♦tr ♦ ♦

P♣♣ ré ♥ ③t ♣tst r ♦♣③ P♣♣ ♦♥Ptr ❲ t s♥ r P②♦♥♦♠s ♦ r②♦ts ♠♣t ♦♠ss♥ t ♦♥ r ♥♠♥ts ♦ ♦ ♦

Prst♦♥ ❲ ❨ ♦♥s t ♦♥ ♣s②r♦♣ r♥r♦♥ ♥ts ♠r♥ s♣♦♥ ♥r♠ s②♠♦s♠♥ ♥♦ s♣ ♥♦ Pr♦ t ❯

❨ r♥s♦♥r♦♥ ♦ts ♦ts P ❩♥s ❩♠♠r ♥ ❩ ♦♥♥ ❱ t s ❲ rst ♥♦s♣r♠s ♥ r♦♠ ♠t♦♦♥r ♣st ♥ ♥r ♥♦♠str

Page 205: Early Evolution and Phylogeny

P❨

♥♥ t ❨♥ ❩ Pr♦t② strt♦♥ ♦ ♠♦r ♦t♦♥r②trs ♥ ♠t♦ ♦ ♣②♦♥t ♥r♥ ♦ ♦

♥♥ r t ❨♥ ❩♥ ♥rr♥ s♣t♦♥ t♠s ♥r ♥♣s♦ ♠♦r ♦ ②st ♦

♥③ ❱♥♥t s réér ♥③ ② r r t ♦③r② ♠♠♥ ♣ rt♦ ts ♦ ♦rt♦♦♦s ♥♦♠ ♠rrs ♦r ♣♥t ♠♠♠ ♣②♦♥ts ♦

s② ② ss♦♥s ① ♠♠♦♥s ♦r t ♠♥ ♥♥ ♦s②♥tss ♦ ♠t②tr♦♦♣♥♣♦②♦s ② ♥ ♥♦①②♥ ♣♦t♦tr♦♣Pr♦ t ❯

s♠ss♥ rr tr ♥ r♦s ♦♥ t r♥ tt ssss♥ t rst ♣♣r♥ ♦ r②♦ts ♥ ②♥♦tr tr

♦rt t ss♦♥ ♣♦t♠♣rtr r ♦r t Pr♠r♥ ♦♥s s ♦♥ s♦♥ s♦t♦♣s ♥ rts tr

♦rí③③♣t r r♥♠♥♥ ♥♥r r② ③♥♥ ♦r étr rr rtr ört ❲♦♥ ♦♥rt ♥s P♣♣ré t ♥ r♥③ ♦♥♦♣②② ♦ ♣r♠r② ♣♦t♦s②♥tt r②♦ts r♥ ♣♥ts r ♥ ♦♣②ts rr ♦

♦rí③③♣t r r♥♠♥♥ ♥♥r ♦r étr rt♦t♦s ♥ r♥③ t P♣♣ ré tt♥ ♥ ♦r♦♠♥s②st♠t rr♦rs ♥ ♥♦♠s ♣②♦♥s ②st ♦

♦rí③③♣t r r♥♠♥♥ ♥♥r rr rtr ♦r ♥r r② ❲ P♣♣ ré t ♥ r♥③ ♦rrs♦♥ t r②♦t tr t ♣②♦♥t ♣♦st♦♥s ♦ ♦s ♥ r♦③♦♥s rr ♦

♦s♥ ♥ ♣t r♦♥ ♠r♦♣rts ♥ s♦♦r s♠♥tr② r♦s r♦♠ st r♥♥ ♥

t♦ t ♥♦r♦♥♥ ♠t♦ ♥ ♠t♦ ♦rr♦♥strt♥ ♣②♦♥t trs ♦ ♦ ♦

Page 206: Early Evolution and Phylogeny

P❨

♥r ♥ t ♦s♦♥ sq♥♥ t ♥tr♠♥t♥ ♥t♦rs Pr♦ t ❯

♦♥♥ ❱ s ❲ ♦♦t ♦rt♦♥ ♦ts ②r ② r♥ ❨ ♥ t ❨ P②♦♥ts♦ ♦r♥ ♣♥ts s ♦♥ ♦♠♥ ♥②ss ♦ ♣st t♣ ♥ r ♥sq♥s ②st ♦

♦♥♥ ❱♥♥t t s r ❲ ♦ ♣r♦rss ♥ ♣♥t♠♦r ♣②♦♥ts r♥s ♥t

♠ ♦s P②♥ ss ♥ t ♠t♥ ② r♦♠ ♠r♦♦♦② t♦ t ♦ Pr♦ t ❯

♦♣ ❲♠ ♦ss ♥ ♦ r♥ P♦s r♥s ♦

♦♥ ♦

♦tt ②♦♥s ❲ r ♥ ❨ P♦t♦♥ ❲ ❳ t ♥r r♥ t st♣s ♦①②♥t♦♥ ♦ t Pr♦tr♦③♦ ♦♥ tr

♥ ❨ t ♥ s♦t♦♣ ♥ ♦r ♠r♦s♣t rt♦♥ ♥ t r② r♥ r tr

♦rrrs♦ ❱t♦r t strs♥ ♦s st♠t♦♥ ♦ P②♦♥t♥♦♥sst♥s ♥ t r ♦♠♥s ♦ ♦ ♦ ♦

tr t ♣♣r ♥♦t ♦♥ t ♥♦r♦♥♥ ♦rt♠♦ t♦ ♥ ♦ ♦ ♦

♠♠♦♥s ♥ ♦♣ t ♦♥ t②♦♣♥♦s s ♦♠rrs ♦r ②♥♦tr ♦①②♥ ♣♦t♦s②♥tsstr

♠♠♦♥s ♦r r② ①♥r ♥ ♥ t ❲r ♦ tr♦s trtr♣♥♦s ♥ ♠♦r ♦①②♥ P♦s r♥s ♦

♦♥ ♦

♠r rt ♥ ♣ttr♥ ♦ ♥♦t ssttt♦♥ ♥ r♦s♦♣♠t♦♦♥r ♦ ♦ ♦

t♥♦ ❨ ③ t t ♥s ♦ t ♠①♠♠♦♦ ♥♦r♦♥♥ ♥ ♠①♠♠♣rs♠♦♥② ♠t♦s ♥ ssttt♦♥ rt rs t st ♦ ♦ ♦

Page 207: Early Evolution and Phylogeny

P❨

♦♠♣s♦♥ ♥s t s♦♥ ❯ ❲ ♠♣r♦♥t s♥stt② ♦ ♣r♦rss ♠t♣ sq♥ ♥♠♥t tr♦ sq♥t♥ ♣♦st♦♥s♣ ♣ ♣♥ts ♥ t ♠tr① ♦

s s

♦♠t♥ ♦ ♥♦ ♥r ♥ ♦♥ t ♥♦ r♠ ♦t♦♥r② rst♦♥ ♦ ②♥♦tr ♠♦r♣②♦♥t ♥ ♣♦♥t♦♦ ♣rs♣ts Pr♦ t ❯

❯♥♦ ❨r♦ ❨♠ t ❨♦s ♦r♦ r②♠ ♥♦r ts♦③ ❨♦ ♥ r♦♠ ♥s♦♥s ♦r ♠r♦ ♠t♥♦♥ss ♥ t r② r♥ r tr

❲r Ptr ♥r ♦♥r r♥ t r♥r ♦rt ♦♥r♠t♦♥ ♦ ♦♠rs ♣ s ♦ ♦①②♥ ♥tr ♦♥str♥♥ t t♠♥♦ ♥t rtr♦♣♦ ♥ rtrt trrstr③t♦♥ Pr♦ t ❯

❲♥ t ♦♠♥ ♥r ♠♣r ♠♦ ♦ ♣r♦t♥ ♦t♦♥r r♦♠ ♠t♣ ♣r♦t♥ ♠s s♥ ♠①♠♠♦♦ ♣♣r♦♦ ♦ ♦

❲♠♥ r ❯♥ ♦♥ ♣③♦ ♥ ♦③♥ ♦rt ❱♥♥t ♥♦♥ t♣♥ s r r♦ss♠♥ r♥ ♦♠r♦♦rt♦ t ♦♦♠♥ ♦rrs ♥♦♠s ♦♦r♣② ♥ t rst♦♥ ♦ ♣♥t ♠♠♠s Pr♦ t ❯

❲♦s t ♦① P②♦♥t strtr ♦ t ♣r♦r②♦t♦♠♥ t ♣r♠r② ♥♦♠s Pr♦ t ❯

❨♥ ❩ ①♠♠ ♦♦ ♣②♦♥t st♠t♦♥ r♦♠ sq♥st r rts ♦r sts ♣♣r♦①♠t ♠t♦s ♦ ♦

❨♥ ❩ t ♥♥ ②s♥ ♣②♦♥t ♥r♥ s♥ sq♥s r♦ ♥ ♦♥t r♦ t♦ ♦ ♦ ♦

❨♥ ❩ t ♦rts ♥ t ❯s ♦ q♥s t♦ ♥rr♥♥s ♥ t r ♦ ♦ ♦ ♦

❨♥ ❩♥ ♦♠♣tt♦♥ ♦r ♦t♦♥ ①♦r ❯♥rst②Prss

Page 208: Early Evolution and Phylogeny

P❨

❨♣ ❱♦♥ ♥ t ♣ rr② ♦♦t♥ ♣②♦♥t tr t ♥♦♥rrs ssttt♦♥ ♠♦s ♦ ♦

❩♥ ss ❲♦s t t♦ P②♦♥t♦r♥ ♦ t ♦r♦♣st ♥ ♣r♦r②♦t ♥tr ♦ ts r♦s♦♠ Pr♦

t ❯

❩①②② ♣rr P t ♦rt♥ P ♥♥t ♥ ♣t♦♥s♥ t r♦♦ts ♦ t tr ♦ Pr♦t♦♣s♠

❩r♥ t P♥ ♦♥ ♥s ♥ Pr♦t♥s ♠Prss ❨♦r ♣ ♦t♦♥r② r♥ ♥ ♦♥r♥ ♥ ♣r♦t♥s♣s

❩r♥ t P♥ ♦s s ♦♠♥ts ♦ ♦t♦♥r②st♦r② ♦r ♦

Page 209: Early Evolution and Phylogeny

♣♣♥s

t♦ rts ♦♥trt t♦ r♥ ♠② ♥s ♥ ②♦♥ ♥ r♥t rst ②r ♦ ♠② ♠str ♥ ❯♣♣s ♥ ♥ rst rt st♦♥ tt♠♣ts t♦ ttr rtr③ t r ♠♦♥ts♦ ♥ ♣t♦♥s tt ♦r t t s ♦ t rtrt t sr② ♥♦♥ tt ♥ ♥t♥s ♣s ♦ ♥ ♣t♦♥ ♦r ♥ t ♦rt ♥ tr t s♣t r♦♠ ♣♦♦rts ♠♣♦①s t t♥ t s♥♥♦♥ ♥ ts ♣s ♥ ② s②st♠t② ♥ ♥ ♥②s♥ ♣②♦♥s ♦r ♥s r♦♠ ♦♥rt②♥s srs r②s ♥ ♠rs ttr ♣rs♥t ♥ tss t t t♠ st♠t tt ts ♣s ♦ ♥t♥s♣t♦♥ ♥ ♦r t ♦♥rt②♥s s♣rt r♦♠ ♦tr rtrts

s♦♥ rt st♦♥ ♦ss ♦♥ r♦♥strt♥ ♥str ♥♦♥t♥ts ♥ ♣Pr♦t♦tr t tr ♠② tt rt t♦ ♠t♦♦♥r ♦ ts ♥ s ♠①♠♠ ♣rs♠♦♥② ♥ s♠♣② ♦♥sr t♥♠r ♦ ♥s ♣rs♥t ♥ ①t♥t ♥♦♠s ♥♦t tr sq♥s s♣t ts r②r ♥tr ts ♣r♦r ♦ s t♦ st♠t tt t ♥st♦r ♦ ♣Pr♦t♦tr s r♥ r♦ ♥ ♣r♠tt t♦ s ♥♦♠ rt♦♥st t ♦r♥ ♦ ♣rsts ♥ ♥♦♠ ①♣♥s♦♥s t t ♦r♥ ♦ ♣♥t♥trt♥s♣s

Page 210: Early Evolution and Phylogeny

❯P

♥♦♠ ♣t♦♥s ♥ srs

Page 211: Early Evolution and Phylogeny

Phylogenetic Dating and Characterization of Gene Duplications in

Vertebrates: The Cartilaginous Fish Reference

Marc Robinson-Rechavi,1 Bastien Boussau, and Vincent Laudet

Laboratoire de Biologie Moleculaire de la Cellule, UMR CNRS5161, Ecole Normale Superieure de Lyon, Lyon, France

Vertebrates originated in the lower Cambrian. Their diversification and morphological innovations have been attributedto large-scale gene or genome duplications at the origin of the group. These duplications are predicted to have occurred intwo rounds, the ‘‘2R’’ hypothesis, or they may have occurred in one genome duplication plus many segmentalduplications, although these hypotheses are disputed. Under such models, most genes that are duplicated in allvertebrates should have originated during the same period. Previous work has shown that indeed duplications startedafter the speciation between vertebrates and the closest invertebrate, amphioxus, but have not set a clear ending.Consideration of chordate phylogeny immediately shows the key position of cartilaginous vertebrates (Chondrichthyes)to answer this question. Did gene duplications occur as frequently during the 45 Myr between the cartilaginous/bonyvertebrate split and the fish/tetrapode split as in the previous approximately 100 Myr? Although the time interval isrelatively short, it is crucial to understanding the events at the origin of vertebrates. By a systematic appraisal of genephylogenies, we show that significantly more duplications occurred before than after the cartilaginous/bony vertebratesplit. Our results support rounds of gene or genome duplications during a limited period of early vertebrate evolution andallow a better characterization of these events.

Introduction

Vertebrates originated in the lower Cambrian (Shuet al. 2001), and their diversification and morphologicalinnovations have been attributed to large-scale gene orgenome duplications at the origin of the group (Ohno 1970;Holland et al. 1994). These duplications are predicted tohave occurred in two rounds, the ‘‘2R’’ hypothesis,although it may have been one genome duplication plusmany segmental duplications (Gu, Wang, and Gu 2002;McLysaght, Hokamp, and Wolfe 2002; Panopoulou et al.2003). An interesting prediction of this hypothesis is thatmost genes that are duplicated in all vertebrates should haveoriginated during the same period (for a discussion ofpredictions of the model, see Durand [2003]). Genephylogenies consistent with this model are predicted tocontain most duplications during a given speciationinterval. The comparison of gene complexes, such as hox(Holland et al. 1994; Force, Amores, and Postlethwait2002) or MHC (Abi-Rached et al. 2002), between specieschosen for their key positions in the phylogeny of chordates,thus consistently date a large number of gene duplicationsafter the divergence between the amphioxus and vertebrates(fig. 1). The choice of complexes of linked genes limits theinsight these studies bring into the evolution of the wholegenome, because each group of linked genes only samplesone locus. Studies of the distribution and age of duplicatedgenes in the whole human genome sequence haveestablished that gene duplications were indeed a massivephenomenon at the origin of vertebrates (Gu,Wang, and Gu2002; McLysaght, Hokamp, and Wolfe 2002). However,because of their reliance on only one complete genome from

a chordate and their reliance on the molecular clock, thesestudies cannot be very precise with respect to the dating andto the order of events, although efforts were done to addmore species to the gene trees. In a pioneering comparisonof phylogenies of unlinked genes, the tree topologies

obtained were inconsistent with a simple scenario of tworounds of tetraploidization (Hughes 1999), but no dating ofevents was proposed. Phylogenies of gene families from

various chordates show similar numbers of duplicationsbefore and after the lamprey/hagfish/gnathostome split, butresults are not explained simply by two tetraploidizations

(Escriva et al. 2002). All of these results are consistent withperiods of intensive gene duplication, rather than genomeduplication (Gu, Wang, and Gu 2002), although a recentphylogenetic study challenges even this scenario

(Friedman and Hughes 2003).Overall, there is support for a large number of gene

duplications after the divergence between cephalochordates

and vertebrates (Panopoulou et al. 2003), both before andafter the lamprey/hagfish/gnathostome split (Escriva et al.2002). This possibility leaves an important question mark

on the ending time of the duplication events, which couldrepresent a punctual event or could have occurred graduallyover a period of 160 to 300 Myr. Consideration of chordatephylogeny (fig. 1) immediately shows the key position of

chondrichthyans: if the massive gene duplications occurredalmost exclusively before or after the chondrichthyan(cartilaginous vertebrates)/teleostome (bony vertebrates)

split, this event supports ‘‘rounds’’ of duplications duringa limited period of early vertebrate evolution. Otherwise,if gene duplications are evenly spread over the period

between the cephalochordate/vertebrate split and theactinopterygian/sarcopterygian split, there is no evidencefor these ‘‘rounds,’’ but rather for a long period duringwhich duplication was more frequent than in sarcopterygian

evolution. Most studies do not include chondrichthyans,with the exception of two genes linked to the MHC, whichwere shown to be duplicated before the divergence of

chondrichthyans and teleostomes (Abi-Rached et al. 2002).Lack of chondrichthyan genome data has led us to use

1 Present address: Joint Center for Structural Genomics, University

of California, San Diego, La Jolla.

Key words: shark, ray, genome duplication, 2R hypothesis,phylogeny, Chondrichthyes.

E-mail: [email protected].

580

Mol. Biol. Evol. 21(3):580–586. 2004

DOI: 10.1093/molbev/msh046

Advance Access publication December 23, 2003

Molecular Biology and Evolution vol. 21 no. 3

Society for Molecular Biology and Evolution 2004; all rights reserved.

Page 212: Early Evolution and Phylogeny

the gene phylogeny approach to solve the question of whenvertebrate-specific gene duplications did happen, byconstructing phylogenetic trees of many protein-codinggenes sequenced in Chondrichthyes. As mentioned above,if there were two major rounds of duplication, whether ofgenes or genomes, we would expect most gene families toshow similar relative timing of speciation and duplicationevents. It should be noted that we are only interested invertebrate-specific duplications here. Duplications thatpredate the chordate/arthropod/nematode split (approxi-mately the origin of bilaterian animals), or more recentduplications such as frequently observed in actinopterygianfishes (Robinson-Rechavi et al. 2001), are outside the scopeof this study.

Materials and MethodsData Set

A first selection of gene families was done onHovergen (Duret, Mouchiroud, and Gouy 1994) version42 (April 2002), with the following criteria: at least oneChondrichthyes sequence, sequences from at least twoTeleostome classes (to distinguish vertebrate specific andclass specific duplications), and exclusion of mitochon-drion-encoded genes. These criteria selected 149 genefamilies, as defined in Hovergen, including 415 chon-drichthyan protein sequences. Protein alignments corre-sponding to the selected families were saved fromHovergen and checked using Seaview (Galtier, Gouy, andGautier 1996). Outgroup sequences were added by Blast(Altschul et al. 1990) searches on SwissprotþTrEMBL(Boeckmann et al. 2003), excluding results from Vertebrataand from viruses, as implemented at PBIL (Perriere et al.2003), and by Blast searches on the genome sequences ofDrosophila melanogaster (Adams et al. 2000), Caeno-rhabditis elegans (The C. elegans Sequencing Consortium

1998),Ciona intestinalis (Dehal et al. 2002), and Anophelesgambiae (Holt et al. 2002). Twelve gene families for whichno outgroup sequence could be reliably identified wereexcluded.

Gene families with duplications predating the arthro-pod/nematode/chordate divergence (fig. 2A) were split intosubfamilies, which were then evaluated separately forvertebrate-specific duplications. In cases of a vertebrategene without any known mammalian ortholog, additionalBlast searches were done on the human genome (In-ternational Human Genome Sequencing Consortium2001). In all Blast searches, an expect value of 0.01 andthe default filter for repeated sequences were used, andpotential new genes were assessed for relevance to ourstudy by a phylogenetic analysis. Once gene trees werebuilt (see below), 86 gene families were found to yieldphylogenies that could not be interpreted for dating ofevents at the origin of vertebrates (see Results). Notably,insufficient phylogenetic resolution was diagnosed whenthe gene tree was strongly inconsistent with the expectedspecies phylogeny (for example, lamprey grouping withchicken and mammals not monophyletic [NPY genefamily]) with very low bootstrap support (i.e., under 50%).

Phylogeny

All analyses were done using only complete sites (nogap, no X). When the inclusion of partial sequences led toless than 50 complete sites in the alignment, thesesequences were excluded manually in Phylo_win (Galtier,Gouy, and Gautier 1996), taking care to keep representa-tives of each taxonomic group (i.e., actinopterygians,sarcopterygians, chondrichthyans, and outgroup) and ofeach paralog, as much as possible. Sequences that did notpass a v2 test for homogeneity of amino acid composition(as implemented in Tree-Puzzle [Schmidt et al. 2002])

FIG. 1.—Possible timing of duplication events in chordate phylogeny. Schematic view of phylogenetic relations between chordates and possibletiming of rounds of gene or genome duplication according to recent results (not including this work). The black bar represents relative confidence that

duplications occurred essentially after the cephalochordate/vertebrate split, whereas the gray area represents the incertitude over the period when theduplication ended. Divergence dates are according to the fossil record (Samson, Smith, and Smith 1996; Shu et al. 1999; Zhu, Xiaobo, and Janvier1999; Basden et al. 2000; Shu et al. 2001); molecular clock dates are shown in parentheses (Nikoh et al. 1997; Kumar and Hedges 1998). Although thetopology (urochordates, [cephalochordates, vertebrates]) is well established, the corresponding dates of divergence are not known, apart from estimatesof the date of apparition of chordates, given here as a conservative estimate of the first divergence among chordates.

Cartilaginous Fish Reference for Gene Duplications 581

Page 213: Early Evolution and Phylogeny

were excluded. This exclusion meant that six gene familiesno longer fulfilled the conditions set in terms of speciessampling and were thus excluded from the data set. Treeswere constructed using Neighbor-Joining (Saitou and Nei1987) with distances corrected for multiple substitutionsunder a gamma model of rate heterogeneity (Yang 1996);the alpha parameter of the gamma model was estimated foreach alignment by Tree-Puzzle version 5.1 (Schmidt et al.2002) with eight rate categories, using default parameters.The following topologies were systematically comparedby an SH likelihood test (Shimodaira and Hasegawa1999), under the VT substitution model (Muller andVingron 2000) with a c model of rate heterogeneity, asimplemented in Tree-Puzzle 5.1 (Schmidt et al. 2002): (1)species tree with no duplication (fig. 2A), (2) duplicationafter the chondrichthyan/teleostome split (fig. 2B), (3)

duplication before the chondrichthyan/teleostome split(fig. 2C). When there were more than two vertebrateparalogs, all relative positions of the chondrichthyan/teleostome split and the duplications were compared; forexample (Chondr (Teleos-a (Teleos-b, Teleos-c))) versus(Teleos-a (Chondr (Teleos-b, Teleos-c))) versus (Teleos-a(Teleos-b (Chondr, Teleos-c))) versus ((Teleos-a,Chondr), (Teleos-b, Teleos-c)) and so on. Results wereconsidered supported if the likelihood of the favoredtopology was significantly higher than that of the bestalternative topology (SH test; P , 0.05). Other results areclassified as ‘‘not supported.’’ It should be noted that weare only interested in the relative order of events of geneduplication and the chondrichthyan/teleostome split.Thus, teleost fish-specific duplications, as well as contra-dictions between gene phylogeny and teleostome phylog-eny, as long as the latter were not statistically supported(they never were), were not taken into consideration toclassify phylogenetic results, as far as they do not hamperinterpretation of the trees. Moreover, when there wereinaccuracies in teleostome phylogeny in the Neighbor-Joining tree, likelihood tests were performed under boththe Neighbor-Joining and the species topology; signifi-cance of results was robust to the change.

Results

We selected gene families for the study in three steps:(1) selection on taxonomic criteria (sampling of cartilagi-nous and bony vertebrates, outgroup sequence); (2) manualconsideration of phylogenetic trees, to assess whether thegene families are appropriate to the question being asked;and (3) evaluation of phylogenetic robustness. Notably,a total of 86 gene families were eliminated in step 2. Themain causes limiting interpretation were (1) after splittinginto vertebrate-specific subfamilies, some genes no longerfulfill the conditions set in terms of species sampling(typically the chondrichthyan sequence fell in a subtreewith mammalian sequences and no other taxa); (2) veryshort sequences (NPY genes for example) with no

FIG. 2.—Classification of gene family phylogenies. Three schematicphylogenies, illustrating the possible interpretations of the order of thetiming of gene duplications in a gene family. The taxon names representgene sequences from these taxa, and ‘‘outgroup’’ represents sequencesfrom nonvertebrate species.The branch(es) that should be tested for theclassification of the gene family to be supported are in boldface. (A) Novertebrate specific duplication occurred, although gene duplications may(or may not) have occurred before the divergence of chordates from otheranimal lineages. (B) Vertebrate-specific gene duplication after thechondrichthyan/teleostome split. (C) Vertebrate-specific gene duplicationbefore the chondrichthyan/teleostome split; the broken line indicates thatthe conclusion can be reached even if only one chondrichthyan homologhas been sequenced.

FIG. 3.—Gene duplication history in chordates. Present knowledge(including this work) of rounds of gene duplication mapped on theschematic view of phylogenetic relations between chordates. Black boxesrepresent characterized rounds of duplication, white boxes representcharacterized periods with little accumulation of duplicate genes, andquestion marks represent lack of data to characterize duplications.

582 Robinson-Rechavi et al.

Page 214: Early Evolution and Phylogeny

phylogenetic resolution; (3) extremely conserved sequenceswith no phylogenetic resolution (histones for example);(4) clustered multigene families for which conversion andrecombination are well documented, typically from theimmunological system; and (5) other genes with nophylogenetic resolution, such as hox genes, which includea very conserved homeodomain, with little information, therest of the sequence being very divergent and with littleinformation also (see a discussion in Force, Amores, andPostlethwait [2002]). It may be noted that while thisselection mostly reduced the number of gene families used,splitting families with duplications predating the arthropod/nematode/chordate divergence increased the number ofphylogenies analyzed (two additional ‘‘families’’ of pro-teasome beta subunit genes and one additional ‘‘family’’ oftyrosine phosphatase genes [see table 1 in SupplementaryMaterial online]). Overall, the three steps of selection leadus from 149 gene families with cartilaginous and bonyvertebrate homologous sequences to 48 gene familieswhose evolutionary history can be used to date duplicationevents at the origin of vertebrates (eliminated gene familiesin table 3 of Supplementary Material online), a figure verysimilar to the numbers of genes analyzed in recent studiesusing the same approach in other organisms (i.e., Langkjaeret al. 2003; Taylor et al. 2003).

Results for each gene family are detailed in the firsttable and the figures in the Supplementary Material onlineat www.mbe.oupjournals.org. Gene families with a dupli-cation before the chondrichthyan/teleostome split (fig. 2C)clearly represent the majority of gene families we analyzed,including all 19 genes with significant phylogeneticresolution (table 1). Among the other 29 gene families,phylogenetic resolution is not significant at the chon-drichthyan/teleostome divergence level (table 1). Theseinclude the only two gene families indicating a duplicationafter the chondrichthyan/teleostome split: a mannose-binding lectin, or tetranectin (HBG008208), and the PTP1Dtyrosine phosphatase (‘‘tyrosines phosphatases (1)’’ in theSupplementary Material online). Of note, a different resultwas found for PTP1D in a previous study that did notinclude all available mammalian sequences (Ono-Koyanagiet al. 2000). Finally, 15 genes show no evidence for anyvertebrate-specific duplication. Our classification of thesetrees as ‘‘not supported’’ means that the species tree was notsignificantly more likely than other positions of chon-drichthyans. This is consistent with a previous study inwhich individual nuclear genes had low power in solving

the phylogenetic position of chondrichthyans (Martin2001). The low phylogenetic resolution for the position ofchondrichthyans among vertebrates is also consistent withthe small divergence time between chondrichthyans andteleostomes reported in the fossil record (fig. 1). Bycontrast, the good phylogenetic resolution for the positionof vertebrate-specific gene duplications may imply that thedivergence time between these duplications and thechondrichthyan/teleostome split was important and thatthe duplications occurred early in vertebrate evolution.

It is possible that the observed distribution of geneduplications simply reflects the difference between thetime intervals considered as ‘‘before the chondrichthyan/teleostome split’’ and ‘‘after the chondrichthyan/teleostome split.’’ To test this, let us consider only the27 gene families for which we have a vertebrate-specificduplication and a chordate outgroup (table 1: 16 þ 9 þ

2 ¼ 27), since they allow a more precise dating of events.If we use paleontological datings (fig. 1), the intervalbetween chordate diversification and the chondrichthyan/teleostome split is 98 Myr, whereas the interval betweenthis and the sarcopterygian/actinopterygian split is 45Myr. Then we expect 31% (45/[98 þ 45]) of vertebrate-specific gene duplications to be after the chondrichthyan/teleostome (C/ T) split, under the assumption of a constantrate of gene duplication; the 95% confidence interval ofthis estimate is 14% to 49% ( f6 1.96 var ¼ f (1 f )/N;f ¼ 0.31; N ¼ 27). If we use molecular clock estimatesof divergence dates (fig. 1), we expect 26% of geneduplications after C/ T (confidence interval ¼ 9.5% to43%). The observed proportion of 7.4% (2/27) is signifi-cantly lower than expected by chance in either dating system(outside of the 95% confidence intervals). This conclusionholds true if we only use the 16 significantly supportedphylogenieswith a chordate outgroup (table 1): the observedproportion of duplications after the C/ T split is 0%, whereasthe expected value’s confidence interval is either 8.7% to54% (paleontological dates), or 4.5% to 47% (molecularclock dates). Thus, gene duplications are significantly lessfrequent after than before the chondrichthyan/teleostomesplit, taking into account evolutionary time.

Although our data set is not meant for detailed testingof duplication hypotheses in other branches of the tree, it isinteresting to compare duplications that appear specific toeither of the two major branches of teleostomes: out of 48gene families, there are three with sarcopterygian-specificduplications and eight with actinopterygian-specific

Table 1Distribution of Duplication Histories of Gene Families

Duplication Timing

None Before C/T Split After C/T Split

Outgroup Chordate Other Chordate Other Chordate Total

Significant 0 0 16 3 0 19Not significant 12 3 9 3 2 29

Total 15 (31%) 31 (65%) 2 (4%) 48

NoTE.—Numbers of gene families supporting each evolutionary history; no gene is counted twice. Phylogenetic support

is noted as ‘‘significant’’ if the position of the chondrichthyan gene(s) is supported by a likelihood test (P, 0.05). The ‘‘C/T’’ split is

the divergence between Chondrichthyes and Teleostomi. Chordate outgroups include ascidians, amphioxuses, lampreys, and

hagfishes.

Cartilaginous Fish Reference for Gene Duplications 583

Page 215: Early Evolution and Phylogeny

duplications (see the second table in the SupplementaryMaterial online at www.mbe.oupjournals.org), consistentwith previous observations (Robinson-Rechavi et al.2001). Interestingly, these more recent duplicationsconcern 28% of the 32 gene families for which we haveobserved gene duplications ancestral to vertebrates butonly 12.5% of the 16 gene families without vertebratespecific duplications.

Discussion

The ‘‘2R’’ hypothesis, modified from Ohno (1970),can be summarized by the idea that major duplicationevents occurred specifically in chordate genomes beforethe emergence of bony vertebrates. This hypothesispredicts that duplications should have occurred over a shortperiod of time, in much greater numbers than in theprevious or following periods. This prediction is shared bymore recent hypotheses that there may have been onegenome duplication and one major wave of segmentalduplications (Gu, Wang, and Gu 2002; McLysaght,Hokamp, and Wolfe 2002; Panopoulou et al. 2003). Thebeginning time has been relatively well established, withstudies showing that gene duplications occurred after thecephalochordate/vertebrate split and both before and afterthe gnathostome/jawless vertebrate split (Pennisi 2001;Wolfe 2001; Abi-Rached et al. 2002; Escriva et al. 2002;Gu, Wang, and Gu 2002; McLysaght, Hokamp, and Wolfe2002; Panopoulou et al. 2003), but these studies did not setan ending time to these events. Given the prevalence ofgene duplications in actinopterygian fishes (Wittbrodt,Meyer, and Schartl 1998; Robinson-Rechavi et al. 2001;Taylor et al. 2001), this raises the question of whethersomething specific really happened at the origin ofvertebrates or whether gene duplications have beena common phenomenon throughout chordate evolutionaryhistory, with the exception of sarcopterygians.

It is indeed noticeable that there has been no report ofgenome duplications ancestral to sarcopterygians (Pennisi2001; Wolfe 2001; Durand 2003) or to any of the well-studied groups therein (e.g., tetrapodes, mammals, orsauropsids). Our own data are consistent with previousobservations (Robinson-Rechavi et al. 2001; Taylor et al.2001) that duplicate genes are significantly less abundantin sarcopterygians than in actinopterygians. Analysis ofinvertebrate chordate data also indicates that geneduplications are not abundant in these lineages (Dehalet al. 2002; Panopoulou et al. 2003).

Comparison of MHC-associated genes gave limitedevidence for duplications before the chondrichthyan/teleostome split from two genes (Abi-Rached et al. 2002).Our results show that this pattern is general, with almostall vertebrate-specific gene duplications occurring beforethe chondrichthyan/teleostome split (table 1). This, addedto all the previously published evidence, implies threewaves of gene or genome duplications, two betweenthe cephalochordate split and the chondrichthyan split andthe other in actinopterygian fishes, separated by a periodof ‘‘duplication calm’’ of about 45 Myr (which continuedfor 400 Myr in tetrapodes), which, although short, issignificant. A major prediction of Ohno’s (1970) original

hypothesis, that of intense gene or genome duplicationactivity before the origin of vertebrates, is thus confirmedby the study.

Moreover, our results show that these gene duplica-tions characterize all the jawed vertebrates and predictsimilar genetic complexity in sharks and rays as intetrapodes. Consistent results are found for the evolutionof hox clusters, which allow a direct connection betweenblock duplications and morphological adaptations. Al-though hox genes are very poor phylogenetic markers, asillustrated by the difficulty in resolving the events that ledto the different clusters of gnathostomes and lampreys(Force, Amores, and Postlethwait 2002; Irvine et al. 2002),partial sequences from the horn shark indicate that theduplications that led to four hox clusters in teleostomesoccurred before the chondrichthyan/teleostome divergence(Kim et al. 2000). Moreover, horn shark and humanhoxA clusters are remarkably conserved (Chiu et al. 2002).Thus, hox cluster analysis and our phylogenetic resultsare consistent in establishing no relation between geneduplications and the larger diversity of bony vertebratesthan of cartilaginous vertebrates.

Although the basal branching of chondrichthyansamong jawed vertebrates is considered extremely wellsupported by morphological and paleontological data(Janvier 1996), the analysis of complete mitochondrialsequences suggests a very different phylogeny, withchondrichthyans branching among bony ray-finned fishes(Actinopterygii) (Rasmussen and Arnason 1999). Thissurprising result has not been confirmed by any othersource of data, and molecular phylogenies based onnuclear-encoded genes either are not informative (Martin2001; this study) or strongly support the conventionalbranching position of chondrichthyans (Takezaki et al.2003). In any case, our results show that vertebrate-specific gene duplications occurred before the divergencebetween chondrichthyans, actinopterygians, and sarcop-terygians, whatever the order of these latter events.

Our results are at odds with a recent study that useda similar approach, dating gene duplications by theirphylogenetic position relative to speciation events (Fried-man and Hughes 2003). There are several differencesbetween our methodology and that of Friedman andHughes, but the main difference is the criterion forclassifying gene duplications within speciation intervals.We consider genes to be duplicated within a given interval(i.e., between chordate diversification and the chon-drichthyan/teleostome split) only if all relevant taxonomicgroups (and thus speciations) are represented in the genetree (i.e., a urochordate or a cephalochordate, a chon-drichthyan, and a teleostome). Friedman and Hughes(2003) classify duplications as soon as they can be datedbefore or after one speciation. Moreover they used verydistant dating points (i.e. the primate/rodent, amniote/amphibian, and deuterostome/protostome splits). It isunclear why they did not date duplications relative to theactinopterygian/sarcopterygian split, because this specia-tion would have been more relevant to the ‘‘2R’’controversy, while taking advantage of genome data. Asamphibians are the only lineage involved for whicha genome sequence is not available, this may lead them

584 Robinson-Rechavi et al.

Page 216: Early Evolution and Phylogeny

to include in the ‘‘before primate/rodent’’ category geneduplications that occurred before the amphibian/amniotesplit but for which they do not have amphibian sequencesin the tree. This in turn may introduce a bias in theirargument that the abundance of ‘‘before primate/rodent’’versus ‘‘before amniote/amphibian’’ duplications is evi-dence against a peak of gene duplications at the originof vertebrates. We believe that in our study, the divisionof the sequences into major taxonomic units, and ourseparation of the results according to the outgroupsequences used (table 1), preserve our results from suchbiases. Thus, differences in the conclusions between thatstudy (Friedman and Hughes 2003) and ours probablyreflect different sampling strategies.

An interesting side observation from our data set isthat observations of gene duplications at the origin ofvertebrates, and more recently in either the actinopterygianor sarcopterygian lineage, appear correlated. This may bethe result of sampling; for example, better detection ofduplications in more studied genes. Alternatively, it mayindicate that the function of certain genes makes them moreprone to persisting as duplicate copies. Such a tendency hasindeed been recently shown in yeasts, where certain genesare retained independently as duplicates in different species(Hughes and Friedman 2003).

This study and other recent studies draw an in-creasingly precise picture of gene or genome duplicationwaves in chordates (fig. 3), although questions remain.Among the six branches of the chordate tree for whichsufficient data are available, three are characterized byabundant preservation of duplicate genes, all of them invertebrates. It has also been suggested on the basis ofchromosome counts that polyploidy played an importantpart in lamprey evolution (Potter and Rothwell 1970). Ofcourse it is probable that small-scale duplications have beencontinuous on all branches of the tree (Lynch and Conery2000; Gu, Wang, and Gu 2002). However, large-scaleduplications seem to have been frequent in vertebrateevolution, and the branches where they are absent, such asthe origin of bony vertebrates, appear as the exceptionrather than the rule.

Acknowledgments

We thank Hector Escriva and Manolo Gouy forcritical reading. This work was supported by the CNRSand the ENS Lyon.

Literature Cited

Abi-Rached, L., A. Gilles, T. Shiina, P. Pontarotti, and H. Inoko.2002. Evidence of en bloc duplication in vertebrate genomes.Nat. Genet. 22:22.

Adams, M. D., S. E. Celniker, R. A. Holt et al. (195 co-authors).2000. The genome sequence of Drosophila melanogaster.Science 287:2185–2195.

Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J.Lipman. 1990. Basic local alignment search tool. J. Mol. Biol.215:403–410.

Basden, A. M., G. C. Young, M. I. Coates, and A. Ritchie. 2000.The most primitive osteichthyan braincase? Nature 403:185–188.

Boeckmann, B., A. Bairoch, R. Apweiler et al. (12 co-authors).2003. The SWISS-PROT protein knowledgebase and itssupplement TrEMBL in 2003. Nucleic Acids Res. 31:365–370.

Chiu, C.-H., C. Amemiya, K. Dewar, C.-B. Kim, F. H. Ruddle,and G. P. Wagner. 2002. Molecular evolution of the hoxAcluster in the three major gnathostome lineages. Proc. Natl.Acad. Sci. USA 99:5492–5497.

Dehal, P., Y. Satou, R. K. Campbell et al. (87 co-authors). 2002.The draft genome of Ciona intestinalis: insights into Chordateand Vertebrate origins. Science 298:2157–2167.

Durand, D. 2003. Vertebrate evolution: doubling and shufflingwith a full deck. Trends Genet. 19:2–5.

Duret, L., D. Mouchiroud, and M. Gouy. 1994. HOVERGEN:a database of homologous vertebrate genes. Nucleic AcidsRes. 22:2360–2365.

Escriva, H., L. Manzon, J. Youzon, and V. Laudet. 2002.Analysis of lamprey and hagfish genes reveals a complexhistory of gene duplications during early vertebrate evolution.Mol. Biol. Evol. 19:1440–1450.

Force, A., A. Amores, and J. H. Postlethwait. 2002. Hox clusterorganization in the jawless vertebrate Petromyzon marinus.J. Exp. Zool. 294:30–46.

Friedman, R., and A. L. Hughes. 2003. The temporal distributionof gene duplication events in a set of highly conserved humangene families. Mol. Biol. Evol. 20:154–161.

Galtier, N., M. Gouy, and C. Gautier. 1996. SEAVIEW andPHYLO_WIN: two graphic tools for sequence alignment andmolecular phylogeny. Comput. Appl. Biosci. 12:543–548.

Gu, X., Y. Wang, and J. Gu. 2002. Age distribution of humangene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31:205–209.

Holland, P. W., J. Garcia-Fernandez, N. A. Williams, and A.Sidow. 1994. Gene duplications and the origins of vertebratedevelopment. Development (suppl):125–133.

Holt, R. A., G. M. Subramanian, A. Halpern et al. (123 co-authors). 2002. The genome sequence of the malaria mosquitoAnopheles gambiae. Science 298:129–149.

Hughes, A. L. 1999. Phylogenies of developmentally importantproteins do not support the hypothesis of two rounds ofgenome duplication early in vertebrate history. J. Mol. Evol.48:565–576.

Hughes, A. L., and R. Friedman. 2003. Parallel evolution by geneduplication in the genomes of two unicellular fungi. GenomeRes. 13:794–799.

International Human Genome Sequencing Consortium. 2001.Initial sequencing and analysis of the human genome. Nature409:860–921.

Irvine, S. Q., J. L. Carr, W. J. Bailey, K. Kawasaki, N. Shimizu,C. T. Amemiya, and F. H. Ruddle. 2002. Genomic analysis ofhox clusters in the sea lamprey Petromyzon marinus. J. Exp.Zool. 294:47–62.

Janvier, P. 1996. Early vertebrates. Clarendon Press, Oxford.Kim, C.-B., C. Amemiya, W. Bailey, K. Kawasaki, J. Mezey, W.

Miller, S. Minoshima, N. Shimizu, G. Wagner, and F. Ruddle.2000. Hox cluster genomics in the horn shark, Heterodontusfrancisci. Proc. Natl. Acad. Sci. USA 97:1655–1660.

Kumar, S., and S. B. Hedges. 1998. A molecular timescale forvertebrate evolution. Nature 392:917–920.

Langkjaer, R. B., P. F. Cliften, M. Johnston, and J. Piskur. 2003.Yeast genome duplication was followed by asynchronousdifferentiation of duplicated genes. Nature 421:848–852.

Lynch, M., and J. S. Conery. 2000. The evolutionary fate andconsequences of duplicate genes. Science 290:1151–1155.

Martin, A. 2001. The phylogenetic placement of chondrichthyes:inferences from analysis of multiple genes and implicationsfor comparative studies. Genetica 111:349–357.

Cartilaginous Fish Reference for Gene Duplications 585

Page 217: Early Evolution and Phylogeny

McLysaght, A., K. Hokamp, and K. H. Wolfe. 2002. Extensivegenomic duplication during early chordate evolution. Nat.Genet. 31:200–204.

Muller, T., and M. Vingron. 2000. Modeling amino acidreplacement. J. Comput. Biol. 7:761–776.

Nikoh, N., N. Iwabe, K. Kuma et al. (11 co-authors). 1997. Anestimate of divergence time of Parazoa and Eumetazoa andthat of Cephalochordata and Vertebrata by aldolase and triosephosphate isomerase clocks. J. Mol. Evol. 45:97–106.

Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag,Heidelberg.

Ono-Koyanagi, K., H. Suga, K. Katoh, and T. Miyata. 2000.Protein tyrosine phosphatases from amphioxus, hagfish, andray: divergence of tissue-specific isoform genes in the earlyevolution of vertebrates. J. Mol. Evol. 50:302–311.

Panopoulou, G., S. Hennig, D. Groth, A. Krause, A. J. Poustka,R. Herwig, M. Vingron, and H. Lehrach. 2003. New evidencefor genome-wide duplications at the origin of vertebratesusing an amphioxus gene set and completed animal genomes.Genome Res. 13:1056–1066.

Pennisi, E. 2001. Genome duplications: the stuff of evolution?Science 294:2458–2460.

Perriere, G., C. Combet, S. Penel et al. (11 co-authors). 2003.Integrated databanks access and sequence/structure analysisservices at the PBIL. Nucleic Acids Res. 31:3393–3399.

Potter, I. C., and B. Rothwell. 1970. The mitotic chromosomesof the lamprey, Petromyzon marinus L. Experientia 26:429–430.

Rasmussen, A. S., and U. Arnason. 1999. Molecular studiessuggest that cartilaginous fishes have a terminal position in thepiscine tree. Proc. Natl. Acad. Sci. USA 96:2177–2182.

Robinson-Rechavi, M., O. Marchand, H. Escriva, P.-L. Bardet,D. Zelus, S. Hughes, and V. Laudet. 2001. Euteleost fishgenomes are characterized by expansion of gene families.Genome Res. 11:781–788.

Saitou, N., and M. Nei. 1987. The neighbor-joining method:a new method for reconstructing phylogenetic trees. Mol.Biol. Evol. 4:406–425.

Samson, I. J., M. M. Smith, and M. P. Smith. 1996. Scales ofthelodont and shark-like fishes from the Ordovician of Colo-rado. Nature 379:628–630.

Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler.

2002. TREE-PUZZLE: maximum likelihood phylogenetic

analysis using quartets and parallel computing. Bioinformatics

18:502–504.Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons

of log-likelihoods with applications to phylogenetic inference.

Mol. Biol. Evol. 16:1114–1116.Shu, D. G., L. Chen, J. Han, and X. L. Zhang. 2001. An early

Cambrian tunicate from China. Nature 411:472–473.Shu, D. G., H. L. Luo, S. Conway Morris, X. L. Zhang, S. X. Hu,

L. Chen, L. Han, M. Zhu, Y. Li, and L. Z. Chen. 1999. Lower

Cambrian vertebrates from south China. Nature 402:42–46.Takezaki, N., F. Figueroa, Z. Zaleska-Rutczynska, and J. Klein.

2003. Molecular phylogeny of early vertebrates: monophyly

of the agnathans revealed by sequences of 35 genes. Mol.

Biol. Evol. 20:287–292.Taylor, J. S., I. Braasch, T. Frickey, A. Meyer, and Y. Van de

Peer. 2003. Genome duplication: a trait shared by 22,000

species of ray-finned fish. Genome Res. 13:382–390.Taylor, J. S., Y. Van de Peer, I. Braasch, and A. Meyer. 2001.

Comparative genomics provides evidence for an ancient

genome duplication event in fish. Philos. Trans. R. Soc. Lond.

B Biol. Sci. 356:1661–1679.The C. elegans Sequencing Consortium. 1998. Genome sequence

of the nematode C. elegans: a platform for investigating

biology. Science 282:2012–2018.Wittbrodt, J., A. Meyer, and M. Schartl. 1998. More genes in

fish? Bioessays 20:511–515.Wolfe, K. H. 2001. Yesterday’s polyploids and the mystery of

diploidization. Nat. Rev. Genet. 2:333–341.Yang, Z. 1996. Among-site variation and its impact on phylo-

genetic analyses. Trends Ecol. Evol. 11:367–371.Zhu, M., Y. Xiaobo, and P. Janvier. 1999. A primitive fossil fish

sheds light on the origin of bony fishes. Nature 397:607–610.

Herve Phillippe, Associate Editor

Accepted October 25, 2003

586 Robinson-Rechavi et al.

Page 218: Early Evolution and Phylogeny

❱❯ ❨

♥♦♠ ♦♥t♥t ♦t♦♥ ♥ t ♠② ♦

♠t♦♦♥r

Page 219: Early Evolution and Phylogeny

Computational inference of scenarios for-proteobacterial genome evolutionBastien Boussau, E. Olof Karlberg, A. Carolin Frank, Boris-Antoine Legault, and Siv G. E. Andersson†

Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University, S-752 36 Uppsala, Sweden

Edited by Stanley Falkow, Stanford University, Stanford, CA, and approved May 18, 2004 (received for review February 11, 2004)

The -proteobacteria, from which mitochondria are thought to

have originated, display a 10-fold genome size variation and

provide an excellent model system for studies of genome size

evolution in bacteria. Here, we use computational approaches to

infer ancestral gene sets and to quantify the flux of genes along the

branches of the -proteobacterial species tree. Our study reveals

massive gene expansions at branches diversifying plant-associated

bacteria and extreme losses at branches separating intracellular

bacteria of animals and humans. Alterations in gene numbers have

mostly affected functional categories associated with regulation,

transport, and small-molecule metabolism, many of which are

encoded by paralogous gene families located on auxiliary chro-

mosomes. The results suggest that the -proteobacterial ancestor

contained 3,000–5,000 genes and was a free-living, aerobic, and

motile bacterium with pili and surface proteins for host cell and

environmental interactions. Approximately one third of the ances-

tral gene set has no homologs among the eukaryotes. More than

40% of the genes without eukaryotic counterparts encode proteins

that are conserved among the -proteobacteria but for which no

function has yet been identified. These genes that never made it

into the eukaryotes but are widely distributed in bacteria may

represent bacterial drug targets and should be prime candidates

for future functional characterization.

Fundamental questions subjected to much debate concern theextent to which microbial genomes are related by vertical

descent versus horizontal gene transfer (1–5). A direct approachto address these questions is to estimate frequencies of dele-tionsduplications and horizontal gene transfers for closelyrelated species and compare these estimates with estimates ofnucleotide substitution rates. The -proteobacteria provide anexcellent model system for such studies because genome sizevariation in this subdivision spans the entire size range forbacteria, from 1 Mb in Rickettsia spp. to 9 Mb in Bradyrhizo-bium japonicum (6–12). Furthermore, there is an amazingvariation in lifestyle characteristics in this subdivision, includingboth obligate (Rickettsia and Wolbachia) and facultative (Bar-tonella and Brucella) intracellular bacteria as well as soil-borneplant symbionts and pathogens (Sinorhizobium, Agrobacterium,and Bradyrhizobium), which enables correlations between genecontents and lifestyle features to be examined.

The -proteobacterial group has also attracted much interestbecause one of its descending lineages is thought to be theancestor of mitochondria (13, 14). The acquisition of mitochon-dria represents one of the earliest and most extreme cases ofhorizontal gene transfer events known in the history of life.Phylogenetic studies suggest that 630 eukaryotic genes weretransferred from the -proteobacteria to the eukaryotes, includ-ing many genes coding for modern mitochondrial protein func-tions (15). For the majority of mitochondrial proteins, however,no bacterial homologs were identified, indicating that they werederived from nuclear, eukaryotic genomes via intragenomicduplication and sequence divergence (14–16).

Based on results from pairwise genome comparisons, it hasbeen suggested that there is a correlation between genome sizealterations, microbial population sizes, and growth habitats (17).For example, it has been shown that free-living bacterial species

of large population sizes accumulate insertiondeletion andrearrangement mutations relative to nucleotide substitutions atmuch higher frequencies than host-dependent bacteria of smallpopulation sizes, in which the influence of horizontal genetransfers has been negligible (17). Algorithms for mapping thepresence and absence of genes onto inferred species trees inmultiple genome comparisons (18, 19) have been used to re-construct ancestral gene sets and to obtain estimates of the flowof genes along each of the individual branches. By using suchapproaches, 500 genes have been assigned to the last universalcommon ancestor (LUCA) (19), and 2,000 genes have beenassigned to the ancestor of the Archaea (18).

In this study, we used the -proteobacteria as a model systemto examine the contents of ancestral genomes along with theevolutionary basis for genome size differences. Our resultssuggest that the -proteobacterial ancestor contained severalthousand genes and was metabolically highly versatile. The fluxof genes along the individual branches of the tree highlights therole of the auxiliary chromosomes as mediators of genome sizeexpansions and contractions in response to alterations in envi-ronmental conditions.

Materials and Methods

Genome Analysis. The sizes and GenBank accession numbers of-proteobacterial genomes included in this analysis are given inTable 1. The assignment of functional categories for proteins inRickettsia prowazekii, Rickettsia conorii, Brucella melitensis, Bru-cella suis, Caulobacter crescentus, Agrobacterium tumefaciens,Sinorhizobium meliloti, and Mesorhizobium loti was taken fromthe Institute for Genomic Research (www.tigr.org). Uncatego-rized proteins and proteins from Bartonella henselae, Bartonellaquintana, and B. japonicum were assigned a functional categoryaccording to the best hit in similarity searches using BLASTP (E

1 1010) against all classified proteins from The Institute forGenomic Research (www.tigr.org). Additional proteobacterialgenomes included as outgroups in the analyses were Campy-lobacter jejuni (NC002163), Escherichia coli (NC000913), Hel-icobacter pylori (NC000913), Pseudomonas aeruginosa(NC002516), Ralstonia solanacearum (NC003296), Salmonellatyphimurium (NC003197 and NC003277), and Xylella fastidiosa(NC002490).

Phylogenetic Inference. The species phylogeny was estimated byusing a data set of concatenated proteins that were selected onthe basis that they are encoded by genes that are located insegments with largely conserved gene order structures in B.henselae, B. quintana, B. melitensis, A. tumefaciens, S. meliloti,and M. loti (see Fig. 6, which is published as supporting infor-mation on the PNAS web site). Homologs of the selectedproteins B. quintana were inferred by BLASTP (20) searches (E

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: LUCA, last universal common ancestor; COGs, Clusters of Orthologous

Groups; BeT, symmetric best hit.

†To whom correspondence should be addressed at: Department of Molecular Evolution,

Norbyvagen 18C, S-752 36 Uppsala, Sweden. E-mail: [email protected].

© 2004 by The National Academy of Sciences of the USA

9722–9727 PNAS June 29, 2004 vol. 101 no. 26 www.pnas.orgcgidoi10.1073pnas.0400975101

Page 220: Early Evolution and Phylogeny

1 1020) against the protein data set of each -proteobacterialgenome. To exclude paralogs we included in the analysis onlygenes without a second BLAST hit with an E value of 1 1020.Another selection criteria for inclusion used was that orthologsshould be present in at least 12 of the 20 taxa, resulting in a finalset of 38 proteins (Table 3, which is published as supportinginformation on the PNAS web site).

The alignment was performed by using CLUSTALW (21) onindividual protein sequences that were later concatenated.Maximum-likelihood phylogenies were constructed by usingPHYML (version 2.1 beta) (22) assuming the Jones–Taylor–Thornton model of protein evolution and four -distributed ratecategories with the parameter and proportion of invariablesites estimated from the data. To assess the variation in the data,100 bootstrap replicates were generated from the data set withSEQBOOT from the PHYLIP 3.5c package (J. Felsenstein, Depart-ment of Genetics, University of Washington, Seattle). Maxi-mum-likelihood trees were estimated from the bootstrap matri-ces as described above, and a majority-rule consensus tree wasgenerated from them by using CONSENSE, also from the PHYLIP

3.5C package.

Inference of Ancestral Gene Sets. The homologous groups werecreated by using the Clusters of Orthologous Groups (COGs)database (23) in its 66-genomes version. Proteomes classified inCOGs were retrieved from the COGs database. Six unclassifiedproteomes (B. henselae, B. quintana, B. suis, B. japonicum,Rhodopseudomonas palustris, and Wolbachia pipientis) were as-signed COGs according to the following procedure: the proteinsin each unclassified proteome were used as first queries and thendatabases in separate BLAST searches with all proteomes in theCOGs database. The unclassified proteins were added to theCOG to which it had the highest number of symmetric best hits(BeTs) and BeTs 1. Because this procedure expanded theCOGs, the same was done for all the unclassified proteins fromthe other species so as to also include proteins with BeTs to thenewly assigned proteins. New clusters were then created fromuncategorized proteins forming triangles of BeTs as described inref. 23. Finally, clusters containing only two proteins were madefrom linear BeT relations, after which the remaining proteinswere included as single genes.

The most parsimonious scenarios of -proteobacterial ge-nome evolution and the -proteobacterial ancestor were recon-structed by character mapping by using generalized parsimony asimplemented in PAUP* (version 4.0b10 for Unix) (24) on a rooted

species tree, with ACCTRAN (accelerated transformation) (seeFig. 3) and DELTRAN (delayed transformation) (Fig. 7, which ispublished as supporting information on the PNAS web site)options for parsimony analysis. Fig. 3 shows the results forpenalties for duplications, deletions, and gene genesis of 1, 1, and5, respectively. The selection of penalty values and resultsobtained for different penalty values are described in Fig. 7.

The ancestral proteomes were inferred separately for proteinfamilies assigned to auxiliary (mega-COG) and main (main-COG) chromosomes. The criteria for inclusion in the mega-COG family were that 30% of the protein members wereencoded on auxiliary replichores or symbiosis islands in theRhizobiales. By using these criteria, 43% of the proteins encodedby the auxiliary replichores and 6% of chromosomally encodedproteins were members of the mega-COG families on average.Because many of the species-specific genes are located on theauxiliary replichores, we used the complete -proteobacterialproteome for this analysis. The gene content of the inferred-proteobacterial ancestral genome was compared with theestimated gene content of protomitochondria (15) and theLUCA (19) by using the presence or absence of a COG ratherthan the absolute numbers of genes.

Results and Discussion

Gene Function of -Proteobacterial Genomes. To explore expan-sions in gene function with genome size for the -proteobacteria(Table 1), we examined gene content statistics for 14 functionalcategories (Fig. 1). The relationships between gene content andgenome size can be approximated with linear functions, withslopes ranging from four genes per megabase for basic informa-tion processes such as transcription and translation to 80 genesper megabase for energy metabolism, transport, and regulatoryfunctions. Functional categories associated with environmentalinteractions (e.g., transport and regulation) were found to be themost variable among bacteria with different lifestyles. Forexample, the small genomes of obligate and facultative intracel-lular parasites have only a few regulatory and transport genes,whereas the larger genomes of free-living soil bacteria thatalternate between environments of different nutritional qualitycontain hundreds of such genes. A rapid increase in the numberof regulatory genes in relation to gene content has been observed(25, 26) and may be a general feature of all bacterial genomes.

Extrapolation to the intercept of the y axis provides a measureof the minimal set of genes shared among the -proteobacteria,which here is estimated to 250 genes (Table 4, which is publishedas supporting information on the PNAS web site). This setincludes 200 genes for DNA, RNA, and protein biosynthesisand another 40 genes for nucleotide and cofactor biosynthesis.This is comparable with the minimal set of core genes inendosymbiotic bacteria (27) as well as to minimal gene numbersinferred by computational approaches (28) and experimentalknockout mutants of Bacillus subtilis (29).

The Species Tree for -Proteobacteria. To place the dramatic shiftsin genome size in an evolutionary context, we needed anunderlying reliable species tree onto which the gene sets couldbe mapped. Because a few of the divergence nodes were notconclusively resolved in our rRNA tree (data not shown), weinferred the tree topology by using concatenated protein se-quences (Fig. 2). To minimize topology inconsistencies caused byhorizontal gene transfer and gene paralogy, we selected for thisanalysis a set of 38 genes sampled from regions with conservedgene order structures in the Rhizobiales (Fig. 6 and Table 3).

The phylogenetic tree (Fig. 2), constructed by using themaximum-likelihood method, provided strong support for aclustering of the Rhizobiales to the exclusion of the more earlydiverging lineages B. japonicum, C. crescentus, and the Rickett-siales. The two Bartonella species formed a clade with Brucella

Table 1. -Proteobacterial species included in the

reconstruction analysis

Species Total size, Mb GenBank accession no. (size, Mb)

R. prowazekii 1.1 NC_000963 (1.1)

R. conorii 1.3 NC_003103 (1.3)

W. pipientis 1.3 NC_002987 (1.3)

B. quintana 1.6 BX897700 (1.6)

B. henselae 1.9 BX897699 (1.9)

B. melitensis 3.3 NC_003317 (2.1), NC_003318 (1.2)

B. suis 3.3 NC_004310 (2.1), NC_004311 (1.2)

C. crescentus 4.0 NC_002696 (4.0)

R. palustris 5.5 NC_005296 (5.5)

A. tumefaciens 5.6 NC_003062 (2.8), NC_003063 (2.1),

NC_003064 (0.5), NC_003065 (0.2)

S. meliloti 6.7 NC_003047 (3.6), NC_003037 (1.4),

NC_003078 (1.7)

M. loti 7.6 NC_002678 (7.0), NC_002679 (0.4),

NC_002682 (0.2)

B. japonicum 9.1 NC_004463 (9.1)

Boussau et al. PNAS June 29, 2004 vol. 101 no. 26 9723

EV

OLU

TIO

N

Page 221: Early Evolution and Phylogeny

with high bootstrap support, as did also A. tumefaciens and S.meliloti, which formed a separate clade. The position of M. lotiwas placed with high support (90%) close to the root of theBartonellaBrucella clade. However, the branches separating M.loti from its neighboring clades are very short and the placementof M. loti in the tree was found to be sensitive both to the methodsused and to the genes and species sampled (data not shown). Forall other divergences, the tree topology was robust. The branch-ing order depicted in Fig. 2 represents our best estimate of theunderlying species tree.

Computational Inference of Ancestral Gene Sets. We inferred an-cestral -proteobacterial proteomes and estimated the numberof gene losses, duplications, and genesis events along eachbranch of the topology shown in Fig. 2 with character mappingusing generalized parsimony (Figs. 3 and 7). Following theroutines of previous work (18, 19), we included in the analysisproteins already classified in the COGs database (23) along withproteins encoded by genomes not yet incorporated in the COGsdatabase but related to existing COGs by BeTs. This processresulted in a first data set of 56,337 proteins, to which we added384 COGs containing proteins not related to any existing COGsbut present in three or more species and internally related byBeTs. With the inclusion of these proteins, the data set

amounted to 58,171 proteins, and the -proteobacterial ancestralproteome was estimated to 3,300 proteins (Fig. 3a). The remain-ing proteins were assigned into single or linear protein COGs,which resulted in a data set that included all 73,658 proteins andyielded an ancestral proteome of 5,000 proteins (Fig. 3b).Because some of the species-specific genes may be rapidlyevolving or incorrectly annotated as genes, their inclusion prob-ably results in an overestimate of the ancestral proteome size(Fig. 3b), just as their exclusion may yield an underestimate (Fig.3a). Thus, we define the lower and upper boundaries of theancestral -proteobacterial proteome to 3,000 and 5,000 pro-teins, respectively.

Metabolic Expansions and Contractions. The analyses of gene con-tent alterations at the branches of the tree revealed two majortrends that are observed irrespectively of the different data setsand methods used (Fig. 4). First, massive genome size expansionsaccompanied the divergence of the plant-associated Rhizobiales,particularly the evolution of M. loti and B. japonicum. Thereseems to have been a gradual increase of genes encodingtranscriptional regulators and proteins involved in the transportand metabolism of amino acids, nucleotides, carbohydrates,coenzymes, lipids, inorganic ions, and secondary metabolites.These expansions argue in favor of ancestral cells being visited

Fig. 1. Plot of genome size against gene content for each of the functional

categories. RP, R. prowazekii; RC, R. conorii; BQ, B. quintana; BH, B. henselae;

BM, B. melitensis; BS, B. suis; CC, C. crescentus; AT, A. tumefaciens; SM, S.

meliloti; ML, M. loti; and BJ, B. japonicum. See Table 1 for genome sizes. The

data were separated into two sections (a and b) to prevent overcrowding.

Fig. 2. Phylogenetic relationship of 13 -proteobacterial species (high-

lighted by the purple background) with 7 species from other proteobacterial

subdivisions as outgroups. The topology, branch lengths, and bootstrap

support are according to maximum-likelihood reconstructions with the

Jones–Taylor–Thornton 4I model. Similar results were obtained with the

neighbor-joining method and after removal of positions with gaps. A list of

genes used for the phylogenetic reconstructions is given in Table 5. Abbrevi-

ations for species names are as described in the legend to Fig. 1 with the

addition of the following taxa: WP, W. pipientis; RhP, R. palustris; CJ, C. jejuni;

EC, E. coli; HP, H. pylori; PA, P. aeruginosa; RS, R. solanacearum; ST, S. typhi; and

XF, X. fastidiosa.

9724 www.pnas.orgcgidoi10.1073pnas.0400975101 Boussau et al.

Page 222: Early Evolution and Phylogeny

by highly dynamic plasmids that introduced novel genes byduplication andor genesis, some of which were maintainedselectively in response to the increased use of soil compoundsand the refined interactions with the progenitors of modernplant cells.

Extreme reductions of size occurred twice independently: inthe ancestor of the obligate intracellular lineages Rickettsia andWolbachia and in the ancestor of the facultative intracellularlineages Bartonella and Brucella. These losses have largelyaffected protein families for transcription regulation, transport,and metabolism of amino acids, nucleotides, carbohydrates,lipids, and other small molecules. Particularly notable is theindependent loss of genes involved in secretory pathways, pilusassembly, and flagellar biosynthesis. The loss of genes associatedwith the transition from interactions with plants to animals in theancestor of Bartonella and Brucella was not balanced by acorresponding gain of genes; no genes have homologs solely inBartonella and Brucella (E 0.001).

The number of genes eliminated before the split of Rickettsiaand Wolbachia was estimated to 2,300–3,800 genes, as comparedwith 200–700 lost genes per lineage after the split (Fig. 3). Theinverse correlation between gene loss and branch lengths for thispart of the tree (compare Figs. 2 and 3) makes the lowerfrequency of gene-elimination events in recent times all the morestriking. On average, the ratio of deletions to nucleotide substi-tutions was 25-fold higher before the split of Rickettsia andWolbachia. A high frequency of gene loss relative to nucleotidesubstitutions was also observed immediately before the emer-gence of the intracellular lineages Bartonella and Brucella, whichis reminiscent of the more rapid loss of genes at an early stageof genome reduction in aphid endosymbiont lineages, followedby genomic stasis (17). Overall, we observed no correlationbetween frequencies of amino acid substitutions and gene loss (r2

0.14), gene duplication (r2 0.02), or gene genesis (r2 0.05),indicating dramatically different fixation rates for these muta-tions in the different lineages over time.

Gene Flux on Chromosomes and Auxiliary Replicons. Many species inthe Rhizobiales contain auxiliary chromosomes (Table 1) thatare characterized by less gene synteny than the main chromo-somes (Fig. 6). To quantify the differences in mutational ratesand patterns for genes located on different replicons, we inferredancestral proteomes separately for COGs assigned to the aux-

ba

Fig. 3. Inference of deletionsduplications and gene-genesis events based on the -proteobacterial tree was made by using different clustering levels and

penalty values. The inference was based on proteins already classified in COGs (23) to which we added COGs containing proteins in three or more species internally

related by best hits (58,171 proteins in total) (a) and the complete set of proteins (73,658 proteins in total) (b). Inference of gene contents was made by using

the ACCTRAN option for parsimony analysis in PAUP* with penalties for duplication, deletion, and gene genesis set to 1, 1, and 5, respectively. Numbers along branches

refer to the number of duplicationslossesgenesis, respectively. Numbers at nodes refer to the putative number of genes in the inferred genome at the node.

Outgroup sequences are as described for Fig. 2, but they were pruned from the tree shown here. Abbreviations for species names are as described in the legends

to Figs. 1 and 2.

Fig. 4. Net gene loss or gain throughout the evolution of the -proteobac-

terial species. Arrows pointing upward indicate net gains of genes (G), and

arrows pointing downward indicate net losses of genes (L). Colors and sizes of

arrows refer to the net number of genes gained or lost at each branch. Colors

of circles refer to the relative fraction of genes assigned to the different

functional groups in the modern and inferred genome at the node. Yellow,

information storage and processing; green, metabolism; red, cellular pro-

cesses; blue, poorly characterized. Clustering groups and estimated frequen-

cies are as described for Fig. 3a. Abbreviations for species names are as

described in the legends to Figs. 1 and 2.

Boussau et al. PNAS June 29, 2004 vol. 101 no. 26 9725

EV

OLU

TIO

N

Page 223: Early Evolution and Phylogeny

iliary replicons (mega-COG) versus those assigned to the mainchromosomes (main-COG). We classified a COG as a mega-COG if 30% of its protein members were encoded on anauxiliary replicon in A. tumefaciens, Brucella spp., S. meliloti, oron the symbiosis islands in M. loti and B. japonicum. In total, weclassified 13% of the COGs as mega-COGs, which correspondsto 2,349 COGs (8,662 proteins) out of the complete set of 17,669COGs (73,658 proteins) included in the analysis.

The results showed that 20–24% of the losses that occurredimmediately before the BartonellaBrucella divergence was as-sociated with mega-COGs (Fig. 8, which is published as sup-porting information on the PNAS web site). Likewise, a sub-stantial fraction of the identified duplications involved proteinsin mega-COG families, as observed for example on the branchleading to the Rhizobiales (23%) and also on the branchseparating these from R. palustris and B. japonicum (55%). In theterminal branches for S. meliloti and A. tumefaciens, all threetypes of mutational events were frequent for proteins classifiedin the mega-COG family, including 30% of duplications, 25% oflosses, and 60% of gene-genesis events. Overall, mega-COGsaccounted for 21% of changes below the -proteobacterialancestor. Considering that the mega-COGs only account for13% of all COGs, the relative frequencies of deletions, dupli-cations, and gene genesis was considerably higher for proteinsclassified in these families. We speculate that the auxiliaryreplicons were derived from plasmids that expanded by reiter-ative processes of duplicationdeletion and horizontal gene-transfer events in the Rhizobiales.

Inferred Metabolism of the -Proteobacterial Ancestor. Our pathwayanalysis of the core ancestral gene set identified in all theanalyses (Table 5, which is published as supporting informationon the PNAS web site) suggests that it contained genes forglycolysis and a complete system for aerobic respiration, asexpected for a unicellular organism that was well adapted to theaerobic environment. Notable was its broad biosynthetic capa-bility and the presence of multiple genes for regulatory andtransport functions. The analysis further identified genes forflagellar biosynthesis and type III and type IV secretion systems.Thus, the ancestor was probably a free-living, aerobic, and motilebacterium that had evolved elaborate communication mecha-nisms with other cells. Also present in the ancestor were genesfor phage-related functions; however, these genes may incor-rectly have been assigned to the ancestor because of multipleindependent acquisitions of phage genes by horizontal genetransfer in some of the derived lineages.

A comparison of the -proteobacterial ancestral genome withthe gene content of the LUCA identified a small set of genesinferred to be present in the LUCA (13) but absent from ourancestral set. The number and identity of such genes depend onpenalty values, but even for the highest penalty values it wasobserved that a set of genes, including those for homoserinekinase, uridine kinase, endonuclease IV, and glutamyl-tRNAreductase, were predicted to be present in the LUCA but wereabsent from the -proteobacterial ancestor. These might havebeen lost before the divergence of the -proteobacterial ancestoror, alternatively, been incorrectly assigned to the LUCA.

Comparing the -Proteobacterial Ancestor with the Mitochondrial

Ancestor. The endosymbiotic theory postulates that mitochondriaevolved by massive gene loss and transfer of genes from thecommon ancestor to the nuclear genome of the host cell. A totalof 630 orthologous groups display a close phylogenetic relation-ship between eukaryotes and -proteobacteria (15). These rep-resent a minimal estimate of the protomitochondrial proteome,because some gene transfers may have been missed because ofweak phylogenetic signals and others may have been lost fromthe eukaryotic genomes included in the analysis. We compared

the 630 -proteobacterial gene groups with the set of COGsinferred to be putatively present in the -proteobacterial ances-tor. The protomitochondrial set includes 487 genes in 412COG-associated groups (15), all of which belong to the 3,300genes in the 3,100 COGs of our ancestor (Fig. 3a). Of the 143protomitochondrial groups not associated with a COG, 92 arerepresented in the ancestral gene pool. Most of the 51 groupsmissing from our data set consists of hypothetical proteins orproteins with unknown functions.

Phylogenetic analyses of rRNA sequences, protein subunits ofthe respiratory chain complexes, and concatenated proteinalignment suggest that mitochondria evolved from the -proteobacteria, with no evidence for multiple independent ac-quisitions (12, 13, 30–32). Although several studies have placedmitochondria as a deeply diverging sister clade near to theRickettsiales (30–32), the exact position is still debated. Here, weconsider the gene set of the reconstructed -proteobacterialancestor as an upper limit of the protomitochondrial proteome.To estimate how many of these ancestral genes may, at the most,have been transferred to the host nuclear genome, we selectedthe complete set of COGs present in the -proteobacterialancestor and used them as queries in sequence-similaritysearches against eukaryotic genomes. As expected, the numberof COGs showing significant sequence similarity to eukaryoticgenes decreased with increasing BLAST scores from 1,700(score 50) to 850 (score 150) (Fig. 5). The remaining 1,144ancestral COGs without eukaryotic homologs (score 40) rep-resent putative gene losses. The genes in these COGs display abroad taxonomic distribution in bacteria (data not shown), andsurprisingly many (45%) encode proteins of unknown orpoorly characterized function (Table 2). Future functional anal-yses of these genes may provide the answers as to why these geneswere not transferred to the eukaryotes.

Concluding Remarks

This study represents an attempt to quantify the differentmutational changes that underlie genome size alterations in the-proteobacteria. We observed no correlation between nucleo-tide substitution rates and fixation rates for mutations that affectgenome contents. On the contrary, our results strongly suggestthat the inferred frequencies of deletions, duplications, andhorizontal gene transfers depend on population sizes and bac-terial lifestyle features. In particular, the data support the

Fig. 5. Number of COGs in the -proteobacterial ancestor (Fig. 3a) with

sequence similarity to eukaryotic genes for different BLAST score values. Esti-

mated number of COGs that shows similarity to eukaryotic genes in the

inferred proteomes of the -proteobacterial ancestor (upper curve) and the

minimal protomitochondrial ancestor (lower curve) (15).

9726 www.pnas.orgcgidoi10.1073pnas.0400975101 Boussau et al.

Page 224: Early Evolution and Phylogeny

suggested correlation between transitions to intracellular growthhabitats and genome size reductions, with the highest frequen-cies of gene loss at early stages of the transition (17).

The stability of the main chromosomes of the Rhizobiales,displayed as segments with conserved gene synteny, contrastswith otherwise high substation rates and extensive gene-contentdifferences. Expansions and contractions in the genomic reper-toire have mostly affected genes involved in environmentalinteractions; these typically are located on the auxiliary repli-chores and evolve by very high turnover rates. It is possible that

we have underestimated these rates at the internal branches ofthe tree because of multiple insertiondeletion events. Highintrinsic rates for duplicationsdeletions and horizontal genetransfers may serve as an efficient mutational engine thatenables rapid responses to alterations in the environmentalconditions when subjected to strong selective pressures.

Although the estimated frequencies of duplication and gene-genesis events depend on the penalties assigned to these events,our study clearly demonstrates the importance of gene duplica-tions for expanding and diversifying the metabolic and regula-tory capacities of the bacterial cell. A consequence of highduplication and deletion rates is that the number of paralogousproteins may be much larger than previously anticipated. Ineffect, the many different protein variants do not necessarilytrace back to one ancestral giant gene pool but may have arisenthroughout evolution via reiterative processes of duplication andloss. The continuous generation of novel paralogs may provideone explanation for the difficulty to obtain congruent single genetrees in phylogenomic surveys (1–5).

Computational inference of ancestral genomes with refinedmodels that account for the relative frequencies of the differenttypes of mutational events in the different lineages will providemore detailed scenarios of genome size evolution in the -proteobacteria and other bacterial subdivisions.

This research was supported by grants from the Swedish ResearchCouncil, the Swedish Foundation for Strategic Research, and the Wal-lenberg Foundation.

1. Doolittle, W. F. (1999) Science 284, 2124–2128.2. Snel, B., Bork, P. & Huynen, M. (1999) Nat. Genet. 21, 108–110.3. Sicheritz-Ponten, T. & Andersson, S. G. E. (2001) Nucleic Acids Res. 29,

545–552.4. Kurland, C. G., Canback, B. & Berg, O. G. (2003) Proc. Natl. Acad. Sci. USA

100, 9658–9662.5. Daubin, V., Moran, N. A. & Ochman, H. (2003) Science 301, 829–832.6. Andersson, S. G. E., Zomorodipour, A., Andersson, J. O., Sicheritz-Ponten, T.,

Alsmark, U. C. M., Podowski, R. M., Naslund, K., Eriksson, A.-S., Winkler,H. H. & Kurland, C. G. (1998) Nature 396, 133–140.

7. Ogata, H., Audic, S., Renesto-Audiffren, P., Fournier, P. E., Barbe, V.,Samson, D., Roux, V., Cossart, P., Weissenbach, J., Claverie, J. M. & Raoult,D. (2001) Science 293, 2093–2098.

8. Goodner, B., Hinkle, G., Gattung, S., Miller, N., Blanchard, M., Qurollo, B.,Goldman, B. S., Cao, Y., Askenazi, M., Halling, C., et al. (2001) Science 294,

2323–2328.9. Wood, D. W., Setubal, J. C., Kaul, R., Monks, D. E., Kitajima, J. P., Okura,

V. K., Zhou, Y., Chen, L., Wood, G. E., Almeida, N. F., Jr., et al. (2001) Science

294, 2317–2322.10. Galibert, F., Finan, T. M., Long, S. R., Puhler, A., Abola, P., Ampe, F.,

Barloy-Hubler, F., Barnett, M. J., Becker, A., Boistard, P., et al. (2001) Science

293, 668–672.11. Kaneko, T., Nakamura, Y., Sato, S., Minamisawa, K., Uchiumi, T., Sasamoto,

S., Watanabe, A., Idesawa, K., Iriguchi, M., Kawashima, K., et al. (2002) DNA

Res. 9, 189–197.12. Wu, M., Sun, L. V., Vamathevan, J., Riegler, M., Deboy, R., Brownlie, J. C.,

McGraw, E. A., Martin, W., Esser, C., Ahmadinejad, N., et al. (2004) PLoS Biol.2, 327–341.

13. Gray, M., Burger, G. & Lang, B. F. (1999) Science 283, 1476–1481.14. Karlberg, O. & Andersson, S. G. E. (2003) Nat. Rev. Genet. 4, 391–397.15. Gabaldon, T. & Huynen, M. A. (2003) Science 301, 609.16. Karlberg, E. O., Canback, B., Kurland, C. G. & Andersson, S. G. E. (2000) Yeast

17, 170–187.17. Tamas, I., Klasson, L., Canback, B., Naslund, A. K., Eriksson, A.-S.,

Wernegreen, J. J., Sandstrom, J. P., Moran, N. A. & Andersson, S. G. E. (2002)Science 296, 2376–2379.

18. Snel, B., Bork, P. & Huynen, M. (2002) Genome Res. 12, 17–25.19. Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. (2003) BMC Evol.

Biol. 3, 2.20. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W.

& Liplman, D. J. (1997) Nucleic Acids Res. 25, 3389–3402.21. Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22,

4673–4680.22. Guindon, S. & Gascuel, O. (2003) Syst. Biol. 52, 696–704.23. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. (1997) Science 278, 631–637.24. Swofford, D. L. (1998) Phylogenetic Analysis Using Parsimony (PAUP) (Si-

nauer, Sunderland, MA), Version 4.0b10.25. Nimwegen, E. (2003) Trends Genet. 19, 479–484.26. Konstantinidis, K. T. & Tiedje, J. M. (2004) Proc. Natl. Acad. Sci. USA 101,

3160–3165.27. Klasson, L. & Andersson, S. G. E. (2004) Trends Microbiol. 12, 37–43.28. Koonin, E. V. (2000) Annu. Rev. Genomics Hum. Genet. 1, 99–116.29. Kobayashi, K. (2003) Proc. Natl. Acad. Sci. USA 100, 4678–4683.30. Olsen, G. J., Woese, C. R. & Overbeek, R. (1994) J. Bacteriol. 176, 1–6.31. Viale, A. & Arakaki, A. K. (1994) FEBS Lett. 341, 146–151.32. Emelyanov, V. (2003) Arch. Biochem. Biophys. 420, 130–141.

Table 2. Relative fraction of COGs in the -proteobacterial

ancestor (Fig. 3b) sorted according to broad

functional categories

Functional category Hom* Min† Hom*

Cellular processes 17 15 12

Information processes 15 15 6

Metabolism 45 53 14

Poorly characterized 20 17 45

New clusters‡ 3 0 23

*Values are percentages of COGs in the -proteobacterial ancestor with

homologs (score 50) (Hom) and without homologs (score 40) (Hom) in

eukaryotic genomes.†Values are percentages of COGs in the minimal (Min) protomitochondrial

genome (15) with homologs in eukaryotic genomes (score 50).‡Uncategorized clusters created in this analysis.

Boussau et al. PNAS June 29, 2004 vol. 101 no. 26 9727

EV

OLU

TIO

N

Page 225: Early Evolution and Phylogeny

♦t♦♥ Pr♦♦♥ t P②♦é♥

r♥t tt tès ♠ ss ♥térssé à é♦t♦♥ ♣r♦♦♥ ♥t ♣s r♥r ♥êtr ♦♠♠♥ ♥rs ❯ sq① ♥êtrs s tr♦s r♥sr♦②♠s s rés s térs t s r②♦ts ♥♦t♠♠♥t ré à♣r qqs ♦r♥s♠s ♥s rr ts q térq① ♦s t ré ♥r♠ s②♠♦s♠ t é♠♥t été é♦t♦♥ st♠♣értrs r♦ss♥ ② ♣srs ♠rs ♥♥és P♦r r é♦♣♣é s ♦rt♠s ♥ r♦♥strr é♦t♦♥ séq♥s é♥qs♣s tsé s séq♥s ♣♦r ♣rér s t♠♣értrs ♦♣t♠s r♦ss♥s ♦r♥s♠s ♦r ét♥ts s ♦ès t ♠♦♠ê♠ st♠♦♥s q❯ ♥ t ♣s à très t t♠♣értr ♠s q ss rts s♥♥tss ♥êtrs s térs t r♦♣ ♦♠♣r♥♥t s rés t s r②♦ts♥t ♥s s ♥r♦♥♥♠♥ts ♣s s s♥ q s ① ♥és♥♥t ❯ ♦♥t s ♠ê♠ t②♣ é♦t♦♥ ♥ ♣rè q ♣♦rrt♦r été sé ♣r ♥ s t ♠ê♠ ♣rss♦♥ sét♦♥ tt ♣rss♦♥♣♦rrt êtr réstt ♥ ♥t♥s ♦♠r♠♥t ♠été♦rtq ② ♠rs ♥♥és t ♦r été ♦♠♣♥é ♥ ♥♠♥t ♣s ♥ é♥♦♠ à ♣♦r ❯ rs s é♥♦♠s à ♣♦r ss s♥♥ts ♥st ♥s ♥é s térs s t♠♣értrs ♦♣t♠s r♦ss♥ ♦♥t té q♣♦rrt ♦rrs♣♦♥r à é♦t♦♥ t♠♣értr s ♦é♥s ♦rs s r♥rs ♠rs ♥♥és

r② ♦t♦♥ ♥ P②♦♥②

r♥ ts tss st t r② ♦t♦♥ ♦ r♦♠ t st ❯♥rs ♦♠♠♦♥ ♥st♦r ❯ t♦ t ♥st♦rs ♦ t tr ♥♦♠s rtr ♥ r② ♦t② tt♠♣t t♦ ♣ ♦r♥s♠s ♥ ttr ♦ ♥♠② t tr q① ♦s ♥ t r ♥r♠

s②♠♦s♠ ♥ s♦ st t ♦t♦♥ ♦ ♦♣t♠ r♦t t♠♣rtrs ♦rt st ♦r ♦♥ ②rs ♦ ts ♥ ♦♣ ♦rt♠s t♦ r♦♥strt ♥str ♥ sq♥s ♥ s ts sq♥s t♦ ♣rt t ♦♣t♠ r♦tt♠♣rtrs ♦ ♥♦①t♥t ♦r♥s♠s ② ♦s ♥ st♠t tt ❯ ♥♦t ♥ r② ♦t ♥r♦♥♠♥t t tt ts s♥♥ts t ♥st♦rs ♦tr ♥ ♦ t r♦♣ ♦♥t♥♥ r ♥ r② ♦t t rt♠♣rtrs s ♠♣s tt t t♦ ♥s s♥♥ r♦♠ ❯ ♥r♥t t s♠ ♥ ♦ ♦t♦♥ ♥ ♣r ♣r♣s s ② t s♠ ♥qst♦♥ ♣rssr s ♣rssr ♠② rst r♦♠ ♥ ♥t♥s ♠t♦rt♦♠r♠♥t ♦♥ ②rs ♦ ♥ ♥ ♦♠♣♥ ② t tr♥st♦♥r♦♠ ♥ ♥♦♠ ♥ ❯ t♦ ♥♦♠s ♥ ts s♥♥ts sq♥t② ♥ t tr ♥ ♦♣t♠ r♦t t♠♣rtr r♦♣♣ ♠② ♦rrs♣♦♥ t♦ t ♦t♦♥ ♦ ♦♥ t♠♣rtrs ♥ t st ♦♥②rs