Module 4: Introduc/on to concepts and methods in molecular ... · Module 4: Introduc/on to concepts...
Transcript of Module 4: Introduc/on to concepts and methods in molecular ... · Module 4: Introduc/on to concepts...
Module4:Introduc/ontoconceptsandmethodsinmolecularphylogene/cs
CélinePouxLaboratoireEEP–UMR8198
UniversitédeLille
Introduc/ontobioinforma/cs
SamuelBlanquartInria
LaboratoireCRIStAL–UMR9189UniversitédeLille
Introduc/ontobioinforma/cs
Part1:Introduc/on–PreparingthedatasetPart2:Phylogene/creconstruc/on–MLphylogene/creconstruc/onPart3:Reconstruc/onbiasesPart4:Phylogene/creconstruc/on–Bayesianreconstruc/on
Part5:Molecularda/ng–Bayesianda/ng
Reconstruc/onbiases
SignalOrthologouscharacters
Data
NoiseWheredoesitcomefrom?
POORSIGNAL LOTOFNOISEWRONGTREE
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases
1. Stochas7cerrors: Samplingerrors
2. Systema7cerrors:
Methodologicalerrors
3. Biologicalerrors: Genestreesandspeciestreesarediscordant
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thestochas/cerrors
Rokasetal.2003
Treereliability
106orthologousgenesfrom8yeastgenomes
Stochas7cerrorsaresamplingerrors
Thesamplingsizeoftheanalyzedcharactersistosmall
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thestochas/cerrors
Rokasetal.2003
106genesdataset
12
34
5Nod
e3
Nod
e5
Treereliability
Stochas7cerrorsaresamplingerrors
Reconstruc/onbiases:thestochas/cerrors
Data Alignment&cleaning TreebuildingEvolu/onarymodels BiasesDelsucetal.2005
Methodsforphylogenomicinference
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thestochas/cerrors
=>Addingmoresequences–phylogenomics–isnotalwaysenoughtoresolveinconsistences
Philippeetal2011
Treereliability
Stochas7cerrorsaresamplingerrors
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
Systema7cerrors:themethodofinferenceisinconsistent
ü Thehypothesesofthereconstruc/onmethodsareviolatedü Themodelsofsequencesevolu/onarenotaccurate.⇒ Mul/plessubs/tu/ons(homoplasy)cangoundetectedorbewrongly
inferred.
• Ratevaria/onamonglineage =>Longbranchaarac/on
• Ratevaria/onamongsites =>Satura/onatsomesites
• Heterogeneityofnucleo/decomposi/onamongspecies =>Composi/onalbiases
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
LongBranchAarac/on(LBA)
Yang&Rannala2012
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
Philippeetal.2007
MP11959AAsites
Treereliability
Tunicates
LongBranchAarac/on(LBA)
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
Philippeetal.2007
MP11959AAsites
Platyhelminthes
Tunicates
Treereliability
LongBranchAarac/on(LBA)
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
CATmixturemodel
TreereliabilityPhilippeetal.2007
LongBranchAarac/on(LBA)
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
Composi/onalbiases&Satura/on
Jeffroyetal.2006
Treereliability
BInt3 BInt
BInt12BIAA
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:thesystema/cerrors
Treereliability
Composi/onalbiases&Satura/on
(A) Correctphylogeny.
(B) Classicalreconstruc/onar/factwheremesophilicbacteriawithsimilarGCcontentclusterinthetree.
ARN16S
BlanquartetLar/llot2006
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:biologicalerrors
Genesphylogeny=Speciesphylogeny
Isthishypothesiscorrect?
Treereliability
@EmmanuelDouzery
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Genesduplica/on
Pholetal.2009
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Genesduplica/on
Pholetal.2009
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Genesduplica/on
Treereliability@GuyPerrière
3.6.Orthologieetparalogie
Duplica/onSpecia/on
CopyA CopyB
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Genesduplica/on
Treereliability
Truephylogeny
@GuyPerrière
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Genesduplica/on
Treereliability
3.6.Orthologieetparalogie
Truephylogeny
@GuyPerrière
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Genesduplica/on
Treereliability
3.6.Orthologieetparalogie
Truephylogeny
Reconstructedphylogenyphylogeny
@GuyPerrière
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Horizontalgenetransfer
CoverofMMBR,December2009
Treereliability
Reconstruc/onbiases:Horizontalgenetransfer
• Horizontalgenestransfer:genestransmissionbetweendifferenttaxa.
• Phenomenonfrequentbetweenprokaryotes.
• Itimpliesvariousmechanisms:–Transforma/on,–Conjuga/on,–Transduc/on.17.6%ofthegenesE.coliwouldhavebeenacquiredbytransfer.
Data Alignment&cleaning TreebuildingEvolu/onarymodels BiasesTreereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:horizontalgenetransfer
Calteauetal.2004
Treereliability
BacteriaArchea
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
AdaptedfromLeliaertetal.2014
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels BiasesAdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Spe.A SpeB SpeC
Genetree=Speciestree
AdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Spe.A SpeB SpeC
Genetree≠Speciestree
AdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels BiasesAdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels BiasesAdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
BC
ABC
AdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
AB
ABC
AdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
AB
ABC
AdaptedfromLeliaertetal.2014
Reconstruc/onbiases:Incompletelineagesor/ng&Ancestralpolymorphism
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Reconstruc/onbiases:Hybridiza/on&Genomeduplica/on
Macet-Houben&Gabaldon2015
Treereliability
Data Alignment&cleaning TreebuildingEvolu/onarymodels Branchsupports BiasesMacet-Houben&Gabaldon2015
Reconstruc/onbiases:Hybridiza/on&Genomeduplica/on
Concludingremarks
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
• Thephylogene7csignaliscarriedbyorthologsequencesdisplayingsynapomorphiescharacteris/cofasharedevolu/vehistory.Thissignalincreaseswiththenumberofanalyzedcharacter.
• Thenon-phylogene7csignalcanbedueto:• Misalignedsequences,• Saturatedsequences,• GCcontentbiases,• Simplis/corinappropriatemodelsofsequenceevolu/on,• Comparisonofparalog,xenologorohnologsequences• Incompletelineagesor/ng&ancestralpolymorphism
Treereliability
Concludingremarks
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Strongphylogene/csignalWeaknon-phylogene/csignal
Weakphylogene/csignalWeaknon-phylogene/csignal
Weakphylogene/csignalStrongnon-phylogene/csignal
Correcttopology Resultsarenonsignificant Arctefactualtopologyrobustlyinferred
Treereliability
Reconstruc/onbiases
Data Alignment&cleaning TreebuildingEvolu/onarymodels Biases
Advises:Ingeneral:ü Usesinglecopygenes,ü Checkincongruencesofindividualgenetreeswithspeciestree.
Fordeep/mephylogenies:ü Checkthecomposi/onalbiases(Phylobayes)=>recodethedata,ü Checkforhomoplasy(Phylobayes)=>removefastevolvingsites,ü Includemorespecies(slowlyevolvingsequences,closelyrelatedoutgroup),
Fordeepcoalescenceproblems:ü Mul/speciescoalescentmodels(*Beast)Forgeneduplica/onandlateralgenetransfer:ü Phylogene/creconcilia/on(Phyldog)
Treereliability