JamesDegnanUniversityofCanterbury11/1/11
(JointworkwithElizabethAllmanandJohnRhodes)
Thanksto
1.BackgroundA.genetreesvs.speciestreesB.coalescenceandincompletelineagesorting
2.Rootedgenetreeprobabilitiesaspolynomials
3.Unrootedgenetreeprobabilities
Populationgenetics:traditionallyusedtoanalyzesinglepopulations.
Phylogenetics:Whatisthebestwaytoinferrelationshipsbetweenpopulations/species?
Graphic by Mark A. Klinger, Carnegie Museum of Natural History, Pittsburgh
Past
Present
Present
Past
Incompletelineagesorting
ABCDThegenetreeisarandomvariable.Thegenetreedistributionisparameterizedbythespeciestreetopologyandinternalbranchlengths.
Usingtransformedbranchlengths,
genetreeprobabilitiescanbewrittenaslinearcombinationsofmonomials
wherenisthenumberoftips.€
X1 = e−t1 , X2 = e− t2 , etc.
€
X1α1X2
α2Xn−2αn−2
€
x
€
y
Probability:
€
19 p2,2(x)p3,2(y) = 1
6 XY − 16 XY
3
History Probabilityh1:(1,2,3) h2:(1,3,3)h3:(2,2,3)h4:(2,3,3)h5:(3,3,3)
Total
€
(1− X)(1−Y )
€
13 (1− X)Y
€
13 X(1− 3
2Y + 12Y
3)€
€
16 XY − 1
6 XY3
€
118 XY
3
€
1− 23 X − 2
3Y + 13 XY + 1
18 XY3
Giventhesetofgenetreeprobabilities,canthespeciestreeberecovered?
Inmanycases,thehighestprobabilitygenetreehasthesametopologyasthespeciestree,butnotalways.
Themostlikelytripleforanysetofthreetaxaisarootedtripleonthespeciestree,sothespeciestreecanberecoveredbymarginalizinggenetreestotheirrootedtriples.
Toinferrootedgenetrees,youneedmolecularclockoroutgroup;otherwiseonlyunrootedgenetreescanbeinferred
Speciestreemethodsusingthecoalescentcurrentlyassumerootedtrees–canspeciestreesbeinferredusingunrootedgenetrees?
Theprobabilityofanunrootedgenetreeisthesumoftheprobabilitiesofallgenetreeswiththesameunrootedtopology
Pr[]=P[(((AB)C)D)] +P[(((AB)D)C)]
+P[(((CD)A)B)]+P[(((CD)B)A)]+P[((AB)(CD))]
AB D
C
Probabilitiesofunrootedgenetreesarelinearcombinationsofprobabilitiesofrootedgenetrees.
Expressionsforprobabilitiesofunrootedgenetreesareoftensimplerthanrootedgenetreeprobabilitiesforthesamenumberofspecies.
UnrootedGeneTreesProbability
AB
AC
B
A BD
CD
D
C€
1− 23 X
€
13 X
€
13 X
UnrootedGeneTreesProbability
AB
AC
B
A BD
CD
D
C€
1− 23 XY
€
13 XY
€
13 XY
Ifthespeciestreetopologyisknown,theunrootedgenetreedistributiononlyhasinformationaboutoneinternaledge(orsumofedges).
Ifthespeciestreeisunknown,theunrootedgenetreedistributiononlyidentifiestheunrootedspeciestreetopology.Thefollowingspeciestreesinducethesameunrootedgenetreedistribution:
15unrootedtopologies 3rootedspeciestreesshapes
Caterpillar Balanced Pseudocaterpillar
Whathappenswithfivetaxa?
• 15unrootedtopologies• 3rootedspeciestreesshapes
Caterpillar Balanced Pseudocaterpillar
Givenadistributionof155‐taxonunrootedgenetreeprobabilities,whatinformationcanwerecoveraboutthespeciestree?(1) Unrootedspeciestreetopology?(2) Rootedspeciestreetopology?(3) Branchlengths?
A
B C
D
E
€
T1
probability
A
AB
B
CC
D
D
E
E
€
T1
€
T2
probability
A
B
C
DE
€
T3
A
A
A
A
A
B
B
BB
B
CC
C
C
D
D
DD
D
E
E
E
E
E
C
€
T1
€
T2
€
T4
€
T5
€
T7
probability
A
B
C
DE
€
T3
€
T13
€
T6
€
T9
€
T12
5others
(1)Puttingunrootedgenetreesthataretiedinprobabilityintoequivalenceclasses,thesizeoftheseclassesdependsontheunlabeled,rootedspeciestreetopologyonfivetaxa:
Caterpillarclasssizes:1,1,1,2,2,2,6Balancedclasssizes:1,2,2,4,6Pseudocaterpillarsizes:1,2,2,2,8
(2)Theunlabeled,rootedspeciestreetopologycanthereforebedeterminedfromtheclasssizes.
(3)Fromthethefour‐taxonresults,theunrootedspeciestreetopologycanbeidentifiedbydeterminingthemostprobablequartetforeachsubsetoffourspecies
Thereforeweknow(forfivetaxa):(i)labeled,unrootedspeciestreetopology(ii)unlabeled,rootedspeciestreetopology
(4)Thelabeled,rootedspeciestreetopologycanbedeterminedbyfurtherconsideringinvariantsorinequalitiesinunrootedgenetreeprobabilities.
Example:Giventheunrootedspeciestreeandgiventhatthespeciestreeisbalanced,Therootedspeciestreeisoneof:
A
BC
D
E
€
T1
A B C D E A B C D E
€
R1
€
R2
andimplydifferentinvariantsandinequalitiesfortheunrootedgenetreeprobabilities.
Under,isinthe6‐elementclass,and
Under,isinthe4‐elementclass,and
€
R1
€
R2
€
R1
€
T7
€
Pr(T7) < Pr(T5)
€
R2
€
T7
€
Pr(T7) > Pr(T5)
Similarargumentscanbeusedtoidentifythecaterpillarandpseudocaterpillar.Thusfive‐taxonrootedspeciestreesareidentifiablefromfive‐taxonunrootedgenetreeprobabilities.
Theresultsgeneralizeimmediatelytolargertrees:theunrootedgenetreedistributionforeachsubset
offivetaxacanbeobtainedbysummingovertreesthatdisplaythefive‐taxontree.
Thusallrootedquintetsonthespeciestreeareidentifiable.Therootedspeciestreetopologycanbeconstructedfromtherootedquintets.
Allbranchlengthsonfive‐taxonspeciestreescanberecovered.Example,(((a,b):x,c):y,(d,e):z)
Theorem.
(i)Theunrootedgenetreedistributiondeterminestherootedspeciestreeandbranchlengthswhenthereare5ormoretaxa.
(ii)Theunrootedgenetreedistributiongivenafour‐taxonspeciestreedeterminestheunrootedspeciestree,butnottherootedspecies.
(((AB)C)(DE))
((((AB)C)D)E)
(((AB)(CD))E)
‐‐Agenetreedistributionhasonlyn‐2parameters(speciestreebranchlengths)
‐‐Butthereare(2n‐3)!!genetreeprobabilities
‐‐Therearemanytiesingenetreeprobabilitiesamongstdifferentgenetrees
‐‐Thereareotherlinearconstraints.Forthespeciestree(((AB)C)D),wehave
Pr[((AB)(CD))]–Pr[(((AB)D)C)]–Pr[(((AD)B)C)]=0
‐‐Howmanylinearconstraintsarethere?Howdoesthenumberoflinearconstraintsdependonthenumberoftaxaandspeciestreetopology?
‐‐Whataresomenonlinearconstraints?
Probabilitiesofcoalescenthistories,andthereforeofgenetrees,arepolynomialsinthetransformedbranchlengths.Anexamplepolynomialconstraint: