Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
-
Upload
rodney-jennings -
Category
Documents
-
view
219 -
download
0
description
Transcript of Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
![Page 1: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/1.jpg)
Molecular EvolutionDistance Methods
Biol. Luis Delaye
Facultad de Ciencias, UNAM
![Page 2: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/2.jpg)
ab
Mainly a STATISTICAL problem!
![Page 3: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/3.jpg)
a) Models of sequence evolution
b) Sequence similarity
c) Estimating the number of substitutions between two sequences
d) Phylogenetic reconstruction
![Page 4: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/4.jpg)
Evolution at the molecular level is the substitution of one allele by another
0
1
frequency
time
1/
The basic forces are: mutation, genetic drift and natural selection
Allele A Allele B Allele C
![Page 5: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/5.jpg)
By this process, a DNA sequence accumulates substitutions through time
ATCGCATCC
ATTGCGTAC
TAGCGTAGG
TAACCCATG
t
![Page 6: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/6.jpg)
In the study of molecular evolution, this changes in a DNA sequence are used for both:
Estimating the rate of molecular evolution
Reconstructing the evolutionary history
![Page 7: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/7.jpg)
Models of sequence evolution
![Page 8: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/8.jpg)
Models of DNA evolution
A C
To study the dynamics of nucleotide substitution we must made assumptions regarding the probability (p) of substitution of one nucleotide by another at the end of time interval t
pt
![Page 9: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/9.jpg)
pAC
For instance, PAC represents the probability that a site that has started with nucleotide i (A in this case) change to nucleotide j (C in this case) at the end of interval t
![Page 10: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/10.jpg)
Models of DNA evolution using matrix theory
PAA PAC PAG PAT
PCA PCC PCG PCT
PGA PGC PGG PGT
PTA PTC PTG PTT
Pt =
Substitution probability matrix
f = [fA fC fG fT]
Base composition of sequences
![Page 11: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/11.jpg)
The Jukes and Cantor’s One-Parameter Model
A G
C T
![Page 12: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/12.jpg)
*
*
*
*
Pt =
Substitution probability matrix
f = [ ¼ ¼ ¼ ¼ ]
Base composition of sequences
The Jukes and Cantor’s One-Parameter Model
* pii = 1 - ji pij
![Page 13: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/13.jpg)
A
The Jukes and Cantor’s One-Parameter Model
t = 0 t = 1A
pA(0) = 1 pA(1) = 1 - 3
Since we started whit A
The probability that the nucleotide has
remained unchanged
What is the probability of having an A in a site in a DNA sequence at time t =1, in a site that started
whit an A at time t = 0 ?
![Page 14: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/14.jpg)
The Jukes and Cantor’s One-Parameter Model
What is the probability of having an A in a site in a DNA sequence at time t = 2?
A
A
A
A
Not A
A
t = 0
t = 1
t = 2
Scenario 1 Scenario 2
No substitution Substitution
No substitution Substitution
(After Li, 1997)
![Page 15: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/15.jpg)
The Jukes and Cantor’s One-Parameter Model
What is the probability of having an A in a site in a DNA sequence at time t = 2?
A
A
A
A
Not A
A
t = 0
t = 1
t = 2
Scenario 1 Scenario 2
pA(1) = (1 - 3) [1 - pA(1)]
(1 - 3)
(After Li, 1997)
![Page 16: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/16.jpg)
The Jukes and Cantor’s One-Parameter Model
What is the probability of having an A in a site in a DNA sequence at time t = 2?
A
A
A
A
Not A
A
t = 0
t = 1
t = 2
Scenario 1 Scenario 2
pA(1) [1 - pA(1)]
(1 - 3)
(After Li, 1997)
+
![Page 17: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/17.jpg)
The Jukes and Cantor’s One-Parameter Model
What is the probability of having an A in a site in a DNA sequence at time t = 2?
pA(2) = (1 - 3) pA(1) + [1 - pA(1)]
The probability of not having a
substitution from t = 1 to t = 2
The probability of not having a
substitution from t = 0 to t = 1
The probability of having a
substitution from not A to A, from
t = 1 to t = 2
The probability of having a
substitution from A to not A, in
t = 0 to t = 1
The probability of no change The probability of reversible change
![Page 18: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/18.jpg)
The Jukes and Cantor’s One-Parameter Model
The following recurrence equation holds for any t:
pA(t + 1) = (1 - 3) pA(t) + [1 - pA(t)]
![Page 19: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/19.jpg)
The Jukes and Cantor’s One-Parameter Model
Rewriting this equation in terms of the amount of change:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
![Page 20: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/20.jpg)
The Jukes and Cantor’s One-Parameter Model
Doing some algebra:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
![Page 21: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/21.jpg)
The Jukes and Cantor’s One-Parameter Model
Doing some algebra:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
pA(t + 1) - pA(t) = pA(t) - 3pA(t) + [1 - pA(t)] - pA(t)
![Page 22: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/22.jpg)
The Jukes and Cantor’s One-Parameter Model
Doing some algebra:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
pA(t + 1) - pA(t) = pA(t) - 3pA(t) + [1 - pA(t)] - pA(t)
![Page 23: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/23.jpg)
The Jukes and Cantor’s One-Parameter Model
Doing some algebra:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
pA(t) = - 3pA(t) + [1 - pA(t)]
pA(t + 1) - pA(t) = pA(t) - 3pA(t) + [1 - pA(t)] - pA(t)
![Page 24: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/24.jpg)
The Jukes and Cantor’s One-Parameter Model
Doing some algebra:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
pA(t) = - 3pA(t) + [1 - pA(t)]
pA(t + 1) - pA(t) = pA(t) - 3pA(t) + [1 - pA(t)] - pA(t)
![Page 25: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/25.jpg)
The Jukes and Cantor’s One-Parameter Model
Doing some algebra:
pA(t + 1) - pA(t) = (1 - 3) pA(t) + [1 - pA(t)] - pA(t)
pA(t) = - 4pA(t) +
pA(t + 1) - pA(t) = pA(t) - 3pA(t) + [1 - pA(t)] - pA(t)
pA(t) = - 3pA(t) + [1 - pA(t)]
![Page 26: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/26.jpg)
Rewriting this equation for a continuous time model:
= - 4pA(t) + d pA(t)d t
The Jukes and Cantor’s One-Parameter Model
![Page 27: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/27.jpg)
Rewriting this equation for a continuous time model:
= - 4pA(t) + d pA(t)
d t
The Jukes and Cantor’s One-Parameter Model
pA(t) = ¼ + pA(0) - ¼ e -4t
The solution is given by:
![Page 28: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/28.jpg)
Since we started with A, pA(0) = 1
The Jukes and Cantor’s One-Parameter Model
An if we start with non A, pA(0) = 0
pA(t) = ¼ + 1 - ¼ e -4t = ¼ + ¾ e -4t
pA(t) = ¼ + 0 - ¼ e -4t = ¼ - ¼ e -4t
![Page 29: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/29.jpg)
The probability of initially having A, and still having A at time t is:
The Jukes and Cantor’s One-Parameter Model
The probability of initially having G, and then having A at time t is:
pAA(t) = ¼ + ¾ e -4t
pGA(t) = ¼ - ¼ e -4t
We can write the equations in a more explicit form:
![Page 30: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/30.jpg)
And since all nucleotides are equivalent under the JC model, pGA(t) = pCA(t) = pTA(t).
The Jukes and Cantor’s One-Parameter Model
pii(t) = ¼ + ¾ e -4t
pij(t) = ¼ - ¼ e -4t
where i j
![Page 31: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/31.jpg)
pA(t)
For instance, pA(t) can also be interpreted as the frequency of A in a DNA sequence. For example, if we start with a sequence made of A‘s only, then pA(0) = 1, and pA(t) is the expected frequency of A in the sequence at time t.
![Page 32: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/32.jpg)
Probability
Time (million years)
pii
pij
¼
The Jukes and Cantor’s One-Parameter Model
Temporal changes in the probability of having a certain nucleotide at a given nucleotide site ( = 5x10-9 substitutions/site/year).
0
1
20 40 60 80 100 120 140 160 180 200
![Page 33: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/33.jpg)
Other models of sequence evolution
![Page 34: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/34.jpg)
The Kimura two-Parameter Model
A G
C T
Transitions
Transitions
Transversions
![Page 35: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/35.jpg)
Base pair differences
Time since divergence (Myr)
Transitions
Transversions
The Kimura two-Parameter Model
Number of transition and transversions between pairs of bovid mammal mitochondrial sequences (684 base pairs from the COII gene) against the estimated time of divergence.
0 5 10 15 20 25
20
40
60
80
100
![Page 36: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/36.jpg)
*
*
*
*
Pt =
Substitution probability matrix
f = [ ¼ ¼ ¼ ¼ ]
Base composition of sequences
The Kimura two-Parameter Model
* pii = 1 - ji pij
![Page 37: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/37.jpg)
* C G T
A * G T
A C * T
A C G *
Pt =
Substitution probability matrix
f = [A C G T ]
Base composition of sequences
The Felsenstein (1981) Model
* pii = 1 - ji pij
This model assumes that there is variation in base composition
![Page 38: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/38.jpg)
* C G T
A * G T
A C * T
A C G *
Pt =
Substitution probability matrix
f = [A C G T ]
Base composition of sequences
The Hasegawa, Kishino and Yano (1985) Model
* pii = 1 - ji pij
This model assumes that there is variation in base composition and that transition and transversions occur at different rates.
![Page 39: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/39.jpg)
* C a G b T c
A a * G d T e
A b C d * T f
A c C e G f *
Pt =
Substitution probability matrix
f = [A C G T ]
Base composition of sequences
The General Reversible (REV) Model
* pii = 1 - ji pij
This model assumes that there is variation in base composition and that each substitution has its own probability.
![Page 40: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/40.jpg)
Comparing the Models
Jukes-Cantor
Allow for / bias Allow for base frequency to vary
Kimura 2 parameter Felsenstein (1981)
Allow for / biasAllow for base frequency to vary
Felsenstein (1981)
Allow all six pairs of substitutions to have different rates
General Reversible (REV)From Page and Holms (1998)
![Page 41: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/41.jpg)
Among site rate variation
![Page 42: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/42.jpg)
Among site rate variation
For protein coding sequences not all sites have the same probability of change (there is among site rate variation). If this effect is not taken into account, the number of substitutions per site between two sequences can be underestimated (Li and Graur, 1991).
![Page 43: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/43.jpg)
Effect of among site rate variation in sequence divergence
(A) Substitution rate of 0.5 % / M.a. and 80 % of the sites free to vary
(B) Substitution rate of 2 % / M.a. and 50 % of the sites free to vary
(Page and Holms, 1998)
![Page 44: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/44.jpg)
Gamma distribution
f(r) = [ba / (a)] e –br r a-1
where:
(a) = ∫0 e –t t a-1 dt
![Page 45: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/45.jpg)
The a shape parameter
![Page 46: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/46.jpg)
Time reversibility
![Page 47: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/47.jpg)
Time reversibility in the Jukes and Cantor’s One-Parameter Model
A
A A
t tpAA(t)pAA(t)
pAA(t)2
AA At = 0 t = 1 t = 2
pAA(t) pAA(t)
pAA(t)2
![Page 48: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/48.jpg)
Time reversibility in the Jukes and Cantor’s One-Parameter Model
A
A A
t tpAA(t)
![Page 49: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/49.jpg)
Time reversibility in the Jukes and Cantor’s One-Parameter Model
A
A A
t tpAA(t)pAA(t)
![Page 50: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/50.jpg)
Time reversibility in the Jukes and Cantor’s One-Parameter Model
A
A A
t tpAA(t)pAA(t)
pAA(t)2
![Page 51: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/51.jpg)
Time reversibility in the Jukes and Cantor’s One-Parameter Model
A substitution process is said to be time reversible if the probability of starting from nucleotide i and changing to nucleotide j in a time interval t is the same as the probability of starting from j and going backward to i in the same time duration.
pij(t) p = pji(t) p
![Page 52: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/52.jpg)
Sequence similarity between two sequences
![Page 53: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/53.jpg)
Divergence Between DNA sequences
Ancestral sequence
Sequence 1 Sequence 2
t t
![Page 54: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/54.jpg)
I(t)
The expected value of the proportion of identical nucleotides between the two sequences under study is equal to the probability, I(t), that the nucleotide at a given site at time t is the same in both sequences.
![Page 55: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/55.jpg)
Sequence Similarity
A
t t
![Page 56: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/56.jpg)
Sequence Similarity
A
A
t tpAA(t)
![Page 57: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/57.jpg)
Sequence Similarity
A
A A
t tpAA(t)pAA(t)
![Page 58: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/58.jpg)
Sequence Similarity
A
A A
t tpAA(t)pAA(t)
pAA(t)2
![Page 59: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/59.jpg)
Sequence Similarity
A
C C
t tpAC(t)pAC(t)
pAC(t)2
But for parallel substitutions.
![Page 60: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/60.jpg)
Sequence Similarity
A
G G
t tpAG(t)pAG(t)
pAG(t)2
But for parallel substitutions.
![Page 61: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/61.jpg)
Sequence Similarity
A
T T
t tpAT(t)pAT(t)
pAT(t)2
But for parallel substitutions.
![Page 62: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/62.jpg)
Sequence Similarity in the JC Model
Therefore,
I(t) = pAA(t)2
+ pAT(t) 2
+ pAC(t) 2
+ pAG(t) 2
And from the JC model,
I(t) = ¼ + ¾ e -8t
This equation also holds if the initial nucleotide was different from A, and represents the expected proportion of identical nucleotides between two sequences that diverged t time units ago
![Page 63: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/63.jpg)
Proportion of identical nucleotides
Time (million years)
¼
Sequence similarity in the Jukes and Cantor’s One-Parameter Model
Temporal changes in the expected proportion of identical nucleotides between two sequences that diverged t years ago ( = 5x10-9 substitutions/site/year).
0
1
20 40 60 80 100 120 140 160 180 200
![Page 64: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/64.jpg)
Estimating the number of nucleotide substitutions between two sequences
![Page 65: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/65.jpg)
Number of nucleotide substitutions between two sequences
K= N/LSubstitutions per nucleotide site.
Total number of substitutions.
Number of sites compared between two sequences.
![Page 66: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/66.jpg)
A simple measure of genetic distance between two sequences is p
p= nd / nProportion of different sites.
Total number of differences.
Number of sites compared between two sequences.
![Page 67: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/67.jpg)
Divergence Between DNA sequences
Ancestral sequence
Sequence 1 Sequence 2
ACTGAACGTAACGC
ACTGAACGTAACGC
t t Single substitution
Multiple substitutions
T C
Coincidental substitutions
Parallel substitutions
Convergent substitutions
Back substitutions T C
A
G G
A A
T C T
![Page 68: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/68.jpg)
Divergence Between DNA sequences
Ancestral sequence
Sequence 1 Sequence 2
ACTGAACGAATCGC
ACTGAACGAATCGC
t t Single substitution
Multiple substitutions
T C
Coincidental substitutions
Parallel substitutions
Convergent substitutions
Back substitutions T C
A
A G
A A
T C TAlthough there has been 12 mutations, only 3 can be detected
![Page 69: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/69.jpg)
Sequence dissimilarity
D = (1 – I(t))
Time
Due to multiple substitutions, the observed number of differences between two sequence is less than the
true number of substitutions
0
1
Proportion of observed differences
Proportion of actual differences
![Page 70: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/70.jpg)
Sequence dissimilarity
D = (1 – I(t))
Time
Models of sequence evolution can be used to “correct” for multiple hits
0
1 Distance correction
![Page 71: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/71.jpg)
Estimating the number of nucleotide substitutions under the Jukes and Cantor’s One-Parameter Model
As we have seen, the expected proportion of identical nucleotides between two sequences that diverged t time units ago is given by:
I(t) = ¼ + ¾ e -8t
![Page 72: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/72.jpg)
Estimating the number of nucleotide substitutions under the Jukes and Cantor’s One-Parameter Model
And the probability that the two sequences are different at a site at time t is:
I(t) = ¼ + ¾ e -8t
p = 1 - I(t)
![Page 73: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/73.jpg)
Estimating the number of nucleotide substitutions under the Jukes and Cantor’s One-Parameter Model
Doing some algebra:
p = 1 - (¼ + ¾ e -8t)
p = ¾ (1 - e -8t)
8t = - ln (1 - 4p/3)
p = 1 - I(t)
And since in the JC model K = 2(3t) between two sequences:
K = - (¾) ln (1 - (4/3)p)
![Page 74: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/74.jpg)
Estimating the number of nucleotide substitutions under the Kimura two-Parameter Model
where:
And P and Q are the proportions of transitional and transversional differences between the two sequences
K = (½) ln(a) + (¼)ln(b)
a = 1/ (1 - 2P - Q)
b = 1/ (1 - 2Q)
![Page 75: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/75.jpg)
Estimating the number of nucleotide substitutions using the Poisson Correction for protein sequences
![Page 76: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/76.jpg)
Estimating the number of nucleotide substitutions using the Poisson Correction for protein sequences
M C A N T P L …P (k) = e -rt (rt)k / k!
P (0) = e -rt
P (1) = e -rt
P (2) = e -rt (rt)2 / 2!P (n) = e -rt (rt)n / n!
P (substitutions)
![Page 77: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/77.jpg)
Estimating the number of nucleotide substitutions using the Poisson Correction for protein sequences
SecA
Sec1 Sec2
e–rt e–rt q = (e–rt)2 e–2rt = 1 - p
The probability that none of the sequences has suffered a substitution is:
K = 2rt
Doing a little algebra:
K = - ln (1 - p)e–K = 1 - p
![Page 78: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/78.jpg)
Genetic distance using Poisson Correction
![Page 79: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/79.jpg)
Trees
![Page 80: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/80.jpg)
A phylogeny and the three basic kinds of tree used to depict that phylogeny
After Page and Holmes (1998)
A B C
time
Character change
PhylogenyA B CCladogram
A B C
Additive tree
A B C
5
0
Ultrametric tree
![Page 81: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/81.jpg)
Distance Methods for Phylogenetic Inference
![Page 82: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/82.jpg)
[ 1 2 3 4 5 6 7 8 9 10]
[ 1]
[ 2] 0.009
[ 3] 0.000 0.009
[ 4] 0.000 0.009 0.000
[ 5] 0.000 0.009 0.000 0.000
[ 6] 0.009 0.019 0.009 0.009 0.009
[ 7] 0.009 0.019 0.009 0.009 0.009 0.000
[ 8] 0.098 0.108 0.098 0.098 0.098 0.108 0.108
[ 9] 0.098 0.108 0.098 0.098 0.098 0.108 0.108 0.000
[ 10] 0.088 0.098 0.088 0.088 0.088 0.098 0.098 0.009 0.009
Distance Matrix
![Page 83: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/83.jpg)
In order for a distance measure to be used to build phylogenies it must satisfy some basic requeriments
It must be metric
It must be additive
![Page 84: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/84.jpg)
Metric distances
A distance is metric if:
1 d (a,b) 0 (non-negativity)
a sequence
b sequence
d (a,b)
2 d (a,b) = d (b,a) (symetry)
3 d (a,c) d (a,b) + d (b,c) (triangle inequality)4 d (a,b) = 0 if and only if a = b (distinctiness)
![Page 85: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/85.jpg)
Ultrametric distances
5 d (a,b) maximum [d (a,c), d (b,c)]
A distance is ultrametric if:
a b
c
4
6 6
An ultrametric distance have the property of implying a constant evolutionary rate
![Page 86: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/86.jpg)
Additive distances
Four point condition:
d (a,b) + d (c,d) maximum [d (a,c) + d (b,d), d (a,d) + d (b,c)]
a
b
c
d
![Page 87: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/87.jpg)
a b c d
a b c d
10 10 10 6 6 2
a
b
c
d
2
6
6
10
10
10
1
1
2
2
3
5
An ultrametric distance matrix between four sequences and the corresponding ultrametric tree
![Page 88: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/88.jpg)
a b c d
a b c d
14 10 9 7 3 6
6
3
7
9
10
14
a
b
c
d
5
1
1
2
1
6
An aditive distance matrix between four sequences and the corresponding additive tree
![Page 89: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/89.jpg)
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
OTU A B C
B dAB
C dAC dBC
D dAD dBD dCD
OTU
![Page 90: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/90.jpg)
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
OTU A B C
B dAB
C dAC dBC
D dAD dBD dCD
OTU
![Page 91: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/91.jpg)
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
A
B
dAB /2
![Page 92: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/92.jpg)
OTU (AB) C
C d(AB)C
D d(AB)D dCD
OTU
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
d(AB)C = ( dAC + dBC )/2d(AB)D = ( dAD + dBD )/2
![Page 93: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/93.jpg)
OTU (AB) C
C d(AB)C
D d(AB)D dCD
OTU
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
![Page 94: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/94.jpg)
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
A
B
C
d(AB)C /2
![Page 95: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/95.jpg)
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
d(ABC)D /2 = [(dAD + dBD + dCD )/ 3]/ 2
A
B
C
D
![Page 96: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/96.jpg)
Unweighted Pair-group Method using Arithmetic averages (UPGMA)
dXY = dij / (nX nY)
Assumes a constant molecular clock
Estimates tree topology and branch length
![Page 97: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/97.jpg)
Minimum Evolution Method
In this method, the sum (S) of all branch length estimates is computed for all or all plausible topologies and the topology that has the smallest S value is chosen as the best tree.
S = bii
T
![Page 98: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/98.jpg)
Neighbor-Joining Method
The principle of N-J method is to find neighbors sequentially that may minimize the total lenght of the tree
X
1
2
3
4
5
6
7
8
This method strarts with a starlike tree:
Y
1
2 3
4
5
6
7
8
X
The first step is to separate a pair of OTUs from all others:
And among all the posible pair of OTUs the one with the smallest sum of branch lenghts is chosen.This procedure is repeated until all interior branches are found.
1
23
4
5
6
7
8
![Page 99: Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.](https://reader036.fdocuments.us/reader036/viewer/2022062600/5a4d1b6f7f8b9ab0599b4dd4/html5/thumbnails/99.jpg)