1
Introduction to molecular evolution
Lecture 13, Statistics 246
March 4, 2004
2
Evolution using molecules: implicit assumptions
Our DNA is inherited from our parents more or less unchanged.
Molecular evolution is dominated by mutations that are neutral from the standpoint of natural selection.
Mutations accumulate at fairly steady rates in surviving lineages.
We can study the evolution of (macro) molecules and reconstruct the evolutionary history of organisms using their molecules.
3
Some important dates in history(billions of years ago)
Origin of the universe 15 4
Formation of the solar system 4.6
First self-replicating system 3.5 0.5
Prokaryotic-eukaryotic divergence 1.8 0.3
Plant-animal divergence 1.0
Invertebrate-vertebrate divergence 0.5
Mammalian radiation beginning 0.1
86 CSH Doolittle et al.
4
The three kingdoms
5
Two important early observations
Different proteins evolve at different rates, and this seems more or less independent of the host organism, including its generation time.
It is necessary to adjust the observed percent difference between two homologous proteins to get a distance more or less linearly related to the time since their common ancestor. ( Later we offer a rational basis for doing this.)
An striking early version of these observations is next.
6
Evolution ofthe globins
Hemoglobin
Fib
rinop
eptid
es1.
1 M
Y
5.8 MY
Cytochrome c
20.0 MYSeparation of ancestorsof plants and animals
1
23
4
67
8 910
5
Mam
mal
s
Bird
s/R
eptil
es
Rep
tiles
/Fis
h
Car
p/La
mpr
ey
Mam
mal
s/R
eptil
es
Ver
tebr
ates
/In
sect
s
a
220
200
180
160
140
120
100
80
60
40
20
0
200100 300 400 500 600 700 800
Millions of years since divergenceAfter Dickerson (1971)
Cor
rect
ed a
min
o ac
id c
hang
es p
er 1
00 r
esid
ues
900 1000 1100 1200 1300 1400
bcde f h ig j
Hur
on
ian
Alg
on
kia
n
Cam
bria
n
Ord
ovi
cian
Silu
ria
nD
evo
nia
n
Per
mia
nTr
iass
icJu
rass
ic
Cre
tace
ous
Pal
eo
cen
e
Olig
oce
ne
Mio
cen
eP
lioce
ne
Eoc
en
e
Car
bo
nife
rou
s
Rates of macromolecular evolution
7
Protein aPAMs/100 residues/108 years Theoretical lookback timeb
Pseudogenes 400 45c
Fibrinopeptides 90 200c
Lactalbumins 27 670c
Lysozymes 24 750c
Ribonucleases 21 850c
Hemoglobins 12 1.5d
Acid proteases 8 2.3d
Triosephosphate isomerase 3 6d
Phosphoglyceraldehyde dehydrogenase 2 9d
Glutamate dehydrogenase 1 18d
______________________________________________________________________________________
aPAMs, Accepted point mutations (explained shortly). bUseful lookback time = 360 PAMs.
cMillion years. dBillion years. From Doolittle 1986
Different rates of change for different proteins
8
Rates of change in protein familiesProtein Ratea Protein Rate
Fibrinopeptides 90 Thyrotropin beta chain 7.4Growth hormone 37 Parathyrin 7.3Ig kappa chain C region 37 Parvalbumin 7.0Kappa casein 33 BPTI Protease inhibitors 6.2Ig gamma chain C region 31 Trypsin 5.9Lutropin beta chain 30 Melanotropin beta 5.6Ig lambda chain C region 27 Alpha crystallin A chain 5.0Complement C3a 27 Endorphin 4.8Lactalbumin 27 Cytochrome b5 4.5Epidermal growth factor 26 Insulin 4.4Somatotropin 25 Calcitonin 4.3Pancreatic ribonuclease 21 Neurophysin 2 3.6Lipotropin beta 21 Plastocyanin 3.5Haptoglobin alpha chain 20 Lactate dehydrogenase 3.4Serum albumin 19 Adenylate cyclase 3.2Phospholipase A2 19 Triosephosphate isomerase 2.8Protease inhibitor PST1 type 18 Vasoactive intestinal peptide 2.6Prolactin 17 Corticotropin 2.5Pancreatic hormone 17 Glyceraldehyde 3-P DH 2.2Carbonic anydrase C 16 Cytochrome C 2.2Lutropin alpha chain 16 Plant ferredoxin 1.9Hemoglobin alpha chain 12 Collagen 1.7Hemoglobin beta chain 12 Troponin C, skeletal muscle 1.5Lipid-binding protein A-II 10 Alpha crystallin B-chain 1.5Gastrin 9.8 Glucagon 1.2Animal lysozyme 9.8 Glutamate DH 0.9 Myoglobin 8.9 Histone H2B 0.9Amyloid A 8.7 Histone H2A 0.5Nerve growth factor 8.5 Histone H3 0.14Acid proteases 8.4 Ubiquitin 0.1Myelin basic protein 7.4 Histone H4 0.1
apercent/100My From (Nei, 1987; Dayhoff et al., 1978)
9
Some terminology
In evolution, homology (here of proteins), means similarity due to common ancestry.
A common mode of protein evolution is by duplication. Depending on the relations between duplication and speciation dates, we have two different types of homologous proteins. Loosely,
Orthologues: the “same” gene in different organisms;common ancestry goes back to a speciation event.
Paralogues: different genes in the same organism; common ancestry goes back to a gene duplication.
Lateral gene transfer gives another form of homology.
10
Beta-globins (orthologues)
10 20 30 40
M V H L T P E E K S A V T A L W G K V N V D E V G G E A L G R L L V V Y P W T Q BG-human- . . . . . . . . N . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . BG-macaque
- - M . . A . . . A . . . . F . . . . K . . . . . . . . . . . . . . . . . . . . BG-bovine- . . . S G G . . . . . . N . . . . . . I N . L . . . . . . . . . . . . . . . . BG-platypus
. . . W . A . . . Q L I . G . . . . . . . A . C . A . . . A . . . I . . . . . . BG-chicken- . . W S E V . L H E I . T T . K S I D K H S L . A K . . A . M F I . . . . . T BG-shark
50 60 70 80
R F F E S F G D L S T P D A V M G N P K V K A H G K K V L G A F S D G L A H L D BG-human. . . . . . . . . . S . . . . . . . . . . . . . . . . . . . . . . . . . N . . . BG-macaque
. . . . . . . . . . . A . . . . N . . . . . . . . . . . . D S . . N . M K . . . BG-bovine. . . . A . . . . . S A G . . . . . . . . . . . . A . . . T S . G . A . K N . . BG-platypus
. . . A . . . N . . S . T . I L . . . M . R . . . . . . . T S . G . A V K N . . BG-chicken. Y . G N L K E F T A C S Y G - - - - - . . E . A . . . T . . L G V A V T . . G BG-shark
90 100 110 120
N L K G T F A T L S E L H C D K L H V D P E N F R L L G N V L V C V L A H H F G BG-human. . . . . . . Q . . . . . . . . . . . . . . . . K . . . . . . . . . . . . . . . BG-macaque
D . . . . . . A . . . . . . . . . . . . . . . . K . . . . . . . V . . . R N . . BG-bovineD . . . . . . K . . . . . . . . . . . . . . . . N R . . . . . I V . . . R . . S BG-platypus. I . N . . S Q . . . . . . . . . . . . . . . . . . . . D I . I I . . . A . . S BG-chicken
D V . S Q . T D . . K K . A E E . . . . V . S . K . . A K C F . V E . G I L L K BG-shark
130 140
K E F T P P V Q A A Y Q K V V A G V A N A L A H K Y HBG-human. . . . . Q . . . . . . . . . . . . . . . . . . . . .BG-macaque
. . . . . V L . . D F . . . . . . . . . . . . . R . .BG-bovine. D . S . E . . . . W . . L . S . . . H . . G . . . .BG-platypus. D . . . E C . . . W . . L . R V . . H . . . R . . .BG-chicken
D K . A . Q T . . I W E . Y F G V . V D . I S K E . . BG-shark
. means same as reference sequence
- means deletion
11
Beta-globins: uncorrected pairwise distances
DISTANCES between protein sequences, calculated over: 1 to 147Below diagonal: observed number of differencesAbove diagonal: number of differences per 100 amino acids
hum mac bov pla chi sha
hum ---- 5 16 23 31 65 mac 7 ---- 17 23 30 62 bov 23 24 ---- 27 37 65
pla 34 34 39 ---- 29 64
chi 45 44 52 42 ---- 61 sha 91 88 91 90 87 ----
12
Beta-globins: corrected pairwise distances
DISTANCES between protein sequences, calculated over 1 to 147. Below diagonal: observed number of differences Above diagonal: estimated number of substitutions per 100 amino acids Correction method: Jukes-Cantor hum mac bov pla chi sha
hum ---- 5 17 27 37 108 mac 7 ---- 18 27 36 102 bov 23 24 ---- 32 46 110
pla 34 34 39 ---- 34 106
chi 45 44 52 42 ---- 98 sha 91 88 91 90 87 ----
13
UPGMA tree
BG-bovine
BG-humanBG-macaque
BG-platypus
BG-chicken
BG-shark
14
Human globins (paralogues)
10 20 30
- V L S P A D K T N V K A A W G K V G A H A G E Y G A E A L E R M F L S F P T T alpha-humanV H . T . E E . S A . T . L . . . . - - N V D . V . G . . . G . L L V V Y . W . beta-humanV H . T . E E . . A . N . L . . . . - - N V D A V . G . . . G . L L V V Y . W . delta-human
V H F T A E E . A A . T S L . S . M - - N V E . A . G . . . G . L L V V Y . W . epsilon-humanG H F T E E . . A T I T S L . . . . - - N V E D A . G . T . G . L L V V Y . W . gamma-human- G . . D G E W Q L . L N V . . . . E . D I P G H . Q . V . I . L . K G H . E . myo-human
40 50 60 70
K T Y F P H F - D L S H G S A - - - - - Q V K G H G K K V A D A L T N A V A H V alpha-humanQ R F . E S . G . . . T P D . V M G N P K . . A . . . . . L G . F S D G L . . L beta-humanQ R F . E S . G . . . S P D . V M G N P K . . A . . . . . L G . F S D G L . . L delta-humanQ R F . D S . G N . . S P . . I L G N P K . . A . . . . . L T S F G D . I K N M epsilon-humanQ R F . D S . G N . . S A . . I M G N P K . . A . . . . . L T S . G D . I K . L gamma-human
L E K . D K . K H . K S E D E M K A S E D L . K . . A T . L T . . G G I L K K K myo-human
80 90 100 110
D D M P N A L S A L S D L H A H K L R V D P V N F K L L S H C L L V T L A A H L alpha-human. N L K G T F A T . . E . . C D . . H . . . E . . R . . G N V . V C V . . H . F beta-human. N L K G T F . Q . . E . . C D . . H . . . E . . R . . G N V . V C V . . R N F delta-human
. N L K P . F A K . . E . . C D . . H . . . E . . . . . G N V M V I I . . T . F epsilon-human
. . L K G T F A Q . . E . . C D . . H . . . E . . . . . G N V . V T V . . I . F gamma-humanG H H E A E I K P . A Q S . . T . H K I P V K Y L E F I . E . I I Q V . Q S K H myo-human
120 130 140
P A E F T P A V H A S L D K F L A S V S T V L T S K Y R - - - - - - alpha-humanG K . . . . P . Q . A Y Q . V V . G . A N A . A H . . H . . . . . . beta-human
G K . . . . Q M Q . A Y Q . V V . G . A N A . A H . . H . . . . . . delta-humanG K . . . . E . Q . A W Q . L V S A . A I A . A H . . H . . . . . . epsilon-humanG K . . . . E . Q . . W Q . M V T A . A S A . S . R . H . . . . . . gamma-human
. G D . G A D A Q G A M N . A . E L F R K D M A . N . K E L G F Q G myo-human
15
Human globins: uncorrected pairwise distances
DISTANCES between protein sequence, calculated over 1 to 154.Below diagonal: observed number of differencesAbove diagonal: number of differences per 100 amino acids
alpha beta delta epsil gamma myo
alpha ---- 55 55 60 57 74 beta 82 ---- 7 25 27 75 delta 82 10 ---- 27 29 74
epsil 89 35 39 ---- 20 77
gamma 85 39 42 29 ---- 76 myo 116 117 116 119 118 ----
16
Human globins: corrected pairwise distances
DISTANCES between protein sequences,calculated over 1-141 Below diagonal: observed number of differences Above diagonal: estimated number of substitutions per 100 amino acids Correction method: Jukes-Cantor
alpha beta delta epsil gamma myo
alpha ---- 281 281 281 313 208 beta 82 ---- 7 30 31 1000 delta 82 10 ---- 34 33 470
epsil 89 35 39 ---- 21 402
gamma 85 39 42 29 ---- 470 myo 116 117 116 119 118 ----
17
10 20 30 40 50 60
C C G A C A G G C A C G G T G G C T C A C A C C T G T A A T C C C A G T A C T T T G G G A G G C T G A G G C G A G A G G hum-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-3. . . . . . . A . . . . . . . . . . . . . . . . . . . C . . . . . . . C . . . . . . . . . . . . . . . . . . . G . . . . chimp. . . . . . . A . . . . . . . . . . . . . . . . . . . C . . . . . . . C . . . . . . . . . . . . . . . . . . . G . . . . bonob. . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . C . . . . . . . . G . . . . goril. G . . . . . A . . . . . . . . . . . . . G . . . . . . . . . . . . . C . . . . . . . . . . . . C . . . . T . G . C . . orang
70 80 90 100 110 120
A T C A C C T G A G G T C G G G A G T T T G A G A C C A G C C T G A C C A A T A T G G A G A A A C C C C A G T T A T A C hum-1. . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . . . . . hum-3. . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . chimp. . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bonob. . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . goril. . . . . . . . . . . . T . . . . . . . C . . A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . . . orang
130 140 150 160 170 180
T A A A A A T A C A A A A T T A G C T G G G T G T G G T G G C G C A T G C C T G T A A T C C T A G C T A C T A G G A A G hum-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-3. . . . . . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . G . . chimp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . G . . bonob. . . . . . . . . . . . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . G . . goril. . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . . . . . . T . C . . . . . . . . . . G . . orang
190 200 210 220 230 240
G C T G A G G C A G G A G A A T C G C T T G A A C C C G G G A G G T G G A G G T T G A G G T G A G C T G A G A T C A C G hum-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-3. . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . chimp. . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bonob. . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . A . . . . . . . . . T . . . . . . . . . . . . . . . . . A goril. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . G . . orang
250 260 270 280 290 300
C C A T T G C A C T C C A G C C T G G G C A A C A A G A G C A A A A C T C C G T C T C A A A A A A T A A A T A A A T A A hum-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hum-3. . . C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . C . . . . chimp. . . C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . bonob. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . goril. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . T . . . . . . . . . . . . . . . . . orang
Alu sequences (a-globin2 Alu 1, Knight et al., 1996)
18
Alu sequences:uncorrected pairwise distances
DISTANCES between nucleic acid sequences calculated over: 1 to 300, considering all base positions Below diagonal: observed number of differences Above diagonal: number of differences per 100 bases hum-1 hum-2 hum-3 chimp bonob goril orang
hum-1 ---- 0 0 4 3 5 7 hum-2 1 ---- 1 4 4 5 7 hum-3 1 2 ---- 4 4 5 7
chimp 12 13 13 ---- 1 5 6
bonob 10 11 11 2 ---- 4 5 goril 14 15 15 14 12 ---- 7
orang 20 21 21 18 16 22 ----
19
Alu sequences: corrected pairwise distances
DISTANCES between nucleic acid sequences, calculated over: 1 to 300, considering all base positions Below diagonal: observed number of differences Above diagonal: estimated number of substitutions per 100 bases Correction method: Jukes-Cantor
hum-1 hum-2 hum-3 chimp bonob goril orang
hum-1 ---- 0 0 4 3 5 7 hum-2 1 ---- 1 4 4 5 7 hum-3 1 2 ---- 4 4 5 7
chimp 12 13 13 ---- 1 5 6
bonob 10 11 11 2 ---- 4 6 goril 14 15 15 14 12 ---- 8
orang 20 21 21 18 16 22 ----
20
Correcting distances between DNA and protein sequences
We mentioned earlier that it is necessary to adjust observed percent differences to get a distance measure which scales linearly with time. This is because we can have multiple and back substitutions at a given position along a lineage.
All of the correction methods (with names like Jukes-Cantor, 2-parameter Kimura, etc) are justified by simple probabilistic arguments involving Markov chains whose basis is worth mastering.
The same molecular evolutionary models are used in scoring sequence alignments.
21
Markov chain
State space = {A,C,G,T}.
p(i,j) = pr(next state Sj | current state Si)
Markov assumption:
p(i,j) = pr(next state Sj | current state Si & any configuration of states before this)
Only the present state, not previous states, affects the probs of moving to next states.
22
A simple 4-state Markov chain: the Kimura 2-parameter model for nucleotide change
A G
TC
Transition probabilities: Horizontal: a Diagonal & vertical: b Self: c = a2b
c a b b
a c b b
b b c a
b b a c
A G C TA
G
C
T
23
The multiplication rule
pr(state after next is Sk | current state is Si)
= ∑j pr(state after next is Sk, next state is Sj | current state is Si) [addition rule]
= ∑j pr(next state is Sj| current state is Si) x pr(state after next is Sk | current
state is Si, next state is Sj) [multiplication rule]
= ∑j p(i,j) x p(j,k) [Markov assumption]
= (i,k)-element of P2, where P=(p(i,j)).
More generally,
pr(state t steps from now is Sk | current state is Si) = i,k element of Pt
24
Continuous-time version
For any s,t write pij(t) = pr(Sj at time t+s | Si at time s) for the
stationary (time-homogeneous) transition probabilities.
Write P(t) = (pij(t)) for the matrix of pij(t)’s.
Then for any t,u: P(t+u) = P(t) P(u).
It follows that P(t) = exp(Qt), where Q = P’(0) is the derivative of
P(t) at t = 0.
Q is called the infinitesimal matrix of P(t), and satisfies
P’(t) = QP(t) = P(t)Q.
25
Interpretation of Q
Roughly, q(i,j) is the rate of transitions of i to j, while
q(i,i) = j q(i,j), so each row sum is 0. If under some initial
conditions, we have a Markov chain evolving in continuous time
with infinitesimal matrix Q, and pj(t) = pr(Sj at time t), then
pj(t+h) =i pr(Si at t, Sj at t+h)
= i pr(Si at t)pr(Sj at t+h | Si at t)
= pj(t)x(1+hqjj) + i jpi(t)x hqij
i.e., h-1[pj(t+h) - pj(t)] = pj(t)q(j,j) + i j pi(t)q(i,j)
which becomes P’ = QP as h 0.
Important approximation: when t is small, P(t) I + Qt.
≠
≠
≠
≠
26
Q =
P(t) =
The Jukes-Cantor model (1969)
-3 -3 -3 -3
r s s s
s r s s
s s r s
s s s r
r = (1+3e4t)/4, s = (1 e4t)/4.
Commonancestor ofhuman and orang.
t time units
human (now)
Consider e.g. the 2nd position in a-globin2 Alu1.
27
Let P(t) = exp(Qt). Then the A,G element of P(t) is
pr(G now | A then) = (1 e4t)/4.
Same for all pairs of different nucleotides.
Overall rate of change k = 3t.
When k = .01, described as 1 PAM
PAM = accepted point mutation
Put t = .01/3 = 1/300. Then the resulting
P = P(1/300) is called the PAM(1) matrix.
Why use PAMs?
Definition of PAM
28
Evolutionary time, PAM
Since sequences evolve at different rates, it is
convenient to rescale time so that 1 PAM of evolutionary time corresponds to 1% expected substitutions.
For Jukes-Cantor, k = 3t is the expected number of substitutions in [0,t], so is a distance. (Show this.)
Set 3t = 1/100, or t = 1/300, so 1 PAM = 1/300 years.
29
Distance adjustment
For a pair of sequences, k = 3t is the desired metric, but not observable. Instead, pr(different) is observed. So we use a model to convert pr(different) to k.
This is completely analogous to the conversion of
= pr(recombination)
to genetic (map) distance (= expected number of crossovers) using the Haldane map function
= 1/2 (1 e-2d),
assuming the no-interference (Poisson) model.
30
common ancestor
Gorang
Chuman
still 2nd position in a-globin Alu 1
Assume that the common ancestor has A, G, C or T with probability 1/4.
Then the chance of the nt differing
p≠ = 3/4 (1 e8t) = 3/4 (1 e4k/3), since k =2 3t
t
3/4
Towards Jukes-Cantor adjustment
31
If we suppose all nucleotide positions behave identically and independently, and n≠ differ out of n, we can invert this, obtaining
= 3/4 log(1 4/3 n≠/n).
This is the corrected or adjusted fraction of differences (under this simple model). 100 to get PAMs
The analogous simple model for amino acid sequences has
= 19/20 log(1 20/19 n≠/n).
100 for PAM.
Jukes-Cantor adjustment
32
Illustration
1. Human and bovine beta-globins are aligned with no deletions at 145 out of 147 sites. They differ at 23 of these sites. Thus n≠/n = 23/145, and the corrected distance using the Jukes-Cantor formula is (natural logs)
19/20 log(1 20/19 23/145) = 17.3 10-2.
2. The hum-1 and gorilla sequences are aligned without gaps across all 300 bp, and differ at 14 sites. Thus n≠/n = 14/300, and the corrected distance using the Jukes-Cantor formula is
3/4 log(1 4/3 14/300) = 4.8 10-2.
33
Correspondence between observed a.a. differences and the evolutionary distance (Dayhoff et al., 1978)
Observed Percent Difference Evolutionary Distance in PAMs
1511172330384756678094112133159195246
1510152025303540455055606570758085 328
Top Related