Pairwise Sequence Comparison
description
Transcript of Pairwise Sequence Comparison
![Page 1: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/1.jpg)
Pairwise Sequence Comparison
Stat 246, Spring 2002, Week 5,
![Page 2: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/2.jpg)
Sequence comparison: topics
General concepts
Dot plots
Global alignments
Scoring matrices
Gap penalties
Dynamic programming
Chance or common ancestry?
![Page 3: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/3.jpg)
Dot Plot
This is the earliest, simplest and most complete method for comparing two sequences
It is possible to filter the plot to minimise noise whilst preserving the obvious relationship
This plot can identify
• regions of similarity
• internal repeats
• rearrangement events
![Page 4: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/4.jpg)
A C A C A C T A
A
G
C
A
C
A
C
A
b
a .A dot goes where the two sequences match
Sequence1 down:
Sequence 2along:
(Add a “guard” row and colum.)
Connect the dotsalong diagonals.
![Page 5: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/5.jpg)
Extensions to dot plots
Modern dot plots are more sophisticated, using the notions of
window : size of diagonal strip centered on an entry, over which matching is accumulated, and
stringency: the extent of agreement required over the window, before a dot is placed at the central entry.
e.g. for a window of size 5, we might require at least 3 matches, and then we put a dot in the central spot. More complex scoring rules can be used.
![Page 6: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/6.jpg)
Human globin vs. human myoglobin
a
beta-human.pep ck: 1,242, 1 to 146050100150100500
![Page 7: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/7.jpg)
Human LDL receptor vs. itself (w=30, s=9)
a
ldlrecep.pep ck: 3,641, 1 to 860 02004006008008006004002000
![Page 8: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/8.jpg)
Human LDL receptor vs. itself (40, 15)
COMPARE Window: 40 Stringency: 15.0 Points: 5,287
ldlrecep.pep ck: 3,641, 1 to 860
ldlrecep.pep ck: 3,641, 1 to 860
0
200
400
600
800
8006004002000
![Page 9: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/9.jpg)
Human LDL receptor vs. itself (40, 17.5)
ldlrecep.pep ck: 3,641, 1 to 860
0
200
400
600
800
8006004002000
COMPARE Window: 40 Stringency: 17.5 Points: 3,079
ldlrecep.pep ck: 3,641, 1 to 860
![Page 10: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/10.jpg)
Human LDL receptor vs. itself (40, 20)
ldlrecep.pep ck: 3,641, 1 to 860
0
200
400
600
800
8006004002000
COMPARE Window: 40 Stringency: 20.0 Points: 2,295
ldlrecep.pep ck: 3,641, 1 to 860
![Page 11: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/11.jpg)
Plasmodium falciparum MSP3 vs. itself (30,9)
a
msp3.pep ck: 4,247, 1 to 3800100200300
3002001000
![Page 12: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/12.jpg)
Plasmodium falciparum MSP3 vs. itself (20,9)
COMPARE Window: 20 Stringency: 9.0 Points: 15,619
msp3.pep ck: 4,247, 1 to 380
msp3.pep ck: 4,247, 1 to 380
0
100
200
300
3002001000
![Page 13: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/13.jpg)
Plasmodium falciparum MSP3 vs. itself (10,9)
COMPARE Window: 10 Stringency: 9.0 Points: 1,263
msp3.pep ck: 4,247, 1 to 380
msp3.pep ck: 4,247, 1 to 380
0
100
200
300
3002001000
![Page 14: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/14.jpg)
Global alignment
An alignment of two sequences a and b is an arrangement of a and b by position, where a and b can be padded with gap symbols to achieve the same length:
a: AGCACAC-A or AG-CACACA
b: A-CACACTA ACACACT-A
If we read the alignment column-wise, we have a protocol of edit operations that lead from a to b.
Left: Match (A,A) Right: Match (A,A)
Delete (G,-) Replace (G,C)
Match (C,C) Insert (-,A)
Match (A,A) Match (C,C)
Match (C,C) Match (A,A)
Match (A,A) Match (C,C)
Match (C,C) Replace (A,T)
Insert (-,T) Delete (C,-)
Match (A,A) Match (A,A)
The left-hand alignment shows one Delete, one Insert, and the other edit operations are Matches.
The right-hand alignment shows one Insert, one Delete, two Replaces, and some trivial ones.
![Page 15: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/15.jpg)
Cost (scoring) of global alignments; optimal global alignments
Next we turn the edit protocol into a measure of distance by assigning a “cost” or “weight” S to each operation. For example, for arbitrary characters u,v from A we may define
S(u,u) = 0; S(u,v) = 1 for u ≠ v; S(u,-) = S(-,v) = 1. (Unit Cost)
This scheme is known as the Levenshtein distance, also called unit cost model. Its predominant virtue is its simplicity. In general, more sophisticated cost models must be used. For example, replacing an amino acid by a biochemically similar one should weight less than a replacement by an amino acid with totally different properties. Details shortly. Now we are ready to define the most important notion for sequence analysis:
The cost of an alignment of two sequences a and b is the sum of the costs of all the edit operations that lead from a to b.
An optimal alignment of a and b is an alignment which has minimal cost among all possible alignments.
The edit distance of a and b is the cost of an optimal alignment of a and b under a cost function S. We denote it by d(a,b).
Using the unit cost model for S in our previous example, we obtain the following cost:
a: AGCACAC-A or AG-CACACA
b: A-CACACTA ACACACT-A
cost: 2 cost: 4
Here it is easily seen that the left-hand assignment is optimal under the unit cost model, and hence the edit distance d(a,b) = 2.
![Page 16: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/16.jpg)
More general scores = - costs: see later.
C 9
S -1 4
T -1 1 5
P -3 -1 -1 7
A 0 1 0 -1 4
G -3 0 -2 -2 0 6
N -3 1 0 -2 -2 0 6
D -3 0 -1 -1 -2 -1 1 6
E -4 0 -1 -1 -1 -2 0 2 5
Q -3 0 -1 -1 -1 -2 0 0 2 5
H -3 -1 -2 -2 -2 -2 1 -1 0 0 8
R -3 -1 -1 -2 -1 -2 0 -2 0 1 0 5
K -3 0 -1 -1 -1 -2 0 -1 1 1 -1 2 5
M -1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5
I -1 -2 -1 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4
L -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4
V -1 -2 0 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 3 1 4
F -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6
Y -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7
W -2 -3 -2 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11
C S T P A G N D E Q H R K M I L V F Y W
134 LQQGELDLVMTSDILPRSELHYSPMFDFEVRLVLAPDHPLASKTQITPEDLASETLLI | ||| | | |||||| | || || 137 LDSNSVDLVLMGVPPRNVEVEAEAFMDNPLVVIAPPDHPLAGERAISLARLAEETFVM
D:D = +6
D:R = -2
From Henikoff 1996
![Page 17: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/17.jpg)
Scoring Matrices
Physical/Chemical similarities
comparing two sequences according to the properties of their residues may highlight regions of structural similarity
Identity matrices
by stressing only identities in the alignment, stretches of sequence that may have diverged will not penalise any remaining common features
![Page 18: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/18.jpg)
Scoring Matrices (ctd)
As the direct source of residue by residue comparison scores the scoring matrix you choose will have a major impact on the alignment calculated
The most commonly used will be one of the mutation matrices
PAM or BLOSUM
Von Bing will explain the derivation of these and other mutation matrices next Tuesday.
The matrix that performs best will be the matrix that best reflects the evolutionary separation of the sequences being aligned.
![Page 19: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/19.jpg)
Statistical motivation for alignment scores
pr(data|H) = pr( |H) = pr( |H) x ...
= (1-p)apd d = # disagreements, a = # agreements, p = (1-e-8t)
pr(data|R) = pr( |R) = pr( |R) x ...
= ( )a( )d
= a log + d log . Since p < , log <0, log >0
score = a + d (-) >0 match score, -<0 mismatch penalty
Note that if t 0, p 6t, 1-p 1 and so log4, while - log8t is large and negative: a big difference in the two scores.
Conversely, if t is large, p = (1-), = 1-, and log(1-) -, while 1-p = (1+3), = 1+3, and so log(1+3) 3. Thus the scores are about 3:1.
AGCTGATCA...AACCGGTTA...Alignment: H = homologous (indep. sites, Jukes-
Cantor)R = random (indep. sites, equal freq.)
Hypotheses:
34
34
14
log {pr(data|H)pr(data|R) } 1-p
1/4 p3/4
34
p3/4
1-p1/4
≈ ≈ ≈ ≈ ≈
34
p3/4 ≈
14
1-p1/4
≈
![Page 20: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/20.jpg)
We can do the same with any other Markov substitution matrix for molecular evolution. E.g. with a PAM or BLOSUM matrix of probabilities,
a1 ..... am
b1 ..... bmdata = a gap free alignment of two a.a. sequence fragments
pr(data|H) = aipaibi(2t) pr(data|R) = aibi
log{ } = log{ }
The elements of a log-odds score matrix are typically > 0 on the diagonal and < 0 off the diagonal, but not always.
Also the relative sizes of match and mismatch penalties increase as #PAMs (t) decreases. Thus PAM(120) is more stringent than PAM(250), while PAM(360) is less stringent than it.
PAM(0) = the identity matrix is the toughest.
There are plenty of score matrices based on other principles.
m
1
i
pr(data|H)pr(data|R)
ipaibi(2t)/ bi
![Page 21: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/21.jpg)
Below diagonal: BLOSUM62 substitution matrixAbove diagonal: Difference matrix obtained by subracting the
PAM 160 matrix entrywise.
From Henikoff & Henikoff 1992
C S T P A G N D E Q H R K M I L V F Y W
0 -1 1 0 2 1 1 2 1 2 0 0 2 4 1 5 1 2 -2 5 C
2 0 -2 0 -1 0 0 0 1 0 0 0 1 0 1 -1 1 1 -1 S
C 9 2 -1 -1 -1 0 0 0 0 0 0 -1 0 -1 1 0 1 1 3 T
S -1 4 2 -2 -1 -1 0 0 -1 -1 -1 1 1 0 -1 0 0 2 1 P
T -1 1 5 2 -1 -2 -2 -1 0 0 1 1 0 0 1 0 1 1 2 A
P -3 -1 -1 7 2 0 -1 -2 0 1 1 0 0 -1 0 -1 1 2 4 G
A 0 1 0 -1 4 3 -1 -1 0 0 1 -1 0 -1 0 -1 0 0 0 N
G -3 0 -2 -2 0 6 2 -1 -1 -1 0 -1 0 0 0 0 2 1 3 D
N -3 1 0 -2 -2 0 6 1 0 0 2 2 1 -1 0 0 2 2 4 E
D -3 0 -1 -1 -2 -1 1 6 0 -2 0 1 1 -1 0 0 1 3 3 Q
E -4 0 -1 -1 -1 -2 0 2 5 2 -1 0 1 0 -1 0 1 2 2 H
Q -3 0 -1 -1 -1 -2 0 0 2 5 -1 -1 0 -1 1 0 1 3 -4 R
H -3 -1 -2 -2 -2 -2 1 -1 0 0 8 1 -2 -1 1 1 2 3 1 K
R -3 -1 -1 -2 -1 -2 0 -2 0 1 0 5 -2 -1 -1 0 1 2 4 M
K -3 0 -1 -1 -1 -2 0 -1 1 1 -1 2 5 -1 1 0 0 1 3 I
M -1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5 -1 0 -1 1 2 L
I -1 -2 -1 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4 0 1 2 4 V
L -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4 -1 -2 1 F
V -1 -2 0 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 3 1 4 -1 2 Y
F -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6 -1 W
Y -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7
W -2 -3 -2 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11
C S T P A G N D E Q H R K M I L V F Y W
![Page 22: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/22.jpg)
Above diagonal: SG scoring system (Feng et al., 1985)Below diagonal: Log-odds matrix for 250 PAMs (Dayhoff et al., 1978)
C S T P A G N D E Q H R K M I L V F Y W
6 4 2 2 2 3 2 1 0 1 2 2 0 2 2 2 2 3 3 3 C
6 5 4 5 5 5 3 3 3 3 3 3 1 2 2 2 3 3 2 S
C 12 6 4 5 2 4 2 3 3 2 3 4 3 3 2 3 1 2 1 T
S 0 2 6 5 3 2 2 3 3 3 3 2 2 2 3 3 2 2 2 P
T -2 1 3 6 5 3 4 4 3 2 2 3 2 2 2 5 2 2 2 A
P -3 1 0 6 6 3 4 4 2 1 3 2 1 2 2 4 1 2 3 G
A -2 1 1 1 2 6 5 3 3 4 2 4 1 2 1 2 1 3 0 N
G -3 1 0 -1 1 5 6 5 4 3 2 3 0 1 1 3 1 2 0 D
N -4 1 0 -1 0 0 2 6 4 2 2 4 1 1 1 4 0 1 1 E
D -5 0 0 -1 0 1 2 4 6 4 3 4 2 1 2 2 1 2 1 Q
E -5 0 0 -1 0 0 1 3 4 6 4 3 1 1 3 1 2 3 1 H
Q -5 -1 -1 0 0 -1 1 2 2 4 6 5 2 2 2 2 1 1 2 R
H -3 -1 0 0 -1 -2 2 1 1 3 6 6 2 2 2 3 0 1 1 K
R -4 0 0 0 -2 -3 0 -1 -1 1 2 6 6 4 5 4 2 2 3 M
K -5 0 0 -1 -1 -2 1 0 0 1 0 3 5 6 5 5 4 3 2 I
M -5 -2 -1 -2 -1 -3 -2 -3 -2 -1 -2 0 0 6 6 5 4 3 4 L
I -2 -1 0 -2 -1 -3 -2 -2 -2 -2 -2 -2 -2 2 5 6 4 3 3 V
L -6 -3 -2 -3 -2 -4 -3 -4 -3 -2 -2 -3 -3 4 2 6 6 5 3 F
V -2 -1 0 -1 0 -1 -2 -2 -2 -2 -2 -2 -2 2 4 2 4 6 3 Y
F -4 -3 -3 -5 -4 -5 -4 -6 -5 -5 -2 -4 -5 0 1 2 -1 9 6 W
Y 0 -3 -3 -5 -3 -5 -2 -4 -4 -4 0 -4 -4 -2 -1 -1 -2 7 10W -8 -2 -5 -6 -6 -7 -4 -7 -7 -5 -3 2 -3 -4 -5 -2 -6 0 0 17
C S T P A G N D E Q H R K M I L V F Y W
![Page 23: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/23.jpg)
Gap penalties
Gap penalties are usually composed of two parts:
Gap opening penalty
This reduces the alignment score and therefore must create more significant alignment downstream than would be present if no gap were created
The size of the penalty is usually of the order of one to three times the size of values in the scoring matrix
![Page 24: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/24.jpg)
Gap penalties (ctd)
Gap extension penalty
If a gap has been created then extending it should not be as hard to do
On the other hand we want to limit the size of the gap to practical lengths
A smaller gap extension penalty may allow an alignment to resolve situations where complete loops may be missing between one structure and another
![Page 25: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/25.jpg)
Low gap penalty eclustalw May 24, 1999 18:44
lgb1_pea.pep ck: 2970 from: 1 to: 147 Length: 147 hbhu.pep ck: 3588 from: 1 to: 147 Length: 147
Pairwise similarity parameter: K-Tuple length: 1 Gap Penalty: 3 Number of diagonals: 5 Diagonal window size: 5 Scoring Method: Percentage
Multiple alignment parameter: Gap Penalty (fixed): 1.00 Gap Penalty (varying): 0.05 Gap separation penalty range: 8 Percent. identity for delay: 40% List of hydrophilic residue: GPSNDQEKR Protein Weight Matrix: blosum
10 20 30 40 50 60 . . . . . .LGB1_PEA.pep --GFTDKQE-ALVNSSSEFKQNLPGYSILFYTIVLEKAPAAKGLF-SF--LKDTAGVEDSHBHU.pep MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVY--PWTQRFFESFGDLSTPDAVMGN * . *. * * .*. * .. * ** * *
LGB1_PEA.pep PKLQAHAEQVFGLVRDSAAQLR-TKGEVVLGNATLGAIHVQKGVTNP-HFVVVKEALLQTHBHU.pep PKVKAHGKKVLGAFSDGLAHLDNLKGTF----ATLSELHCDKLHVDPENFRLLGNVLVCV **..** .* * * *.* ** *** .* * * .* .. *.
LGB1_PEA.pep IKKASGNNWSEELNTAWEVAYDGLATAIKKAMKTAHBHU.pep LAHHFGKEFTPPVQAAYQKVVAGVANAL--AHKYH . . * . ...* . *.*.*. * *
![Page 26: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/26.jpg)
Middling gap penalty eclustalw May 24, 1999 18:50
lgb1_pea.pep ck: 2970 from: 1 to: 147 Length: 147 hbhu.pep ck: 3588 from: 1 to: 147 Length: 147
Pairwise similarity parameter: K-Tuple length: 1 Gap Penalty: 3 Number of diagonals: 5 Diagonal window size: 5 Scoring Method: Percentage
Multiple alignment parameter: Gap Penalty (fixed): 25.00 Gap Penalty (varying): 0.05 Gap separation penalty range: 8 Percent. identity for delay: 40% List of hydrophilic residue: GPSNDQEKR Protein Weight Matrix: blosum
10 20 30 40 50 60 . . . . . .LGB1_PEA.pep ----GFTDKQEALVNSSSEFKQNLPGYSILFYTIVLEKAPAAKGLFSFLKDTAGVEDSPKHBHU.pep MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK .* . * .. .* . * * * **
LGB1_PEA.pep LQAHAEQVFGLVRDSAAQLRTKGEVVLGNATLGAIHVQKGVTNP-HFVVVKEALLQTIKKHBHU.pep VKAHGKKVLGAFSDGLAHLDN---LKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAH ..** .* * * *.* . . *** .* * * .* .. *. . .
LGB1_PEA.pep ASGNNWSEELNTAWEVAYDGLATAIKKAMKTAHBHU.pep HFGKEFTPPVQAAYQKVVAGVANALAHKYH-- * . ...* . *.*.*. . .
![Page 27: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/27.jpg)
Very high gap penalty eclustalw May 24, 1999 18:52
lgb1_pea.pep ck: 2970 from: 1 to: 147 Length: 147 hbhu.pep ck: 3588 from: 1 to: 147 Length: 147
Pairwise similarity parameter: K-Tuple length: 1 Gap Penalty: 3 Number of diagonals: 5 Diagonal window size: 5 Scoring Method: Percentage
Multiple alignment parameter: Gap Penalty (fixed): 50.00 Gap Penalty (varying): 0.05 Gap separation penalty range: 8 Percent. identity for delay: 40% List of hydrophilic residue: GPSNDQEKR Protein Weight Matrix: blosum
10 20 30 40 50 60 . . . . . .LGB1_PEA.pep ----GFTDKQEALVNSSSEFKQNLPGYSILFYTIVLEKAPAAKGLFSFLKDTAGVEDSPKHBHU.pep MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK .* . * .. .* . * * * **
LGB1_PEA.pep LQAHAEQVFGLVRDSAAQLRTKGEVVLGNATLGAIHVQKGVTNPHFVVVKEALLQTIKKAHBHU.pep VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPEN--FRLLGNVLVCVLAHH ..** .* * * *.* . . * . ... * * .. *. . .
LGB1_PEA.pep SGNNWSEELNTAWEVAYDGLATAIKKAMKTAHBHU.pep FGKEFTPPVQAAYQKVVAGVANALAHKYH-- * . ...* . *.*.*. . .
![Page 28: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/28.jpg)
Dynamic Programming
This is a mathematical implementation that can be seen as an extension of the dotplot method
Rather than dots, the comparison matrix positions are assigned values that reflect the scores in the scoring matrix
For obtaining optimal alignments
![Page 29: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/29.jpg)
Dynamic Programming
The optimum alignment is obtained by tracing the highest scoring path from the top left-hand corner to the bottom right-hand corner of the matrix
When the alignment steps away from the diagonal this implies an insertion or deletion event, the impact of which can be assessed by the application of a gap penalty
![Page 30: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/30.jpg)
A C A C A C T A
A
G
C
A
C
A
C
A
b
a 0 1 0 1 0 1 1 0
1 1 1 1 1 1 1 1
1 0 1 0 1 0 1 1
0 1 0 1 0 1 1 0
1 0 1 0 1 0 1 1
0 1 0 1 0 1 1 0
1 0 1 0 1 0 1 1
0 1 0 1 0 1 1 0
![Page 31: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/31.jpg)
Dynamic programming: the formula
Suppose that our two sequences are a=(a1,...,am) and b=(b1,...,bn),
and that we denote by dij the edit distance between the initial
segments ai=(a1,...,ai) and bj=(b1,...,bj) of a and b.
Extend this to i=j=0 by writing d00=0.
Supposing that a deletion or an insertion incurs a penalty of +1,
the following formula summarizes our verbal argument:
dij=min(di-1,j-1 + s(ai,bj), di,j-1 + 1, di-1,j + 1).
(More is needed to give a complete algorithm: what is it?)
![Page 32: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/32.jpg)
A C A C A C T A
0 1 2 3 4 5 6 7 8
A 1 0 1 2 3 4 5 6 7
G 2 1 1 2 3 4 5 6 7
C 3 2 1 2 2 3 4 5 6
A 4 3 2 1 2 2 3 4 5
C 5 4 3 2 1 2 2 3 4
A 6 5 4 3 2 1 2 3 3
C 7 6 5 4 3 2 1 2 3
A 8 7 6 5 4 3 2 2 2
b
a
![Page 33: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/33.jpg)
Chance or common ancestry?
Idea: calculate optimal alignment scores for pairs of sequences where one is a randomized (shuffled) version of the original. This will give a distribution of random scores, representing chance similarity rather than homology.
The score from our original pair of sequences can be referred to this distribution and assigned a Z-score (subtract mean of randoms and divide by SD of randoms), or (better) a p-value.
Criticism: Such random a.a. sequences might have plausible a.a. compositions but are quite unlike real protein sequences.
Partial reply: a) restrict the randomization to blocks; or, b) create a distribution of chance similarity scores using real a.a. sequences known or assumed not to be homologous to our query sequence. [Other approaches use theory, but this is still subject to the criticism above.]
![Page 34: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/34.jpg)
Dynamic Programming
Based on notes by George Rudy, formerly WEHI.
![Page 35: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/35.jpg)
“Life must be lived forwards and understood backwards.”
Søren Kierkegaard
![Page 36: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/36.jpg)
What is DP?
Operations research: “A mathematical formalism applicable to problems involving optimization of decisions over time.”
(after R. Bellman and S. Dreyfus)
Bioinformatics : “An algorithm for finding optimal sequence alignments given an additive alignment score.”
( after R. Durbin, et al.)
Computer programming: “An approach to algorithm design whereby the target problem is decomposed into smaller problems that are then solved independently.”
(after R. Sedgewick)
![Page 37: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/37.jpg)
Where did DP come from?
- Richard Bellman
- The RAND Corporation
- “Dynamic” and “Programming”
![Page 38: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/38.jpg)
Where can DP be applied?
- Both discrete and continuous problems concerning deterministic, stochastic, or adaptive processes
- Multiple fields: research, industry, finance,…
- Examples: allocation processes
smoothing and scheduling processes
optimal search and stopping techniques
optimal trajectories
multistage production processes
feedback control processes
Markovian decision processes
![Page 39: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/39.jpg)
DP in biomedical literature (1)
0
5
10
15
20
25
Years
![Page 40: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/40.jpg)
DP in biomedical literature (2)- A symmetric-iterated multiple alignment of protein sequences.
[Brocchieri, L. and Karlin S., J. Mol. Biol. 276(1):249-64, 1998.]
- Sequence assembly validation by multiple restriction digest fragment coverage analysis.
[Rouchka, E.C. and States, D.J., ISMB. 6:140-7, 1998.]
- Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.
[Gracy, J. and Argos, P., Bioinformatics 14(2):164-73, 1998.]
- A segment-based dynamic programming algorithm for predicting gene structure.
[Wu, T.D., J. Comput. Biol. 3(3):375-94, 1996.]
- Automatic detection of cardiac contours on MR images using fuzzy logic and dynamic programming.
[Lalande A. et al., Proc. AMIA Annu. Fall Symp. :474-8, 1997.]
- Process models for production of beta-lactam antibiotics.
[Bellgardt, K.H., Adv. Biochem. Eng. Biotechnol. 60:153-94, 1998.]
- Dynamic programming approach for newborn’s incubator humidity control.
[Bouattoura, D. et al., IEEE Trans. Biomed. Eng. 45(1):48-55, 1998.]
- Minimum energy trajectories of the swing ankle when stepping over obstacles of different heights.
[Chou L.S. et al., J. Biomech. 30(2):115-20, 1997.]
- A theoretical study of the socioecology of ungulates. II. A dynamic programming study of the stochastic formulation.
[Paveri-Fontana, S.L. and Focardi, S. Theor. Popul. Biol. 46(3):279-99, 1994.]
![Page 41: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/41.jpg)
What problems are suitable for DP?
- Essential components (common to all OR problems):
a decision-maker
access to results of decisions
- Additionally:
decisions are sequential
later decisions are affected by earlier ones
effect of a decision can be calculated independently of other decisions
![Page 42: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/42.jpg)
The Stagecoach Problem (1)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K[after S. E. Dreyfus]
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
![Page 43: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/43.jpg)
Some terminology
- Vertex
- Edge
- Path
-Monotonic-to-the-right
- (Admissible) path
- Stage
- State
![Page 44: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/44.jpg)
The Stagecoach Problem (2)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
0
![Page 45: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/45.jpg)
The Stagecoach Problem (2)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2 2
1
0
![Page 46: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/46.jpg)
The Stagecoach Problem (2)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2 2
4
1
0
![Page 47: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/47.jpg)
The Stagecoach Problem (2)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
10
8
7
2
4
6
7
5
1
0
![Page 48: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/48.jpg)
The Stagecoach Problem (2)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
10
9
12
13
14
8
8
7
2
4
6
11
7
5
1
0
![Page 49: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/49.jpg)
Some more terminology
- Optimal value function
- Policy
- Optimal policy function
![Page 50: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/50.jpg)
The Stagecoach Problem (3)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
10
9
12
13
14
8
8
7
2
4
6
11
7
5
1
0
![Page 51: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/51.jpg)
The Stagecoach Problem (3)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
10
9
12
13
14
8
8
7
2
4
6
11
7
5
1
0
![Page 52: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/52.jpg)
The Stagecoach Problem (3)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
10
9
12
13
14
8
8
7
2
4
6
11
7
5
1
0
![Page 53: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/53.jpg)
The Stagecoach Problem (4)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
1
5
2 3
5
1
2
4
2
0
4
1
2
3
4
8
2
4
7
1
3
5
2
2
10
9
12
13
14
8
8
7
2
4
6
11
7
5
1
0
![Page 54: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/54.jpg)
Efficiency of the DP approach
- At each of 9 vertices where a real choice existed: 2 additions
1 binary comparison
- At the other 6 vertices: 1 addition
Total: 24 additions
9 comparisons
- Compare this with direct evaluation of the original problem by enumeration of all 20 admissible paths:
5 additions/path = 100 additions 20 comparisons
![Page 55: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/55.jpg)
Efficiency (2), and the Curse of Dimensionality
In general, for the n-stage problem treated here,
DP involves (n2/2) + n additions
Direct enumeration generates paths, or
additions.
Thus, for n=20, DP requires 220 additions while direct enumeration would demand 3,510,364 additions.
n
n
2
⎛
⎝⎜
⎞
⎠⎟ =
n !n2⎛⎝
⎞⎠ ! n
2⎛⎝
⎞⎠ !
(n −1) n!n2⎛⎝
⎞⎠!n2⎛⎝
⎞⎠ !
![Page 56: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/56.jpg)
The Stagecoach Problem (5)
A
C
H
E L
O
D
BF
I
M
G
J P
N
K
y
x
1
2
3
-1
-2
-3
1 2 3 4 5 6
![Page 57: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/57.jpg)
The Principle of Optimality, or Bellman’s Principle
“An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.” (Bellman)
or, “An optimal sequence of decisions in a multistage decision process problem has the property that whatever the initial stage, state, and decision are, the remaining decisions must constitute an optimal sequence of decisions for the remaining problem, with the stage and state resulting from the first decision considered as initial conditions.” (Dreyfus)
or, “An optimal policy must have the property that no matter what path is taken to enter a particular state, the remaining stages (decisions) taken must constitute an optimal policy for departure from that state.”
or, “An optimal policy is comprised of optimal subpolicies.”
or, “An optimal policy from any state is independent of the path taken to that state, and is made up entirely of optimal subpolicies.”
or, ...
![Page 58: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/58.jpg)
The optimal value function
S(x,y) = the value of the minimum-value admissible path connecting the vertex (x,y) and the terminal vertex (6,0)
eu(x,y) = the value of the edge connecting the vertices (x,y) and
(x+1, y+1)
ed(x,y) = the value of the edge connecting the vertices (x,y) and
(x+1, y-1)
S(x,y) = min {eu(x,y) + S(x+1, y+1), ed(x,y) + S(x+1, y-1)}
S(6,0) = 0.
![Page 59: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/59.jpg)
A more formal restatement of common features of DP problems
A physical system characterized at any stage by a small set of parameters, the state variables;
At each stage of the process there is a choice of a number of decisions;
The effect of a decision is a transformation of the state variables;
The past history of the system is of no importance in determining future actions;
The purpose of the process is to maximize some function of the state variables.
![Page 60: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/60.jpg)
The practice of DP
Imbed the specific given problem in a more general family of problems;
Define the optimal value function which associates a value with each of the various possible initial conditions of problems in that family;
Invoke the principle of optimality in order to deduce a recurrence relation characterizing that function;
Seek the solution of the recurrence relation in order to obtain the optimal policy function which furnishes the solution to the specific given problem and all other problems in the more general family as well.
![Page 61: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/61.jpg)
More practically speaking,Determine the decision-maker and the decisions to be made;
Determine the stages;
Determine the possible states;
Formulate the optimal value function in the form of a recurrence relation;
Calculate and tabulate the optimal value function for each stage and state;
Find the optimal policy (ies) for the problem.
![Page 62: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/62.jpg)
New problem, new terminology
Edit operations: M(atch), R(eplacement), I(nsert), D(elete).
Edit transcript: A string over the alphabet M, R, I, D that describes a transformation of one string into another. Example:
R D I M D MR D I M D M
M A - T H S
A - R T - S
Edit (Levens(h)tein) distance: The minimum number of edit operations necessary to transform one string into another. (Note: matches are not counted.) Example:
R D I M D MR D I M D M
1+ 1+ 1+ 0+ 1+ 0 = 4
![Page 63: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/63.jpg)
Once again,
Imbed the problem in the more general family;
Define the optimal value function;
Deduce the recurrence relation;
Solve for the recurrence relation to obtain the optimal policy function.
![Page 64: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/64.jpg)
The recurrence
Stage: position in the edit transcript;
State: I, D, M, or R;
Optimal value function: D(i, j)
where D(i, j) = edit distance of Seq1[1...i] and Seq2[1...j]
Recurrence relation:
D(i, j) = min {1 + D(i-1, j),1 + D(i, j-1), t(i, j) + D(i-1, j-1) } ,
where t(i, j) = 0 if Seq1(I) = Seq2(j), and =1 otherwise.
![Page 65: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/65.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0
M 1
A 2
T 3
H 4
S 5
![Page 66: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/66.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0
M 1
A 2
T 3
H 4
S 5
![Page 67: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/67.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1
M 1
A 2
T 3
H 4
S 5
![Page 68: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/68.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2
M 1
A 2
T 3
H 4
S 5
![Page 69: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/69.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1
A 2 2
T 3 3
H 4 4
S 5 5
![Page 70: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/70.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1
A 2 2
T 3 3
H 4 4
S 5 5
![Page 71: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/71.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2
A 2 2
T 3 3
H 4 4
S 5 5
![Page 72: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/72.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2 3 4
A 2 2 1 2 3 4
T 3 3
H 4 4
S 5 5
![Page 73: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/73.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2 3 4
A 2 2 1 2 3 4
T 3 3 2 2 2 3
H 4 4
S 5 5
![Page 74: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/74.jpg)
The tabulation , D(i, j)
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2 3 4
A 2 2 1 2 3 4
T 3 3 2 2 2 3
H 4 4 3 3 3 3
S 5 5 4 4 4 3
![Page 75: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/75.jpg)
The traceback
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2 3 4
A 2 2 1 2 3 4
T 3 3 2 2 2 3
H 4 4 3 3 3 3
S 5 5 4 4 4 3
![Page 76: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/76.jpg)
The solutions - #1
1 0 1 1 0 = 3
DD MM RR RR MM
M A T H S
- A R T S
![Page 77: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/77.jpg)
The traceback
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2 3 4
A 2 2 1 2 3 4
T 3 3 2 2 2 3
H 4 4 3 3 3 3
S 5 5 4 4 4 3
![Page 78: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/78.jpg)
The solutions - #2
1 0 1 0 1 0 = 3
DD MM II MM DD MM
M A - T H S
- A R T - S
![Page 79: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/79.jpg)
The traceback
Seq2(j) A R T S
Seq1(i) 0 1 2 3 4
0 0 1 2 3 4
M 1 1 1 2 3 4
A 2 2 1 2 3 4
T 3 3 2 2 2 3
H 4 4 3 3 3 3
S 5 5 4 4 4 3
![Page 80: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/80.jpg)
The solutions - #3
1 1 0 1 0 = 3
RR RR MM DD MM
M A T H S
A R T - S
![Page 81: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/81.jpg)
DP, in general (well, for a discrete, deterministic, additive process, anyway)
F(t, s) = Opt {r(t, s, x) + aF(t´, s´) : x in X(t, s) and s´ = T(t, s, x)}
Need not be additive. When a stochastic process, r and F are expected values; the state transform is random with a probability distribution
P[T(t, s, x) = s´ | s, x]’, and
F(t´, s´) is replaced by
∑s´ {F(t´, s´) P[T(t, s, x) = s´ | s, x]}
![Page 82: Pairwise Sequence Comparison](https://reader036.fdocuments.us/reader036/viewer/2022070417/568153f2550346895dc1f538/html5/thumbnails/82.jpg)
“Life must be lived forwards and understood backwards.”
Søren Kierkegaard