Dynamic Programming and Biological Sequence Comparison
-
Upload
remedios-carrillo -
Category
Documents
-
view
23 -
download
0
description
Transcript of Dynamic Programming and Biological Sequence Comparison
![Page 1: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/1.jpg)
Dynamic Programming and Biological Sequence Comparison
Part I
![Page 2: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/2.jpg)
\course\eleg667-01-f\Topic-2a.ppt 2
Topic II – Biological Sequence Alignment and Database Search
Part I (Topic-2a): Dynamic programming and Sequence comparison
Part II (Topic-2b): Heuristic and Database Search (e.g. FAST, BLAST) sequence alignment
Part III (Topic-2c): Multiple sequence alignment
![Page 3: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/3.jpg)
\course\eleg667-01-f\Topic-2a.ppt 3
Outline
Concept of alignment
Two algorithm design techniques;
Dynamic Programming: Examples
Applying DP to Sequence Comparison;
The database search problem
Heuristic algorithms to database search
![Page 4: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/4.jpg)
\course\eleg667-01-f\Topic-2a.ppt 4
Alignment
The two sequences will have the same length (after possible insertions of spaces on either or both of them)
No space in one sequence can be aligned with a space in the other
Spaces can be inserted at the beginning or end of the sequences
![Page 5: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/5.jpg)
\course\eleg667-01-f\Topic-2a.ppt 5
Biological Sequence Alignment and Database Search
1. We have two sequences over the same alphabet, both about the same length (tens of thousands of characters) and the sequences are almost equal. The average frequency of these differences is low, say, one each hundred characters. We want to find the places where the differences occur.
2. We have two sequences over the same alphabet with a few hundred characters each. We want to know whether there is a prefix of one which is similar to suffix of the other.
![Page 6: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/6.jpg)
\course\eleg667-01-f\Topic-2a.ppt 6
3. We have the same problem as in (2), but now we have several hundred sequences that must be compared (each one against all). In addition, we know that the great majority of sequence pairs are unrelated, that is, they will not have the required degree of similarity.
4. We have two sequences over the same alphabet with a few hundred characters each. We want to know whether there are two substrings, one from each sequence, that are similar.
5. We have the same problem as in (4), but instead of two sequences we have one sequence that must be compared to thousands of others.
(cont’d)
![Page 7: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/7.jpg)
\course\eleg667-01-f\Topic-2a.ppt 7
Breaking Problems Down:
Divide and Conquer: Starting with the complete instance of a problem, divide it into smaller subinstances, solve each of them recursively and combine the partial solutions into a solution to the original problem.
Dynamic Programming: Starting with the smallest subinstances of a problem, solve and combine them until the complete instance of the original problem is solved.
Two Related Algorithm Design Techniques
![Page 8: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/8.jpg)
\course\eleg667-01-f\Topic-2a.ppt 8
Divide and Conquer – Example 1
9 1 25 4 15 4 1 9 25 15
becomes
4 1
25 15 becomes
becomes 1
4 15 25
1 4 15 25
Quick Sort
![Page 9: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/9.jpg)
\course\eleg667-01-f\Topic-2a.ppt 9
Divide and Conquer – Example 2
The Fibonacci numbers
Fib(n){ if (n < 2) return 1; else return Fib(n-1)+Fib(n-2);}
F1 = 1, F2 = 1
Fn = Fn-1 + Fn-2
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …
![Page 10: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/10.jpg)
\course\eleg667-01-f\Topic-2a.ppt 10
Divide and Conquer – Example 2F1 = 1, F2 = 1
Fn = Fn-1 + Fn-2
F(7)
F(3)
+
F(2) F(1)
F(4)
+
F(2)
F(6)
+
F(3)
+
F(2) F(1)F(3)
+
F(2) F(1)
F(4)
+
F(2)
F(5)
+
+
F(3)
+
F(2) F(1)F(3)
+
F(2) F(1)
F(4)
+
F(2)
F(5)
+
n 1 2 3 4 5 6 7 8 9 10 11 …Fn 1 1 2 3 5 8 13 21 34 55 89 …
Fn / Fn-1 1.6 Fn 1.6n, n >> 1
T(n) #Internal_nodes = #leaves - 1but #leaves = Fn
T(n) = O(1.6n)Exponential
Time!
![Page 11: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/11.jpg)
\course\eleg667-01-f\Topic-2a.ppt 11
How to Compute Fib Function Using Dynamic Programming
Method?
![Page 12: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/12.jpg)
\course\eleg667-01-f\Topic-2a.ppt 12
Dynamic Programming–Example 1
Fib(n) { int tab[n];
tab[1] = 1; tab[2] = 1; for (j = 3; j <= n; j++) tab[j]=tab[j-1] + tab[j-2]; return tab[n];}
Start by solving thesmallest problems
Use the partial solutions to solvebigger and bigger problems
Extra memory to store intermediate values
1
1
2
3
5
8
13
21
34
55
89
….
tab
LinearTime!T(n) = O(n) Space-Time Tradeoff
![Page 13: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/13.jpg)
\course\eleg667-01-f\Topic-2a.ppt 13
Sequence Comparison
Molecular sequence data are at the heart of Computational Biology
DNA sequences RNA sequences Protein sequences
We can think of these sequences as strings of letters DNA & RNA: alphabet of 4 letters (A,T,C,G) Protein: alphabet of 20 letters
code full nameA alanineC cysteineD aspartateE glutamateF phenylalanineG glycineH histidineI isoleucineK lysineL leucineM methionineN aspartamineP prolineQ glutamineR arginineS serineT threonineV valineW tryptophanY tyrosine
![Page 14: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/14.jpg)
\course\eleg667-01-f\Topic-2a.ppt 14
Sequence Comparison – (Cont.)
Why compare sequences? Find similar genes/proteins
Allows to predict function & structure
Locate common subsequences in genes/proteins Identify common recurrent patterns
Locate sequences that might overlap Help in sequence assembly
![Page 15: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/15.jpg)
\course\eleg667-01-f\Topic-2a.ppt 15
Sequence X = A T A A G T
Sequence Y = A T G C A G T
To compare the sequences we need to quantify the similariy
matches = 1mismatches = 0
Score 1 1 0 0 0 0 0
Total = 2
Sequence Comparison – (Cont.)
![Page 16: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/16.jpg)
\course\eleg667-01-f\Topic-2a.ppt 16
Sequence Y = A T G C A G T
Sequence X = A T A A G T
Sequence Comparison – (Cont.)
Sequence X = A T A A G T
Taking positions of the letters into account
matches = 1mismatches = 0
Score 0 0 0 0 1 1 1
Total = 3
![Page 17: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/17.jpg)
\course\eleg667-01-f\Topic-2a.ppt 17
Sequence Y = A T G C A G T
Sequence X = A T A A G T
Sequence Comparison – (Cont.)
Sequence X = A T A - A G T
How to take possible mutations into account?
matches = 1mismatches = 0gap = -1
Score 1 1 0 –1 1 1 1
Total = 4
matches = 1mismatches = 0
![Page 18: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/18.jpg)
\course\eleg667-01-f\Topic-2a.ppt 18
Applying DP to Sequence ComparisonSequence X = GASequence Y = AG
G -
-A
G - - A
GA
- GA -
GA - -
- -AG
GA - - - A
GA- A
G - A - A -
G - -- AG
GAA -
G -AG
- GAA - -
- G -A -G
- GAG
- - GAG -
GA - - - - AG
GA -- AG
G - A - AG
G - A - - A -G
G - - A- AG -
GA -A -G
GAAG
G - AAG -
- GA -A - -G
- GAA -G
- G - AA -G -
- GAAG -
- - GAAG - -
scores
-1 -1
-2 -2 0 -2 -2
-3 0 -3 -3 -1 -1 -3 -3 0 -3
-4 -1 -4 -2 -4 -2 0 -2 -4 -2 -4 -1 -4
T(n,n) = O(kn)
ExponentialTime!
choose the best score, i.e max(-2, 0, -2)choose the best score, i.e max(-3, 0, -1)choose the best score, i.e max(-1, 0, -3)choose the best score, i.e max(-1, 0, -1)total score = 0
![Page 19: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/19.jpg)
\course\eleg667-01-f\Topic-2a.ppt 19
G A
A
G
Applying DP to Sequence ComparisonSequence X = GASequence Y = AG
G -
-A
G - - A
GA
- GA -
GA - -
- -AG
GA - - - A
GA- A
G - A - A -
G - -- AG
GAA -
G -AG
- GAA - -
- G -A -G
- GAG
- - GAG -
GA - - - - AG
GA -- AG
G - A - AG
G - A - - A -G
G - - A- AG -
GA -A -G
GAAG
G - AAG -
- GA -A - -G
- GAA -G
- G - AA -G -
- GAAG -
- - GAAG - -
-1 -1
-2 -2 0 -2 -2
-3 0 -3 -3 -1 -1 -3 -3 0 -3
-4 -1 -4 -2 -4 -2 0 -2 -4 -2 -4 -1 -4
0
0 -1 -2
-2
-1
0 0
0
T(n,n) = O(n2)
PolynomialTime!
![Page 20: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/20.jpg)
\course\eleg667-01-f\Topic-2a.ppt 20
Questions
Queston: when DP comparison ends – how many possible distinct paths have been explored in total for this example?
Answer: Let us count Total = 13
G A 0 -1 -2
A -1 0 0
G -2 0 0
3 5 7
1 2 4
6 8 9
Question: from 1 to 9 how many paths?
1
3 5 2
86
9 9 9 9 9 99
9 9 9
9 9 9
8 7
8 78
5
5
8 7
477
![Page 21: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/21.jpg)
\course\eleg667-01-f\Topic-2a.ppt 21
DP algorithm for Sequence Comparison
int S[m,n]
m = length(X)n = length(Y)for i = 0 to m do S[i,0] = i . gfor j = 0 to n do S[j,0] = j . gfor i = 1 to m do for j = 1 to n do S[i,j] = max( S[i-1,j]+g, S[i-1,j-1]+sb[i,j], S[i,j-1]+g )return S[m,n]
sb[i,j] - Substitution Matrix
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
A T C G
A
T
C
G
Start by solving thesmallest problems
Extra memory to store intermediate values
Use the partial solutions to solve bigger and
bigger problems
![Page 22: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/22.jpg)
\course\eleg667-01-f\Topic-2a.ppt 22
The Substitution Matrix
For DNA we usually use identity matrices;
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
A T C G
A
T
C
G
For proteins more sensitive matrices, derived empirically, are used;
A B C D E F G H I K L M N P Q R S T V W Y Z
A 2 0 -2 0 0 -4 1 -1 -1 -1 -2 -1 0 1 0 -2 1 1 0 -6 -3 0 B 0 2 -4 3 2 -5 0 1 -2 1 -3 -2 2 -1 1 -1 0 0 -2 -5 -3 2 C -2 -4 12 -5 -5 -4 -3 -3 -2 -5 -6 -5 -4 -3 -5 -4 0 -2 -2 -8 0 -5 D 0 3 -5 4 3 -6 1 1 -2 0 -4 -3 2 -1 2 -1 0 0 -2 -7 -4 3 E 0 2 -5 3 4 -5 0 1 -2 0 -3 -2 1 -1 2 -1 0 0 -2 -7 -4 3 F -4 -5 -4 -6 -5 9 -5 -2 1 -5 2 0 -4 -5 -5 -4 -3 -3 -1 0 7 -5 G 1 0 -3 1 0 -5 5 -2 -3 -2 -4 -3 0 -1 -1 -3 1 0 -1 -7 -5 -1 H -1 1 -3 1 1 -2 -2 6 -2 0 -2 -2 2 0 3 2 -1 -1 -2 -3 0 2 I -1 -2 -2 -2 -2 1 -3 -2 5 -2 2 2 -2 -2 -2 -2 -1 0 4 -5 -1 -2 K -1 1 -5 0 0 -5 -2 0 -2 5 -3 0 1 -1 1 3 0 0 -2 -3 -4 0 L -2 -3 -6 -4 -3 2 -4 -2 2 -3 6 4 -3 -3 -2 -3 -3 -2 2 -2 -1 -3 M -1 -2 -5 -3 -2 0 -3 -2 2 0 4 6 -2 -2 -1 0 -2 -1 2 -4 -2 -2 N 0 2 -4 2 1 -4 0 2 -2 1 -3 -2 2 -1 1 0 1 0 -2 -4 -2 1 P 1 -1 -3 -1 -1 -5 -1 0 -2 -1 -3 -2 -1 6 0 0 1 0 -1 -6 -5 0 Q 0 1 -5 2 2 -5 -1 3 -2 1 -2 -1 1 0 4 1 -1 -1 -2 -5 -4 3 R -2 -1 -4 -1 -1 -4 -3 2 -2 3 -3 0 0 0 1 6 0 -1 -2 2 -4 0 S 1 0 0 0 0 -3 1 -1 -1 0 -3 -2 1 1 -1 0 2 1 -1 -2 -3 0 T 1 0 -2 0 0 -3 0 -1 0 0 -2 -1 0 0 -1 -1 1 3 0 -5 -3 -1 V 0 -2 -2 -2 -2 -1 -1 -2 4 -2 2 2 -2 -1 -2 -2 -1 0 4 -6 -2 -2 W -6 -5 -8 -7 -7 0 -7 -3 -5 -3 -2 -4 -4 -6 -5 2 -2 -5 -6 17 0 -6 Y -3 -3 0 -4 -4 7 -5 0 -1 -4 -1 -2 -2 -5 -4 -4 -3 -3 -2 0 10 -4 Z 0 2 -5 3 3 -5 -1 2 -2 0 -3 -2 1 0 3 0 0 -1 -2 -6 -4 3
![Page 23: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/23.jpg)
\course\eleg667-01-f\Topic-2a.ppt 23
Sequence Comparison revisited
A T G C A G T
A
T
A
A
G
T
-1 -2 -3 -4 -5
0 2 1 0 -1 -2 -3
-1 1 2 1 1 0 -1
-2 0 1 2 2 1 0
-3 -1 1 1 2 3 2
0 -1 -2 -3
-1
-2
-3
-4 -5 -6
-4
-5
-7
-6 -4 -2 0 1 1 2 4
Similarity Matrix
int S[m,n]
m = length(X)n = length(Y)for i = 0 to m do S[i,0] = i . gfor j = 0 to n do S[j,0] = j . gfor i = 1 to m do for j = 1 to n do S[i,j] = max( S[i-1,j]+g, S[i-1,j-1]+sb[i,j], S[i,j-1]+g )return S[m,n]
1
1-1 + (-1) 0 + (+1)-1 + (-1)
0
0-2 + (-1)-1 + ( 0 ) 1 + (-1)
-1-3 + (-1)-2 + ( 0 ) 0 + (-1)
-2-4 + (-1)-3 + ( 0 ) -1 + (-1)
-3-5 + (-1)-4 + (+1)-2 + (-1)
-5-7 + (-1)-6 + ( 0 )-4 + (-1)
-4-6 + (-1)-5 + ( 0 )-3 + (-1)
![Page 24: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/24.jpg)
\course\eleg667-01-f\Topic-2a.ppt 24
What To Do Next?
Answer: Finding alignments
But, How?
![Page 25: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/25.jpg)
\course\eleg667-01-f\Topic-2a.ppt 25
Finding the Alignment(s)
A T G C A G T
A
T
A
A
G
T
1 0 -1 -2 -3 -4 -5
0 2 1 0 -1 -2 -3
-1 1 2 1 1 0 -1
-2 0 1 2 2 1 0
-3 -1 1 1 2 3 2
0 -1 -2 -3
-1
-2
-3
-4 -5 -6
-4
-5
-7
-6 -4 -2 0 1 1 2 4
Similarity Matrix
42 + (-1)3 + (+1)2 + (-1)
TT
31 + (-1)2 + (+1)2 + (-1)
G TG T
21 + (-1)1 + (+1)2 + (-1)
A G TA G T
10 + (-1)1 + ( 0 )2 + (-1)
C A G TA A G T
C A G T - A G T
1-1 + (-1)0 + ( 0 )2 + (-1)
G C A G T - A A G T
1-1 + (-1)0 + (+1)-1 + (-1)
21 + (-1)2 + ( 0 )1 + (-1)
G C A G TA - A G T
20 + (-1)1 + (+1)0 + (-1)
T G C A G TT - A A G T
T G C A G TT A - A G T
A T G C A G TA T A - A G T
A T G C A G TA T - A A G T
Global Alignments
![Page 26: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/26.jpg)
\course\eleg667-01-f\Topic-2a.ppt 26
How to Break a Tie?
Should one report all?
Or, report only one?
![Page 27: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/27.jpg)
\course\eleg667-01-f\Topic-2a.ppt 27
Advantage of DP Alignment Algorithms
Build up the solution by determining all similarities between arbitrary prefixes of the two sequences
Starting with the shorter prefixes and use previously computed results to solve for larger prefixes
![Page 28: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/28.jpg)
\course\eleg667-01-f\Topic-2a.ppt 28
The Complexity of the DP Alignment Algorithm?
Find an optimal alignment
O (m + n)
Construction of the similarity matrix:
O (m • n)
![Page 29: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/29.jpg)
\course\eleg667-01-f\Topic-2a.ppt 29
Global versus Local Alignments
A global alignment attempts to match all of one sequence against all of another
LGPSTKQFGKGSSSRIWDN| |||| | | LNQIERSFGKGAIMRLGDA
A local alignment attempts to match subsequences of the two sequences;
-------FGKG-------- |||| -------FGKG--------
![Page 30: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/30.jpg)
\course\eleg667-01-f\Topic-2a.ppt 30
How to Compute Local Alignment?
![Page 31: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/31.jpg)
\course\eleg667-01-f\Topic-2a.ppt 31
Applying DP to Local Alignment
Similarity Matrix Computation:
a[i,j-1]+g
a[i,j]= max a[i-1,j-1]+sb(i,j)
a[i-1,j]+g
0
0
0
0
0 0 0 0 0
..
..
a[i,0]= 0 ; for i= 0…m
a[0,j]= 0 ; for j= 0…n
If the best alignment up to somepoint has a negative score, it’s better to start a new one, rather
than extend the old one.
Don’t penalize gaps on leftand right ends!
![Page 32: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/32.jpg)
\course\eleg667-01-f\Topic-2a.ppt 32
Criteria of Finding a Local Alignment
Find the entries with maximum values in the simularity matrix
For each of such entries, construct an local alignment
See next example
We may also be interested in near-optimal alignments
![Page 33: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/33.jpg)
\course\eleg667-01-f\Topic-2a.ppt 33
A T G C A G T
A
T
A
A
G
T
1 0 0 0 1 0 0
0 2 1 0 0 1 1
1 1 2 1 1 0 1
1 1 1 2 2 1 0
0 0 2 1 2 3 2
0 0 0 0
0
0
0
0 0 0
0
0
0
0 0 1 1 2 1 2 4
Similarity Matrix
Similarity Matrix Computation:
a[i,j-1]+g
a[i,j]= max a[i-1,j-1]+sb(i,j)
a[i-1,j]+g
0
A T G C A G TA T - A A G T
A T G C A G TA T A - A G T
A T G CA A G T
Applying DP to Local Alignment
![Page 34: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/34.jpg)
\course\eleg667-01-f\Topic-2a.ppt 34
Local Alignment using DPT G A T G G A G G T
G
A
T
A
G
G
0 1 0 0 1 1 0 1 1 0
0 0 0 0
0
0
0
0 0 0
0
0
0
0
0 0 0
0 0 2 0 0 0 2 0 0 0
1 0 0 3 1 0 0 1 0 1
0 0 1 1 2 0 1 0 0 0
0 1 0 0 2 3 1 2 1 0
0 1 0 0 1 3 1 2 3 1
0
0 + (-2)0 + (-1)0 + (-2)0
1
0 + (-2)0 + (+1)0 + (-2)0
T G A T G G A G G T A G G
a[i,j-1]+g
a[i-1,j-1]+sb(i,j)
a[i-1,j]+g
0
a[i,j]= max
1 -1 -1 -1
-1 1 -1 -1
-1 -1 1 -1
-1 -1 -1 1
A T C G
A
T
C
G
g = -2 T G A T - G G A G G T G A T A G G
T G A T G G A G G T G A T A G
T G A T G G A G G T G A T
![Page 35: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/35.jpg)
\course\eleg667-01-f\Topic-2a.ppt 35
How to Break a Tie?
Should one report all?
Or, report only one?
![Page 36: Dynamic Programming and Biological Sequence Comparison](https://reader031.fdocuments.us/reader031/viewer/2022013004/5681351a550346895d9c7124/html5/thumbnails/36.jpg)
\course\eleg667-01-f\Topic-2a.ppt 36
Extension to the Basic DP Method
Improving space complexity Introduce general gap functions
That is, the probability of a sequence of consecutive spaces is more likely than individual spaces
Affine gap functions: w(k) = h + gk