SMAWK. REVISE Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity:...
-
Upload
alicia-thompson -
Category
Documents
-
view
214 -
download
0
Transcript of SMAWK. REVISE Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity:...
SMAWK
REVISE
Global alignment (Revise)
ag
a
g
c
a
t
c
agcagcaa 31
1
2
3
5
4 65 7 80
7
6
8
2
4
Alignment graph for S = aacgacga, T = ctacgaga
Complexity: O(n2)
V(i,j) = max {V(i-1,j-1) + (S[i], T[j]),V(i-1,j) + (S[i], -),V(i,j-1) + (-, T[j])
}
DIST and OUT matrix (Revise)
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix OUT matrixI (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
0 1 2 3 4 5
1 0 -1 -2 - -
1 1 0 1 -1 -
1 3 3 4 2 0
-12 0 0 2 0 0
-13 -13 -1 1 0 0
-14 -14 -14 1 2 3
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
max col
Compute O without explicit OUT
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix I (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
SMAWK
• Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays.
• Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d1. Convex condition:
M[a,c]M[b,c]M[a,d]M[b,d].2. Concave condition:
M[a,c]M[b,c]M[a,d]M[b,d].
SMAWK
• Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find
all row and column maxima of a totally monotone matrixby querying only O(n) elements of the matrix.
Presentation Outline
• What is Monge arrays?– Monge Totally monotone
• Why DIST alignment matrix is Monge arrays?
• How to compute totally monotone arrays efficiently?– SMAWK
• Given a totally monotone arrays• Compute all columns maxima in O(n)
MONGE AND TOTALLY MONOTONE PROPERTIES
Monge
• A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1. M[a, c] + M[b, d] M[a, d] + M[b, c]2. M[a, c] + M[b, d] M[a, d] + M[b, c]
c d z
a M[a,c] M[a,d] …
b M[b,c] M[b,d]x … …
Totally monotone
• A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1. Convex condition:
M[a,c]M[b,c] M[a,d]M[b,d]2. Concave condition:
M[a,c]M[b,c] M[a,d]M[b,d]• Monge Totally monotone
c d z
a M[a,c] M[a,d] …
b M[b,c] M[b,d]x … …
Intuition
• Monge: Quadrangle inequality:
a
cb
d
xz
c d z
a M[a,c] M[a,d] …
b M[b,c] M[b,d]
x … …
M[a, c] + M[b, d] M[a, d] + M[b, c]
History
• Computational Geometry• All nearest neighbor problem– Shamos and Hoey proved (n log n) in 1975
• All farthest neighbor problem– F.P.Reparata proved (n log n) in 1977
• All farthest neighbor problem in convex polygon– Lee and Preparata proved O(n) in 1978
SMAWK
• Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987
• Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find
all row and column maxima of a totally monotone matrixby querying only O(n) elements of the matrix.
DIST AND OUT MATRICES
• Assumption– row and column maxima of a
totally monotone matrixcan be computed in O(n)
• Why DIST and OUT matrices of the alignment problem is totally monotone?
DIST and OUT matrix (Revise)
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix OUT matrixI (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
0 1 2 3 4 5
1 0 -1 -2 - -
1 1 0 1 -1 -
1 3 3 4 2 0
-12 0 0 2 0 0
-13 -13 -1 1 0 0
-14 -14 -14 1 2 3
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
max col
Compute O without explicit OUT
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix I (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
SMAWK
DIST is Monge
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST is Monge array
• Monge• M[a, c] + M[b, d] M[a, d] + M[b, c]
• Totally monotone by Concave condition:• M[a,c]M[b,c] M[a,d]M[b,d]
Comment on this approach
• Advantages– Easy to parallelize– Easy to combine
• Disadvantages– Need to compute/keep more information
Applications
• Parallel sequence alignment– O(log m log n) time – Using O(m n / log m) processors (CREW PRAM)
• Best non-overlapping alignment score– O(n2 log2 n) time
• Tandem approximate repeat– O(n2 log n) time
• Common Substring Alignment
SMAWK
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
[a b][c d]
Find all column mimimas of the following totally monotone arrays
b < d a < cb = d a c
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
[a b][c d]
a > c b > da = c b d
Find all column mimimas of the following totally monotone arrays
b < d a < cb = d a c
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
[a b][c d]
a > c b > da = c b d
b < d a < cb = d a c
Observation 1
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
[a b][c d]
a > c b > da = c b d
Observation 2
b < d a < cb = d a c
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 151
2 21 35 48 65 76 85 105 123 130
3 13 26 35 51 58 67 86 100 104
4 10 20 28 42 48 56 75 86 88
5 20 29 33 44 49 55 73 82 80
6 13 21 24 35 39 44 59 65 59
7 19 25 28 38 42 44 57 61 52
8 35 37 40 48 48 49 62 62 49
9 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 29
11 58 56 54 55 47 41 50 47 29
12 66 64 61 61 51 44 52 45 24
13 82 76 72 70 56 49 55 46 23
14 99 91 83 80 63 56 59 46 20
15 124 116 107 100 80 71 72 58 28
16 133 125 113 106 86 75 74 59 25
17 156 146 131 120 97 84 80 65 31
18 178 164 146 135 110 96 92 73 39
[a b][c d]
a > c b > da = c b d
• SMAWK is a recursive algorithm of 2 steps– REDUCE– INTERPOLATE
b < d a < cb = d a c
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 151
2 21 35 48 65 76 85 105 123 130
3 13 26 35 51 58 67 86 100 104
4 10 20 28 42 48 56 75 86 88
5 20 29 33 44 49 55 73 82 80
6 13 21 24 35 39 44 59 65 59
7 19 25 28 38 42 44 57 61 52
8 35 37 40 48 48 49 62 62 49
9 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 29
11 58 56 54 55 47 41 50 47 29
12 66 64 61 61 51 44 52 45 24
13 82 76 72 70 56 49 55 46 23
14 99 91 83 80 63 56 59 46 20
15 124 116 107 100 80 71 72 58 28
16 133 125 113 106 86 75 74 59 25
17 156 146 131 120 97 84 80 65 31
18 178 164 146 135 110 96 92 73 39
[a b][c d]
a > c b > da = c b d
• SMAWK is a recursive algorithm of 2 steps– REDUCE– INTERPOLATE
• REDUCE removes rows• INTERPOLATE removes
half of the columns
b < d a < cb = d a c
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9 37 36 37 42 39 39 51 50 3710 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9 42 39 39 51 50 3710 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015
16 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015
16 2517
18
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015
16 2517
18
REDUCE
0 1 2 3 4 5 6 7 8 9
1
2
3
4 10 20 28 42 48 56 75 86 885
6 21 24 35 39 44 59 65 597 28 38 42 44 57 61 528
9
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2015
16 2517
18
REDUCE
0 1 2 3 4 5 6 7 8 9
4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25
REDUCE
0 1 2 3 4 5 6 7 8 9
4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25
INTERPOLATE
Remove all odd indexed colums
0 2 4 6 8
4 20 42 56 866 21 35 44 657 38 44 61
10 42 33 4311 41 4712 44 4513 4614 4616
INTERPOLATE
0 2 4 6 8
4 20 42 56 866 21 35 44 657 38 44 61
10 42 33 4311 41 4712 44 4513 4614 4616
RECURSIVE
Find all row minima
0 1 2 3 4 5 6 7 8 9
4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25
0 1 2 3 4 5 6 7 8 9
4 10 20 28 42 48 56 75 86 886 21 24 35 39 44 59 65 597 28 38 42 44 57 61 52
10 42 35 33 44 43 2911 47 41 50 47 2912 44 52 45 2413 55 46 2314 46 2016 25
0 1 2 3 4 5 6 7 8 9
4 10 20 286 24 35 397 42
10 35 33 44 43 2911 2912 2413 2314 2016 25
0 1 2 3 4 5 6 7 8 9
4 10 20 286 24 35 397 42
10 35 33 44 43 2911 2912 2413 2314 2016 25
0 1 2 3 4 5 6 7 8 9
1 25 42 57 78 90 103 123 142 1512 21 35 48 65 76 85 105 123 1303 13 26 35 51 58 67 86 100 1044 10 20 28 42 48 56 75 86 885 20 29 33 44 49 55 73 82 806 13 21 24 35 39 44 59 65 597 19 25 28 38 42 44 57 61 528 35 37 40 48 48 49 62 62 499 37 36 37 42 39 39 51 50 37
10 41 39 37 42 35 33 44 43 2911 58 56 54 55 47 41 50 47 2912 66 64 61 61 51 44 52 45 2413 82 76 72 70 56 49 55 46 2314 99 91 83 80 63 56 59 46 2015 124 116 107 100 80 71 72 58 2816 133 125 113 106 86 75 74 59 2517 156 146 131 120 97 84 80 65 3118 178 164 146 135 110 96 92 73 39
APPROXIMATE TANDEM REPEATApplication of DIST and SMAWK
Tandem repeat
• IRQI QLWLR QIWIR LRQL
Social City
Observation
• Approximate tandem repeat– With the Mid-point c
– Alignments• start at column c• end at row c
c
c
0 n
n
• 4 cases– Cross column n/2– Cross row n/2– In side sub-triangle
[0,n/2]– In side sub-triangle
[n/2,n]
Algorithm
1. Find all repeats that cross– row n/2– column n/2
2. Recursively solve the – sub-array
[0..n/2, 0..n/2]– sub-array
[n/2..n, n/2..n]
c10n/2c2
c1
c2
c3
c3
n/2
Cross column n/2
• Combine– Best path from column c
to (k,n/2)– Best path from (k,n/2) to
row c
c
c
0 n
n
n/2
Cross column n/2
• Sub-problems:– DIST_col(c,n/2)[i,j]
– DIST_row(c,n/2)[i,j]
c10n/2c2
c1
c2
Cross column n/2
• DIST_col(c,n/2)[i,j] : O(n3) words
• Encode in array of binary trees • Using O(n2 log n) words • B[j,c] is a binary tree • B[j,c](i) is a leaf of the tree • Read an entry of DIST_col(c,n/2)[i,j] in O(log n)
c10n/2c2
c1
c2
Algorithm1. Find all repeats O(n2 logn)
– cross row n/2– column n/2
1. Recursively solve the – sub-array
[0..n/2, 0..n/2]– sub-array
[n/2..n, n/2..n]
c10n/2c2
c1
c2
c3
c3
n/2
References
• Aggarwal, A. and Park, J. Notes on Searching in Multidimensional Monotone Arrays. IEEE
• Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM.
• Lawrence L. Larmore. The SMAWK Algorithm. UNLV.• Apostolico, A. and Atallah, M.J. and Larmore, L.L. and
McFaddin, S.. Efficient Parallel Algorithms for String Editing and Related Problems. SIAM J. Comput.
• Landau, G.M. and Ziv-Ukelson, M. On the Common Substring Alignment Problem. J. of Algorithms