Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann
description
Transcript of Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann
Rolf BackofenRolf Backofen Danny Hermelin Danny Hermelin Gad M. Landau Gad M. LandauOren WeimannOren Weimann
RNA sequencesRNA sequencesC G
C G
G C
U A
A U
C G
C
A G U A G U
C C G U A G U A C C A C A G U G U G G
RNA sequencesRNA sequencesC G
C G
G C
U A
A U
C G
C
A G U A G U
C C G U A G U A C C A C A G U G U G G
RNA sequencesRNA sequencesC G
C G
G C
U A
A U
C G
C
A G U A G U
C C G U A G U A C C A C A G U G U G G
Alignment of StringsAlignment of Strings
Global Alignment: )(nmO
S1=
S2=
U C A C C G __ A __ G
U C G C G G U A U G
Alignment of RNA Alignment of RNA sequencessequences
A A G G C C C U G A U
A G A C C G U UA U
Alignment of RNA Alignment of RNA sequencessequences
A A G G C C C U G A U
A G A C C G U U U
Alignment of RNA Alignment of RNA sequencessequences
RNA Global Alignment via tree edit distance:
A A G G C C C U G A U
A G A C C G U U U
[K 1998]
)n(O)nm(O 422 [SZ 1989]
)nlogn(O)nlgnm(O 32
)n(O))1(lgnm(O 32 [DMRW 2006]
n
m
Theorem: All these algorithms compute the edit distance
between any two arcs provided we match these arcs.
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2 in which all arcs are deleted.
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
The Alignment graphThe Alignment graph
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Theorem: There is a one to one correspondence between HEAVIEST paths in the alignment graph and OPTIMAL alignments of substrings of R1 and R2.
The Local Alignment The Local Alignment algorithmsalgorithms
We use the alignment graph to We use the alignment graph to compute the local similarity between compute the local similarity between two RNA sequences according to two RNA sequences according to two well known metrics:two well known metrics: Smith-Waterman – the Smith-Waterman – the highest scoring
alignment between any pair of substrings of the input RNAs.
It’s normalized version. It’s normalized version.
Standard Local Similarity Standard Local Similarity (Smith-Waterman)(Smith-Waterman)
The score is computed The score is computed via dynamic program:via dynamic program:
Score(i,j) =Score(i,j) =
maxmax
U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Score(i’,j’) + Weight of the incoming edge from (i’,j’)Score(i’,j’) + Weight of the incoming edge from (i’,j’),,
00Time complexity:
O(mn) + one run of a global algorithm = 1))n(lgO(m2 nm
Normalized Local SimilarityNormalized Local Similarity The weakness of Smith Waterman approach The weakness of Smith Waterman approach
[AP 2001]:[AP 2001]:
Solution: look for the substrings (with Solution: look for the substrings (with their arcs) that maximize: their arcs) that maximize:
and some given value.and some given value.
|'R||'R|
)'R,'ED(R
21
21
'R,'R 21
)'R,'ED(R 21
Normalized Local Similarity Local Similarity
Again, dynamic program: Again, dynamic program: U C A C C G A G
U
C
G
C
G
G
U
A
U
G
Define Define Length(k,i,j)) to be the length of to be the length of the shortest path that ends at vertex the shortest path that ends at vertex (i,j) and has weight equal to k.(i,j) and has weight equal to k.
• The best The best k/Length(k,i,j) over all ) over all i,j,ki,j,k is the normalized score. is the normalized score.
Normalized Local SimilarityNormalized Local Similarity
Again, dynamic program: Again, dynamic program:
Define Define Length(k,i,j)Length(k,i,j) to be the length of to be the length of the shortest path that ends at vertex the shortest path that ends at vertex (i,j) and has weight equal to k.(i,j) and has weight equal to k.
For every k,i,j compute For every k,i,j compute Length(k,i,jLength(k,i,j)) = =
minmin Length(k-w,i’,j’)Length(k-w,i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’)
Length(k-w,i’,j’)
Length(k,i,j)
w
j’-j
i’-i
Time complexity:
+ one run of a global algorithm = m)O(n2
m)O(n1))n(lgO(mm)O(n 222 nm
Open ProblemsOpen Problems
Arc deletion:Arc deletion:
Improve global tree edit distanceImprove global tree edit distance
U C A C C G A G
U
C
G
C
G
G
U
A
U
G