Linear Sequence Alignment
description
Transcript of Linear Sequence Alignment
![Page 1: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/1.jpg)
1
Linear Sequence AlignmentLinear Sequence Alignment
Travis Hillenbrand
![Page 2: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/2.jpg)
2
Methods of ComparisonMethods of Comparison
Dot Matrix
Dynamic Programming Algorithm
Greedy X-drop Approach
Linear Alignment
![Page 3: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/3.jpg)
3
Dot Matrix MethodDot Matrix Method
http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html
![Page 4: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/4.jpg)
4
Sequence AlignmentSequence Alignment
ATCGATACG, ATGGATTACG
3 possibilities
Mismatch
…C…
…G…
Indel
…C…
…-…
Match
…C…
…C…|
![Page 5: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/5.jpg)
5
Global Pairwise AlignmentGlobal Pairwise Alignment
ATCGAT-ACG
ATGGATTACG
ATCGATACG, ATGGATTACG
|| ||| |||+1 +1 +1+1 +1 +1 +1 +1Matches: = +8
-1Mismatches: = -1-2Gaps: = -2
Total score = +5
![Page 6: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/6.jpg)
6
Dynamic Programming Dynamic Programming
0 - G A T C
- 0
G
A
C
Global alignment (Needleman-Wunsch) algorithm
![Page 7: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/7.jpg)
7
Dynamic Programming Dynamic Programming
0 - G A T C
- 0 -2 -4 -6 -8
G
A
C
Global alignment (Needleman-Wunsch) algorithm
![Page 8: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/8.jpg)
8
Dynamic Programming Dynamic Programming
0 - G A T C
- 0 -2 -4 -6 -8
G -2
A -4
C -6
Global alignment (Needleman-Wunsch) algorithm
![Page 9: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/9.jpg)
9
Dynamic Programming Dynamic Programming
+ MATCH + GAP
+ GAP
0 - G A T C
- 0 -2 -4 -6 -8
G -2
A -4
C -6
+1
Max= 1
Global alignment (Needleman-Wunsch) algorithm
![Page 10: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/10.jpg)
10
Dynamic Programming Dynamic Programming
0 - G A T C
- 0 -2 -4 -6 -8
G -2 1 -1 -3 -5
A -4 -1 2 0 -2
C -6 -3 0 1 1
Global alignment (Needleman-Wunsch) algorithm
![Page 11: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/11.jpg)
11
Dynamic Programming Dynamic Programming
- G A T C
- 0 -2 -4 -6 -8
G -2 1 -1 -3 -5
A -4 -1 2 0 -2
C -6 -3 0 1 1
GATC
GA-C
Global alignment (Needleman-Wunsch algorithm)
|| |
![Page 12: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/12.jpg)
12
Greedy X-drop Alignment Greedy X-drop Alignment
Aligns sequences that differ by sequencing errors
Works with measure of difference
Restricts indel penalty
Zhang et al. 2000
2
matmisindel
![Page 13: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/13.jpg)
13
Greedy X-drop Alignment Greedy X-drop Alignment
Zhang et al. 2000
![Page 14: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/14.jpg)
14
Greedy X-drop Alignment Greedy X-drop Alignment
CA 0G 0- 0
- G A T C
![Page 15: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/15.jpg)
15
Greedy X-drop Alignment Greedy X-drop Alignment
C 1 1 1A 0 1G 0- 0
- G A T C
X-drop condition saves computation
![Page 16: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/16.jpg)
16
Linear Alignment Linear Alignment
![Page 17: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/17.jpg)
17
Index of coincidence– Maximum number of matches between two sequences
– Ungapped alignment
Linear Alignment Linear Alignment
ATCGATACG
ATGGATTACG
ATCGATACG
ATGGATTACG
ATCGATACG
ATGGATTACG |
ATCGATACG
ATGGATTACG|| |||
ATCGATACG
ATGGATTACG …
![Page 18: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/18.jpg)
18
Attempt to increase similarity
Linear Alignment Linear Alignment
ATCGATACG
ATGGATTACG|| |||
-ATCGATACG
ATGGATTACG ||||
ATCGATACG
-ATGGATTACG |
Window score: 2 -3 -3
ATCGATACG
ATGGATTACG|| |||
![Page 19: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/19.jpg)
19
9 human/mouse homologous gene cds pairs retrieved (Jareborg et al. 1999)
Greedy alignment run firstmat=10, mis=-6, X=2200 (indel=-11)
Dynamic Programming and Linear alignment using truncated seqs
Comparison of alignments Comparison of alignments
![Page 20: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/20.jpg)
20
Similarity scores
Comparison of alignments Comparison of alignments
05000
1000015000200002500030000350004000045000
IOC Linear Greedy DynProg
Sco
re
AHSG
PANK3
PBX2
Protein C
Cyp21
CREB-RP
H2 TAP1
C4
notch4
![Page 21: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/21.jpg)
21
Similarity percentage
Comparison of alignments Comparison of alignments
0
20
40
60
80
100
IOC Linear Greedy DynProg
Sim
ilari
ty (
%)
AHSG
PANK3
PBX2
Protein C
Cyp21
CREB-RP
H2 TAP1
C4
notch4
![Page 22: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/22.jpg)
22
Comparison of alignments Comparison of alignments
1
10
100
1000
10000
100000
AHSG PANK3 PBX2 ProteinC
Cyp21 CREB-RP
H2TAP1
C4 notch4
Tim
e (m
s) Dyn Prog
Greedy
Linear
![Page 23: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/23.jpg)
23
Comparison of alignments Comparison of alignments
1
10
100
1000
10000
100000
AHSG PANK3 PBX2 ProteinC
Cyp21 CREB-RP
H2TAP1
C4 notch4
Tim
e (m
s) Dyn Prog
Greedy
Linear
![Page 24: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/24.jpg)
24
Comparison of alignments Comparison of alignments
0
10
20
30
40
50
60
70
80
90
100
w/ IOC w/o IOC
Sim
ilari
ty (
%)
PACAP
PANK3
CD4
PBX2
Protein C
AHSG
Cyp21
H2 TAP1
CREB-RP
C4
notch4
![Page 25: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/25.jpg)
25
Comparison of alignments Comparison of alignments
Maximum coincidence alignment: Offset -72 yielded 1642 matches of 2175 possible (75.4943% similarity), score 6611
ACAGTACTGCTACTTCTCGCCGACTGGGTGCTGCTCCGGACCGCGCTGCCCCGCATATTCTCCCTGCTGGTGCCCACCGCGCTGCCACTGCTCCGGGT
| | || | | | ||||||| | | | | | | || | || | | ||| |
ATGGCTGCGCACGTCTGGCTGGCGGCCGCCCTGCTCCTTCTGGTGGACTGGCTGCTGCTGCGGCCCATGCTCCCGGGAATCTTCTCCCTGTTGGTTCC
ACGGGCCGCCTCACTGACTGGATTCTACAAGATGGCTCAGCCGATACCTTCACTCGAAACTTAACTCTCATGTCCATTCTCACCATAGCCAGTGCAGT
||||||||| |||||||||||||||| || ||| ||| || |||||| || ||| || |||||||||||||||||||||||||| |||
ACGGGCCGCATCACTGACTGGATTCTTCAGGATAAGACAGTTCCTAGCTTCACCCGCAACATATGGCTCATGTCCATTCTCACCATAGCCAGCACAGC
Decreasing the gap penalty allows similar regions to be aligned without using IOC
![Page 26: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/26.jpg)
26
Comparison of alignments Comparison of alignments
References
Needleman, S. B. & Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 48: 443-453.
Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology. Pacific Grove, California: Brooks/Cole.
Zhang, Z.; Schwartz, S.; Wagner, L.; and Miller, W. 2000. A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7:203-214.
![Page 27: Linear Sequence Alignment](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d75550346895dbad1a2/html5/thumbnails/27.jpg)
27
Linear Sequence AlignmentLinear Sequence Alignment
Travis Hillenbrand