Sequ en ce An alysis: Part I. Pairwise alignment and...
Transcript of Sequ en ce An alysis: Part I. Pairwise alignment and...
![Page 1: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/1.jpg)
Bioinformatics
Sequence Analysis: Part I. Pairwisealignment and database searching
Fran Lewitter, Ph.D.Head, BiocomputingWhitehead Institute
![Page 2: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/2.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 2
Bioinformatics Definitions“The use of computational methods to make biologicaldiscoveries.”
Fran Lewitter
![Page 3: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/3.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 2
Bioinformatics Definitions“The use of computational methods to make biologicaldiscoveries.”
Fran Lewitter
“An interdisciplinary field involving biology, computerscience, mathematics, and statistics to analyze biologicalsequence data, genome content, and arrangement, and topredict the function and structure of macromolecules.”
David Mount
![Page 4: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/4.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 3
Course SyllabusJan 7 Sequence Analysis I. Pairwise alignments, database searching including
BLAST (FL) [1, 2, 3]Jan 14 Sequence Analysis II. Database searching (continued), Pattern searching(FL)[7]Jan 21 No Class - Martin Luther King HolidayJan 28 Sequence Analysis III. Hidden Markov models, gene finding algorithms (FL)[8]
Feb 4 Computational Methods I. Genomic Resources and Unix (GB)Feb 11 Computational Methods II. Sequence analysis with Perl. (GB)Feb 18 No Class - President's BirthdayFeb 25 Computational Methods III. Sequence analysis with Perl and BioPerl (GB)
Mar 4 Proteins I. Multiple sequence alignments, phylogenetic trees (RL) [4, 6]Mar 11 Proteins II. Profile searches of databases, revealing protein motifs (RL) [9]Mar 18 Proteins III.Structural Genomics:structural comparisons and predictions (RL)
Mar 25 Microarrays: designing chips, clustering methods (FL)
![Page 5: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/5.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 4
Course Information
• Lectures• Text book• Supplemental reading• Homework• Course project• Office hours - M 2-4, W 2-4• http://fladda.wi.mit.edu/bioinfo• Mailing list, [email protected], Subject: course
![Page 6: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/6.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 5
Topics to Cover
• Introduction• Scoring alignments• Alignment methods• Significance of alignments• Database searching methods• Demo
![Page 7: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/7.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 6
Topics to Cover
• Introduction– Why do alignments?– A bit of history– Definitions
• Scoring alignments• Alignment methods• Significance of alignments• Database searching methods• Demo
![Page 8: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/8.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 7
Doolittle RF, Hunkapiller MW, Hood LE, DevareSG, Robbins KC, Aaronson SA, Antoniades
HN. Science 221:275-277, 1983.
Simian sarcoma virus onc gene, v-sis, is derivedfrom the gene (or genes) encoding a platelet-derived
growth factor.
![Page 9: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/9.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 8
Cancer Gene Found
Homology to bacterial and yeast genes shed newlight on human disease process
![Page 10: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/10.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 9
Evolutionary Basis of SequenceAlignment
• Similarity - observable quantity, such as percent identity
• Homology - conclusion drawn from datathat two genes share a commonevolutionary history; no metric is associatedwith this
![Page 11: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/11.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 10
Some Definitions
An alignment is a mutual arrangement of twosequences, which exhibits where the twosequences are similar, and where they differ.An optimal alignment is one that exhibits themost correspondences and the leastdifferences. It is the alignment with thehighest score. May or may not bebiologically meaningful.
![Page 12: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/12.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 11
Alignment Methods
• Global alignment - Needleman-Wunsch(1970) maximizes the number of matchesbetween the sequences along the entirelength of the sequences.
• Local alignment - Smith-Waterman (1981)is a modification of the dynamicprogramming algorithm gives the highestscoring local match between two sequences.
![Page 13: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/13.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 12
Alignment Methods
Global vs Local
Modular proteins
Fn2 EGF Fn1 EGF Kringle CatalyticF12
EGFFn1 Kringle CatalyticKringlePLAT
![Page 14: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/14.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 13
Database Searching Methods:Local Alignments
FASTA
Smith-Waterman
Gapped BLAST
OriginalBLAST
SENSITIVITY
SPEE
D
OriginalFASTA
![Page 15: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/15.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 14
Possible AlignmentsA: T C A G A C G A G T GB: T C G G A G C T G
![Page 16: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/16.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 14
Possible AlignmentsA: T C A G A C G A G T GB: T C G G A G C T G
I. T C A G A C G A G T GT C G G A - - G C T G
![Page 17: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/17.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 14
Possible AlignmentsA: T C A G A C G A G T GB: T C G G A G C T G
I. T C A G A C G A G T GT C G G A - - G C T G
II. T C A G A C G A G T G
T C G G A - G C - T G
![Page 18: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/18.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 14
Possible AlignmentsA: T C A G A C G A G T GB: T C G G A G C T G
I. T C A G A C G A G T GT C G G A - - G C T G
II. T C A G A C G A G T G
T C G G A - G C - T G III. T C A G A C G A G T G
T C G G A - G - C T G
![Page 19: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/19.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 15
Topics to Cover
• Introduction• Scoring alignments
– Nucleotide vs Proteins• Alignment methods• Significance of alignments• Database searching methods• Demo
![Page 20: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/20.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 16
Amino Acid SubstitutionMatrices
PAM - point accepted mutation based onglobal alignment [evolutionary model]
BLOSUM - block substitutions based on localalignments [similarity among conservedsequences]
![Page 21: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/21.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 17
Substitution Matrices
BLOSUM 30
BLOSUM 62
BLOSUM 80
% identity
PAM 250 (80)
PAM 120 (66)
PAM 90 (50)
% change
Lesschange
![Page 22: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/22.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 18
Part of BLOSUM 62 MatrixC S T P A G N
C 9 S -1 4T -1 1 5P -3 -1 -1 7A 0 1 0 -1 4G -3 0 -2 -2 0 6N -3 1 0 -2 -2 0
Log-odds = obs freq of aa substitutions freq expected by chance
![Page 23: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/23.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 18
Part of BLOSUM 62 MatrixC S T P A G N
C 9 S -1 4T -1 1 5P -3 -1 -1 7A 0 1 0 -1 4G -3 0 -2 -2 0 6N -3 1 0 -2 -2 0
Log-odds = obs freq of aa substitutions freq expected by chance
![Page 24: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/24.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 18
Part of BLOSUM 62 MatrixC S T P A G N
C 9 S -1 4T -1 1 5P -3 -1 -1 7A 0 1 0 -1 4G -3 0 -2 -2 0 6N -3 1 0 -2 -2 0
Log-odds = obs freq of aa substitutions freq expected by chance
![Page 25: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/25.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 18
Part of BLOSUM 62 MatrixC S T P A G N
C 9 S -1 4T -1 1 5P -3 -1 -1 7A 0 1 0 -1 4G -3 0 -2 -2 0 6N -3 1 0 -2 -2 0
Log-odds = obs freq of aa substitutions freq expected by chance
![Page 26: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/26.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 19
Part of PAM 250 MatrixC S T P A G N
C 1 2 S 0 2T -2 1 3P -3 1 0 6A -2 1 1 1 2G -3 1 0 -1 1 5N -4 1 0 -1 0 0
Log-odds = pair in homologous proteins pair in unrelated proteins by chance
![Page 27: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/27.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 19
Part of PAM 250 MatrixC S T P A G N
C 1 2 S 0 2T -2 1 3P -3 1 0 6A -2 1 1 1 2G -3 1 0 -1 1 5N -4 1 0 -1 0 0
Log-odds = pair in homologous proteins pair in unrelated proteins by chance
![Page 28: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/28.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 19
Part of PAM 250 MatrixC S T P A G N
C 1 2 S 0 2T -2 1 3P -3 1 0 6A -2 1 1 1 2G -3 1 0 -1 1 5N -4 1 0 -1 0 0
Log-odds = pair in homologous proteins pair in unrelated proteins by chance
![Page 29: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/29.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 19
Part of PAM 250 MatrixC S T P A G N
C 1 2 S 0 2T -2 1 3P -3 1 0 6A -2 1 1 1 2G -3 1 0 -1 1 5N -4 1 0 -1 0 0
Log-odds = pair in homologous proteins pair in unrelated proteins by chance
![Page 30: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/30.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 20
Gap Penalties
• Insertion and Deletions (indels)• Affine gap costs - a scoring system for gaps
within alignments that charges a penalty forthe existence of a gap and an additional per-residue penalty proportional to the gap’slength
![Page 31: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/31.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 21
Example of simple scoringsystem for nucleic acids
• Match = +1 (ex. A-A, T-T, C-C, G-G)• Mismatch = -1 (ex. A-T, A-C, etc)• Gap opening = - 2• Gap extension = -1
![Page 32: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/32.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 21
Example of simple scoringsystem for nucleic acids
• Match = +1 (ex. A-A, T-T, C-C, G-G)• Mismatch = -1 (ex. A-T, A-C, etc)• Gap opening = - 2• Gap extension = -1
T C A G A C G A G T GT C G G A - - G C T G
![Page 33: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/33.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 21
Example of simple scoringsystem for nucleic acids
• Match = +1 (ex. A-A, T-T, C-C, G-G)• Mismatch = -1 (ex. A-T, A-C, etc)• Gap opening = - 2• Gap extension = -1
T C A G A C G A G T GT C G G A - - G C T G
![Page 34: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/34.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 21
Example of simple scoringsystem for nucleic acids
• Match = +1 (ex. A-A, T-T, C-C, G-G)• Mismatch = -1 (ex. A-T, A-C, etc)• Gap opening = - 2• Gap extension = -1
T C A G A C G A G T GT C G G A - - G C T G+1 +1 -1 +1 +1 -2 -1 -1 -1 +1 +1 = 0
![Page 35: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/35.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 22
Scoring for BLAST 2 SequencesScore = 94.0 bits (230), Expect = 6e-19Identities = 45/101 (44%), Positives = 54/101 (52%), Gaps = 7/101 (6%)
Query: 204 YTGPFCDV----DTKASCYDGRGLSYRGLARTTLSGAPCQPWASEATYRNVTAEQ---AR 256 Y+ FC + + CY G G +YRG T SGA C PW S V Q A+Sbjct: 198 YSSEFCSTPACSEGNSDCYFGNGSAYRGTHSLTESGASCLPWNSMILIGKVYTAQNPSAQ 257
Query: 257 NWGLGGHAFCRNPDNDIRPWCFVLNRDRLSWEYCDLAQCQT 297 GLG H +CRNPD D +PWC VL RL+WEYCD+ C TSbjct: 258 ALGLGKHNYCRNPDGDAKPWCHVLKNRRLTWEYCDVPSCST 298
Position 1: Y - Y = 7Position 2: T - S = 1Position 3: G - S = 0Position 4: P - E = -1 . . .Position 9: - - P = -11Position 10: - - A = -1
. . . Sum 230
Based onBLOSUM62
![Page 36: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/36.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 23
Topics to Cover• Introduction• Scoring alignments• Alignment methods
– Dot matrix analysis– Exhaustive methods; Dynamic programming algorithm
(Smith-Waterman (Local), Needleman-Wunsch(Global)
– Heuristic methods; Approximate methods; word or k-tuple (FASTA, BLAST)
• Significance of alignments• Database searching methods• Demo
![Page 37: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/37.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 24
Database Searching Methods:Local Alignments
FASTA
Smith-Waterman
Gapped BLAST
OriginalBLAST
SENSITIVITY
SPEE
D
OriginalFASTA
![Page 38: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/38.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 25
Dot Matrix Comparison
CoFaX11
Window Size = 8 Scoring Matrix: pam250 matrixMin. % Score = 50Hash Value = 2
100 200 300 400 500 600
100
200
300
400
500
F1EKK
Cata
lytic
CatalyticF2 E F1 E K
![Page 39: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/39.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 26
Dot Matrix Comparison
FLO11
Window Size = 16 Scoring Matrix: pam250 matrixMin. % Score = 60Hash Value = 2
200 400 600 800 1000 1200
200
400
600
800
1000
1200
![Page 40: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/40.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 26
Dot Matrix Comparison
FLO11
Window Size = 16 Scoring Matrix: pam250 matrixMin. % Score = 60Hash Value = 2
200 400 600 800 1000 1200
200
400
600
800
1000
1200
FLO11
Window Size = 16 Scoring Matrix: pam250 matrixMin. % Score = 60Hash Value = 2
950 1000 1050 1100 1150 1200 1250 1300 1350
900
1000
1100
1200
1300
![Page 41: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/41.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 26
Dot Matrix Comparison
FLO11
Window Size = 16 Scoring Matrix: pam250 matrixMin. % Score = 60Hash Value = 2
200 400 600 800 1000 1200
200
400
600
800
1000
1200
FLO11
Window Size = 16 Scoring Matrix: pam250 matrixMin. % Score = 60Hash Value = 2
950 1000 1050 1100 1150 1200 1250 1300 1350
900
1000
1100
1200
1300
FLO11
Window Size = 16 Scoring Matrix: pam250 matrixMin. % Score = 60Hash Value = 2
200 220 240 260 280 300 320 340 360 380
200
220
240
260
280
300
320
340
360
380
400
![Page 42: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/42.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 27
Dynamic Programming
![Page 43: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/43.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 27
Dynamic Programming
• Provides very best or optimal alignment
![Page 44: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/44.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 27
Dynamic Programming
• Provides very best or optimal alignment• Compares every pair of characters (e.g.
bases or amino acids) in the two sequences
![Page 45: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/45.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 27
Dynamic Programming
• Provides very best or optimal alignment• Compares every pair of characters (e.g.
bases or amino acids) in the two sequences• Puts in gaps and mismatches
![Page 46: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/46.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 27
Dynamic Programming
• Provides very best or optimal alignment• Compares every pair of characters (e.g.
bases or amino acids) in the two sequences• Puts in gaps and mismatches• Maximum number of matches between
identical or related characters
![Page 47: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/47.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 27
Dynamic Programming
• Provides very best or optimal alignment• Compares every pair of characters (e.g.
bases or amino acids) in the two sequences• Puts in gaps and mismatches• Maximum number of matches between
identical or related characters• Generates a score and statistical assessment
![Page 48: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/48.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
Match = +2
Mismatch = -1
Gap = -3
![Page 49: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/49.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
Match = +2
Mismatch = -1
Gap = -3
![Page 50: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/50.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
GT
= -1
- TG -
T-- G
= -6
= -6 -1
Match = +2
Mismatch = -1
Gap = -3
![Page 51: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/51.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
Match = +2
Mismatch = -1
Gap = -3
![Page 52: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/52.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
Match = +2
Mismatch = -1
Gap = -3
![Page 53: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/53.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
-4 -9-4 -4
Match = +2
Mismatch = -1
Gap = -3
![Page 54: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/54.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
-4 -9-4 -4
Match = +2
Mismatch = -1
Gap = -3
![Page 55: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/55.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
-4 -9-4 -4
-8 -5-13 -5
Match = +2
Mismatch = -1
Gap = -3
![Page 56: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/56.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
-4 -9-4 -4
-8 -5-13 -5
Match = +2
Mismatch = -1
Gap = -3
![Page 57: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/57.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
-4 -9-4 -4
-8 -5-13 -5
-6 -6-5 -5
Match = +2
Mismatch = -1
Gap = -3
![Page 58: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/58.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 28
Dynamic Programming
-27A-24A-21T-18G-15C-12A-9T-6T-3G
-18-15-12-9-6-30GapTAATATGap
-1 -6-6 -1
-4 -9-4 -4
-8 -5-13 -5
-6 -6-5 -5
Match = +2
Mismatch = -1
Gap = -3
![Page 59: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/59.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 29
Dynamic Programming
-2-7-12-17-22-27A-3-5-4-9-14-19-24A0-5-7-6-11-16-21T0-2-4-6-8-13-18G-21-1-3-10-15C-4-12-3-2-7-12A-6-6-30-2-4-9T
-11-8-5-2-2-1-6T-16-13-10-7-3G-18-15-12-9-6-300TAATAT0
-1 -6-6 -1
-4 -9-4 -4
-8 -5-13 -5
-6 -6-5 -5
![Page 60: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/60.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 30
Dynamic Programming
-2-7-12-17-22-27A-3-5-4-9-14-19-24A0-5-7-6-11-16-21T0-2-4-6-8-13-18G-21-1-3-10-15C-4-12-3-2-7-12A-6-6-30-2-4-9T
-11-8-5-2-2-1-6T-16-13-10-7-3G-18-15-12-9-6-300TAATAT0
-1 -6-6 -1
-4 -9-4 -4
-8 -5-13 -5
-6 -6-5 -5
- T - A - - T A A T
G T T A C G T A A -
- - T A - - T A A T
![Page 61: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/61.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 31
Global vs Local Alignment
Examples of aligning the same two proteins bothglobally and locally.
See Chapter 3, example 1 on the online site forBioinformatics by Mount.
![Page 62: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/62.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 32
Original “Ungapped” BLASTAlgorithm
• To improve speed, use a word based hashingscheme to index database
• Limit search for similarities to only the regionnear matching words
• Use Threshold parameter to rate neighbor words
• Extend match left and right to search for highscoring alignments
![Page 63: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/63.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 33
Original BLAST AlgorithmQuery word (W=3)
Query: GSVEDTTGSQSLAALLNKCKTPQGQRLVNQWIKQPLMPQG 18 PHG 13PEG 15 PMG 13PNG 13 PTG 12PDG 13 Etc.
NeighborhoodScore threshold(T=13)
Query: 325 SLAALLNKCKTPQGQRLVNQWIKQPLMDKNRIEERLNLVEA+LA++L+ TP G R++ +W+ P+ D + ER I A
Sbjct: 290 TLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGA
Neighborhoodwords
![Page 64: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/64.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 34
Recent BLAST Refinements
• “two-hit” method for extending word pairs• Gapped alignments• Iterate with position-specific matrix (PSI-
BLAST)• Pattern-hit initiated BLAST (PHI-BLAST)
![Page 65: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/65.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 35
Gapped BLAST
15(+) > 1322(•) > 11
From: NucleicAcids Research,1997, Vol. 25,
No. 173389–3402
![Page 66: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/66.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 35
Gapped BLAST
15(+) > 1322(•) > 11
From: NucleicAcids Research,1997, Vol. 25,
No. 173389–3402
![Page 67: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/67.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 36
Gapped BLAST
From: NucleicAcids Research,1997, Vol. 25,
No. 173389–3402
![Page 68: Sequ en ce An alysis: Part I. Pairwise alignment and ...barc.wi.mit.edu/education/bioinfo2003/lecture1-color.pdfBioinformatics Sequ en ce An alysis: Part I. Pairwise alignment and](https://reader035.fdocuments.us/reader035/viewer/2022063004/5f7ae7558186d60a193bdb77/html5/thumbnails/68.jpg)
WIBR Bioinformatics Course, © Whitehead Institute, 2002 37
Programs to Compare twosequences
Macintosh– MacVector - Pustell Protein Matrix (DotPlot)
Web– BLAST 2 Sequences– RepeatFinder– lalign
GCG/Unix– BestFit - Smith-Waterman (randomize)– Gap - Needleman -Wunsch (randomize)– Dotter (dot plot)