Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation...
-
Upload
julia-jordan -
Category
Documents
-
view
216 -
download
0
Transcript of Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation...
![Page 1: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/1.jpg)
Pairwise alignment
Lesson 6Based on presentation by Irit Gat-Viks,
which is based on presentation by Amir Mitchel,Introduction to bioinformatics course,
Bioinformatics unit, Tel Aviv University.and.. Benny shomer, Bar-Ilan university
![Page 2: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/2.jpg)
DefinitionAlignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical characters in the sequences.
VLSPADKTNVKAAWAKVGAHAAGHG
||| | | |||| | ||||
VLSEAEWQLVLHVWAKVEADVAGHG
![Page 3: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/3.jpg)
Sequence comparisons
Goal: similarity search on sequence database
Multiple pairwise comparisons
We wish to optimize for speed, not accuracy
BLAST, FASTA programs
Next goal: refine database search, are the reported
matches really interesting?
Goal: Comparing two specific sequences
Single pairwise comparisons
We wish to optimize for accuracy, not speed
Dynamic programming methods (Smith-Waterman,
Needleman-Wunsch)
Identify homologous, common domains, common active sites
etc.
![Page 4: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/4.jpg)
How similar are two sequences?
• The common measure of sequence similarity is their alignment score
• Simpler measures, e.g., % identity are also common
• These require algorithm that compute the optimal alignment between sequences
![Page 5: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/5.jpg)
Comparison methods
• Global alignment – Finds the best alignment across the whole two sequences.
• Local alignment – Finds regions of similarity in parts of the sequences.
Global Local
_____ _______ __ ____
__ ____ ____ __ ____
![Page 6: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/6.jpg)
Pairwise Alignment - Scoring
• The final score of the alignment is the sum of the positive scores and penalty scores:
+ Number of Identities
+ Number if Similarities
- Number of gap insertions
- Number of Gap extensions
Alignment score
![Page 7: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/7.jpg)
Intuition of Dynamic Programming
If we already have the optimal solution to:XYAB
then we know the next pair of characters will either be:
XYZ or XY- or XYZABC ABC AB-
(where “-” indicates a gap).
So we can extend the match by determining which of these has the highest score.
![Page 8: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/8.jpg)
V(k,l) has the following properties:• Base conditions:
– V(i,0) = k=0..i(sk,-)
– V(0,j) = k=0..j(-,tk)
• Recurrence relation: V(i-1,j-1) + (si,tj)
1in, 1jm: V(i,j) = max V(i-1,j) + (si,-)
V(i,j-1) + (-,tj)
Alignment with 0 elements spacing
S’=s1...si-1 with T’=t1...tj-1
si with tj.
S’=s1...si with T’=t1...tj-1and ‘-’ with tj.
V(i,j) := optimal score of the alignment of S’=s1…si and T’=t1…tj (0 i n, 0 j m)
![Page 9: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/9.jpg)
Optimal Alignment - Tabular Computation
• Add back pointer(s) from cell (i,j) to father cell(s) realizing V(i,j).
• Trace back the pointers from (m,n) to (0,0)
• Needleman-Wunsch, ‘70
Backtracking the alignment
![Page 10: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/10.jpg)
PAM vs. BLUSOM• Choosing n
– Different BLOSUM matrices are derived from blocks with different identity percentage. (e.g., blosum62 is derived from an alignment of sequences that share at least 62% identity.) Larger n smaller evolutionary distance.
– Single PAM was constructed from at least 85% identity dataset. Different PAM matrices were computationally derived from it. Larger n larger evolutionary distance
• Blosum uses more sequences
Observed % Difference
Evolutionary distance (PAM)
BLOSUM
1 1 9910 11 9020 23 8030 38 7040 56 6050 80 5060 120 4070 159 3080 250 20
62
120
250
![Page 11: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/11.jpg)
DNA scoring matrices
• Non-uniform substitutions in all nucleotides:
From
To
A G C T
A 2
G -4 2
C -6 -6 2
T -6 -6 -4 2
MatchMismatchtransition
Mismatchtransversion
![Page 12: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/12.jpg)
Topics to be Covered
• Introduction• Comparison methods – Global, local alignment• Alignment parameters• Alignment scoring matrices – proteins• Alignment scoring matrices – DNA• Evaluation• Comparison programs• Choosing between Global / local alignment
![Page 13: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/13.jpg)
Example: Global or local?
• Two human transcription factors:
1. SP1 factor, binds to GC rich areas.
2. EGR-1 factor, active at differentiation stage
(Fasta fromats from http://us.expasy.org/sprot/)
![Page 14: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/14.jpg)
>sp|P08047|SP1_HUMAN Transcription factor Sp1 - Homo sapiens (Human). MSDQDHSMDEMTAVVKIEKGVGGNNGGNGNGGGAFSQARSSSTGSSSSTGGGGQESQPSP
LALLAATCSRIESPNENSNNSQGPSQSGGTGELDLTATQLSQGANGWQIISSSSGATPTS KEQSGSSTNGSNGSESSKNRTVSGGQYVVAAAPNLQNQQVLTGLPGVMPNIQYQVIPQFQ TVDGQQLQFAATGAQVQQDGSGQIQIIPGANQQIITNRGSGGNIIAAMPNLLQQAVPLQG LANNVLSGQTQYVTNVPVALNGNITLLPVNSVSAATLTPSSQAVTISSSGSQESGSQPVT SGTTISSASLVSSQASSSSFFTNANSYSTTTTTSNMGIMNFTTSGSSGTNSQGQTPQRVS GLQGSDALNIQQNQTSGGSLQAGQQKEGEQNQQTQQQQILIQPQLVQGGQALQALQAAPL SGQTFTTQAISQETLQNLQLQAVPNSGPIIIRTPTVGPNGQVSWQTLQLQNLQVQNPQAQ TITLAPMQGVSLGQTSSSNTTLTPIASAASIPAGTVTVNAAQLSSMPGLQTINLSALGTS GIQVHPIQGLPLAIANAPGDHGAQLGLHGAGGDGIHDDTAGGEEGENSPDAQPQAGRRTR REACTCPYCKDSEGRGSGDPGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERPFMCTW SYCGKRFTRSDELQRHKRTHTGEKKFACPECPKRFMRSDHLSKHIKTHQNKKGGPGVALS VGTLPLDSGAGSEGSGTATPSALITTNMVAMEAICPEGIARLANSGINVMQVADLQSINI SGNGF
>sp|P18146|EGR1_HUMAN Early growth response protein 1 (EGR-1) (Krox-24 protein) (ZIF268) (Nerve growth factor-induced protein A) (NGFI-A) (Transcription factor ETR103) (Zinc finger protein 225) (AT225) - Homo sapiens (Human).
MAAAKAEMQLMSPLQISDPFGSFPHSPTMDNYPKLEEMMLLSNGAPQFLGAAGAPEGSGS NSSSSSSGGGGGGGGGSNSSSSSSTFNPQADTGEQPYEHLTAESFPDISLNNEKVLVETS YPSQTTRLPPITYTGRFSLEPAPNSGNTLWPEPLFSLVSGLVSMTNPPASSSSAPSPAAS SASASQSPPLSCAVPSNDSSPIYSAAPTFPTPNTDIFPEPQSQAFPGSAGTALQYPPPAY PAAKGGFQVPMIPDYLFPQQQGDLGLGTPDQKPFQGLESRTQQPSLTPLSTIKAFATQSG SQDLKALNTSYQSQLIKPSRMRKYPNRPSKTPPHERPYACPVESCDRRFSRSDELTRHIR IHTGQKPFQCRICMRNFSRSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHLR QKDKKADKSVVASSATSSLSSYPSPVATSYPSPVTTSYPSPATTSYPSPVPTSFSSPGSS TYPSPVHSGFPSPSVATTYSSVPPAFPAQVSSFPSSAVTNSFSASTGLSDMTATFSPRTI EIC
![Page 15: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/15.jpg)
SP1 at swissprot
![Page 16: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/16.jpg)
EGR1 at swissprot
![Page 17: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/17.jpg)
Available softwares…
• http://en.wikipedia.org/wiki/Sequence_alignment_software
• http://fasta.bioch.virginia.edu/fasta_www/home.html– LAlign (local alignment), PLalign(dot plot)– PRSS/ PRFX (significance by Monte Carlo)
• http://bioportal.weizmann.ac.il/toolbox/overview.html (Many useful software), Needle, Water.
• Bl2seq (NCBI)
![Page 18: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/18.jpg)
Using LAlign
• http://www.ch.embnet.org/software/LALIGN_form.html
• http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_006758.2
• http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_066300.1
![Page 19: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/19.jpg)
![Page 20: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/20.jpg)
Bl2Seq at NCBIhttp://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi
![Page 21: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/21.jpg)
Bl2seq results
![Page 22: Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course,](https://reader036.fdocuments.us/reader036/viewer/2022062423/5697bf9e1a28abf838c9466b/html5/thumbnails/22.jpg)
Conclusions• The proteins share only a limited area of sequence
similarity. Therefore, the use of local alignment is recommended.
• We found a local alignment that pointed to a possible structural similarity, which points to a possible function similarity.
• Reasons to make Global alignment:• Checking minor differences between close homologous.• Analyzing polymorphism.• A good reason