COMP 571 Bioinformatics: Sequence Analysis Pairwise Sequence Alignment (II)
Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can...
-
Upload
esther-austin -
Category
Documents
-
view
235 -
download
1
Transcript of Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can...
![Page 1: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/1.jpg)
Pairwise Sequence Alignment(PSA)
Why and How
![Page 2: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/2.jpg)
Goals of sequence alignment
• From the alignment we can learn about:– The function of a new protein– New members of a gene family– Evolutionary relationships between genes– Position and function of coding genes and of
regulatory regions in a genomic sequence– Comparison of sequences between
individuals can detect changes that are related to diseases
Slide by Vered Caspi, BGU. 21.12.2005
![Page 3: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/3.jpg)
Similarity vs. homology
Sequence alignment algorithms enable us to identify similarity between sequences
From sequence similarity (and additional biological knowledge) we may deduce sequence homology.
![Page 4: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/4.jpg)
Homology: common ancestry of genes
![Page 5: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/5.jpg)
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html
speciation
Homologous genes: orthologs and paralogs
![Page 6: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/6.jpg)
Retinol-binding protein, human(NP_006735)
b-lactoglobulin, cow(P02754)
Slide from J. Pevsner - Page 42
Similar genespossible similarity in
structure/function
Homologous genes
![Page 7: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/7.jpg)
EEELTKPRLLWALYFNMRDALSSG
VEKPRILYALYFNMRDSSDE
Pairwise Sequence Alignment
You can find a list of abbreviations at:http://en.wikipedia.org/wiki/Amino_acids#Table_of_standard_amino_acid_abbreviations_and_side_chain_properties
![Page 8: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/8.jpg)
Alignment The process of lining up two or more sequences to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
EEELTKPRLLWALYFNMRDALSSG
VEKPRILYALYFNMRDSSDE
EEELTKPRLLWALYFNMRDALSSG----VEKPRILYALYFNMRD--SSDE
Pairwise Sequence Alignment
Slide by Vered Caspi, BGU. 21.12.2005
![Page 9: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/9.jpg)
EEELTKPRLLWALYFNMRDALSSG
VEKPRILYALYFNMRDSSDE
EEELTKPRLLWALYFNMRDALSSG----VEKPRILYALYFNMRD--SSDE
end gap
mismatchmatch
conservedsubstitution
gap
Pairwise Sequence Alignment
Slide by Vered Caspi, BGU. 21.12.2005
![Page 10: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/10.jpg)
Examples
• Pairwise alignment servers:– LALIGN (
http://www.ch.embnet.org/software/LALIGN_form.html)
– NEEDLE / WATER global and local PSA (http://www.ebi.ac.uk/Tools/emboss/align/index.html)
![Page 11: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/11.jpg)
Pairwise sequence alignment
• NEEDLE results for example:EMBOSS_001 1 EEELTKPRLLWALYFNMRDALSSG 24
:.|||:|:||||||||:...
EMBOSS_001 1 VEKPRILYALYFNMRDSSDE 20
• Alternative:EMBOSS_001 1 EEELTKPRLLWALYFNMRDALSSG 24
:.|||:|:|||||||| ||.
EMBOSS_001 1 VEKPRILYALYFNMRD--SSDE 20
mismatch gapsmall
positive score
Score >1.0 identity
![Page 12: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/12.jpg)
A different format you might find
mismatch Similarity
identity
Tyrosine (Y)
Tryptophan(W)
gap
![Page 13: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/13.jpg)
How do we find the best alignment?
• One can find the optimal alignment by trying all possible alignments and choosing the best one.
• There are two approaches to do that:– Graphically display the sequences in a way
that will help us find the best alignment by eye– Let the computer compute a score for each
possible alignment and choose the alignment with the highest score.
![Page 14: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/14.jpg)
Sequence alignment process • Choose strategy
– Compare DNA or protein sequences– Global or local alignment
• Execute an algorithm to determine the optimal alignment of the sequences– Choose algorithm– Give parameters to the algorithm (gap penalties,
scoring matrix)
• Interpret the results– Is the alignment “good” (score, % identity)– Is it possible that the alignment was achieved by
chance (statistical significance, e-value)– Does the alignment represent a true biological
relationship between the sequence
![Page 15: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/15.jpg)
Graphical representation of sequence alignment
Dot Plot
![Page 16: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/16.jpg)
DotPlot of Sequences E E E L T K P R L L W A L Y F N M R D A L S S G
V
E
K
P
R
I
L
Y
A
L
Y
F
N
M
R
D
S
S
D
E
Match
Mismatch
Gap
![Page 17: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/17.jpg)
Remove noise: Windows
• Usually, one AA identity holds little biological meaning
• We are interested in contiguous identities.
Window size background noise
missed identites
Large
Small
![Page 18: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/18.jpg)
Remove noise: Windows
• For nucleotide sequences, there are only 4 possible letters so windows should be larger
• For AA sequences, there are 20 possible letters and windows can be smaller.
![Page 19: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/19.jpg)
Window size: 2
Window size: 3
Remove noise: Windows
![Page 20: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/20.jpg)
DotPlots for detection of Repeats
![Page 21: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/21.jpg)
Dotlet
• An application for viewing dotplots.
![Page 22: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/22.jpg)
Dot Plots
• More on dot plots in the hands-on session.
![Page 23: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/23.jpg)
Principles of sequence alignment
![Page 24: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/24.jpg)
Major strategies for SA
• Global alignment– attempt to align every residue in the
sequences.
• Local alignment– Identify regions of similarity within their larger
sequence context.
![Page 25: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/25.jpg)
• Global alignment advantages:– Easy to understand, complete seqs. in output.– Checking minor differences between 2 seqs.– Finding polymorphisms between 2 seqs.
• Local alignment advantages:– mRNA vs. Genomic DNA: introns/exons– Genes/proteins are modular– Finding repeat elements within 1 sequence.– Possible to determine E-values.
Global vs. Local Alignment
![Page 26: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/26.jpg)
What degree of similarity between sequences indicates homology?
• It has been shown empirically that protein sequences which can be aligned along 100 amino acids or more, where in the aligned region at least 35% of the amino acids are identical, are homologous.
Orengo, Jones & Thornton (2003) “Bioinformatics. Genes, Proteins & Computers” BIOS. p. 30
Aligned residues
![Page 27: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/27.jpg)
What degree of similarity between sequences indicates homology?
• Usually, PSA is used to identify or study close homologs (>35% identity).
• Twilight zone: Seqs. with 25%-35% identity. They may have evolutionary relatedness, but this has to be checked carefully.
• To study about evolutionary relatedness of more distant proteins, one has to apply more advanced methods such as multiple sequence alignment (MSA), profile searches, threading and so on. Some of these methods will be taught later in the course.
![Page 28: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/28.jpg)
What degree of similarity between sequences indicates homology?
Evolutionary distance
% I
dent
ity
From Pevsner
![Page 29: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/29.jpg)
Scoring a sequence alignment
• Quantitative indication of the quality of the alignment.
• Quantitative comparison of alignments in search algorithm.
• Nucleotide sequence: Nucleotides are either identical or not.
• Amino acid sequence: AAs may also be similar, i.e. close chemical properties.
![Page 30: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/30.jpg)
• The score depends on penalizing two kinds of differences between the sequences:– Point mutations (with a substitution matrix)– Indels (Gap penalties)
Slide by Vered Caspi, BGU. 21.12.2005
Scoring a sequence alignment
![Page 31: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/31.jpg)
Amino Acid Substitution Matrices
• We use a two-dimensional matrix (table) of 20 X 20, where each cell in the matrix contains a number indicating the similarity between a pair of amino acids.
• Positive values indicate high similarity. • Negative values indicate low similarity.• We will see later how the matrices are
developed.• Many different matrices exist.
Orengo, Jones & Thornton (2003) “Bioinformatics. Genes, Proteins & Computers” BIOS. Chapter 4
![Page 32: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/32.jpg)
Amino Acid Substitution Matrices
• A positive score is given to the more likely substitutions while a negative score is given to the less likely substitutions.
• Every identity or substitution is assigned a score based on its observed frequencies in the alignment of related proteins.
• Scores within a BLOSUM are log-odds scores that measure the log for the ratio of the likelihood of one AAs substituting another with a biological sense and the likelihood of the same AAs appearing by chance.
Orengo, Jones & Thornton (2003) “Bioinformatics. Genes, Proteins & Computers” BIOS. Chapter 4
![Page 33: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/33.jpg)
Amino Acid Substitution Matrices
• Substitution matrices are constructed by assembling a large and diverse sample of verified pairwise alignments (or multiple sequence alignments) of AAs.
• Substitution matrices should reflect the true probabilities of mutations occurring through a period of evolution.
• The two major types of substitution matrices are PAM and BLOSUM.
From Pevsner
BLOSUM62
BLOCKS Database: multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins
![Page 34: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/34.jpg)
BLOSUM62 Matrix
Small hydrophylic
Acid, acid amide and hydrophilic
Basic
Small hydrophobic
Aromatic
![Page 35: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/35.jpg)
BLOSUM62 Substitution
Matrix
Common amino acids have low weights
A 4R -1 5 N -2 0 6D -2 -2 1 6C 0 -3 -3 -3 9Q -1 1 0 0 -3 5E -1 0 0 2 -4 2 5G 0 -2 0 -1 -3 -2 -2 6H -2 0 1 -1 -3 0 0 -2 8I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 A R N D C Q E G H I L K M F P S T W Y V X
Rare amino acids have high weights
Negative for less likely substitutions
Positive for more likely substitutions
Scoring Systems - Proteins
From NCBI field guides
NC
BI
Fie
ldG
uid
e
![Page 36: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/36.jpg)
Gap penalties
Example gap penalties:Gap opening: -10Gap extension: -0.5
Slide by Vered Caspi, BGU. 21.12.2005
• The presence of a gap is ascribed more significance than the length of the gap. (Because a single mutational event may cause the insertion or deletion of more than one residue.)
• Gap opening – penalty for presence of gap
• Gap extension – penalty for gap length.
![Page 37: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/37.jpg)
Score= (4+2+4+9+7-2+3)-(10+0.5+0.5) = 16
Slide by Vered Caspi, BGU. 21.12.2005
Alignment scoring
penaltiesgapmismatchesidentitiesS ,
V D S - - - C Y H V
V E S L T G C Y A L
4 2 4 -10 -0.5 -0.5 9 7 -2 3
![Page 38: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/38.jpg)
Substitution Matrices
• In principle, there are two approaches to construct an AA substitution matrix:1.Based on careful study of the physico-
chemical structures of the amino acid.2.To use a more empirical approach, based on
inspection of groups of proteins whom we know in advance to be homologous.
• The second approach was found to give better results, and is the one used today in popular BLOSUM substitution matrices.
Slide by Vered Caspi, BGU. 21.12.2005
![Page 39: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/39.jpg)
Substitution Matrices
1.Based on careful study of the physico-chemical structures of the amino acid.
![Page 40: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/40.jpg)
Conservative Substitutions - Definition
• Substitutions that – Conserve the physical and chemical
properties of the amino acids – Limit disruptions in protein structure/function.
Orengo, Jones & Thornton (2003) “Bioinformatics. Genes, Proteins & Computers” BIOS. Chapter 4
![Page 41: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/41.jpg)
Slide from S. Pietrokovsky
![Page 42: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/42.jpg)
Percent accepted mutations matrix (PAMs)
• A matrix of weights that is derived from how often different AAs replace other AAs in evolution.
• Based on a database of 1,572 changes in 71 groups of closely related proteins.
• PAM-1 would correspond to roughly 1% divergence in a protein (one amino acid replacement per hundred).
![Page 43: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/43.jpg)
Percent accepted mutations matrix (PAMs)
• To derive a mutational probability matrix for a protein sequence that has undergone N percent accepted mutations, a PAM-N matrix, the PAM-1 matrix is multiplied by itself N times.
• This results in a family of scoring matrices. PAM matrices.
• By trial and error it was found that for weighting purposes a PAM-250 matrix works well.
![Page 44: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/44.jpg)
Percent accepted mutations matrix (PAMS)
original amino acid
replacement amino acid
![Page 45: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/45.jpg)
Odds matrix• What? The ratio Ma,b/Pb: The probability that
some AA a will change to AA b in some PAM interval.
• Ma,b - The probability that the aligned pair a and b represent an authentic alignment.
• Pb – The probability that residue b was aligned by chance (=the normalized frequency)
![Page 46: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/46.jpg)
Normalized Frequencies of Amino Acids
Gly (G) 8.9%Arg (R) 4.1%Ala (A) 8.7%Asn (N) 4.0%Leu (L) 8.5%Phe (F) 4.0%Lys (K) 8.1%Gln (Q) 3.8%Ser (S) 7.0%Ile (I) 3.7%
Val (V) 6.5%His (H) 3.4%Thr (T) 5.8%Cys (C) 3.3%Pro (P) 5.1%Tyr (Y) 3.0%Glu (E) 5.0%Met (M) 1.5%Asp (D) 4.7%Trp (W) 1.0%
![Page 47: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/47.jpg)
Log odds matrix• Why? Logarithms are easier to use for a
scoring system. They allow us to sum the scores of aligned residues (rather than having to multiply them).
b
ab
P
MbaS 10log10),(
![Page 48: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/48.jpg)
Pevsner Page 57
How do we go from mutation-probability to log-odds matrices?
• The cells in a log odds matrix consist of an “odds ratio”:
the probability that an alignment is authentic
the probability that the alignment was random• The score S for an alignment of residues a,b is given by:
• Example, for tryptophan (W):
• S(W,W) = 10 log10 (0.55/0.010) = 17.4– Probability of alignment W-W: 0.55 (According to PAM250 matrix)– Probability of chance appearance of Trp: 0.01
b
ab
P
MbaS 10log10),(
![Page 49: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/49.jpg)
Percent accepted mutations matrix (PAMS)
original amino acid
replacement amino acid
![Page 50: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/50.jpg)
Normalized Frequencies of Amino Acids
Gly (G) 8.9%Arg (R) 4.1%Ala (A) 8.7%Asn (N) 4.0%Leu (L) 8.5%Phe (F) 4.0%Lys (K) 8.1%Gln (Q) 3.8%Ser (S) 7.0%Ile (I) 3.7%
Val (V) 6.5%His (H) 3.4%Thr (T) 5.8%Cys (C) 3.3%Pro (P) 5.1%Tyr (Y) 3.0%Glu (E) 5.0%Met (M) 1.5%Asp (D) 4.7%Trp (W) 1.0%
![Page 51: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/51.jpg)
What do the numbers meanin a log odds matrix?
S(W,W) = 10 log10 (0.55/0.01) = 17.4
A score of +17 for tryptophan (W) means that this alignment is 50 times more likely than a chance alignment of two Trp residues.
S(a,b) = 17x = Probability of replacement (Mab/pb)Then10 log10 x = 17log10 x = 1.7
x = 101.7 = 50
![Page 52: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/52.jpg)
What do the numbers meanin a log odds matrix?
• A score of +2 : The AA replacement occurs 1.6 times as frequently as expected by chance.
• A score of 0 : Replacement is as frequent as chance alignment.
• A score of –10 : The correspondence of the two AAs in an alignment that accurately represents homology (evolutionary descent) is one tenth as frequent as the chance alignment of these AAs.
![Page 53: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/53.jpg)
BLOSUM Matrices
• BLOSUM matrices are based on local alignments.
• BLOSUM stands for Blocks Substitution Matrix.
• BLOSUM62 is a matrix calculated from comparisons of sequences with more than 62% divergence.
• BLOSUM matrix values are given as the log-odds scores (Same as PAM matrices)
![Page 54: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/54.jpg)
Closely related proteins
Distant proteins(“Twilight zone”)
Substitution matrices
![Page 55: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/55.jpg)
Slide by Jonathan Pevsner
Comparing protein sequencescan be more informative than DNA
• Protein is more informative (20 vs. 4 characters);• Many amino acids share related biophysical
properties.• Codons are degenerate: changes in the third
position often do not alter the amino acid that is specified.
• Protein sequences offer a longer “look-back” time.
• DNA sequences can be translated into protein and then used in pairwise alignments.
![Page 56: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/56.jpg)
Slide by Jonathan Pevsner
Comparing protein sequencescan be more informative than DNA
• However, many times, DNA alignments are appropriate– to confirm the identity of a cDNA– to study noncoding regions of DNA– to study DNA polymorphisms
![Page 57: Pairwise Sequence Alignment (PSA) Why and How. Goals of sequence alignment From the alignment we can learn about: –The function of a new protein –New.](https://reader036.fdocuments.us/reader036/viewer/2022062320/56649d8b5503460f94a718ce/html5/thumbnails/57.jpg)
Summary
• Graphical alignment: Dot plots
• Algorithmic alignment:– Global alignment (=“needle”)– Local alignment (=“water”)
• For proteins: Based on substitution matrices:– PAM– BLOSUM