Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE...
Transcript of Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE...
![Page 1: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/1.jpg)
Medical Natural Sciences Year 2:Introduction to Bioinformatics
Lecture 9:Multiple sequence alignment (III)
Centre for Integrative Bioinformatics VU
![Page 2: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/2.jpg)
Intermezzo: Symmetry-derived secondary structure prediction using
multiple sequence alignments (SymSSP)
Victor Simossis Jaap Heringa
Centre for Integrative Bioinformatics VU (IBIVU)Vrije Universiteit
Amsterdam, The Netherlands
![Page 3: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/3.jpg)
Symmetry-derived secondary structure prediction using multiple
sequence alignments (SymSSP)• Modern state-of-the-art methods use multiple sequence alignments
•Methods like PhD, Profs, SSPro, etc., predict for the top sequence in the alignment by cutting out positions with gaps in the top sequence
• What if two helices ‘out of phase’ are pasted together? Or a strand and a helix?
• Approach: correct by permuting alignments and consensus prediction
![Page 4: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/4.jpg)
Secondary structure periodicity patterns
Burried β-strand
Edge β-strand
α-helix
hydrophobic hydrophilic
![Page 5: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/5.jpg)
Symmetry-derived Secondary structure prediction using MA (SymSSP)
1234
2134
3124
4123
EEEEE HHHHHH EEEEE HH
EEEE? ?HHHHH EEE H
EEEEE HHHHH? ??EE HH
EEEEEE ?HHHHH EEEE HH
EEEEE HHHHHH EEE HH
EEEE? ?HHHHH EEE H
EEEEE HHHHH? ??EE HH
EEEEE ?HHHHH EEEE HH
EEEEE HHHH EEE HH
EEEE? ?HHH EEE H
EEEEE HHH? ??EE HH
EEEEE HHH? EEEE HH
EEEEE HHHHHH EEE HHHH
EEEE? ?HHHHH EEE ?HHH
EEEEE HHHHH? ??EE HHHH
EEEEE ?HHHHH EEEE HHHH
1111
EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH
![Page 6: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/6.jpg)
Optimal segmentation of predicted secondary structures
H score 0 0 0 0 0….E score 3 4 4 4 3….C score 1 0 0 0 0…..
1234
EEEEE HHHHHH EEEEE HH
EEEE? ?HHHHH EEE H
EEEEE HHHHH? ??EE HH
EEEEEE ?HHHHH EEEE HH
? Score 0 0 0 0 1….Region 0 1 1 1 0….
CEH
Each sequence within an alignment gives riseto a library of n secondary structure predictions, where n is the number of sequences in the alignment.
The predictions are recorded by secondary structure type and region position in a single matrix
1->11->21->31->4
![Page 7: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/7.jpg)
Optimal segmentation of predicted secondary structures by Dynamic Programming
sequence position
window size
Max scoreOffsetLabel
H scoreE scoreC score
The recorded values are used in a weighted function according to their secondary structure type, that gives each position a window-specific score. The more probable the secondary structure element, the higher the score.
Restrictions:H only if ws>=4E only if ws>=2
5H
2 6
Segmentation score (Total score of each path)
? scoreRegion
![Page 8: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/8.jpg)
Example of an optimally segmented secondary structure prediction library for sequence 3chy3chy ---------------GYVV-----KPFTAATLEEKLNKIFEKLGM------3chy <- 1fx1 ??????????????? ee ?? hhhhhhhhhhhhhh ????????3chy <- FLAV_DESDE ??????????????? ee ?? hhhhhhhhhhhhhhh ????????3chy <- FLAV_DESVH ??????????????? ee ?? hhhhhhhhhhhhhh ????????3chy <- FLAV_DESGI ??????????????? eee ?? ??hhhhhhhhhhhhh ????????3chy <- FLAV_DESSA ??????????????? eee ?? ??hhhhhhhhhhhhh ????????3chy <- 4fxn ??????????????? eee ?? hhhhhhhhhhhhh ?????????3chy <- FLAV_MEGEL ????????????????eee ?? hh?hhhhhhhhhhh ?????????3chy <- 2fcr e ? eeeeeee hhhhhhhhhhhhhhh ??????3chy <- FLAV_ANASP ? eeeeeee hhhhhhhhhhhhhhh ??????3chy <- FLAV_ECOLI eeeeeee hhhhhhhhhhhhhhh hhhhh3chy <- FLAV_AZOVI ? eeeeeee hhhhhhhhhhhhhhh ????3chy <- FLAV_ENTAG e eeeeeeee hhhhhhhhhhhhhhhh? ??????3chy <- FLAV_CLOAB eeeeeee hhhhhhhhhh ???????????3chy <- 3chy --------------- ----- hhhhhhhhhhhhhh ------
Consensus ---------------EEEE----- HHHHHHHHHHHHH ------Consensus-DSSP ...............****.....****xx***************......
PHD --------------- ----- HHHHHHHHHHHHHH ------PHD-DSSP ...............xxxx.....******************x**......
DSSP ...............EEEE.....SS HHHHHHHHHHHHHHHT ......LumpDSSP ...............EEEE..... HHHHHHHHHHHHHHH ......
![Page 9: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/9.jpg)
Symmetry-derived secondary structure prediction (SymSSP)
• Tried over 120 different consensus weighting schemes (global, regional, positional)
• Over ~2700 Homstrad alignments and compared to PHD, on average 0.5% better
• 60% of the alignments are improved, 20% not affected and 20% is made worse
• Tried to correlate schemes with “cheap” a priori data (pairwise identities, sequence lengths, number of sequences, etc.)
![Page 10: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/10.jpg)
Integrating secondary structure prediction and multiple sequence
alignment• Low key example shown of fairly
homogeneous data (strings of letters in both cases)
• But already difficult to do and methods are not easily tunable
• How to scale up to knowledge-integrating and inference engines?
![Page 11: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/11.jpg)
Strategies for multiple sequence alignment
• Profile pre-processing• Secondary structure-induced
alignment• Globalised local alignment• Matrix extension
Objective: try to avoid (early) errors
![Page 12: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/12.jpg)
Globalised local alignment
• Aim: fill each DP search matrix with the highest possible local alignment going through that cell
• Problem: Forward calculation + traceback for each local alignment is too slow
• Solution: Double dynamic programming1. Local DP in forward and reverse direction (no
traceback) + matrix summation2. Global DP over matrix from step 1 + traceback
![Page 13: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/13.jpg)
Globalised local alignment
1. Local (SW) alignment (M + Po,e)
+ =
2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
![Page 14: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/14.jpg)
M = BLOSUM62, Po= 0, Pe= 0
![Page 15: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/15.jpg)
M = BLOSUM62, Po= 12, Pe= 1
![Page 16: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/16.jpg)
M = BLOSUM62, Po= 60, Pe= 5
![Page 17: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/17.jpg)
Strategies for multiple sequence alignment
• Profile pre-processing• Secondary structure-induced
alignment• Globalised local alignment• Matrix extension
Objective: try to avoid (early) errors
![Page 18: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/18.jpg)
Integrating alignment methods and alignment information with
T-Coffee• Integrating different pair-wise alignment
techniques (NW, SW, ..)• Combining different multiple alignment
methods (consensus multiple alignment)• Combining sequence alignment methods
with structural alignment techniques• Plug in user knowledge
![Page 19: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/19.jpg)
Matrix extension
T-CoffeeTree-based Consistency Objective Function
For alignmEnt Evaluation
Cedric NotredameDes HigginsJaap Heringa J. Mol. Biol., J. Mol. Biol., 302, 205302, 205--217217;2000;2000
![Page 20: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/20.jpg)
Using different sources of alignment information
Structure alignmentsClustalClustal
Lalign ManualDialign
T-Coffee
![Page 21: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/21.jpg)
Progressive multiple alignment12134
Score 1-2
Score 1-3
Score 4-5
ScoresSimilaritymatrix
5
5×5
Guide tree Multiple alignment
![Page 22: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/22.jpg)
Default T-COFFEE
• Uses information from all sequences for each pair-wise alignment
• Reconciles global and local alignment information
![Page 23: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/23.jpg)
T-Coffee matrix extension
12
13
14
23
24
34
![Page 24: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/24.jpg)
Search matrix extension
![Page 25: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/25.jpg)
T-Coffee• Combine different alignment techniques by adding scores:
W(A(x), B(y)) = ∑S(A(x), B(y))
– A(x) is residue x in sequence A– summation is over the scores S of the global and local
alignments containing the residue pair (A(x), B(y))– S is sequence identity percentage of the associated alignment
• Combine direct alignment seqA- seqB with each seqA-seqI-seqB:
W’(A(x), B(y)) = W(A(x), B(y)) + ∑I≠A,BMin(W(A(x), I(z)), W(I(z), B(y)))
– Summation over all third sequences I other than A or B
![Page 26: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/26.jpg)
T-Coffee
Direct alignment
Other sequences
![Page 27: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/27.jpg)
T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight
3 V31 5 L33 103 V31 6 L34 14
5 L33 6 R35 215 l33 6 I36 35
![Page 28: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/28.jpg)
T-Coffee progressive alignment
MDAGSTVILCFVGMDAASTILCGS
Amino Acid Exchange Matrix
Gap penalties (open,extension)
Search matrix
MDAGSTVILCFVG-MDAAST-ILC--GS
![Page 29: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/29.jpg)
Kinase nucleotide binding sites
![Page 30: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/30.jpg)
Comparing T-coffee with other methods
![Page 31: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/31.jpg)
but.....T-COFFEE (V1.23) multiple sequence alignment Flavodoxin-cheY1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-----FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-----FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK-----FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK-----FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK-----4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEPF-IEEIS-TKISGKK-----FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK-----FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL-----2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP-----FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT-----FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL-----FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT-----FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL-----3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE--------------LLKTIRADGAMSALPVLMV
:. . . : . ::
1fx1 ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI--------FLAV_DESVH ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI--------FLAV_DESGI ---------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV--------FLAV_DESSA ---------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI--------FLAV_DESDE ---------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL--------4fxn ---------VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP---------------------LIVQNEPD--EAEQDCIEFGKKIANI---------FLAV_MEGEL ---------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA---------FLAV_CLOAB ---------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF-----------2fcr ---------VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------FLAV_ENTAG ---------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL-------FLAV_ANASP ---------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------FLAV_AZOVI ---------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL----FLAV_ECOLI ---------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA3chy TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM----------------------------------------------------------
.
![Page 32: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/32.jpg)
Evaluating multiple alignmentsEvaluating multiple alignments• Conflicting standards of truth
– evolution– structure– function
• With orphan sequences no additional information• Benchmarks depending on reference alignments• Quality issue of available reference alignment
databases• Different ways to quantify agreement with
reference alignment (sum-of-pairs, column score)• “Charlie Chaplin” problem
![Page 33: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/33.jpg)
Evaluating multiple alignmentsEvaluating multiple alignments
• As a standard of truth, often a reference alignment based on structural superpositioning is taken
![Page 34: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/34.jpg)
Evaluation measuresQuery Reference
Column score
Sum-of-Pairs score
![Page 35: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/35.jpg)
Scoring a multiple alignment
Query
Sum-of-Pairs score:
•For each alignment position: take the sum of all pairs (add a.a. exchange values)
•As an option, subtract gap penalties
![Page 36: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/36.jpg)
Evaluating multiple alignmentsEvaluating multiple alignments
∆SP
BAliBASE alignment nseq * len
![Page 37: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/37.jpg)
Summary
• Weighting schemes simulating simultaneous multiple alignment– Profile pre-processing (global/local)– Matrix extension (well balanced scheme)
• Smoothing alignment signals– globalised local alignment
• Using additional information– secondary structure driven alignment
• Schemes strike balance between speed and sensitivity
![Page 38: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/38.jpg)
References
• Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, 341-364.
• Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217.
• Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), 459-477.
![Page 39: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures](https://reader036.fdocuments.us/reader036/viewer/2022063006/5fb7e50e5a74e35f7f0fa159/html5/thumbnails/39.jpg)
Where to find this….http://www.ibivu.cs.vu.nl/teaching