/120© Burkhard Rost
�1
title: Computational Biology 1 - Protein structure: Sequence alignments 2short title: cb1_alignments2
lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016
/120© Burkhard Rost
Videos: YouTube / www.rostlab.org/talks THANKS :. EXERCISES: Special lectures: • Mikal Boden UQ Brisbane No lecture: • 04/26 Security check Rostlab (exercise WILL be) • 05/01 May Day (also no exercise) • 05/08 Student representation (SVV) - exercise WILL happen • 05/10 Ascension Day (also no exercise) • 05/22 Whitsun holiday (also no exercise) • 05/31 Corpus Christi (also no exercise) • 06/21 no lecture (but exercise) LAST lecture: bef: Jul 12 Examen: Jul 12 18-20:00 (room TBA) • Makeup: no makeup (sorry due to overload)
�2
Dmitrij Nechaev
Your Name
Lothar Richter
Michael Heinzinger
next
AnnouncementsCONTACT: [email protected]
© Michael Leunig
© Burkhard Rost (TUM Munich)
Recap: 3D comparison
�3
/120© Burkhard Rost
�4
Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
/120© Burkhard Rost
�5
Dynamic programming: Global
GGQLAKEEALEGQPVEVL
GGQLAKEEAL EGQPVEVL
GGQLAKEEAL..EGQPVEVL
/120© Burkhard Rost
�6
Dynamic programming concept no gaps
G G Q L A K E E A LE 1 1G 1 1 1 1Q 1 2 1 1P 1 2 1V 1 2E 1 2V 1 2L 1 2P 1 2
© Burkhard Rost
allowing for gaps
�7
GGQLAKEEAL GQ..PVEVL
GGQLAKEEAL GQPVEVL
no gap with gap
/120© Burkhard Rost
�8
Dynamic programming concept gaps
G G Q L A K E E A LE 1 1G 1 1 1 1Q 2 1 1P 2 1V 2E 3V 3L 4PScore = ∑ 𝛿ij - Go * Ngap
GQPVE•V•LGQLAKEEAL
GQGQ
/120© Burkhard Rost
�9
Dynamic programming: optimal alignment
€
SW = MUkTkk=1
Lali
∑ - Go ⋅ Ngap - Ge ⋅ (Lgap - Ngap )
• Global/no gap: SB Needleman and CD Wunsch 1970 J Mol Biol 48, 443-53• Local/Gap: TF Smith and MS Waterman 1981 J Mol Biol 147, 195-197
/120© Burkhard Rost
�10
Alignments: scoring matrix
S Henikoff & J Henikoff (1992) PNAS 89:10915-9
Score=∑Mij - Go*Ngap-Ge*(Lgap-Ngap)
synonyms: [scoring | substitution | exchange | log-odds]
[matrix | metric | table]
one particular: Blossum65
/120© Burkhard Rost
�11
BLAST: fast matching of single ‘words’
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTIYKLILQGRTIKAELITEGVDGATGEKVYKQYGNQNAVDAEYTYDNATRTFTITQK
Default “word” size for “seeds” = 3
#1 seed=3
#2 extendTTYKLIL AATAEKVFKQYA WTYDDATKTFTIYKLIL GATGEKVYKQYG YTYDNATRTF
? ?
/120© Burkhard Rost
the major challenge for word search algorithms is to get the statistics right
�12
Word searches: challenge: statistics
/120© Burkhard Rost
�13
Significance of match (e.g. BLAST E-values)
Hits
Score=∑Mij - Go*Ngap-
Ge*(Lgap-Ngap)
© Burkhard Rost (TUM Munich)
Sequence comparisons:
multiple alignment
�14
© Burkhard Rost (TUM Munich)
How accurate are pairwise alignments?
�15
/120© Burkhard Rost
�16
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - profile
sequence - profilesequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
/120© Burkhard Rost
�17
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - profile
sequence - profilesequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
/120© Burkhard Rost
�18
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - profile
sequence - profilesequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
/120© Burkhard Rost
�19
40
50
60
70
80
90
100
15 20 25 30 35
Schematic: structural similarity in Twilight zone
not true data: just to illustrate the idea of the twilight
zone
True Positives=pairs of proteins with similar structure
Percentage:accuracy-or-specificity
/120© Burkhard Rost
�20
1
10
100
1,000
15 20 25 30 35
Schematic: true and false in Twilight zone
True Positives = pairs of proteins with similar structureFalse Positives=pairs of proteins with DIFFERENT structure
Number of pairs:
/120© Burkhard Rost
�21
All-vs-all: PDB
3D = structural alignment
1D = sequence alignment
<0.2nm rmsdin between>0.5nm rmsd
SAME 3DignoreDIFFER in 3D
/120© Burkhard Rost
�22
PDB all-against-all ok?
/120© Burkhard Rost
�23
Databases biased: MUST remove bias!
/120© Burkhard Rost
�24
Databases biased: MUST remove bias!
Day
light
Zo
ne
Twili
ght
Zone
Mid
nigh
t Zo
ne
redundancy-reduced vs.
redundancy-reduced?
/120© Burkhard Rost
�25
Databases biased: MUST remove bias!
/120© Burkhard Rost
�26
Sequence conservation of protein structure
B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69
/120© Burkhard Rost
�27
Raw data: density? lesson learned?
B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69
/120© Burkhard Rost
�28
How to estimate performance from the curves?
?gro
ups
groups
groupsgroupsgroups
/120© Burkhard Rost
�29
Distance from new HSSP-curve
B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69
/120© Burkhard Rost
�30
Twilight zone = true positives explode
101
102
103
104
105
106
-15 -10 -5 0 5 10
10 15 20 25 30 35
Num
ber o
f pro
tein
pai
rs
Distance from HSSP threshold
Percentage sequence identity
0
20
40
60
80
100
50 100 150 200 250 300 350 400
%
Seq
uenc
e id
entit
y
Number of residues aligned
+25+20
+15+10
+ 5 0
- 5- 10
- 15- 20
B Rost 1999 Prot Engin 12, 85-94
/120© Burkhard Rost
�31
Twilight zone = false positives bazoom!!
101
102
103
104
105
106
-15 -10 -5 0 5 10
10 15 20 25 30 35
Num
ber o
f pro
tein
pai
rs
Distance from HSSP threshold
Percentage sequence identity
0
20
40
60
80
100
50 100 150 200 250 300 350 400
%
Seq
uenc
e id
entit
y
Number of residues aligned
+25+20
+15+10
+ 5 0
- 5- 10
- 15- 20
B Rost 1999 Prot Engin 12, 85-94
© Burkhard Rost (TUM Munich)
Multiple alignment:simple hacks
�32
/120© Burkhard Rost
Dynamic programming?for 3 sequences: O(N1 x N2 x N3)NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48)
claim: computer: up to 6 ~60 TB main memoryno quote-> unsure
�33
Multiple alignments
G G Q L A K E E A LE 1 1G 1 1 1 1Q 2 1 1P 2 1V 2E 3VL 4P
/120© Burkhard Rost
Dynamic programming?for 3 sequences: O(N1 x N2 x N3)NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48) hack 1: dynamic programming: pairwise, only space in vicinity of intersection searched n-wise – H Carrillo & DJ Lipman (1988) SIAM J Applied Math. 48: 1073-82– DJ Lipman, SF Altschul, JC Kececioglu (1989) PNAS 86:4412-5
�34
Multiple alignments
/120© Burkhard Rost
Dynamic programming?for 3 sequences: O(N1 x N2 x N3)NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48) hack 1: dynamic programming: pairwise, only space in vicinity of intersection searched n-wise – H Carrillo & DJ Lipman (1988) SIAM J Applied Math. 48: 1073-82– DJ Lipman, SF Altschul, JC Kececioglu (1989) PNAS 86:4412-5
hack 2:map to tree / pairwiseRussell Doolittle, UCSD
�35
Multiple alignments
Russell Doolittle
Shapers and Shakers
/120© Burkhard Rost
�36
Multiple alignment: progressive 1
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
/120© Burkhard Rost
�37
Multiple alignment: progressive
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 1
/120© Burkhard Rost
�38
Multiple alignment: progressive
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2GGQIAKDEALGGQIAKDEAIggqiakdeal
/120© Burkhard Rost
�39
Multiple alignment: progressive 1
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2GGQIAKDEALGGQIAKDEAIggqiakdeal
ggqlakeealggqiakdeal
Step 3
Step 3
/120© Burkhard Rost
�40
Multiple alignment: progressive 2
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 1
/120© Burkhard Rost
�41
Multiple alignment: progressive 2
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2ggqlakeealGGQIAKDEAL ggqlakeeal
/120© Burkhard Rost
�42
Multiple alignment: progressive 2
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2ggqlakeealGGQIAKDEAL ggqlakeeal
Step 3
Step 3ggqlakeealGGQIAKDEAI
© Burkhard Rost (TUM Munich)
Profiles: concept
�43
/120© Burkhard Rost
�44
Pairwise vs. multiple: built-up
GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL
GGAQGLAKDEAL
/120© Burkhard Rost
�45
Pairwise vs. multiple: built-up
GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL
GGAQGLAKDEAL
GGQQLAKEEALGGQQLAKDEAL
GGAQGLAKDEAL
/120© Burkhard Rost
�46
Pairwise vs. multiple: built-up
GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL
GGAQGLAKDEAL
GGQQLAKEEALGGQQLAKDEAL
GGAQGLAKDEAL
GG.QQLAKEEALGG.QQLAKDEALGGAQGLAKDEAL
/120© Burkhard Rost
�47
Profiles profit from relation of “families”
/120© Burkhard Rost
�48
Computationally: motifs
K Robison et al (1998) JMB 284, 241-54retrieved from http://weblogo.berkeley.edu/examples.html
Transcription factor binding sites
Following slides:thanks kudos to Theresa Wirth
/120© Burkhard Rost
�49Fig. 7: M Zvelebil & JO Baum (2008) Understanding Bioinformatics, Garland
Sequence motifs
Any AAOne-letter code
Two or more possible AA Disallowed AA Repetition range X(n,m) of X
Matching sequences e.g.GLLMSACVVVGILMSAYPPGLLMSAES
Representation of a sequence with more than one possible amino acid (AA) or nuclear acid (NA) at a single position
G-[LI]-L-M-S-A-{RK}-X(1,3)
slide: Theresa Wirth
/120© Burkhard Rost
�50
Scoring matrix: generic vs. specific
slide: Theresa Wirth
Generic(here Blossum62)
Position-specific scoring matrixPSSM
/120© Burkhard Rost
• Matrix of numbers with scores for each residue or nucleotide at each position
�51
PSSM - Position specific substitution matrix: concept
PSSM: Position-specific scoring matrix
Building PSSM:
◦ Absolute frequencies◦ Add pseudo-counts if necessary◦ Relative frequencies◦ Log likelihoods
Starting point: 123456 ATGCTA ATTGCT TCTGAG GTTGAG CCATCC
slide: Theresa Wirth
/120© Burkhard Rost
�52
PSSM - Position specific substitution matrix: one solution
Absolutefrequency
A:2T:1G:1C:1
Relativefrequency
A:0.4T:0.2G:0.2C:0.2
Σ=1
1 2 3 4 5 6
A 0.4 0 0.2 0 0.4 0.2
T 0.2 0.6 0.6 0.2 0.2 0.2
G 0.2 0 0.2 0.6 0 0.4
C 0.2 0.4 0 0.2 0.4 0.2
1 2 3 4 5 6
A 0.47 ∞ -0.22 ∞ 0.47 -0.22
T -0.22 0.86 0.86 -0.22 -0.22 -0.22
G -0.22 ∞ -0.22 0.86 ∞ 0.47
C -0.22 0.47 ∞ -0.22 0.47 -0.22
Logodds
slide: Theresa Wirth
/120© Burkhard Rost
�53Olga Zhaxybayeva http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.html
`
slide: Theresa Wirth
E
© Burkhard Rost (TUM Munich)
PSI-Blast
Position-Specific Iterative-Basic Local Alignment Tool
�54
/120© Burkhard Rost
1. fast hashing
�55
PSI-BLAST in steps
/120© Burkhard Rost
�56
Like BLAST match ‘words’
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
? ?
Default “word” size for “seeds” = 3
#1 seed=3
#2 extend
/120© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches
�57
PSI-BLAST in steps
/120© Burkhard Rost
�58
BLAST + Smith-Waterman
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
#1 seed=3
#2 extend
dynamic programming to extend
/120© Burkhard Rost
�59
BLAST + Smith-Waterman
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
#1 seed=3
#2 extend
dynamic programming to extend
Why is this fast?
/120© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics
EVAL - Expectation values
�60
PSI-BLAST in steps
Hits
/120© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics 4. collect all pairs and build profile
�61
PSI-BLAST in steps
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
/120© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics 4. collect all pairs and build profile 5. ?
�62
PSI-BLAST in steps
??? 1 50
fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
/120© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics 4. collect all pairs and build profile 5. iterate
�63
PSI-BLAST in steps
/120© Burkhard Rost
�64
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PSI- position specific iteration
/120© Burkhard Rost
�65
Steps involved for profile-based alignments
sequence-sequence query database using substitution metric
results -> profile
/120© Burkhard Rost
�66
Steps involved for profile-based alignments
sequence-sequence query database using substitution metric
results -> profile
profile-sequence query database using PSSM/profile
results -> update profile
/120© Burkhard Rost
�67
Steps involved for profile-based alignments
sequence-sequence query database using substitution metric
results -> profile
profile-sequence query database using PSSM/profile
results -> update profile itera
te
/120© Burkhard Rost
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
�68
Sequence-profile methods
/120© Burkhard Rost
all against all (pairs) by dynamic programming (varying substitution matrices) build phylogenetic tree
�69
Clustal (ClustalW, ClustalX)
A B C DA 90 80 70B 90 80C 90D A B C D
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
/120© Burkhard Rost
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
MaxHom relatively slow, dynamic programming, good first guessC Sander & R Schneider (1991) Proteins 9:56-69
�70
Sequence-profile methods
/120© Burkhard Rost
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
SAM/HMMer slow, need preprocess, HMM (statistics), very accurateR Hughey & A Krogh (1996) CABIOS 12:95-107 S Eddy (1998) Bioinformatics 14:755-63
�71
Sequence-profile methods
/120© Burkhard Rost
�72
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
e
profile - profilesequence - profile
sequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
© Burkhard Rost (TUM Munich)
another way to do math: HMM - Hidden Markov
Models�73
/120© Burkhard Rost
�74SR Eddy (1998) Bioinformatics 755-63
HMM: Hidden Markov Models
Fig. 1. A toy HMM, modeling sequences of as and bs as two regions of potentially different residue composition. The model is drawn (top) with circles for states and arrows for state transitions. A possible state sequence generated from the model is shown, followed by a possible symbol sequence. The joint probability P(x,π|HMM) of the symbol sequence and the state sequence is a product of all the transition and emission probabilities. Notice that another state sequence (1-2-2) could have generated the same symbol sequence, though probably with a different total probability. This is the distinction between HMMs and a standard Markov model with nothing to hide: in an HMM, the state sequence (e.g. the biologically meaningful alignment) is not uniquely determined by the observed symbol sequence, but must be inferred probabilistically from it.
/120© Burkhard Rost
�75
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
/120© Burkhard Rost
�76
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
W
L
D
WW
W
L
LLL
L
L
/120© Burkhard Rost
�77
ProfileHMM: example football
slide: Marco Punta
W D LProbabilistic
W D L
/120© Burkhard Rost
�78
ProfileHMM: example football
slide: Marco Punta
Our model for Augsburg’s Bundesliga results:
3 states: W, D, LS(t)=F(S(t-1))States S connected by probabilities
pij≥0; j
pij=1Σ
/120© Burkhard Rost
�79
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
W
L
D
WW
W
L
LLL
L
L
/120© Burkhard Rost
�80
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
W
L
D
WW
W
L
LLL
L
L
H
A
A
H
H
H
A
H
A
H
A
A
/120© Burkhard Rost
�81
ProfileHMM: example football
slide: Marco Punta
W D L
H A
/120© Burkhard Rost
�82
ProfileHMM: formalism
slide: Marco Punta
HMMs are probabilistic models defined by:
! A finite set S of states! A discrete alphabet A of symbols (observed objects)! A probability transition matrix T=(tij) , i,j states! A probability emission matrix E=(eix), i state, x symbol
… …
states
symbols
tij tij tij tij
eix eix eix eix
/120© Burkhard Rost
�83
ProfileHMM - intro
Profile-HMMs are probabilistic models…
Model -> simulates a system Probabilistic -> produces outcomes based on probabilities
/120© Burkhard Rost
�84
ProfileHMM: example 2: traffic light
R Y G
/120© Burkhard Rost
�85
ProfileHMM: example 2: traffic light
R Y G
Deterministic
/120© Burkhard Rost
�86
ProfileHMM: example 3: weather
/120© Burkhard Rost
�87
ProfileHMM: example 3: weather
S C R
/120© Burkhard Rost
�88
ProfileHMM: example 3: weather
S C R
Probabilistic
*In fact, chaotic, deterministic
*
/120© Burkhard Rost
�89
ProfileHMM: example 3: weather
Probabilistic
pijTransition probability from symbol i to symbol j
*
*In fact, chaotic, deterministic
/120© Burkhard Rost
�90
Probabilistic, 1st order Markov model
Example 2: The weather
P(Xn+1=x|X1=x1,…,Xn=xn)=P(Xn+1=x|Xn=xn)
© Burkhard Rost (TUM Munich)
HMM for alignment
�91
/120© Burkhard Rost
�92M Zvelebil & JO Baum (2008) Understanding Bioinformatics, Garland
Generic Profile-HMM for alignment
M1 M2 M3 M4
I0 I1 I3I2 I4
D1D4D3D2
START END
◦ Captures matches, insertions and deletions◦ Transition and emission probabilites◦ Gap penalty handled by variation of transition probabilities◦ Calculation of probability by multiplication of path variables
slide: Theresa Wirth
/120© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned,
�93
Entropy in alignment
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
© Kevin Karplus UCSC
/120© Burkhard Rost
consider residue position iBEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0
�94
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
/120© Burkhard Rost
consider residue position iBEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignmentposterior probability Pi + priors -> Hi if conserved: Hi -> 0; if varied: Hi -> H0 Hi-H0 reflects the “bits saved” by the alignment
�95
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
/120© Burkhard Rost
• few members / little divergence entropy dominated by priors-> the background signal dominates
�96
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPI
QRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34CMYFPHIVDGTNSALQEKR
35WHQYPMERDNKFGTSIVLAC
36CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
Alignment entropy for small families
0
1
2
bits
sav
ed
1MHYFQPINRGDKELVAST
2MFYHIV
PLNGTQDSAERK
3CMYHFI
QNRVTLDGKESAP
4CMPQDNGHWERKTSIAVLFY
5MFYHI
PVNGDTLQSAEKR
6WMCFYIHVQLPRTKENDSAG
7MIPFVQTYGNLDARSEKH
8MFYHI
PVNGDTLQSAEKR
9CDQNPHGEKRMWSTAVIYLF
10
MHYFQPINRGDKELVAST
11
MFYHIV
PLNGTQDSAERK
12
MFYHIVLPGNTRQSAKDE
13
FIYHVPLQRATKESGDN
14
WHCNDQGPYRMKESFTALIV
15
MFYHI
PVNGDTLQSAEKR
16
WHCNDQGPREKYSMTFAVLI
17
CWHNDQGPERSKYTMAFVIL
18
MFYHIVLPGNTRQSAKDE
19
MFHYIQVLPRNKDEGTAS
20
CPMQHDNKEGRTSAIVLYFW
21
CDQNPHGEKRMWSTAVIYLF
22
HCMYFNQRPDIKTELSGVA
23
MFYHIV
PLNGTQDSAERK
24
FIYHVPLQRATKESGDN
25
WHCNDQGPREKYSMTFAVLI
26
MFYHIVLPGNTRQSAKDE
27
FIYHVPLQRATKESGDN
28
CMYHFI
QNRVTLDGKESAP
29
CMPQDNGHWERKTSIAVLFY
30CWHNDQGPERSKYTMAFVIL
31MFIYHVLPRQTAKGSNED
32MHYFQPINRGDKELVAST
33
MFYHIV
PLNGTQDSAERK
34
WMCFYIHVQLPRTKENDSAG
35
CWHNDQGPERSKYTMAFVIL
36
MFYHIVLPGNTRQSAKDE
37
FIYHVPLQRATKESGDN
38
CWHNDQGPERSKYTMAFVIL
39
CHNDPGQYREKSTFAIVLM
40
MFYHIV
PLNGTQDSAERK
41
FIYHVPLQRATKESGDN
42
MHYFQPINRGDKELVAST
0
1
2
3
4
bits
sav
ed
1HVLPTNGQASDREK
2LDPATNSEQGRK 3IYVHPL
TANGDSQEKR
4YIVPHDTLSANGEQKR
5PGK 6FMYIPVHDGTNLSAEQKR
7GYEFKRQSIAVLT
8SIVTAL 9PQGMNEKRS
HWTAVLIYF
10
LHQRAGPNKDEST
11
PGQNTADERSK
12
ASE
13
FPYGNIDRSTLVKEAQ
14
QEYFSKARIVTL
15
QEVAL
16
QFKSRTIVLAE
17
CTYAFVMIL
18
HIPGVLTSNADQRKE
19
HPLGTQDNSAERK
20
AVLE
21
RTSHAMVIWLYF
22
AL
23
REAVKFL
24
PRAGQKEHDTSN
25
AIVL
26
LDTRASEK
27 28
EANSQKRP
29
MPDQGHENSI
RATVFLKY
30
NGQEKRYSMTFAVIPL
31
HPQRKAEGNDTS
32
SEIAVLR
33
RSAE
34
PMIGNLVSTADQKRE
35
DPHGMNYFQESTAVILKR
36
KRTVAILE
37
GVNTDLRSKAQE
38
CKSYTAFMVIL
39
HYFNDQCPRMKEILTVGSA
40
TDNVLSQRAEK
41
VEKASL
42
NPGQEKCRYMSFAIVTL
© Kevin Karplus UCSC
/120© Burkhard Rost
�97
SAM-T98: Build alignment
© Kevin Karplus UCSC
Reestimate the alignment with the new homologs
Use the model to search for additional homologs
Build a model from the sequence or alignment
SAM-T98 Alignment Building
(Iterations 1 - 3)
Start: a single sequence
End: a SAM-T98 alignment(Iteration 4)
/120© Burkhard Rost
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
SAM/HMMer slow, need preprocess, HMM (statistics), very accurateR Hughey & A Krogh (1996) CABIOS 12:95-107/ S Eddy (1998) Bioinformatics 14:755-63
T-Coffee much slower, requires preprocessing, Genetic AlgorithmCedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
�98
Sequence-profile methods
© Burkhard Rost (TUM Munich)
Genetic Algorithm for alignment
�99
/120© Burkhard Rost
�100
Independence assumption
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
dynamic programming(smith-waterman) PSI-BLAST sequence-sequence
orsequence-profile
SAMHMMer
/120© Burkhard Rost
�101
Independence assumption
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
dynamic programming(smith-waterman) PSI-BLAST sequence-sequence
orsequence-profile
SAMHMMer
ALL assume that alignment at position i independent of alignment at position j
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
E E R E E D
K K K E E R
© Burkhard Rost
Genetic algorithm does not make the
independence assumption
�102
© Burkhard Rost
Genetic algorithm operates on segments
�103
/120© Burkhard Rost
�104
Genetic algorithm - concept
The 2006 NASA ST5 spacecraft antenna. This complicated shape was found through a genetic algorithm optimizing the radiation
pattern. It is known as an Evolved antenna.
© Wikipedia
/120© Burkhard Rost
�105
Genetic algorithm - concept
The 2006 NASA ST5 spacecraft antenna.
This complicated shape was found through a genetic algorithm optimizing the
radiation pattern. It is known as an Evolved
antenna.
© W
ikipe
dia
Initialize population
Compute fitness
STOP? • roulette parents
• cross-over to children
• mutate children• compute fitness
START
END
Crossoverparents
child
mutation
/120© Burkhard Rost
Begin with “library” of local and global pairwise alignments
�106© Cedric Notredame CRG
T-Coffee Genetic algorithm (GA)
/120© Burkhard Rost
Local Alignment Global Alignment
Extension
Multiple Sequence Alignment
�107
T-Coffee: Mix local and global alignment
© Cedric Notredame, CRG Barcelona
/120© Burkhard Rost
Local Alignment Global Alignment
Multiple Sequence Alignment
Multiple Alignment
StructuralSpecialist
�108
T-Coffee: Use more information
© Cedric Notredame, CRG Barcelona
/120© Burkhard Rost
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for expertsJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
MaxHom relatively slow, dynamic programming, good first guessC Sander & R Schneider (1991) Proteins 9:56-69
T-Coffee much slower, requires preprocessing, Genetic AlgorithmCedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
SSEARCH/PSI-Search SSEARCH - similar to MaxHom with SW onlyPSI-Search: iterated SSEARCHW Liu, H McWilliam, M Goujon, A Cowley, R Lopez, WR Pearson (2013) Bioinformatics 28:1650-1
�109
Sequence-profile methods
/120© Burkhard Rost
�110
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
/120© Burkhard Rost
�111
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
e
profile - profilesequence - profile
sequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
/120© Burkhard Rost
pairwise multiple sequence-profile ?
�112
evolution of alignment methods
© Burkhard Rost (TUM Munich)
Profile-profile alignments
�113
/120© Burkhard Rost
�114
Profile-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
/120© Burkhard Rost
complicated: too many free parameters
�115
Problems with profile-profile
© Burkhard Rost (TUM Munich)
Profile-profile modern: HHblits
�116
/120© Burkhard Rost
�117
HHblits: pipeline
M Remmert et al & J Söding (2011) Nat Methods 9:173-5
© Wikipedia
/120© Burkhard Rost
PSI-BLAST fast, partial dynamic programmingSF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for experts: all in family treated equalJD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
MaxHom/SSEARCH/PSI-Search relatively slow, dynamic programming, good first guess: query is specialC Sander & R Schneider (1991) Proteins 9:56-69, W Liu, H McWilliam, M Goujon, A Cowley, R Lopez, WR Pearson (2013) Bioinformatics 28:1650-1
SAM/HMMer slow, need preprocess, HMM (statistics), very accurateR Hughey & A Krogh (1996) CABIOS 12:95-107/ S Eddy (1998) Bioinformatics 14:755-63
T-Coffee much slower, requires preprocessing, Genetic AlgorithmCedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
HHBlits SSEARCH - similar to MaxHom with SW onlyPSI-Search: iterated SSEARCHM Remmert et al & J Söding (2011) Nat Methods 9:173-5
�118
Sequence-profile methods
/120© Burkhard Rost
�119
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
e
profile - profilesequence - profile
sequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
/120© Burkhard Rost
01: 04/10 Tue: No lecture 02: 04/12 Thu: No lecture 03: 04/17 Tue: No lecture 04: 04/19 Thu: Intro 1: organization of lecture: intro into cells & biology 05: 04/24 Tue: Intro 2: amino acids, protein structure (comparison), domains 06: 04/26 Thu: No lecture 07: 05/01 Tue: SKIP: May Day 08: 05/03 Thu: Alignment 1 09: 05/08 Tue: SKIP: Student Representation (SVV) 10: 05/10 Thu: SKIP: Ascension Day 11: 05/15 Tue: Alignment 2 12: 05/17 Thu: Comparative modeling & exp structure determination & secondary structure assignment 13: 05/22 Tue: SKIP: Whitsun holiday 14: 05/24 Thu: 1D: Secondary structure prediction 1 15: 05/29 Tue: 1D: Secondary structure prediction 3 16: 05/31 Thu: SKIP: Corpus Christi 17: 06/05 Tue: 1D: Transmembrane structure prediction 1 18: 06/07 Thu: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction 19: 06/12 Tue: 1D: Transmembrane structure prediction 3 / Solvent accessibility prediction 20: 06/14 Thu: 1D: Disorder prediction 21: 06/19 Tue: 2D prediction / 3D prediction 22: 06/21 Thu: No lecture 23: 06/26 Tue: recap 1 24: 06/28 Thu: recap 2 25: 07/03 Tue: TBA 26: 07/05 Thu: TBA 27: 07/10 Tue: TBA 28: 07/12 Thu: TBA
�120
Lecture plan (CB1 structure: INF)
today
Top Related