Gathering Sequences: BLAST - T-CoffeeSelecting Diverse Sequences (Opus II) Selecting Diverse...
Transcript of Gathering Sequences: BLAST - T-CoffeeSelecting Diverse Sequences (Opus II) Selecting Diverse...
-
Gath
ering
Seq
uen
ces: BL
AS
TC
om
mo
n M
istake:S
equ
ences T
oo
Clo
sely Related
PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE
PRVA_HUMAN SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHMLDKDKSGFIEE
PRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEE
PRVA_MOUSE SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHILDKDKSGFIEE
PRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE
PRVA_RABIT AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHILDKDKSGFIEE
:**::*.*******:***:* :****************..::******:***********
PRVA_MACFU DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES
PRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAES
PRVA_GERSP DELGFILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSES
PRVA_MOUSE DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES
PRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES
PRVA_RABIT EELGFILKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES
:*** ******.******.**** *:************.:******:**
-IDEN
TICAL SEQ
UEN
CES BRING
NO
INFO
RMA
TION
FOR TH
EM
ULTIPLE SEQ
UEN
CE ALIG
NM
ENT
-MU
LTIPLE SEQU
ENCE A
LIGN
MEN
TS THRIVE O
N D
IVERSITY…
-
Selectin
g D
iverse Seq
uen
ces (Op
us I)
Resp
ect Info
rmatio
n!
-This Alignm
ent Is not Informative about the relation
Betww
en TPCC MO
USE and the rest of the sequences.
-A better Spread of the Sequences is needed
PRVA_MACFU ------------------------------------------SMTDLLN----AEDIKKA
PRVA_HUMAN ------------------------------------------SMTDLLN----AEDIKKA
PRVA_GERSP ------------------------------------------SMTDLLS----AEDIKKA
PRVA_MOUSE ------------------------------------------SMTDVLS----AEDIKKA
PRVA_RAT ------------------------------------------SMTDLLS----AEDIKKA
PRVA_RABIT ------------------------------------------AMTELLN----AEDIKKA
TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM
: :*. .*::::
PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI
PRVA_HUMAN VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFI
PRVA_GERSP IGAFAAADS--FDHKKFFQMVG------LKKKTPDDVKKVFHILDKDKSGFIEEDELGFI
PRVA_MOUSE IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSI
PRVA_RAT IGAFTAADS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGSI
PRVA_RABIT IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSGFIEEEELGFI
TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
:. . * .*..:*: *: * *. :::..:*:::**: .*:*: :** :
PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-
PRVA_HUMAN LKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAES-
PRVA_GERSP LKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSES-
PRVA_MOUSE LKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES-
PRVA_RAT LKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES-
PRVA_RABIT LKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES-
TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE
*: . .. :: .: : *: ***:.**:*. :** ::
-
Selectin
g D
iverse Seq
uen
ces (Op
us II)
Selectin
g D
iverse Seq
uen
ces (Op
us II)
PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLTSKSADDVKKAFAIIDQDKSGFIE
PRVB_BOACO -AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGVIDRDKSGYIE
PRV1_SALSA MACAHLCKEADIKTALEACKAADTFSFKTFFHTIGFASKSADDVKKAFKVIDQDASGFIE
PRVB_LATCH -AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKILDQDKSGFIE
PRVB_RANES -SITDIVSEKDIDAALESVKAAGSFNYKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIE
PRVA_MACFU -SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIE
PRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFNHKKFFALVGLKAMSANDVKKVFKAIDADASGFIE
: *: .: . .* .:*. * ** *: * : * :* * **:**
PRVB_CYPCA EDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA-
PRVB_BOACO EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG
PRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-
PRVB_LATCH DEELELFLQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA-
PRVB_RANES QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-
PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES
PRVA_ESOLU EEELKFVLKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA
:** .*:.* .* *: ** :: .* **** **::** **
-A REA
SON
ABLE M
odel Now
Exists.
-Going Further:Rem
ote Hom
ologues.
-
Alig
nin
g R
emo
te Ho
mo
log
ues
PRVA_MACFU ------------------------------------------SMTDLLNA----EDIKKA
PRVA_ESOLU -------------------------------------------AKDLLKA----DDIKKA
PRVB_CYPCA ------------------------------------------AFAGVLND----ADIAAA
PRVB_BOACO ------------------------------------------AFAGILSD----ADIAAG
PRV1_SALSA -----------------------------------------MACAHLCKE----ADIKTA
PRVB_LATCH ------------------------------------------AVAKLLAA----ADVTAA
PRVB_RANES ------------------------------------------SITDIVSE----KDIDAA
TPCS_RABIT -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAI
TPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAI
TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM
: ::
PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI
PRVA_ESOLU LDAVKAEGS--FNHKKFFALVG------LKAMSANDVKKVFKAIDADASGFIEEEELKFV
PRVB_CYPCA LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSGFIEEDELKLF
PRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF
PRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLF
PRVB_LATCH LEGCKADDS--FNHKVFFQKTG------LAKKSNEELEAIFKILDQDKSGFIEDEELELF
PRVB_RANES LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSGFIEQDELGLF
TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEI
TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI
TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
: . .: .. . *: * : * :* : .*:*: :** .
PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-
PRVA_ESOLU LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA-
PRVB_CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA--
PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-
PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ--
PRVB_LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA--
PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA--
TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ
TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ
TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE
:: .. :: : :: .* :.** *. :** ::
Go
ing
Fu
rther…
PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI
PRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF
PRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLF
TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEI
TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI
TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
TPC_PATYE SDEMDEEATGRLNCDAWIQLFER---KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI
. : .. . :: . : * :* : .* *. : * .
PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES--
PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG--
PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ---
TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ-
TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ-
TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE-
TPC_PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMMSSDA
: . :: : :: * :..* :. :** ::
-
WH
AT
MA
KE
S A
GO
OD
AL
IGN
ME
NT
…
-THE M
ORE D
IVERGEA
NT TH
E SEQU
ENCES, TH
E BETTER
-THE FEW
ER IND
ELS, THE BETTER
-NICE U
NG
APPED
BLOCKS SEPA
RATED
WITH
IND
ELS
-DIFFEREN
T CLASSES O
F RESIDU
ES WITH
IN A
BLOCK:
•Completely Conserved
•Conserved For Size and Hydropathy
•Conserved For Size or Hydropathy
-THE U
LTIMA
TE EVALU
ATIO
N IS A
MA
TTER OF PERSO
NN
AL JU
DG
EMEN
TA
ND
KNO
WLED
GE.
DO
NO
T O
VE
RT
UN
E!!!
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD
wheat --DPNKPKRAPSA
FFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE
trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP
mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP
***. ::: .: .. . : . . * . *: *
chite AATAKQNYIRALQEYERNGG-
wheat ANKLKGEYNKAIAAYNKGESA
trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE
* : .* . :
DO
NO
T PLAY W
ITH
PARA
METERS IF YO
U KN
OW
THE A
LIGN
MEN
TYO
U W
AN
T: MA
KE IT YOU
RSELF!
chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD
wheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE
trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP
mouse -----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP
***. :*: .: .. . : . . * . *: *
chite AATAKQNYIRALQEYERNGG-
wheat ANKLKGEYNKAIAAYNKGESA
trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE
* : .* . :
-
TU
NIN
G o
r NO
T T
UN
ING
?
-MO
ST METH
OD
S ARE TU
NED
FOR W
ORKIN
G W
ELL ON
AVERA
GE
-PARA
METERS BEH
AVIO
UR D
O N
OT N
ECESSARILY FO
LLOW
THE
THEO
RY (i.e. Substitution Matrices).
-A G
OO
D A
LIGN
MEN
T IS USU
ALLY RO
BUST(i.e. Changes little).
-TUN
E IF YOU
WA
NT TO
CON
VINCE YO
URSELF.
-PARA
METERS TO
TUN
E USU
ALLY IN
CLUD
E:•G
OP/ G
EP•M
ATRIX
•SENSITIVITY Vs SPEED
GO
P
GEP
Substitution Matrices
(Etzold and al. 1993)
Gonnet
61.7Blosum
5059.7
Pam250
59.2
KE
EP
A B
IOL
OG
ICA
L P
ER
SP
EC
TIV
E
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD
wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE
trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP
mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP
***. ::: .: .. . : . . * . *: *
chite AATAKQNYIRALQEYERNGG-
wheat ANKLKGEYNKAIAAYNKGESA
trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE
* : .* . :
chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EVAKKGGELWRGL-
wheat -DPNK----PKRAP-FFVFMGE-FREEFKQKNPKNKSVA-AVGKAAGERWKSLS
trybr -K--KDSNAPKR-AMT-MFFSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG
mouse ----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAWKNLS
* *** .:: ::... : * . . . : * . *: *
chite KSEWEAKAATAKQNY-I--RALQE-YERNG-G-
wheat KAPYVAKANKLKGEY-N--KAIAA-YNK-GESA
trybr RKVYEEMAEKDKERY----K--RE-M-------
mouse KQAYIQLAKDDRIRYDNEMKSWEEQMAE-----
: : * : .* :
DIFFEREN
T PARA
METERS
-
RE
PE
AT
S
THERE IS A
PROBLEM
WH
EN TW
O SEQ
UEN
CES DO
NO
T CON
TAIN
THE SA
ME N
UM
BER OF REPEA
TS
IT IS THEN
BETTER TO M
AN
UA
LLY EXTRACT TH
E REPEATS A
ND
TO A
LIGN
THEM
. IND
IVIDU
AL REPEA
TS CAN
BE RECOG
NIZED
USIN
G D
OTTER
Ch
oo
sing
Th
e Rig
ht M
etho
d
PROBLEM
PROG
RAM
ClustalW
ClustalW
MSA
DIA
LIGN
II
DIA
LIGN
II
METH
OD
Source: BaliBase, Thompson et al, N
AR, 1999
-
Exam
ples o
f Mistakes
Playin
g W
ith B
locks: M
bh
1
-
Playin
g W
ith B
locks: tR
NA
Syn
thases
Playin
g W
ith B
locks:R
Tase
-
Co
nclu
sion
The Best Alignm
ent Method:
•Your Brain•The Right D
ata
The Best Evaluation:•Your Eyes•Experim
ental Information (Sw
issProt)
What Can I Conclude:•H
omology=> Inform
ation Extrapolation
How
Can I go Further?:•PrositePatterns.•PrositeProfiles.