Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics....

33
Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK Email:[email protected]

Transcript of Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics....

Page 1: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics.

Dr. Hyunji KimDepartment of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UKEmail:[email protected]

Page 2: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

1) BLAST/WUBLAST

A search engine to find sequences of your interest.BLAST can sophisticate its search, by varying substitution matrices/filtering options on a specified database. http://www.ncbi.nlm.nih.gov/BLAST/, http://www.ebi.ac.uk/blast2/,

2) ClustalW/T-Coffee/Muscle

Helps us make sense of a bunch of unaligned sequences, via generating multiple or pairwise sequence alignments. Uses a progressive-alignment method. http://www.ebi.ac.uk/clustalw/

3) HMMer/PSI-BLAST

Builds a profile Hidden Markov Model from a set of sequences aligned.Aligns sequences using a pHMM, searches from a sequence database, and can assign functions to a given

sequence.http://hmmer.wustl.edu/

4) Phylip/TreeDyn

Calculates a distance matrix from a set of sequences. Derives phylogenetic trees, by taking such matrix as input, based upon theories of minimum evolution, parsimony and more.http://evolution.genetics.washington.edu/phylip.html

Basic Tools

Page 3: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

5) Databases

• Nucleotide databases; EMBL, Genbank &DDBJ• Protein databases; fully annotated, e.g. Swiss-Prot v52.3, as of 17th of Apr., 2007. (264,492 entries) a computer-annotated, e.g. TrEMBL v35.3

• Genomics databases; Ensembl & Eukaryota, Bacteria and Archaea genomes 20+14;(v44), 51, 445, 40, as of 20th of Apr., 2007.

http://www.ebi.ac.uk/uniprot/index.html, http://www.ensembl.org/, http://www.ebi.ac.uk/genomes/index.html

6) Major Bioinformatics Centres, around the globe.

http://www.ebi.ac.uk/, http://www.ncbi.nlm.nih.gov/, http://www.ddbj.nig.ac.jp/, http://us.expasy.org/, http://www.sanger.ac.uk/, http://geneontology.org/

Page 4: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Searching for sequences by homology

- BLAST

Page 5: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

x

yi

j

Page 6: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Page 7: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Page 8: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Reference: Gish, W. (1996-2006) http://blast.wustl.edu

Query= KcsA (160 letters) >Filtered+0 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE RRGHFVRHSEKXXXXXXXXXXXXLHERFDRLERMLDDNRR

Database: swissprot 223,100 sequences; 81,965,973 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done

Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N

SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. 615 3.0e-60 1 SW:KCSA_STRLI P0A334 Voltage-gated potassium channel. 615 3.0e-60 1

>SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. Length = 160

Score = 615 (221.5 bits), Expect = 3.0e-60, P = 3.0e-60, Group = 1 Identities = 120/160 (75%), Positives = 120/160 (75%)

Query: 1 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI 60 MPPM GRHGSALHWR GSYLAVLAERGAPGAQLI Sbjct: 1 MPPMLSGLLARLVKLLLGRHGSALHWRAAGAATVLLVIVLLAGSYLAVLAERGAPGAQLI 60

Query: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE Sbjct: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120

Page 9: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Multiple sequence alignment

– ClustalW

Page 10: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

***************************************************** CLUSTAL W (1.83) Multiple Sequence

Alignments ***************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice: 2

****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now

(Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:

Page 11: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Page 12: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

CLUSTAL W (1.82) multiple sequence alignment

KVAP_AERPE FDALW-WAVVTATTVGYGDVVP-ATPIGKVIGIAVMLTGISALTLLIGTVSNMF------ 79MVP_METJA FDAFY-FTTISITTVGYGDITP-KTDAGKLI---IIFS---VLFFISGLITS-------- 70O28600 FDSLY-MTVITITTTGYGEVKP-MGPGGRVISMLLMFVGVGTF----------------- 64Q8TXQ4 LTCLY-FTAATITTVGYGDVVP-TTEAGRLLSVIVMFSGIGVASYAL------------- 73Q6L2S2 FTSLW-WTMQTITTVGYGDTPV-YGFYGRINGMLIMVFGIGTIGYVTASLAT-------- 79Q979Z2 FTAIW-FTMETVTTVGYGDVVP-VSNLGRVVAMLIMVSGIGLLGTLTATISAYLF----Q 80O26605 EDSLW-YVLQTITTVGYGDIVP-VTSLGRFTGMVIMFSAIASTSLITASATSTLLERGEQ 114Q9HIA8 GNAFY-YTGEVITTLGFGDILP-VTMDAKIFTISLAFLGVAIFFSSITALILPSVERRLG 94Q97CK5 GTALY-YTGETVTTLGFGDILP-VDLESRLFTISLAFLGVAIFFSAMTALITPTIERRVG 84

GrayOthers

Hydroxyl, AmineGreenSTYHCNGQ

BasicMagentaRHK

AcidicBlueDE

Small (small+ hydrophobic (incl.aromatic -Y))

RedAVFPMILW

Page 13: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Profile alignment & Pattern recognition: HMMer More sensitive homology-search: PSI-BLAST &

HMMer

Page 14: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

DNA sequence

Amino acid sequence

Page 15: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Page 16: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

PSI-BLAST

Page 17: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Phylogeny: Phylip & Treedyn

Page 18: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Saitou N and Nei M, The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4):406-425, 1987

Page 19: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Page 20: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Page 21: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

TreeDyn

Page 22: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Protein secondary structure prediction:

two consensus methods

Page 23: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

http://sbcb.bioch.ox.ac.uk/TM_noj/TM_noj.html

Page 24: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

640 650 660 670 680 690 700 | | | | | | | MFAKGYGKNNEPLRGYILTFLIALGFILIAELNVIAPIISNFFLASYALINFSVFHASLAKSPGWRPAFKALOM2 ***************** DAS **************************************** HMMTOP2 ****************** ************************* MEMSAT1.5 ************************* PHD ************************* SPLIT4 **************** *************************** TMAP ***************************** TMFINDER **************************************** TMHMM2 *********************** ****************** TMPRED ************************* TOPPRED2 ********************* ********************* Consensus ------------???hhhhHHHHHHHHHHHHHHHHHhHHhhhhhhhhh???????????-----------

Dr. Jonathan Cuthbertson developed Transmembrane Prediction Server.

Example Output

Page 25: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

http://pongo.biocomp.unibo.it/pongo

Pongo

Page 26: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Example Output by Pongo

Page 27: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Background for practical sessions

Page 28: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Ion channels ; Potassium channels ; Voltage-gated potassium channels

• Ion channels are a diverse class of transmembrane proteins that are responsible for the diffusion of ions across the cell membranes.

• There are several major families of ion channels, for instance K+, Na+, Ca2+ and Cl- channels as well as ligand gated ion channels (LGICs).

•Many human neurological and muscular disorders have been traced to defects in voltage-gated and ligand-gated ion channels. Fig 2. A. Long et al., Science, Vol. 309, p897, 2005

TM

T1

Introduction to your input sequence

Page 29: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

K+ channels, blastp

Homologues are visualised in BLIXEM.

Your expected blastp-output

Page 30: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Kv

BK

SK

Erg

Kir

CNG

AKT

Kv1.xShabKv2.xShalKv4.xKv5.6.8.9.ShawKv3.x

Kir2.xKir6.2Kir3.xKir4.xKir1.1Kir6.1Kir2.3

Fig 4. Shealy et al., Biophysical Journal, Vol 84, p2929, 2003

Alignment you are about to build, not necessarily as big.

Page 31: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

hmmsearch - search a sequence database with a profile HMM

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -HMM file: Kv.hmm [Kv_homologues]Sequence database: infile_comb- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query HMM: Kv_homologuesHMM has been calibrated; E-values are empirical estimates]Scores for complete sequences (score includes all domains):

Sequence Description Score E-value N -------- ----------- ----- ------- ---CIKS_DROME 241.2 3.2e-71 1Q9VX00_DROME 234.3 3.9e-69 1CIKB_DROME 159.3 1.5e-46 1O62350_Celegans 156.7 8.8e-46 1Q9VLC6_DROME 156.6 9.6e-46 1CIKW_DROME 156.5 1e-45 1Q8SYL2_DROME 156.5 1e-45 1Q22012_Celegans 155.3 2.4e-45 1Filtered_5DROME 140.5 6.6e-41 1Filtered_6DROME 140.5 6.6e-41 1Q9XXD1_Celegans 125.0 3.1e-36 1

Example of pHMM-related output

Page 32: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Kir

Kv

BK

SK

AKT

CNG/HErg

KcsA

MthK

Kv1.2

KvAP

Raw tree-files produced by PHYLIP

Page 33: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Phylogenetic trees modified in TreeDyn