Optimization of phycobiliprotein extraction and characterization of phycocyanin from Anabaena sp
Fast, sensitive homology detection using HMMER ·...
Transcript of Fast, sensitive homology detection using HMMER ·...
![Page 1: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/1.jpg)
Fast, sensitive homology detection using HMMER
Rob FinnSequence Families Team Lead@robdfinn, [email protected] Nov 2018
![Page 2: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/2.jpg)
Making sense of sequence data
Sequence Data Information
Experimental Literature
Sporadic Literature
Similarity
Uncharacterized
Reference Proteomes
Complete Proteomes
Other Sequences & Metagenomics
Model Organisms
![Page 3: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/3.jpg)
MGnify Protein Database
• >1 billion sequences, mean length of 205
• <1% match UniProtKB, but 58% match Pfam
0
1
2
3
4
5
0 500 1000 1500 2000length of sequence
Freq
uenc
y (m
illion
s) ProteinPartail
C−term truncated
N−term truncated
Full length
Length distribution
-200,000,000
0
200,000,000
400,000,000
600,000,000
800,000,000
1,000,000,000
1,200,000,000
2002 2004 2006 2008 2010 2012 2014 2016 2018
Num
ber o
f Seq
uenc
es
Year
Growth of MGnify Compared with UniProt
UniProt MGNify
![Page 4: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/4.jpg)
undergone convergent evolution to form a stable 3-on-3 a-helical sandwich fold. Interest-ingly, it was subsequently discovered that phycocianins can aggregate forming clusters thatthen adhere to the membrane forming the so-called phycobilisomes. Such a functionalrelationship may indeed point to convergent evolution from a distant common ancestor.
The second example, which is extracted from the work of one of our groups (Tsigelnyet al., 2000), illustrated how the combination and integration of different sources ofinformation, including structural alignments, could help to functionally characterize aprotein. In our work, two new EF-hand motifs were identified in acetylcholinesterase(AChE) and related proteins bycombining the results fromahiddenMarkovmodel sequencesearch, Prosite pattern extraction, and protein structure alignments by CE. It was also foundthat thea–b hydrolase fold family, including acetylcholinesterases, contains putative Ca2þ
binding sites, indicative of an EF-hand motif, and which in some family members may becritical for heterologous cell associations. This putative finding represented the secondcharacterization of an EF-hand motif within an extracellular protein, which previouslyhad only been found in osteonectins. Thus, structure alignment had contributed to ourunderstanding of an important family of proteins.
Finally, the third example, also from a previous work of one of our groups (McMahonet al., 2005), combined information from structural alignments deposited in the DBAlidatabase and experiments to analyze the sequence and fold diversity of a C-type lectindomain.Wedemonstrated that theC-type lectin fold adoptedbyamajor tropismdeterminantsequence, a retroelement-encoded receptor binding protein, provides a highly staticstructural scaffold in support of a diverse array of sequences. Immunoglobulins are knownto fulfill the same role of a scaffold supporting a large variety of sequences necessary for anantigenic response. C-type lectins were shown to represent a different evolutionary solutiontaken by retroelements to balance diversity against stability.
MULTIPLE STRUCTURE ALIGNMENT
Our discussions thus far have involved only pair-wise structure comparison and alignment,or at best, alignment of multiple structures to a single representative in a pair-wise fashion(i.e., progressive pair-wise structure alignment). Most of the available methods for multiplestructure alignment start by computing all pair-wise alignments between a set of structuresbut then use them to generate the optimal consensus alignment between all the structures.
Figure 16.2. Structurealignment for c-phycocyanin (1CPC:A) (black)andcolicinA (1COL:A) (gray)as
computed by SALIGN. The alignment extended over 86 residues with a 0.97A"RMSD. The sequence
identity of the superposed residues with respect to the shorter of the two structures was 11.9%.
406 STRUCTURE COMPARISON AND ALIGNMENT
Sequence And Structure Alignments
undergone convergent evolution to form a stable 3-on-3 a-helical sandwich fold. Interest-ingly, it was subsequently discovered that phycocianins can aggregate forming clusters thatthen adhere to the membrane forming the so-called phycobilisomes. Such a functionalrelationship may indeed point to convergent evolution from a distant common ancestor.
The second example, which is extracted from the work of one of our groups (Tsigelnyet al., 2000), illustrated how the combination and integration of different sources ofinformation, including structural alignments, could help to functionally characterize aprotein. In our work, two new EF-hand motifs were identified in acetylcholinesterase(AChE) and related proteins bycombining the results fromahiddenMarkovmodel sequencesearch, Prosite pattern extraction, and protein structure alignments by CE. It was also foundthat thea–b hydrolase fold family, including acetylcholinesterases, contains putative Ca2þ
binding sites, indicative of an EF-hand motif, and which in some family members may becritical for heterologous cell associations. This putative finding represented the secondcharacterization of an EF-hand motif within an extracellular protein, which previouslyhad only been found in osteonectins. Thus, structure alignment had contributed to ourunderstanding of an important family of proteins.
Finally, the third example, also from a previous work of one of our groups (McMahonet al., 2005), combined information from structural alignments deposited in the DBAlidatabase and experiments to analyze the sequence and fold diversity of a C-type lectindomain.Wedemonstrated that theC-type lectin fold adoptedbyamajor tropismdeterminantsequence, a retroelement-encoded receptor binding protein, provides a highly staticstructural scaffold in support of a diverse array of sequences. Immunoglobulins are knownto fulfill the same role of a scaffold supporting a large variety of sequences necessary for anantigenic response. C-type lectins were shown to represent a different evolutionary solutiontaken by retroelements to balance diversity against stability.
MULTIPLE STRUCTURE ALIGNMENT
Our discussions thus far have involved only pair-wise structure comparison and alignment,or at best, alignment of multiple structures to a single representative in a pair-wise fashion(i.e., progressive pair-wise structure alignment). Most of the available methods for multiplestructure alignment start by computing all pair-wise alignments between a set of structuresbut then use them to generate the optimal consensus alignment between all the structures.
Figure 16.2. Structurealignment for c-phycocyanin (1CPC:A) (black)andcolicinA (1COL:A) (gray)as
computed by SALIGN. The alignment extended over 86 residues with a 0.97A"RMSD. The sequence
identity of the superposed residues with respect to the shorter of the two structures was 11.9%.
406 STRUCTURE COMPARISON AND ALIGNMENT
Fig. Adapted from Chap.16, Structural Bioinformatics, 2nd Ed., Marti-Renom et al
![Page 5: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/5.jpg)
• Statistical inference, accounting for uncertainty
• Use more information
Profile hidden Markov models
![Page 6: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/6.jpg)
• Statistical inference, accounting for uncertainty
• Use more information
Profile hidden Markov models
![Page 7: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/7.jpg)
P(t | model of homology to q)
![Page 8: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/8.jpg)
P(t | model of nonhomology)P(t | model of homology to q)
![Page 9: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/9.jpg)
P(t | H)P(t | R)
![Page 10: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/10.jpg)
S = log P(t | H)P(t | R)
![Page 11: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/11.jpg)
S = log P(t,πo | H)P(t | R)
joint probability of t, and the alignment
![Page 12: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/12.jpg)
S = log P(t,πo | H)P(t | R)
Optimal alignment scores are only an approximation.and the approximation breaks down on remote homologs.
...GHRL...
...| |...
...GI-M...
![Page 13: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/13.jpg)
S = log P(t,πo | H)P(t | R)
Optimal alignment scores are only an approximation.and the approximation breaks down on remote homologs.
...GHRL...
...| |...
...GI-M...
![Page 14: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/14.jpg)
V = logP(t,πo | H)
P(t | R)maxπ P(t,π | H)
P(t | R)= log
F = logP(t | H)P(t | R)
Σπ P(t,π | H)P(t | R)
= log
According to inference theory, the correct score is a log-odds ratio summed over all alignments
optimal alignment score HMMs: "Viterbi" score, V
HMMs: "Forward" score, F
![Page 15: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/15.jpg)
According to inference theory, the correct score is a log-odds ratio summed over all alignments
Depends on: - a probability model of alignment, not just scores - algorithms fast enough to use in practice
V = logP(t,πo | H)
P(t | R)maxπ P(t,π | H)
P(t | R)= log
F = logP(t | H)P(t | R)
Σπ P(t,π | H)P(t | R)
= log
optimal alignment score HMMs: "Viterbi" score, V
HMMs: "Forward" score, F
![Page 16: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/16.jpg)
BLAST (almost)
HMMER
According to inference theory, the correct score is a log-odds ratio summed over all alignments
V = logP(t,πo | H)
P(t | R)maxπ P(t,π | H)
P(t | R)= log
F = logP(t | H)P(t | R)
Σπ P(t,π | H)P(t | R)
= log
optimal alignment score HMMs: "Viterbi" score, V
HMMs: "Forward" score, F
Depends on: - a probability model of alignment, not just scores
![Page 17: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/17.jpg)
• Statistical inference, accounting for uncertainty
• Use more information
Profile hidden Markov models
![Page 18: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/18.jpg)
Profile Hidden Markov Models - Encapsulate diversity
M1 M2 M3 M4 M5B E
I0 I1 I2 I4I3 I5
D5D4D3D1 D2
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
Plan7 core model
Input multiple alignment: Consensus columns assigned,Defining inserts and deletes:
N T A S G
W F L Y
D E QC
![Page 19: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/19.jpg)
Profile Hidden Markov Models
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
Input multiple alignment: Consensus columns assigned,Defining inserts and deletes:
![Page 20: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/20.jpg)
Profile Hidden Markov Models
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
Input multiple alignment: Consensus columns assigned,Defining inserts and deletes:
![Page 21: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/21.jpg)
Profile Hidden Markov Models
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
Input multiple alignment: Consensus columns assigned,Defining inserts and deletes:
![Page 22: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/22.jpg)
Profile Hidden Markov Models
M1 M2 M3 M4 M5B E
I0 I1 I2 I4I3 I5
D5D4D3D1 D2
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
Plan7 core model
Input multiple alignment: Consensus columns assigned,Defining inserts and deletes:
N T A S G
W F L Y
D E QC
![Page 23: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/23.jpg)
Profile Hidden Markov Models
M1 M2 M3 M4 M5B E
I0 I1 I2 I4I3 I5
D5D4D3D1 D2
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
seq1 ACG-LDseq2 SCG--ESeq3 NCGgFD Seq4 TCG-WQ 123-45
Plan7 core model
Input multiple alignment: Consensus columns assigned,Defining inserts and deletes:
N T A S G
W F L Y
D E QC
![Page 24: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/24.jpg)
anecdotal search example: globin superfamily
query: alignment of three vertebratehemoglobins and one myoglobin
target db: Uniprot 7.0 (207K seqs)(contains about 1060 known globins)
at E <= 0.01:PSI-BLAST sees: 915 globins (9 sec)HMMER3 sees: 1002 globins (8sec)
Aplysia myoglobin (PDB 1mba)
~300 Mya
~550 Mya
~600-700 Mya?
~1000 Mya?
~2500 Mya?
HBA_HUMANHBA_MOUSE
HBB_HUMANHBB1_MOUSE
MYG_HUMANMYG_MOUSE
NGB_HUMANNGB_MOUSE
LGB1_PEALGB2_PEA
HMP_VIBCHHMP_ECOLI
alpha hemoglobins
beta hemoglobins
myoglobins
neuroglobins
plant leghaemoglobins
bacterial nitric oxide dioxygenases
4e-463e-42
2e-579e-50
1e-452e-41
--
--
1.10.45
PSI-BLAST HMMERE-value (statistical significance)
9e-624e-554e-642e-572e-586e-54
1e-72e-7
5e-55e-6
0.004-
![Page 25: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/25.jpg)
Projecting profile HMMs back onto structures
generated using http://www.skylign.org/3DPatch/
Jakubec D et al, Bioinformatics, 2018
![Page 26: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/26.jpg)
Different HMMER search methods• phmmer—single protein sequence against protein sequence database. • hmmscan—single protein sequence against profile HMM library (Pfam,
CATH-Gene3D, PIRSF, Superfamily and TIGRFAMs).
• hmmsearch—either multiple sequence alignment or profile HMM against protein sequence database.
• jackhmmer—iterative searches. Initiated with a single sequence, a profile HMM or a multiple sequence alignment against a target sequence database.
![Page 27: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/27.jpg)
Find out more?
UNIT 3.15The HMMER Web Server for ProteinSequence Similarity SearchAnanth Prakash,1 Matt Jeffryes,1 Alex Bateman,1 and Robert D. Finn1
1European Molecular Biology Laboratory, The European Bioinformatics Institute(EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
Protein sequence similarity search is one of the most commonly used bioin-formatics methods for identifying evolutionarily related proteins. In general,sequences that are evolutionarily related share some degree of similarity, andsequence-search algorithms use this principle to identify homologs. The re-quirement for a fast and sensitive sequence search method led to the de-velopment of the HMMER software, which in the latest version (v3.1) usesa combination of sophisticated acceleration heuristics and mathematical andcomputational optimizations to enable the use of profile hidden Markov models(HMMs) for sequence analysis. The HMMER Web server provides a commonplatform by linking the HMMER algorithms to databases, thereby enabling thesearch for homologs, as well as providing sequence and functional annotationby linking external databases. This unit describes three basic protocols andtwo alternate protocols that explain how to use the HMMER Web server usingvarious input formats and user defined parameters. C⃝ 2017 by John Wiley &Sons, Inc.
Keywords: bioinformatics ! homology ! profile hidden Markov model ! pro-tein sequence analysis
How to cite this article:Prakash, A., Jeffryes, M., Bateman, A., & Finn, R. D. (2017). The
HMMER web server for protein sequence similarity search. CurrentProtocols in Bioinformatics, 60, 3.15.1–3.15.23. doi:
10.1002/cpbi.40
INTRODUCTION
The HMMER Web server (http://www.ebi.ac.uk/Tools/hmmer/) is an open-access proteinsequence similarity search tool that hosts a suite of HMMER algorithms to identifyevolutionarily related proteins and/or domains by employing profile hidden Markovmodels (HMMs; APPENDIX 3A, Schuster-Bockler & Bateman, 2007) for fast and efficientdetection of close and remote homologs. The HMMER Web server provides four searchinterfaces to the corresponding algorithms in the HMMER suite (http://hmmer.org):phmmer, hmmscan, hmmsearch, and jackhmmer. The functionality of these algorithmsare outlined in Table 3.15.1. The HMMER Web server can work with various inputformats and user-defined parameters to provide results that are presented to help inferprotein sequence conservation, function, and evolution. This article provides detailedprotocols for using the Web versions of PHMMER, HMMSCAN, and JACKHMMERalgorithms, and ways to navigate and interpret the output.
Basic Protocol 1 and Alternate Protocol 1 describe in detail how to use the basic andadvanced search features, respectively, in PHMMER, and interpret the results usinga protein sequence as the starting point. The logical organization and interpretationof the output described in Basic Protocol 1 is common to all other protocols and istherefore described in detail; user is referred back to this section in subsequent protocols.
Current Protocols in Bioinformatics 3.15.1–3.15.23, December 2017Published online December 2017 in Wiley Online Library (wileyonlinelibrary.com).doi: 10.1002/cpbi.40Copyright C⃝ 2017 John Wiley & Sons, Inc.
FindingSimilarities andInferringHomologies
3.15.1
Supplement 60
UNIT 3.15The HMMER Web Server for ProteinSequence Similarity SearchAnanth Prakash,1 Matt Jeffryes,1 Alex Bateman,1 and Robert D. Finn1
1European Molecular Biology Laboratory, The European Bioinformatics Institute(EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
Protein sequence similarity search is one of the most commonly used bioin-formatics methods for identifying evolutionarily related proteins. In general,sequences that are evolutionarily related share some degree of similarity, andsequence-search algorithms use this principle to identify homologs. The re-quirement for a fast and sensitive sequence search method led to the de-velopment of the HMMER software, which in the latest version (v3.1) usesa combination of sophisticated acceleration heuristics and mathematical andcomputational optimizations to enable the use of profile hidden Markov models(HMMs) for sequence analysis. The HMMER Web server provides a commonplatform by linking the HMMER algorithms to databases, thereby enabling thesearch for homologs, as well as providing sequence and functional annotationby linking external databases. This unit describes three basic protocols andtwo alternate protocols that explain how to use the HMMER Web server usingvarious input formats and user defined parameters. C⃝ 2017 by John Wiley &Sons, Inc.
Keywords: bioinformatics ! homology ! profile hidden Markov model ! pro-tein sequence analysis
How to cite this article:Prakash, A., Jeffryes, M., Bateman, A., & Finn, R. D. (2017). The
HMMER web server for protein sequence similarity search. CurrentProtocols in Bioinformatics, 60, 3.15.1–3.15.23. doi:
10.1002/cpbi.40
INTRODUCTION
The HMMER Web server (http://www.ebi.ac.uk/Tools/hmmer/) is an open-access proteinsequence similarity search tool that hosts a suite of HMMER algorithms to identifyevolutionarily related proteins and/or domains by employing profile hidden Markovmodels (HMMs; APPENDIX 3A, Schuster-Bockler & Bateman, 2007) for fast and efficientdetection of close and remote homologs. The HMMER Web server provides four searchinterfaces to the corresponding algorithms in the HMMER suite (http://hmmer.org):phmmer, hmmscan, hmmsearch, and jackhmmer. The functionality of these algorithmsare outlined in Table 3.15.1. The HMMER Web server can work with various inputformats and user-defined parameters to provide results that are presented to help inferprotein sequence conservation, function, and evolution. This article provides detailedprotocols for using the Web versions of PHMMER, HMMSCAN, and JACKHMMERalgorithms, and ways to navigate and interpret the output.
Basic Protocol 1 and Alternate Protocol 1 describe in detail how to use the basic andadvanced search features, respectively, in PHMMER, and interpret the results usinga protein sequence as the starting point. The logical organization and interpretationof the output described in Basic Protocol 1 is common to all other protocols and istherefore described in detail; user is referred back to this section in subsequent protocols.
Current Protocols in Bioinformatics 3.15.1–3.15.23, December 2017Published online December 2017 in Wiley Online Library (wileyonlinelibrary.com).doi: 10.1002/cpbi.40Copyright C⃝ 2017 John Wiley & Sons, Inc.
FindingSimilarities andInferringHomologies
3.15.1
Supplement 60
![Page 28: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/28.jpg)
AcknowledgementsEMBL-EBI The Sequence Families team:
Matthias Blum Hsin-Yu Chang Sara El-Gebali Matthew Fraser Jaina Mistry Alex Mitchell Gift Nuka Typhaine Paysan-Lafosse Sebastien Pesseat Simon Potter Matloob Qureshi Lorna Richardson Gustavo Salazar-Orejuela Amaia Sangrador
Collaborators: Harvard University Sean Eddy
University of Montana Travis Wheeler
InterPro
![Page 30: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/30.jpg)
hmmscan-single protein sequence against profile HMM library
![Page 31: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/31.jpg)
hmmscan - Search results
CFTR_RAT (P34158): ABC transporter, a chloride ion channel controlled by phosphorylation.
ABC transporter trans-membrane region
ABC transporter domain
![Page 32: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/32.jpg)
jackhmmer-iterative searches. Initiated with a single sequence, a profile HMM or a multiple sequence alignment against a target sequence database
![Page 33: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/33.jpg)
jackhmmer-iterative searches
![Page 34: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/34.jpg)
jackhmmer-iterative searches
![Page 35: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/35.jpg)
phmmer—single protein sequence against protein sequence database
![Page 36: Fast, sensitive homology detection using HMMER · Structurealignmentforc-phycocyanin(1CPC:A)(black)andcolicinA(1COL:A)(gray)as computed by SALIGN. The alignment extended over 86 residues](https://reader033.fdocuments.us/reader033/viewer/2022052103/603e0f804948605d322065c5/html5/thumbnails/36.jpg)
phmmer- search TRPA1_HUMAN (O75762)