Developing and Using Special Purpose Hidden Markov Model Databases

81
Martin Gollery Associate Director of Bioinformatics University of Nevada, Reno [email protected] Developing and Using Developing and Using Special Purpose Hidden Special Purpose Hidden Markov Model Databases Markov Model Databases

description

Developing and Using Special Purpose Hidden Markov Model Databases. Martin Gollery Associate Director of Bioinformatics University of Nevada, Reno [email protected]. Today’s Tutorial. Instructor: Martin Gollery Associate Director of Bioinformatics, University of Nevada, Reno - PowerPoint PPT Presentation

Transcript of Developing and Using Special Purpose Hidden Markov Model Databases

Page 1: Developing and Using Special Purpose Hidden Markov Model Databases

Martin GolleryAssociate Director of BioinformaticsUniversity of Nevada, [email protected]

Developing and Using Developing and Using Special Purpose Hidden Special Purpose Hidden

Markov Model DatabasesMarkov Model Databases

Developing and Using Developing and Using Special Purpose Hidden Special Purpose Hidden

Markov Model DatabasesMarkov Model Databases

Page 2: Developing and Using Special Purpose Hidden Markov Model Databases

Today’s TutorialToday’s Tutorial

• Instructor: Martin Gollery

• Associate Director of Bioinformatics, University of Nevada, Reno

• Consultant to several organizations

• Formerly with TimeLogic

• Developed several HMM databases

Page 3: Developing and Using Special Purpose Hidden Markov Model Databases

Hidden Markov ModelsHidden Markov Models

• What HMM’s are • Which HMM programs are commonly used• What HMM databases are available• Why you would use one DB over another• Integrated Resources- InterPro and more • How you can build your own HMM DB• Problems with building your own• Live demonstration

Page 4: Developing and Using Special Purpose Hidden Markov Model Databases

Hidden Markov Models-Hidden Markov Models-What are they, anyway?What are they, anyway?

• Statistical description of a protein family's consensus sequence

• Conserved regions receive highest scores

• Can be seen as a Finite State Machine

Page 5: Developing and Using Special Purpose Hidden Markov Model Databases

Representation of Family Representation of Family MembersMembers

• yciH KDGII• ZyciH KDGVI• VCA0570 KDGDI• HI1225 KNGII• sll0546 KEDCV

C D E G I K N V

1 1.0

2 0.6 0.2 0.2

3 0.2 0.8

4 0.2 0.2 0.4 0.2

5 0.8 0.2

Page 6: Developing and Using Special Purpose Hidden Markov Model Databases

Representation of gaps in Family Representation of gaps in Family MembersMembers

• yciH KDGII• ZyciH KDGVI• VCA0570 KDGDI• HI1225 KNGII• sll0546 KED-V

C D E G I K N V -

1 1.0

2 0.6 0.2 0.2

3 0.2 0.8

4 0.2 0.4 0.2 0.2

5 0.8 0.2

Page 7: Developing and Using Special Purpose Hidden Markov Model Databases

For Maximum sensitivity-For Maximum sensitivity-

C D E G I K N V -

1 1.0

2 0.6 0.2 0.2

3 0.2 0.8

4 0.2 0.4 0.2 0.2

5 0.8 0.2

No residue at any position should have a zero probability, even if it was not seen in the training data.

Page 8: Developing and Using Special Purpose Hidden Markov Model Databases

Start with an MSA…Start with an MSA…• CLUSTAL W (1.7) multiple sequence alignment

• yciH KDGVIEIQGDKRDLLKSLLEAKGMKVKLAGG• ZyciH KDGVIEIQGDKRDLLKSLLEAKGMKVKLAGG• VCA0570 KDGDIEIQGDVRDQLKTLLESKGHKVKLAGG• HI1225 KNGIIEIQGEKRDLLKQLLEQKGFKVKLSGG• sll0546 KEDCVEIQGDQREKILAYLLKQGYKAKISGG• PA4840 KDGVVEIQGEHVELLIDELLKRGFKAKKSGG• AF0914 KNGVIELQGNHVNRVKELLIKKGFNPERIKT• *:. :*:**: : : * :* : :

Page 9: Developing and Using Special Purpose Hidden Markov Model Databases

Hidden Markov ModelsHidden Markov Models

• HMMER2.0• NAME example2• DESC Small example for demonstration purposes • LENG 31• ALPH Amino• COM hmmbuild example2 example2.aln• NSEQ 7• DATE Wed Jan 08 13:33:06 2003• HMM A C D E F G H I K … • 1 -3217 -3413 -3082 -2664 -4291 -3257 -2104 -4231 3883… • 2 -1938 -3859 2747 1592 -4024 -1857 -1206 -3953 -1455… • 3 -2160 -3144 1834 -953 -4284 3247 -2013 -4362 -2365…• 4 -1255 2750 436 -2789 -1273 -2972 -2049 1510 -2543…• 5 -2035 -1558 -4660 -4320 -2085 -4409 -4229 3081 -4224…• 6 -3264 -3765 -1447 3822 -4535 -2948 -2636 -4814 -2810…• 7 -2423 -1951 -4843 -4395 -1156 -4544 -3680 3291 -4151…• 8 -3220 -3396 -2530 -2667 -3851 -3171 -2735 -4442 -2277…• 9 -3196 -3194 -3915 -4259 -4867 3789 -4005 -5414 -4591…• 10 -1923 -3837 2743 2134 -4005 -1854 -1196 -3929 -1434…• 11 -999 -2164 -952 -353 -2483 -1909 3321 -2139 1730…• 12 -1629 -1909 -2827 -2102 -2279 -2588 -1442 -1012 -488…

Page 10: Developing and Using Special Purpose Hidden Markov Model Databases

Emission ProbabilitiesEmission Probabilities

• What is the likelihood that sequence X was emitted by HMM Y?

• Likelihood is calculated by adding the probability of each residue at each position, and each of the transition probabilities

Page 11: Developing and Using Special Purpose Hidden Markov Model Databases

Plan7 from Outer SpacePlan7 from Outer Space(Well, from St. Louis, anyway!)(Well, from St. Louis, anyway!)

Page 12: Developing and Using Special Purpose Hidden Markov Model Databases

HMM’s vs BLASTHMM’s vs BLAST

• Position specific scoring vs. general matrix

• Example:– dDGVIvIddDKRDLLKSLiEAKkMKVKLAGG– KDGVIEIQGDKRDLLKSLLEAKGMKVKLAGG has 80% BLAST

similarity, but misses highly conserved regions

• Scoring emphasizes important locations

• Clearer score cutoffs

• However, it is MUCH slower!

Page 13: Developing and Using Special Purpose Hidden Markov Model Databases

HMM programsHMM programs

• HMMer -Sean Eddy, Wash U• SAM - Haussler, UCSC• Wise tools - Birney, EBI• SledgeHMMer - Subramaniam, SDSC• Meta-MEME - Noble & Bailey• PSI-BLAST - NCBI• SPSpfam - Southwest Parallel Software• Ldhmmer - Logical Depth• DeCypherHMM - TimeLogic

Page 14: Developing and Using Special Purpose Hidden Markov Model Databases

What exactly do you want?What exactly do you want?

• Are you searching thousands of sequences with one or a few models?

• Use hmmsearch• Searching a few sequences with thousands of

models?• Use hmmpfam• Thousands of sequences vs. Thousands of models?• Use an accelerator, if you do it very often

Page 15: Developing and Using Special Purpose Hidden Markov Model Databases

HMM databasesHMM databases

• PFAM

• TIGRFAM

• Superfamily

• SMART

• Panther

• PRED-GPCR

Page 16: Developing and Using Special Purpose Hidden Markov Model Databases

HMM databases at the CFBHMM databases at the CFB

• COGfam

• KinFam

• HydroHMMer

• NVfam-pro

• NVfam-arc

• NVfam-fun

• NVfam-pln

Page 17: Developing and Using Special Purpose Hidden Markov Model Databases

PFAMPFAM

• From Sanger, WashU, KI, INRA

• Version 17 has 7868 families

• Most widely used HMM database

• Good annotation team

Page 18: Developing and Using Special Purpose Hidden Markov Model Databases

PFAMPFAM

• PFAM-A is hand curated • From high quality multiple Alignments• PFAM-B is built automatically from ProDom• Generated using the Domainer algorithm• ProDom is built from SP/TREMBL

Page 19: Developing and Using Special Purpose Hidden Markov Model Databases

PFAMPFAM

• Pfam-ls = global alignments

• Pfam-fs = local alignments, so that matches may include only part of the model

• Both the –ls and –fs versions are local W.R.T. the sequence

Page 20: Developing and Using Special Purpose Hidden Markov Model Databases

PFAMPFAM

• Note ‘type’ annotation

• Labeled TP

• Family

• Domain

• Repeat

• Motif

Page 21: Developing and Using Special Purpose Hidden Markov Model Databases

TIGRFAMsTIGRFAMs

• Available at (www.tigr.org/TIGRFAMs/)

• Organized by functional role

• Equivalogs: a set of homologous proteins that are conserved with respect to function since their last common ancestor

• Equivalog domains: domains of conserved function

Page 22: Developing and Using Special Purpose Hidden Markov Model Databases

TIGRFAMsTIGRFAMs

• 2453 models in release 4.1

• Complementary to PFAM, so run both

• Part of the Comprehensive Microbial Resource (CMR)

Page 23: Developing and Using Special Purpose Hidden Markov Model Databases

TIGRFAMsTIGRFAMs

TIGRfam and PFAM alignments for Pyruvate carboxylase. The thin line represents the sequence. The bars represent hit regions.

Page 24: Developing and Using Special Purpose Hidden Markov Model Databases

SuperFamilySuperFamily• By Julian Gough, formerly MRC, now Riken GSC• www.supfam.org• Provides structural (and hence implied functional)

assignments to protein sequences at the superfamily level

• Built from SCOP (Structural Classification of Proteins) database, which is built from PDB

• Available in HMMer, SAM, and PSI-BLAST formats

Page 25: Developing and Using Special Purpose Hidden Markov Model Databases

SuperFamilySuperFamily

• 1447 SCOP Superfamilies

• Each represented by a group of HMMs

• Over 8500 models total

• Table provides comparison to GO, Interpro, PFAM

Page 26: Developing and Using Special Purpose Hidden Markov Model Databases

SMARTSMART• Simple Modular Architecture Research Tool• Version 3.4 contains 654 HMMs

• Emphasis on mobile eukaryotic domains

• smart.embl-heidelberg.de

• Annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues

Page 27: Developing and Using Special Purpose Hidden Markov Model Databases

SMARTSMART

• Use for signaling domains or extracellular domains

• Normal and Genomic mode

Page 28: Developing and Using Special Purpose Hidden Markov Model Databases

PRED-GPCRPRED-GPCR

• Papasaikas et al, U of Athens• 265 HMMs in 67 GPCR families• Based on TiPs Pharmacological classification.• Filters with CAST• signatures regularly updated• Entire system redone each year

Page 29: Developing and Using Special Purpose Hidden Markov Model Databases

PRED-GPCR webserverPRED-GPCR webserver

Page 30: Developing and Using Special Purpose Hidden Markov Model Databases

PantherPanther• Protein ANalysis THrough Evolutionary Relationships • Family and subfamily: families are evolutionarily related

proteins; subfamilies are related proteins with the same function • Molecular function: the function of the protein by itself or with

directly interacting proteins at a biochemical level, e.g. a protein kinase

• Biological process: the function of the protein in the context of a larger network of proteins that interact to accomplish a process at the level of the cell or organism, e.g. mitosis.

• Pathway: similar to biological process, but a pathway also explicitly specifies the relationships between the interacting molecules.

Page 31: Developing and Using Special Purpose Hidden Markov Model Databases

PantherPanther

• (Thomas et al., Genome Research 2003; Mi et al. NAR 2005)

• 6683 protein families

• 31,705 functionally distinct protein subfamilies.

Page 32: Developing and Using Special Purpose Hidden Markov Model Databases

PantherPanther

• Due to the size, searches could be slow

• First, BLAST against consensus seqs

• Then, search against models represented by those hits

• With an accelerator, you don’t have to do that…

Page 33: Developing and Using Special Purpose Hidden Markov Model Databases

PantherPanther

• So- how does it perform?

• I took 3451 Arabidopsis proteins with no hit to PFAM, Superfamily, SMART or TIGRfam

• Ran it against Panther

• Found 160 significant hits!

Page 34: Developing and Using Special Purpose Hidden Markov Model Databases

COG-HMMsCOG-HMMs

• Clusters of Orthologous Groups of proteins

• www.ncbi.nlm.nih.gov/cog/

• Each COG is from at least 3 lineages

• Ancient conserved domain

• 4873 alignments available

• Alignments from NCBI, HMMs from me at [email protected]

Page 35: Developing and Using Special Purpose Hidden Markov Model Databases

CDDCDD

• Conserved Domain Database (NCBI)

• Psi-BLAST profiles are similar to HMMs

• 10991 PSSMs - SMART + COG +KOG+ Pfam+CD

• Runs with RPS-BLAST

• Much faster searches

Page 36: Developing and Using Special Purpose Hidden Markov Model Databases

KinFamKinFam

• Kinfam- models represent 53 different classes of PKs• Assigns Kinase Class and Group• Based on Hanks’ classification scheme• Database is small, so searches are fast

Page 37: Developing and Using Special Purpose Hidden Markov Model Databases

KinFamKinFam

• Categorizes Kinase data

• Available for download from bioinformatics.unr.edu

RANK SCORE QF TARGET|ACCESSION E_VALUE DESCRIPTION 1 852.93 1 KinFam||ptkgrp15 9.3e-256 Fibroblast GF recept2 479.14 1 KinFam||ptkgrp14 3.1e-143 Platelet derived GF3 423.33 1 KinFam||ptkother 1.9e-126 Other membrane-span

Page 38: Developing and Using Special Purpose Hidden Markov Model Databases

HydroHmmerHydroHmmer

• Hydrohmmer finds LEAs, other hydrophilin classes

• Small target size makes for very fast searches

Page 39: Developing and Using Special Purpose Hidden Markov Model Databases

NVFAMsNVFAMs

• HMM’s reflect the training data

• Specific training sets provide better results

• So… use Archaeal data to study Archaeons, Fungal data to study Fungi, etc.

• Designed for use with PFAM, not stand alone

• Recent redesign, name change

Page 40: Developing and Using Special Purpose Hidden Markov Model Databases

NVFAMsNVFAMs• NVFAM-pro used to study E. faecalis• Demonstrated higher scores, better aligns • However, PFAM had more total hits• P.falciparum used as negative control• PFAM showed better scores, aligns as predicted• Automated design by Garrett Taylor- scripts are

available!• Contact me for input, collaboration, or help to

build your own

Page 41: Developing and Using Special Purpose Hidden Markov Model Databases

Which database to use?Which database to use?One Comparison Test-One Comparison Test-(Your results may vary…)(Your results may vary…)

• Compare 563 I. pini sequences to COGhmm, PFAM, PFAMfrag, SMART, TIGRfam, TIGRfamfrag, Superfamily

• COGs- 9• PFAM- 22• PFAMfrag- 57• SMART- 4• Superfamily- 30• TIGRfam- 6• TIGRfamfrag- 12

Page 42: Developing and Using Special Purpose Hidden Markov Model Databases

Integrated ResourcesIntegrated Resources

• InterProscan

• MAGPIE

• PANAL

• Make your own!

Page 43: Developing and Using Special Purpose Hidden Markov Model Databases

InterProInterPro• Database built from PFAM, Prints, Prosite,

SuperFamily, ProDom, SMART, TIGRFAMs, PANTHER, PIRsf, Gene3D & SP/TrEMBL

• Version 10.0

• Nearly 12,000 entries

• http://www.ebi.ac.uk/interpro/

• InterProScan can be installed locally

Page 44: Developing and Using Special Purpose Hidden Markov Model Databases

InterProScanInterProScan

• Splits up big jobs & reassembles them

• Works with SGE, PBS, LSF

• A free analysis pipeline!

• Provides GO mappings

• Written in PERL, so it’s easy to modify

• Average 4 min. per NT sequence per CPU

Page 45: Developing and Using Special Purpose Hidden Markov Model Databases

InterProInterProInterPro release 10.0 contains 11972 entries, representing 3079 domains, 8597 families, 228 repeats, 27 active sites, 21 binding sites and 20 post-translational modification sites. Overall, there are 7521179 InterPro hits from 1466570 UniProt protein sequences. A complete list is available from the ftp site.

DATABASE VERSION ENTRIES

SWISS-PROT 46.5 180652

PRINTS 37.0 1850

TrEMBL 29.5 1689375

Pfam 17.0 7868

PROSITE patterns 18.45 1800

PROSITE preprofiles N/A 120

ProDom 2004.1 1522

InterPro 10.0 11972

SMART 4.0 663

TIGRFAMs 4.1 2454

PIRSF 2.52 962

PANTHER 5.0 438

SUPERFAMILY 1.65 1160

Gene3D 3.0 117

GO Classification N/A 18705

Page 46: Developing and Using Special Purpose Hidden Markov Model Databases
Page 47: Developing and Using Special Purpose Hidden Markov Model Databases
Page 48: Developing and Using Special Purpose Hidden Markov Model Databases
Page 49: Developing and Using Special Purpose Hidden Markov Model Databases
Page 50: Developing and Using Special Purpose Hidden Markov Model Databases

Modifying InterProScanModifying InterProScan

• Two ways to Add your own HMM database to InterProScan:

• Modify PERL scripts

• Concatenate your models onto PFAM

• Similarly, if you are looking for a specific target, delete all the rest to speed up searches

Page 51: Developing and Using Special Purpose Hidden Markov Model Databases

PANALPANAL

• Simultaneously searches several targets• Produces a nice graphical overview• Databases-

– PFAM– SMART– TIGRFAM– Prosite– PRINTS– BLOCKS

Page 52: Developing and Using Special Purpose Hidden Markov Model Databases

PANALPANAL

Page 53: Developing and Using Special Purpose Hidden Markov Model Databases

MAGPIEMAGPIE

• BLOCKS• NCBI public non-redundant DNA and protein• NCBI EST databases• NCBI Conserved Domain Database (CDD)• Protein Identification Resource SuperFamilies• PFAM• ProDom• SCOP SuperFamilies• SMART• TIGRFam• ProSite

Page 54: Developing and Using Special Purpose Hidden Markov Model Databases

MAGPIEMAGPIE

• Gives a putative description of the gene • Database search result ranking based on user

defined tool precedence and score thresholds.• A single graphical summary of the various search

results• Links to the database source entries

Page 55: Developing and Using Special Purpose Hidden Markov Model Databases

MAGPIEMAGPIE

• Gene taxonomic distribution information• Reporting of similar sequences in the dataset

based on hits to similar database entries• Annotated metabolic pathway diagrams• Gene Ontology (GO) term assignments

Page 56: Developing and Using Special Purpose Hidden Markov Model Databases

MAGPIEMAGPIE

Terry Gaasterland et al. Genome Res. 2000; 10: 502-510

Page 57: Developing and Using Special Purpose Hidden Markov Model Databases

Building Your Own HMM Building Your Own HMM DatabaseDatabase

• Why do it?

• Greater Specificity

• Represent your training set

• Faster searches

• Focus on the particular aspects that you want

Page 58: Developing and Using Special Purpose Hidden Markov Model Databases

PFAM YourData

YourData

PublicDB

HMMsearch BLASTOr

ClusterSequences

BuildMultiple Sequence Alignments

HMMbuildHMMcalibrate

Discard Singletons

Annotate

Check Alignments

Add Desc ription Line

Page 59: Developing and Using Special Purpose Hidden Markov Model Databases

First, search against a target…First, search against a target…

Page 60: Developing and Using Special Purpose Hidden Markov Model Databases

Select the hits for the modelSelect the hits for the model

Page 61: Developing and Using Special Purpose Hidden Markov Model Databases

Build the Multiple Sequence Build the Multiple Sequence AlignmentAlignment

Page 62: Developing and Using Special Purpose Hidden Markov Model Databases

Run HMMbuild to make the Run HMMbuild to make the modelmodel

Page 63: Developing and Using Special Purpose Hidden Markov Model Databases

Iterate Search to Add more distant MembersIterate Search to Add more distant Members

Page 64: Developing and Using Special Purpose Hidden Markov Model Databases

Design Decisions:Design Decisions:

• Local or global models?

• Which sequence weighting scheme?

• What type of Prior?

Page 65: Developing and Using Special Purpose Hidden Markov Model Databases

CalibrationCalibration

• Hmmcalibrate

• Improves scoring

• Compares to random data

• Can be done on each model, or on the entire collection

Page 66: Developing and Using Special Purpose Hidden Markov Model Databases

CalibrationCalibration

• Very time consuming on CPU, not on researcher

• No acceleration available

• Not necessary with SAM

Page 67: Developing and Using Special Purpose Hidden Markov Model Databases

Meme and Meta-MemeMeme and Meta-Meme

• Meme discovers motifs in a group of related DNA or protein sequences

• Motifs contain no gaps- split in two instead

Page 68: Developing and Using Special Purpose Hidden Markov Model Databases

Meta-memeMeta-meme

• Meta-meme takes meme motifs & related seqs as input

• Combines motifs into HMMs

• Regions between motifs are modeled imprecisely

• Reduction in parameter space

• Accurate models with fewer training seqs

Page 69: Developing and Using Special Purpose Hidden Markov Model Databases

Meta-memeMeta-meme

• mhmm: Build a motif-based HMM from Meme motifs.

• mhmms: Search a sequence database using a motif-based HMM

• mhmmscan: Like mhmms, but allows long seqs and multiple matches.

Page 70: Developing and Using Special Purpose Hidden Markov Model Databases

Using RPS-BLASTUsing RPS-BLAST

• Start with PSI-BLAST using –C

• Prepare files with makemat and copymat

• Compile target

• Annotate

• Search with RPS-BLAST

Page 71: Developing and Using Special Purpose Hidden Markov Model Databases

IMPALAIMPALA

• Also uses profiles database

• Alignments generated by Smith-Waterman instead of word hit initiated

• 10-100x Slower, might be better than RPS-BLAST

Page 72: Developing and Using Special Purpose Hidden Markov Model Databases

SPEEDSPEED

• PVM version of HMMer is available, MPI is on the way (?)

• Other Solutions- use PSSM’s?• SPSpfam can speed searches 3-60X• SledgeHMMer claims 10X Speedup• Accelerators• Target Triage

Page 73: Developing and Using Special Purpose Hidden Markov Model Databases

SPSpfamSPSpfam

• From Southwest Parallel Software

• Optimized HMMer code

• Up to 60X faster

• Works well on cluster

• Uses binary Pfam, so you can’t drop it into InterProScan

• This may change soon

Page 74: Developing and Using Special Purpose Hidden Markov Model Databases

HMM AcceleratorsHMM Accelerators

• Can provide speedup of 100’s-1000’s X

• TimeLogic is the only commercial one left

• HokieGene from Virginia Tech

• StarBridge - No HMMs yet

• Others coming soon

• An open-source project is in the works- BioFPGA

Page 75: Developing and Using Special Purpose Hidden Markov Model Databases

HMMs on the WebHMMs on the Web

• SAM http://www.cse.ucsc.edu/research/compbio/

• HMMer http://hmmer.wustl.edu/ • Several other HMMer servers…• SledgeHMMer.sdsc.edu is only unlimited

webserver- most restrict you to one sequence at a time.

Page 76: Developing and Using Special Purpose Hidden Markov Model Databases

ResourcesResources• Online Applications:• HMMer http://hmmer.wustl.edu/• SAM-T02

http://www.soe.ucsc.edu/research/compbio/HMM-apps/HMM-applications.html

• Pfam http://pfam.wustl.edu/ • SledgeHMMer sledgehmmer.sdsc.edu• Meta-MEME http://metameme.sdsc.edu/

• PANAL http://web.ahc.umn.edu/panal/

Page 77: Developing and Using Special Purpose Hidden Markov Model Databases

ResourcesResources

• Commercial vendors of HMM systems

• SPSpfam (www.spsoft.com)

• Ldhmmer (www.logicaldepth.com)

• DeCypherHMM (www.timelogic.com)

Page 78: Developing and Using Special Purpose Hidden Markov Model Databases

ReferencesReferences

• S.Altshul, et al. Basic Local Alignment Search Tool. JMB, 215:403{410, 1990.

• C. Barrett, et al. Scoring hidden Markov models. CABIOS, 13(2):191{199, 1997.

• S. R. Eddy. Profile hidden markov models. Bioinformatics, 14(9):755{63, 1998.

• W. N. Grundy,et al. Meta-MEME: Motif-based hidden Markov models of protein families. CABIOS, 13(4):397{406, 1997.

• M. Gribskov, et al. Profile analysis: Detection of distantly related proteins. PNAS, 84:4355{4358, July 1987.

• S. Henikoff and Jorja G. Henikoff. Amino acid substitution matrices from protein blocks. PNAS, 89:10915{10919, November 1992.

• [HH94] Steven Henikoff and Jorja G. Henikoff. Position-based sequence weights. JMB,

• 243(4):574{578, November 1994.

Page 79: Developing and Using Special Purpose Hidden Markov Model Databases

• Jerey D. et al. Kestrel: A programmable array for sequence analysis. In Application-Specific

• Array Processors, pages 25{34, Los Alamitos, CA, July 1996. IEEE Computer Society.

• R. Hughey and A. Krogh. Hidden Markov models for sequence analysis: Extension and analysis of the basic method. CABIOS, 12(2):95{107, 1996.

• T. Hubbard, et al. SCOP: a structural classification of proteins database. NAR, 25(1):236{9, January 1997.

• L. Holm and C. Sander. Dali/fssp classification of three-dimensional

• protein folds. NAR, 25:231{234, 1 Jan 1997.

• K. Karplus, et al. Predicting protein structure using only sequence

• information. Proteins: Structure, Function, and Genetics

• K. Karplus, et al. Hidden markov models for detecting remote protein homologies. Bioinformatics, 14(10):846{856, 1998.

Page 80: Developing and Using Special Purpose Hidden Markov Model Databases

• A. Krogh, et al, Hidden Markov models in computational biology: Applications to protein modeling. JMB, 235:1501{1531, February 1994.

• Kevin Karplus, et al. Predicting protein structure using hidden Markov models. Proteins: Str, Func, and Genetics, Suppl. 1:134{139, 1997.

• C. A. Orengo, et al. Cath- a hierarchic classification of protein domain structures. • Structure, 5(8):1093{108, August 1997. • J. Park, et al. Sequence comparisons using multiple sequences detect twice • as many remote homologues as pairwise methods. JMB, 284(4):1201{1210• E.L.L Sonnhammer, et al. Pfam: A comprehensive database of protein families. Proteins,

28:405{420, 1997. • K. Sjolander, et al. Dirichlet mixtures: A method for improving detection of weak • but signicant protein sequence homology. CABIOS, 12(4):327{345, August 1996. • Reinhard Schneider and Chris Sander. The HSSP database of protein • structure-sequence alignments. NAR, 24(1):201{205, 1 Jan 1996. • Chukkapalli G., Guda, C. and Subramaniam S. SledgeHMMER: A web server for batch searching

Pfam database, Nucleic Acids Res. , 32:W542-544• Schaffer, A.A., Wolf, Y.I., Ponting, C.P. Koonin, E.V., Aravind, L., Altschul, S. F., IMPALA:

Matching a Protein Sequence Against a Collection of PSI-BLAST-Constructed Position-Specific Score Matrices, Bioninformatics,

• P. K. Papasaikas, P. G. Bagos, Z. I. Litou, V. J. Promponas and S. J. Hamodrakas PRED-GPCR: GPCR recognition and family classification serveNucleic Acids Research 2004 32(Web Server issue):W380-W382; doi:10.1093/nar/gkh431

• Silverstein, K.A.T., A. Kilian, J.L. Freeman, and E.F. Retzel. "PANAL: an integrated resource for Protein sequence ANALysis," Bioinformatics, 16:1157-1158, 2000

Page 81: Developing and Using Special Purpose Hidden Markov Model Databases

Thanks!Thanks!• Garrett Taylor, Brian Beck, Taliah Mittler,

Barrett Abel, John Cushman, Lee Weber

• Contact me at- [email protected]

• Bioinformatics.unr.edu