Readings for this week
description
Transcript of Readings for this week
Readings for this week
Gogarten et al Horizontal gene transfer…..
Francke et al. Reconstructing metabolic networks…..
Sign up for meeting next week for proposal feedback/progress checkup
Inferring protein function
By genomic context………….
Inferring protein function
By homology……
COGs—Clusters of Orthologous Groups(Eukaryotic versions are KOGs)
Identified using all-all against all sequence comparisons on collection of complete genomes. Includes genes with orthologous and paralogous relationships
COGS are grouped into large scale functional categories
Domains--Conserved structural entities with distinctive secondary structure content and an hydrophobic core
Example: Protein kinase domain
Motifs-- A pattern of amino acids that is conserved across many proteins and confers a particular function on
the protein. Example: Zinc finger CX2-4C....HX2-4H
Looking at Parts of Proteins
PFAM—Protein Families DatabaseBased on Hidden Markov Models (HMM)
statistical probability models of multiple sequence alignments
Uses a seed alignment of manually curated alignments (PFAM-A)
Based on these alignments a Position Specific Scoring Matrix (PSSM) is created
How to identify domains?
Position Specific Scoring Matrix (PSSM)
PFAM—Protein Families DatabaseSearching a protein against PFAM results in an E value with meaning similar to BLAST evalues (the probability that a sequence would score that well for that domain by chance)
Other Protein Databases
SMART—uses HMMs, focus is signalling and regulatory proteins (tend to be more divergent than enzymes)
TIGR FAMs– TIGR curated alignments used to generated HMMs, one advantage is names should be functionally accurate for all proteins they represent
PRINTS—not HMM based, uses “fingerprints” of conserved motifs
Ecumenical solution—InterPro—collection of multiple databases under one umbrella
Still more kinds of BLAST
PSI-BLAST– Position Specific Iterated BLASTUse to: find members of a protein family or build a custom position-specific score matrix
most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family
1st round: Standard BLASTP search, then a PSSM is built with all hits with E values better than inclusion threshold
2nd round: PSSM is used to evaluate the alignment in this search. Additional hits better than inclusion threshold are incorporated into an updated PSSM
3rd + rounds: as second round. Search reaches convergence when no new hits are found.
Can save PSSM for use in later searching
Still more kinds of BLAST
PHI-BLAST– Pattern Hit Initiated BLASTFind proteins similar to the query around a given pattern
Must enter both a query sequence containing the pattern AND a pattern to search on
Example Pattern: (easy) FGELA
(harder) [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV]
Matching peptide: FGELALMYNTPRAATIVA
Enzyme Nomenclature
1. Oxidoreductases
2. Transferases
3. Hydrolases
4. Lyases
5. Isomerases
6. Ligases
EC Numbers: A hierachical classification scheme for enzymes
enzymes are named and classified according to the reactions they catalyze
KEGG– Kyoto Encyclopedia of Genes and Genomes
Collection of manually drawn metabolic/cellular pathway maps, based on most up to date biochemical information
Metabolic maps are strongest feature--use EC numbered enzymes as key players, allowing pathways of different genomes to be easily mapped based on their predetermined EC content
Also has a growing collection of signalling/cellular process maps
Putting it all together….