Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are...
-
Upload
cordelia-cook -
Category
Documents
-
view
216 -
download
2
Transcript of Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are...
![Page 1: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/1.jpg)
Protein and RNA FamiliesProtein and RNA Families
Function Prediction
![Page 2: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/2.jpg)
Tell me what you do
and I will tell you who you are …
![Page 3: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/3.jpg)
From multiple alignments we can derive:
• A motif• A profile (PSSM)• A Hidden Markov Model
![Page 4: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/4.jpg)
MOTIF
Rxx(F,Y,W)(R,K)SAQ
![Page 5: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/5.jpg)
Profile Scoring
![Page 6: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/6.jpg)
Profile Hidden Markov Model (profile HMM)
• An MSA can be described by a HMM• HMM is a probabilistic model of the MSA
consisting of a number of interconnected states• The different states are match, delete or
insert.• Each position is modeled independently• The concatenation of the probabilistic models
of the positions is the protein model.
![Page 7: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/7.jpg)
Profile HMM
D16 D17 D18 D19
M16 M17 M18 M19
I16 I19I18I17
100%
100% 100%
100%
D 0.8S 0.2
P 0.4R 0.6
T 1.0 R 0.4S 0.6
X XX X
50%
50%D R T RD R T SS - - SS P T RD R T RD P T SD - - SD - - SD - - SD - - R
16 17 18 19
![Page 8: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/8.jpg)
Protein Domains
• Domains can be considered as building blocks of proteins.
• Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.
• The presence of a particular domain can be indicative of the function of the protein.
![Page 9: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/9.jpg)
C2H2 Zinc-Finger
![Page 10: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/10.jpg)
DNA Binding domainZinc-Finger
![Page 11: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/11.jpg)
PROSITE
• ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles.
Zinc_Finger_C2H2 Cx{2,4}Cx3(L,I,V,M,F,Y,W,C)x8Hx{3,5}H
![Page 12: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/12.jpg)
Pfam
• The Pfam database is based on two distinct classes of alignments– Seed alignments which are deemed to be
accurate and used to produce Pfam A– Alignments derived by automatic clustering of
SwissProt, which are less reliable and give rise to Pfam B
• Database that contains a large collection of multiple sequence alignments andProfile hidden Markov Models (HMMs).
• High-quality seed alignments are used to build HMMs to which sequences are aligned
![Page 13: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/13.jpg)
0 1000 2000 3000 4000 5000 6000 7000 80000
10
20
30
40
50
60
70
80
90
100
Pfam Coverage
Number Of Families
Pe
rce
nta
ge
Co
vera
ge
Of U
niP
rot
● First 2000 families covered ~ 65% of UniProt● Currently, 7503 families cover 74% of UniProt
Pfam coverage
![Page 14: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/14.jpg)
InterPro
Was built from protein classification databases, such as:
• PROSITE• ProDom• SMART• Pfam• PRINTSA total of 10403 entries
Uses UniProt = SWISSPROT and TrEMBL
![Page 15: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/15.jpg)
Applications of InterPro
Diagnostic protein family signature database for:
• Classification of proteins through text and sequence search tools
• Large-scale classification
• Enhancing genome annotation -fly, human, rice mouse
• Proteome Analysis
![Page 16: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/16.jpg)
GO (gene ontology)http://www.geneontology.org/
• The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes (P), cellular components (C) and molecular functions (F) in a species-independent manner. There are three separate aspects to this effort: first, to write and maintain the ontologies themselves; second, to make associations between the ontologies and the genes and gene products in the collaborating databases, and third, to develop tools that facilitate the creation, maintainence and use of ontologies
Ontology is a description of the concepts and relationships that can exist for an agent or a community of agents
![Page 17: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/17.jpg)
InterPro to GO
InterPro: IPR000003 Retinoic acid receptor > GO: DNA binding GO:0003677
InterPro: IPR000003 AraC type helix-turn-helix > GO: transcription factor GO:0003700
![Page 18: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/18.jpg)
Database and Tools for protein families and domains
• InterPro - Integrated Resources of Proteins Domains and Functional Sites
• Prosite – A dadabase of protein families and domain • BLOCKS - BLOCKS db • Pfam - Protein families db (HMM derived)• PRINTS - Protein Motif fingerprint db • ProDom - Protein domain db (Automatically generated) • PROTOMAP - An automatic hierarchical classification of Swiss-Prot
proteins • SBASE - SBASE domain db • SMART - Simple Modular Architecture Research Tool • TIGRFAMs - TIGR protein families db
![Page 19: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/19.jpg)
Clusters of Orthologous Groups of proteins
(COGs) Classification of conserved genes according to their
homologous relationships. (Koonin et al., NAR)
Homologs - Proteins with a common evolutionary origin
Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events.
Orthologs - Proteins from different species that evolved by vertical descent (speciation).
![Page 20: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/20.jpg)
Clusters of Orthologous Groups of proteins
(COGs)
Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages.
Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.
![Page 21: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/21.jpg)
COGS - Clusters of orthologous groups
* All-against-all sequence comparison of the proteins encoded in completed genomes (paralogs/orthologs)
* For a given protein “a” in genome A, if there are several similarproteins in genome B, the most similar one is selected
* If when using the protein “b” as a query, protein “a” in genome A is selected as the best hit “a” and “b” can be included in a COG
* Proteins in a COG are more similar to other proteins in the COG than to any other protein in the compared genomes
* A COG is defined when it includes at least three homologousproteins from three distant genomes
![Page 22: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/22.jpg)
Distribution of functional categories in the COGs database
Function unknown
General function,prediction only
![Page 23: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/23.jpg)
Information in COGS
* Annotation of proteins by members of known structure/function
* Phylogenetic patterns - presence or absence of proteins in a given organism --> Enables following metabolic pathways
* Multiple alignments
![Page 24: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/24.jpg)
Discovering common motifs in unaligned sequences
MEME-can be used for protein sequences as for DNA sequences
![Page 25: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/25.jpg)
RNA families
• Rfam : General non-coding RNA database
(most of the data is taken from specific databases)
http://www.sanger.ac.uk/Software/Rfam/
Includes many families of non coding RNAs and functionalMotifs, as well as their alignement and their secondary structures
![Page 26: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/26.jpg)
Rfam (currently version 6.1)
• 379 different RNA families or functional
Motifs from mRNA UTRs etc.
GENE
INTRON
Cis ELEMENTS
![Page 27: Protein and RNA Families Function Prediction. Tell me what you do and I will tell you who you are …](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f205503460f94c38857/html5/thumbnails/27.jpg)
An example of an RNA family miR-1 MicroRNAs