Tutorial 9
description
Transcript of Tutorial 9
Tutorial 9
Protein and Function Databases
-UniProt - SwissProt/TrEMBL -PROSITE-Pfam-Gene Onltology-DAVID
Protein and Function Databases
Glossary
DomainA structural unit which can be found in multiple protein contexts.
Glossary
RepeatA short unit which is unstable in isolation but forms a stable structure when multiple copies are present.
FamilyA collection of related proteins.
UniProt
The Universal Protein Resource (UniProt) is a central repository of protein sequence, function, classification and cross reference.
It was created by joining the information contained in swiss-Prot and TrEMBL.
http://www.uniprot.org/
Protein search
Reviewed protein
Uniprot input
Uniprot output
Protein status
Accession
numberorganism length
Sequence download
General information
annotations
Information for one protein
GO annotation (MF, BP, CC)
General keywords
Alternative splicing
isoforms
Features in the sequence
Sequences
References
Alignment for two or more proteins
MSA
Blast
Pfam
• http://pfam.sanger.ac.uk/
• Pfam is a database of multiple alignments of protein domains or conserved protein regions.
What kind of domains can we find in Pfam?
Trusted Domains
Repeats
Fragment Domains
Nested Domains
Disulfide bonds
Important residues(e.g active sites)
Trans membrane domains
What kind of domains can we find in Pfam?
Low complexity regions
Coiled Coils:(two or three alpha helices that wind around each other)
Context domains: are those that despite not scoring above the family threshold are expected to be real, based on the other domains found in the protein.
Signal peptides:(indicate a protein that will be secreted)
Pfam input
Domains
Domain range and score
Description
Structure info
Gene Ontology
Links
• http://www.expasy.org/tools/scanprosite • ProSite is a database of protein domains and
motifs that can be searched by either regular expression patterns or sequence profiles.
Prosite
Search Results
Domains architecture
Gene Ontology (GO)
• It is a database of biological processes, molecular functions and cellular components.• GO does not contain sequence information nor gene or protein description. • GO is linked to gene and protein databases. •The GO database is structured as a tree
http://www.geneontology.org/
Search by AmiGO
Three principal branches
http://www.geneontology.org/amigo/
GO structure is a Directed Acyclic Graph
GO sourcesISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation
Results for alpha-synuclein
DAVID Functional Annotation Bioinformatics Microarray Analysis
• Identify enriched biological themes, particularly GO terms• Discover enriched functional-related gene/protein groups• Cluster redundant annotation terms• Explore gene names in batch
ID conversion
annotation
classification
Functional annotationUpload
Annotation options