EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
-
Upload
alban-ramsey -
Category
Documents
-
view
218 -
download
0
Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
![Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/1.jpg)
EBI is an Outstation of the European Molecular Biology Laboratory.
UniProtKB
Sandra Orchard
![Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/2.jpg)
Importance of reference protein sequence databases
• Completeness and minimal redundancy
A non redundant protein sequence database, with maximal coverage including splice isoforms, disease variant and PTMs.
Low degree of redundancy for facilitating peptide assignments
• Stability and consistency Stable identifiers and consistent nomenclature
Databases are in constant change due to a substantial amount of work to improve their completeness and the quality of sequence annotation
• High quality protein annotation
Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source
![Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/3.jpg)
Summary of protein sequence databases
Database Description Species
UniProtKB Expertly curated section (UniProtKB/Swiss-Prot) and computer-annotated section (UniProtKB/TrEMBL); minimum level of redundancy; high level of integration with other databases; stable identifiers; diversity of sources including large scale genomics, small scale cloning and sequencing, protein sequencing, PDB, predicted sequences from Ensembl and RefSeq
Many
UniRef100 Assembled from UniProtKB, Ensembl and RefSeq; merges 100% identical sequences; stable identifiers
Many
Ensembl Predictions using automated genome annotation pipeline; explicitly linked to nucleotide and protein sequences; stable reference; merge their annotations with Vega annotations at transcript level; extensive quality checks to remove erroneous gene models ; high level of integration with other databases
Over 50 Eukaryotic genomesEnsembl Genomes: Metazoa, Plants and Fungi, Protists, Bacteria and Archaea
RefSeq NCBI creates from existing data; ongoing curation; non-redundant; explicitly linked nucleotide and protein sequences; stable reference; high level of integration with other databases
Limited to fully sequenced organisms
Entrez protein (NCBInr) Assembled from GenBank and RefSeq coding sequence translations and UniProt KB ; annotations extracted from source curated databases; high degree of sequence redundancy
Many
Updated from Nesvizhskii, A. I., and Aebersold, R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics. 4,1419–1440l
![Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/4.jpg)
UniProtKB
Master headline
UniProt Knowledgebase: 2 sections
1. UniProtKB/Swiss-Prot Non-redundant, high-quality manual annotation - reviewed
2. UniProtKB/TrEMBL Redundant, automatically annotated - unreviewed
www.uniprot.org
![Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/5.jpg)
Sequence Sequence features
Ontologies
ReferencesNomenclature
Splice variants
Annotations
UniProtKB
Manual annotation of UniProtKB/Swiss-Prot
![Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/6.jpg)
Master headline
Sequence curation, stable identifiers, versioning and archiving
For example – erroneous gene model predictions, frameshifts….
..premature stop codons, read-throughs, erroneous initiator methionines…..
![Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/7.jpg)
Master headline
Splice variants
![Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/8.jpg)
Master headline
Identification of amino acid variants
..and of PTMs
… and also
![Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/9.jpg)
Master headline
Domain annotation
Binding sites
![Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/10.jpg)
Master headline
Protein nomenclature
![Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/11.jpg)
Master headline
![Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/12.jpg)
Master headline
Controlled vocabularies used whenever possible…
Annotation - >30 defined fields
![Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/13.jpg)
Master headline
..and also imported from external resources
Binary interactions taken from the IntAct database
Interactors of human p53
![Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/14.jpg)
Master headline
Controlled vocabulary usage increasing – for example from the Gene Ontology
Annotation for human Rhodopsin
![Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/15.jpg)
1 Evidence at protein levelThere is experimental evidence of the existence of a protein
(e.g. Edman sequencing, MS, X-ray/NMR structure, good quality protein-protein interaction , detection by antibodies)
2 Evidence at transcript levelThe existence of a protein has not been proven but there is expression data (e.g. existence of cDNAs, RT-PCR or Northern blots)
that indicates the existence of a transcript.
3 Inferred from homologyThe existence of a protein is likely because orthologs exist in closely related species
4 Predicted
5 Uncertain
Sequence evidence
Type of evidence that supports the existence of a protein
![Page 16: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/16.jpg)
Manual annotation of the human proteome(UniProtKB/Swiss-Prot)
• A draft of the complete human proteome has been available in UniProtKB/Swiss-Prot since 2008
• Manually annotated representation of 20,231 protein coding genes with 36,865 protein sequences - an additional 33,243 UniProtKB/TrEMBL form the complete proteome set
• Approximately 67,600 single amino acid polymorphisms (SAPs), mostly disease-linked
• ~75,500 post-translational modifications (PTMs)• Close collaboration with NCBI, Ensembl, Sanger Institute
and UCSC to provide the authoritative set to the user community
![Page 17: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/17.jpg)
Master headline
Searching UniProt – Simple Search
• Text-based searching• Logical operators ‘&’ (and), ‘|’
![Page 18: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/18.jpg)
Master headline
Searching UniProt – Advanced Search
![Page 19: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/19.jpg)
Master headline
Searching UniProt – Search Results
Each linked to the UniProt entry
![Page 20: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/20.jpg)
Master headline
Searching UniProt – Search Results
![Page 21: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/21.jpg)
Master headline
Searching UniProt – Search Results
![Page 22: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/22.jpg)
Master headline
Searching UniProt – Blast Search
![Page 23: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/23.jpg)
Master headline
Searching UniProt – Blast Search
![Page 24: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/24.jpg)
Master headline
Searching UniProt – Blast Results
Alignment with query sequence
![Page 25: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/25.jpg)
Master headline
Searching UniProt – Blast Results
![Page 26: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/26.jpg)
UniProtKB/TrEMBL
Multiple entries for the same protein (redundancy) can arise in UniProtKB/TrEMBL due to:
o Erroneous gene model predictionso Sequence errors (Frame shifts)o Polymorphismso Alternative start siteso Isoforms
Apart from 100% identical sequences all merged sequences are analysed by a curator so they can be annotated accordingly.
![Page 27: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/27.jpg)
Why do we need predictive annotation tools?
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
UniProtKB
UniProtKB/Swiss-Prot
Date
Num
ber
of s
eque
nces
![Page 28: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/28.jpg)
Master headline
• Automated clean-up of annotation from original nucleotide sequence entry
• Additional value added by using automatic annotation
• Recognises common annotation belonging to a
closely related family within UniProtKB/Swiss-Prot
• Identifies all members of this family using pattern/motif/HMMs in InterPro
• Transfers common annotation to related family members in TrEMBL
Automatic Annotation
![Page 29: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/29.jpg)
← Name (non-standard)
← Taxonomy
← Publication
← Sequence
![Page 30: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/30.jpg)
Master headline
InterPro
![Page 31: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/31.jpg)
Master headline
![Page 32: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/32.jpg)
Finding a complete proteome in UniProtKB
![Page 33: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/33.jpg)
Complete Proteomes
![Page 34: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/34.jpg)
MS Proteomics
• Require each sequence (inc isoforms) to be present in the dataset as an separate entity for search engines to access
• For higher organisms, with isoforms, expanded set made available on ftp site
• Fasta files by FTP• One file per species containing canonical + isoform sequences
![Page 35: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/35.jpg)
![Page 36: EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0101a28abf838ccb093/html5/thumbnails/36.jpg)
Master headline
????
??? ?
??
?
?
?
?
?
?
??
?
?
? ?
?