Tutorial: Submission of MS/MS datasets to ProteomeXchange via ...
The ProteomeXchange Consoritum: 2017 update
-
Upload
juan-antonio-vizcaino -
Category
Science
-
view
140 -
download
0
Transcript of The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consortium: 2017
update
Dr. Juan Antonio Vizcaíno
(on behalf of all ProteomeXchange partners)
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Overview
• Introduction
• Some usage statistics
• New prospective member: iProx
• Handling of reprocessed datasets
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)R
aw
ID/Q
Me
ta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Public datasets from different omics: OmicsDI
http://www.omicsdi.org/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
…and others
Perez-Riverol et al., Nat Biotechnol, 2017
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
OmicsDI: Portal for omics datasets
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Overview
• Introduction
• Some usage statistics
• New prospective member: iProx
• Handling of reprocessed datasets
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Origin:
1229 USA
902 Germany
618 China
583 United Kingdom
319 France
250 Netherlands
213 Canada
208 Switzerland
200 Australia
179 Spain
172 Austria
168 Denmark
138 Sweden
133 India
115 Japan
115 Belgium
98 Norway
75 Italy
69 Taiwan
57 Brazil
51 Israel
51 Singapore
43 Finland
44 Ireland…
ProteomeXchange: 7,475 datasets up until September 1st 2017
Type:
4805 PRIDE partial
1552 PRIDE complete
649 MassIVE
117 PeptideAtlas/PASSEL
complete
109 jPOST
243 reprocessed datasets
Publicly Accessible:
4051 datasets, 54% of all
89% PRIDE
6% MassIVE
3% PASSEL
2% jPOST
Top Species studied by at least
50 datasets:
2,787 Homo sapiens
958 Mus musculus
236 Saccharomyces cerevisiae
229 Arabidopsis thaliana
190 Rattus norvegicus
157 Escherichia coli
68 Bos taurus
62 Drosophila melanogaster
~ 1,100 species in total
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
PRIDE: Submissions and downloads keep increasing
Data download volume for
PRIDE Archive in 2016: 243
TB
0
50
100
150
200
250
300
2013 2014 2015 2016
Downloads in TBs
Top months: 224 and 234 datasets submitted on
July & August, respectively
> 400 TBs of data
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Public proteomics datasets are being increasingly
reused…
Martens & Vizcaíno, Trends Bioch Sci, 2017
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
PRIDE has become and ELIXIR core data
resource• ELIXIR coordinates, integrates and sustains bioinformatics
resources across Europe and enables users in academia
and industry to access services that are vital for their
research
• First list of core resources announced on July 2017.
• PRIDE included in the initial list.
https://www.elixir-europe.org/platforms/data/core-data-resources
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
On-going
On-going
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Overview
• Introduction
• Some usage statistics
• New prospective member: iProx
• Handling of reprocessed datasets
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
VIP
Load balance server 1
nginx keepalived
CentOS
Load balance server 2
nginx keepalived
CentOS
Application server 1
SpringMVC MyBatis
tomcat
java
CentOS
Application server 2
SpringMVC MyBatis
tomcat
java
CentOS
Database server (Master)
CentOS
MySql
Database server (slave)
CentOS
MySql
Data storage server 2
nginx
CentOS
Data storage server 1
nginx keepalived
CentOS
aspera
Data storage server 3
nginx keepalived
CentOS
aspera
Team Leader
Prof. Yunping Zhu
Curator
Chunyuan Yang Xue Wang
PhD, Medical Genetics MSc, Bioinformatics
Bioinformatician
Jie Ma Cheng Chang
PhD, Biochem &
Molecular Biology
PhD, Biochem &
Molecular Biology
Software Development
Tao Chen Mansheng Li
PhD, Computer
Science & Tech
PhD, Bioinformatics
System Admin.
DongshengLi
Bachelor, Computer Tech
iProX- the integrated proteome resources in China
iProX Team
Cloud platform architecture
with High Availability
www.iprox.org
• User-friendly web-based system
• Standardized metadata collection
• Complete and partial data submission
• Different access level for dataset
• Aspara-based data upload/download
• XML file for data sharing
• RESTful Web Service
• Cloud platform architecture and multiple
sites deployment
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Beijing
Hunan
Shanghai
Infrastructure of iProX
• BPRC & NCPSB (Beijing): Main
location of deployment and the
only submission site
• Three Offsite data backups
• CNIC (in deploying, Beijing, north
China)
• SCBIT (Shanghai, east China)
• NSCC (Hunan, south China)
• All four sites will provide
download service at the same
time coordinated by the load
balancer.
• By the end of August 2017, 308
datasets are submitted, with a
total amount of 47.68 TB
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
An observer of ProteomeXchange consortium - iProX
• Proteome data sharing platform in China
• Focusing
• Collection and sharing of proteome experiment raw data
• Standardized metadata of proteome experiment
• Visualization of proteome dataset
• Providing
• A User friendly data submission pipeline
• Structured management of datasets
• An effective user authority system
• Standardized metadata collection
• Powerful computing, storage, and network resources to support the pipeline
• Remote data backup and synchronous update
www.iprox.org
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Overview
• Introduction
• Some usage statistics
• New prospective member: iProx
• Handling of reprocessed datasets
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Ongoing work
• Reuse of public proteomics data is increasing.
• We are working at present in guidelines to implement the
handling of reprocessed datasets (they get an RPXD
identifier)
• Initial pilot implementation in MassIVE.
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Datasets evolve with reanalysis
http://massive.ucsd.edu
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Reanalysis identifiers
Online browsing Provenance records
Own
identifiers
Own
metadata
Citable
http://massive.ucsd.edu
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Searching large-scale reanalyses
http://massive.ucsd.edu
Available now
321M PSMs
6.1M peptides
10.7M variants
14k searches
>31TB human data
Juan A. Vizcaí[email protected]
HUPO 2017 World ConferenceDublin, 20 September 2017
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Mathias Walzer
Former team members, especially:
Rui Wang
Florian Reisinger
Jose A. Dianes
Henning Hermjakob
Acknowledgements: All ProteomeXchange partners
All data submitters !!!
Eric Deutsch
Zhi Sun
David Campbell
Nuno Bandeira
Mingxun Wang
Jeremy Carver
Yasushi Ishihama
Shujiro Okuda
Shin Kawano
Follow new datasets @proteomexchange