Download - Data Submission Guidelines for the ProteomeXchange Consortium

Transcript
Page 1: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

1

DataSubmissionGuidelinesfortheProteomeXchangeConsortiumThisdocumentaimstoprovidedetailedguidelinesfortheuserstosubmitmassspectrometry(MS)derived proteomics data to the ProteomeXchange (PX) Consortium of proteomics resources (1)(http://www.proteomexchange.org).Tableofcontents1 Typesofdatasetsubmissions........................................................................................................22 ProteomicsdataresourcesinProteomeXchange..........................................................................2

2.1 ListofUniversalArchivalresources......................................................................................22.2 ListofFocusedArchivalresources........................................................................................32.3 ListofSecondaryDataResources.........................................................................................32.4 ProteomeCentral:thecommonPortalforPXdatasets........................................................3

3 Dataworkflowfororiginaldatasets..............................................................................................43.1 SubmissionworkflowforSelectedReactionMonitoring(SRM)datasets.............................6

4 Workflowforreprocesseddatasets..............................................................................................65 Dataownership.............................................................................................................................76 Dataprivacy...................................................................................................................................77 References.....................................................................................................................................88 AppendixI:Datatypes...................................................................................................................99 AppendixII:MetadataandthePXXMLmessage........................................................................1110 AppendixIII:HowtogetnotifiedaboutnewPXdatasets.......................................................1311 AppendixIV:MembershipintheProteomeXchangeConsortium..........................................14

Page 2: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

2

1 Typesofdatasetsubmissions ThePXresourcessupporttwotypesofdatasetsubmissions,dependingonthedifferentproteomicsdataworkflowsandthedataformatsavailable.a) Complete submission: A complete (also known as “supported”) submission ensures that the

identificationresultsandthecorrespondingmassspectra(seedefinitionsofdatatypesinAppendixI)canbeparsed,integratedandvisualisedbythePXresourceand/orinfree-to-usestand-alonetools such as PRIDE Inspector (available at https://github.com/PRIDE-Toolsuite/pride-inspector/releases). To achieve that, processed identification results need to be provided in astandardformat(e.g.mzIdentML(2),mzTab(3)),andoptionallyaswellinadifferentopendataformat(e.g.PRIDEXML).Inaddition,allthesubmittedfilesaremadeavailabletodownload.

b) Partialsubmission:Inthiscase(alsoknownas“unsupported”)processedidentificationresultsare

provided in other data formats than the indicated above for complete submission. For the PXresource,itisthennotpossibletoparse,integrateandvisualisetheidentificationand/orconnectthe identificationdata to the correspondingmass spectra.However, all the submitted files aremadeavailabletodownload.Thismechanismallowsdatageneratedfromsoftwarethatcannotexportyettostandardformats,orfromnovelexperimentalapproachestobedepositedintothePXresources.

2 ProteomicsdataresourcesinProteomeXchangeIn the ProteomeXchange Consortium there are currently two types of proteomics data resources(Figure1):a) Archivalresources:TheirmainmissionistostoreMSbasedproteomicsdata.Therearetwotypes:

a. Universalresources:Theycanstoreanytypeofproteomicsdatasets,comingfromanydataworkflow.However, theyarenormally focused in supporting“complete” submissions forparticular dataworkflows, e.g. bottom-up proteomics data dependent acquisition (DDA)workflows).ThecurrentexamplesintheConsortiumarePRIDEArchive,MassIVEandjPOST(seeSection2.1).

b. Focusedresources:Theysupportspecificallyonetypeofdataworkflowandwillnotstoredata from other proteomics approaches. An example is the PASSEL component ofPeptideAtlas, which is the representative for Selected Reaction Monitoring (SRM)approaches(seeSection2.2).

b) Secondarydataresources:Theseonesbuildupontheprimarydataprovidedbysubmitters,which

are stored in theArchival resources. Thereare two representative resources: PeptideAtlas andMassIVE(seeSection2.3).MassIVEisthenbothanArchivalandaSecondarydataresource.

2.1 ListofUniversalArchivalresources Currently,therearetwouniversalArchivalresourcesavailable:1- PRIDE Archive (http://www.ebi.ac.uk/pride/archive, EMBL-European Bioinformatics Institute,Cambridge,UK).Datasubmissiondocumentationisavailablehereorinthispublication(4).2-MassIVE(https://massive.ucsd.edu/,UniversityofCaliforniaSanDiego(UCSD),SanDiego,CA,US).Datasubmissiondocumentationisavailableathttps://massive.ucsd.edu/ProteoSAFe/help.jsp.

Page 3: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

3

3-jPOST(http://jpost.org/,jPOSTProjectTeam,Japan).Datasubmissiondocumentationisavailableathttps://repository.jpostdb.org/help.

2.2 ListofFocusedArchivalresources1-PASSEL(InstituteforSystemsBiology,Seattle,WA,USA)istheonlyfocusedresourceatpresent.Datasubmissiondocumentationavailableathttp://www.peptideatlas.org/passel/.

2.3 ListofSecondaryDataResources PeptideAtlas(http://www.peptideatlas.org/, InstituteforSystemsBiology,Seattle,WA,USA) istheonlysecondarydataresourceatpresent.Documentationisavailableathttp://www.peptideatlas.org/.MassIVE(https://massive.ucsd.edu/,UniversityofCaliforniaSanDiego(UCSD),SanDiego,CA,USA).

2.4 ProteomeCentral:thecommonPortalforPXdatasets ProteomeCentral(availableathttp://proteomecentral.proteomexchange.org)istheportalforallPXdatasets,independentlyfromtheoriginalresourcewherethedatasetswerestored.Thisqueryablearchiveprovidestheuserswithanefficientwaytoidentifydatasetsofinterest.

Page 4: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

4

3 DataworkflowfororiginaldatasetsTheoverallProteomeXchangedataworkflowissummarizedinFigure1.

Figure1:OverviewoftheProteomeXchangedataflow.OriginaldatasetscomingfromanyproteomicsdataworkflowcanbesubmittedtoanyoftheuniversalArchival Resources (PRIDE Archive, MassIVE or jPOST). Examples of data workflows are shot-gun(bottom-up) proteomics (Data Dependent Acquisition, DDA), top down, or Data IndependentAcquisition(DIA)approaches(e.g.SWATH-MS),amongmanyothers.Allofthesubmitteddatasetswillget a unique PXD identifier (see details athttp://www.ebi.ac.uk/miriam/main/collections/MIR:00000513).However, it is highly RECOMMENDED that datasets from data workflows explicitly supported byexistingfocusedarchivalresources,otherthanshot-gunproteomics(themostuniversalandwidelyusedapproach),aresubmittedtothatresource,andnottoanyoftheuniversalarchivalresources.Atpresent,SRM/MRMdatasetsshouldbesubmittedtoPASSEL(theonlyPXresourceofthistype).Thesamerecommendationwillbe implementedforadditionalproteomicsapproaches ifother focusedresourcesareincludedintheConsortiuminthefuture.UserscanthenchoosefreelytheuniversalArchivalresourceforthesubmissionoftheirdatasets.Userpreferences can be based for instance on geographical proximity, availability of “complete”submissions for particular workflows, or technical specifications (e.g. speed for data uploads anddownloads),amongotherconsiderations.Inanycase,foreachsubmittedPXdatasetitismandatorytoincludethefollowingdatatypes:

Page 5: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

5

(i)Massspectrometeroutputfiles(seeAppendixI).(ii)Protein/peptideidentifications.Dependingonthetypeofsubmission, ‘supportedidentificationresults’(e.g.mzIdentML)willbeneededforcompletesubmissions.InthecaseofPartialsubmissions,anytypeofsearchengineoutputfilesaresupported.(iii)Processedpeaklistspectraformats.Thesefilesareneededtoenabletheconnectionbetweentheidentificationsandthemassspectra.InthecaseofcompletesubmissionsperformedwithmzIdentML,these files aremandatory (since peak lists are not included inmzIdentMLper se). These files areoptionalinthecaseofPartialsubmissions(sincemassspectrometeroutputfilesareavailableanyway).(iv) Metadata: Related biological and technological metadata provide the experimental context.Different resourceshavedifferentmetadata requirements (see individual documentation for eachresource),butatleastinformationneedstobeprovidedtobeabletogeneratethePXXMLformat(usedbytheProteomeCentralresource,seeAppendixII).Otheroptionaldatatypescanalsobeincludedinasubmitteddataset,forinstance:(i)Quantificationsoftwareoutputfiles:Quantificationresults.(ii)Gelimages.(iii)Filesusedtoperformthemassspectralsearches,eithersequencedatabasefilesorspectrallibraryfiles.(iv)Anyotherdatatype(e.g.scripts,pdffiles,etc).Inaddition,amechanismtosubmitmassspectrometryimagingdata(asaPartialsubmission)hasbeendescribedinthispublication(5).SeeTable1belowformoredetails.Table1.SummaryofsubmissionguidelinesforeachPXresource,dependingonthedataworkflowinvolved. PRIDE PASSEL MassIVE jPOST

DDAMS/MS Partial Yes No Yes Yes

Complete:mzIdentML Yes No Yes YesComplete:mzTab No No Yes Yes

Complete:TSV No No Yes NoComplete:PRIDEXML Yes No No No

Otherworkflows

TargetedSRM/MRM

Partialonly

Partialandcomplete

Partialonly

Partialonly

DIAMS/MS Partialonly No Partialandcomplete PartialonlyTop-down Partialonly No Partialonly Partialonly

Massspectrometryimaging Partialonly No Partialonly Partialonly

Page 6: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

6

3.1 SubmissionworkflowforSelectedReactionMonitoring(SRM)datasetsNewdatasetsacquiredviaSRMshouldbesubmittedtoPASSEL,astheonlyfocusedArchivalresourcecurrentlysupportingthistypeofapproaches.Forsuchsubmitteddatasets,3mainitemsarerequired:

1. Massspectrometeroutputfiles,preferablyrawfiles(AppendixI).2. Transitionlistdescribingthepeptidesthattheinstrumenttargeted.3. Analysisresults.

Oncesubmissionsarereceived,theyarecheckedbyacurator,runthroughthePASSELpipeline,andthenloadedintothePASSELdatabase.

Figure2.WorkflowfororiginalSRMdatasubmissionstoPASSEL.

4 WorkflowforreprocesseddatasetsTheworkflowforreprocesseddatasetsstartswhenanysecondarydataresourceofthePXConsortium(atpresentPeptideAtlasandMassIVE)makeareinterpretationofexistingdatainanyoftheArchivalresources.AnewProteomeXchange identifierwillbeobtainedfromProteomeCentral (it isaRPXDidentifier instead of the standard PXD identifier). As an example, see dataset RPXD000665 inProteomeCentral:http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=RPXD000665).However,theoriginalPXaccessionnumberisretainedinthePXXMLmessagetoallowcoordinatedsearch fordifferentviewsofdata fromonegivensubmission.Thisensures thatasimpleone-timesubmission from a contributor is automatically distributed to all PX repositories with sufficientinformation.WhenthereanalysisisdonebyaPXmemberaXMLbroadcastwillbeproduced,whichwill include the new PXD identifier, but also the old one. All the relevant information about theconnectionbetweenthedatasetswillbestoredinProteomeCentral.ThreemainsituationsmayarisewhenaPXdatasetisreanalysed:a)Ifthedatareinterpretationgetspublishedinaseparatepublicationas‘independent’findings:-DatamustgotoauniversalArchivalresource(e.g.asanyothernewMS/MSdataset).-Itisnotmandatorytore-uploadtherawdata(referencestoURLsareallowedinthiscase,ifthiscaseissupportedbytheArchivalresource).

Page 7: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

7

b)Thereinterpretationdoesnotgetpublishedas‘independent’newfindings.Inthiscase,datacanbekeptinaSecondarydataresource.Forinstance,thisappliestoallnewPeptideAtlasbuildsthatgetpublished.c) In the case of a mixture of new and reprocessed data in one given dataset, they should beconsideredtobeanewdataset,sothedatasetshouldbesubmittedtothecorrespondinguniversalArchivalresource.

5 DataownershipAll ProteomeXchange resources donot assumeeditorial control or ownership over the submitteddata; itmaintains the original submitter as owner of these data. All ProteomeXchange resourcesrequirethatasubmitterisexplicitlyidentifiedforeachdataset.Uponpublicavailabilityofthedata,theoriginaldataownershipismaintainedinthedatabase,althoughobviouslydisseminationandreuseofthereleaseddataarenolongerrestrictedatthatpoint.PASSEL and jPOST also do not assume editorial control of the submitted data. Users specify atsubmissiontimethedateonwhichthedatabecomepubliclyaccessible.Onthisdate,thedataareautomaticallyreleased.Thedataownerhastheoptionofadjustingthisdateincaseofreviewdelays,etc.

6 DataprivacyAllProteomeXchangeresourcesallowdatatobekeptprivateforanydurationoftime,untiltheownerofthedata(asidentifiedbytheassociateduseraccount)givesexplicitpermissiontoreleasethedata.However,avariantoccurswhenprivatelysubmitteddataareassociatedwithamanuscriptsubmittedtoajournal.Oncethepaperispublished,thepublicavailabilityofthecorrespondingsubmitteddatawillthenbetriggeredwithoutaskingforpermissiontothesubmitters.IntheparticularcaseofPASSELandjPOST,databecomeautomaticallyavailableonthedatethatthesubmitterspecifies.AllPXresourcescanautomaticallyproviderevieweraccountsforeachsubmittedexperiment,whichcan be communicated to journal editors and referees in a submitted manuscript, thus allowingconfidentialreviewingoftheprivatelysubmitteddata.

Page 8: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

8

7 References 1. Vizcaino,J.A.,Deutsch,E.W.,Wang,R.,Csordas,A.,Reisinger,F.,Rios,D.,Dianes,J.A.,Sun,Z.,

Farrah, T., Bandeira, N. et al. (2014) ProteomeXchange provides globally coordinatedproteomicsdatasubmissionanddissemination.NatBiotechnol,32,223-226.

2. Jones,A.R.,Eisenacher,M.,Mayer,G.,Kohlbacher,O.,Siepen,J.,Hubbard,S.J.,Selley,J.N.,Searle,B.C.,Shofstahl,J.,Seymour,S.L.etal.(2012)ThemzIdentMLdatastandardformassspectrometry-basedproteomicsresults.MolCellProteomics,11,M111014381.

3. Griss,J.,Jones,A.R.,Sachsenberg,T.,Walzer,M.,Gatto,L.,Hartler,J.,Thallinger,G.G.,Salek,R.M., Steinbeck, C., Neuhauser, N. et al. (2014) The mzTab data exchange format:communicating mass-spectrometry-based proteomics and metabolomics experimentalresultstoawideraudience.MolCellProteomics,13,2765-2775.

4. Ternent,T.,Csordas,A.,Qi,D.,Gomez-Baena,G.,Beynon,R.J.,Jones,A.R.,Hermjakob,H.andVizcaino,J.A.(2014)HowtosubmitMSproteomicsdatatoProteomeXchangeviathePRIDEdatabase.Proteomics,14,2233-2241.

5. Rompp,A.,Wang,R.,Albar, J.P.,Urbani,A.,Hermjakob,H., Spengler,B. andVizcaino, J.A.(2015)Apublicrepositoryformassspectrometryimagingdata.AnalBioanalChem,407,2027-2033.

6. Pedrioli,P.G.,Eng,J.K.,Hubley,R.,Vogelzang,M.,Deutsch,E.W.,Raught,B.,Pratt,B.,Nilsson,E., Angeletti, R.H., Apweiler, R. et al. (2004) A common open representation of massspectrometrydataanditsapplicationtoproteomicsresearch.NatBiotechnol,22,1459-1466.

7. Martens, L., Chambers,M., Sturm,M., Kessner,D., Levander, F., Shofstahl, J., Tang,W.H.,Rompp,A.,Neumann,S.,Pizarro,A.D.etal. (2011)mzML--acommunitystandardformassspectrometrydata.MolCellProteomics,10,R110000133.

8. Walzer,M.,Qi,D.,Mayer,G.,Uszkoreit,J.,Eisenacher,M.,Sachsenberg,T.,Gonzalez-Galarza,F.F.,Fan,J.,Bessant,C.,Deutsch,E.W.etal.(2013)ThemzQuantMLdatastandardformassspectrometry-basedquantitativestudiesinproteomics.MolCellProteomics,12,2332-2340.

Page 9: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

9

8 AppendixI:DatatypesProteomicsdatacomeinavarietyofforms,whicharedefinedhere:

- Mass spectrometer output files: the data and metadata generated by mass spectrometers,usuallyonefileperrun(althoughsomeinstrumentsputmultiplerunsperfile).Thedatamaybetheoriginalprofilemodescansormayalreadyhavehadsomebasicprocessinglikecentroidingapplied.Theymaybe:

o i)rawdata(asdescribedbelow).o ii) peak list spectra in a standardized format such as mzML, mzXML or mzData (see

below),buttheycannotbe‘processedpeaklists’(seebelow).However,itisimportantthatallofthescansthatweregeneratedareincludedwithapplicablemetadata.

- Rawdata: thebinary,vendor-specificoutput filesdirectlycreatedbythe instrumentsoftware.Thesefilesaretypicallylargeandrequirespecializedsoftwareinordertoberead.

- StandardizedMSdataformats:Therearecurrentlythreewidelyknownmassspectrometrydataformats inproteomics:mzXML(6)(developedattheInstituteofSystemsBiology(ISB),Seattle,USA), mzData (now made obsolete, originally developed by the HUPO Proteomics StandardsInitiative(PSI)),andthesuccessortobothoftheabove:mzML(7)(currentlyv1.1,jointlydevelopedbytheISBandPSI,http://www.psidev.info/mzml).Thesedataformatscanbeusedtorepresentprocessedpeaklists,aswellasrawdata.Inadditiontothemassspectra,theycontaindetailedmetadatathatprovidecontexttothemeasurements.

- Processedpeaklists:Heavilyprocessedformofmassspectrometrydata,usuallyderivedfromthe

rawdatafilesthroughvarious(semi-)automaticsteps,e.g.centroiding,deisotoping,andchargedeconvolution.Thesefilesareformattedinplaintext,withtypicalformatslikedta,pkl,ms2ormgf.TheyusuallycontainonlyasubsetofonlytheMS2scans(MS1scansareexcluded),andaremissingsignificantamountsofmetadatathatwerepresentinthesourceformat.

- Protein/peptideidentifications:Proteomicsmassspectracanbematchedtopeptidesorproteins,

resultinginidentificationsforthosespectra.Typicallyaspectrumisconsideredidentifiedifthescoreattributedtoapeptideorproteinmatchqualifiesagainstanapriorioraposterioridefinedthreshold.Inthecaseoffragmentationspectra,theinitialidentificationwillconsistofapeptidesequence;subsequentstepswillderivealistofproteinsfromtheidentifiedpeptides.Theproteinassemblystepcanbeadiscernibleprocesswithitsowninputandoutputfiles,oritcanbeimplicitin theoverall identificationsoftware.This informationcanberepresentedbyavarietyofdataformatscalled‘searchengineoutputfiles’(seebelow).

- Protein/peptidequantification:Protein/peptideexpressionvaluescanalsobeobtainedfroma

MS-based proteomics experiment. There is a high diversity of approaches that result in theexistenceofveryheterogeneoussoftwareanddataanalysispipelines.Somesearchenginesareabletoperformbothidentificationandquantification,andproduce‘searchengineoutputfiles’containingbothtypesofdata.However,ifthereissoftwarethatonlyperformsthequantificationpartof theanalysis, thegenerateddata is represented in ‘quantificationsoftwareoutput files’(seebelow).

- Search engine output files: They contain the data and metadata generated by the software

(usuallycalledsearchengines)usedforperformingtheidentificationandoftenthequantificationofpeptidesandproteins. Each searchenginehas itsown specificoutput file. The formats are

Page 10: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

10

typicallyformattedineitherplaintextorXML,withtypicalformatslikeMascot.dat,OMSSAxml,etc.In addition to each specific format, a data standard format calledmzIdentML (currently v1.1,http://www.psidev.info/mzidentml)(2)hasbeendevelopedbythePSItorepresentthiskindofinformation.Somesearchengineoutputfilescanrepresentaswellquantificationresults,butthisisnot thecaseofmzIdentML.Asecondstandarddata formatcalledmzTab(tabdelimitedfile,http://www.psidev.info/mztab)(3)canrepresentbothidentificationandquantificationresults.

- Supported protein/peptide identification results: This definition includes all protein/peptide

identification processed data that can be fully represented by the receiving repository inProteomeXchange. PRIDE Archive and MassIVE fully support mzIdentML, which can now beexportedfromavarietyoftools(seeupdatedlistathttp://www.psidev.info/tools-implementing-mzidentml).ThePRIDEXMLformalisalsosupportedbyPRIDEArchive(itwastheoriginalPRIDEdataformat),althoughitisnotRECOMMENDEDtouseitifthesamedatacanberepresentedinmzIdentML.

- - Quantificationsoftwareoutputfiles:thedataandmetadatageneratedbythesoftwareusedfor

performingexclusivelythequantificationanalysisofpeptidesandproteins. Inadditiontoeachspecific format fromeach software tool, a data standard format calledmzQuantML (currentlyv1.0,http://www.psidev.info/mzquantml)hasbeenreleasedbythePSItorepresentthiskindofinformation (8). As mentioned before, a second data format called mzTab(http://www.psidev.info/mztab) canalso representquantification results, although is currentlynotyetfinished.

- Metadata:Whereasmassspectrapresentthecoreoutputofanymassspectrometer,asimple

collection of spectra does not provide sufficient information for confident interpretation.Somethingsimilarhappensforthepeptideandproteinidentificationsandtheirexpressionvalues.Thislackofcontextcanbesolvedbyprovidingrelevantmetadataalongwiththespectraand/ortheidentificationsandquantificationdata.Massspectrometer,searchengine,andquantificationsoftwareoutputfiles(seeabove)typicallyaccommodatethisinformation.

Page 11: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

11

9 AppendixII:MetadataandthePXXMLmessage An XML XSD (XML SchemaDefinition) file has been drafted for use in the generation of the XMLmessages,whichareusedbyProteomeCentral.ThePXXMLschemacontains theagreedcommonmetadatabyallthePXmembers.Thephilosophybehindthedesignoftheproposedschemawastokeepitasflexibleaspossiblewithanoverallstructurebasedontheheavyuseofcontrolledvocabulary(CV)terms.Allelementsintheschemaaremandatoryapartfromthelastones(ChangeLog,DatasetFileList,RepositoryRecordListandAdditionalInformation).Thecorresponding.xsdfileisavailableathttps://raw.githubusercontent.com/proteomexchange/proteomecentral/master/lib/schemas/proteomeXchange-1.3.0.xsd.Thisisthelistofelementsintheschema:- ProteomeXchangeDataset: This is the root element with mandatory attributes. TheformatVersion attributecouldbeused ifanannouncementhas tobe repeatedwith some (minor)changes,e.g.theadditionofapublicationreference.-CvList:ThiselementlistsallCVs/Ontologiesthatwereusedtopopulatethefile.ThisensuresthatusedCVtermscanbetracedtotheiroriginanddefinition.-DatasetSummary:Thiselementcontainssomebasicinformationaboutthesubmission,like‘title’,‘announcementdate’or‘projectdescription’.Moreover,someadditionalinformationaboutthetypeofsubmission(fullysupported(‘complete’)ornot(‘partial’)bythereceivingrepository),andwhetherarelatedmanuscripthasalreadybeenpublishedisalsoincludedinthiselement.- DatasetIdentifierList: This element includes the identifiers that will unambiguouslycharacterizethedataset:forinstance,thePXaccessionnumberandtheDigitalObjectIdentifier(DOI),ifrelevant.- DatasetOriginList: The aim of this element is to know if the dataset constitutes a newsubmission,or thesubmissiondescribes the reprocessingofapreviously submitteddataset.EveryreanalysisperformedonaparticulardatasetgetsadifferentPXaccessionnumber.-SpeciesList:Containsinformationaboutthespeciesincludedinthedataset.- InstrumentList: Element holding the overall information about the instrumentation used in thegenerationofthedata.- ModificationList: All protein modifications (natural and artificial) are listed in this record(specifiedasCVterms).Ifadatasetdoesnotcontainanymodifications,itisalsoexplicitlyannouncedherewithaspecificCVterm.-ContactList:Informationabouttheresearchersinvolvedinthegenerationandsubmissionofthedataset.-PublicationList:Thelistofpublicationsthatthedatasethasgenerated.-KeywordList:OneormoreCVtermsthatdefinealistofkeywordsthatmaybeattributedtothedataset.-FullDatasetLinkList:Listoflinksthatwillallowaccesstothedata.Differentlinksmaybeusedfordifferentwaysofaccessing thedata (forexampleFTPdownloador repositoryweb link)or fordifferentrepositorieshostingthesamedata.-DatasetFileList:Optionalelement toprovide individual links toall the submitted files (massspectrometeroutputfiles,searchengineoutputfiles,etc)belongingtothedataset.-RepositoryRecordList: This optional element allows a repository to report informationwithmoregranularityifavailable.Forexamplelinksandinformationcouldbeprovidedforeachpart/resultfileofalargerdataset.-AdditionalInformation:OptionalelementthatincludesanyotherCVtermsthatcanbeusedtodescribethedataset.-ChangeLog:Anelementthatrecordscommentsforallchangesmadetothefilesinceitsfirstrelease.ThiselementisoptionalforthefirstreleaseofthePXXMLonly,allsuccessivereleasesmustprovide

Page 12: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

12

achangelogentry.DifferentversionsofthePXXMLannouncementforthesamePXdatasetscanbemadeavailabletoProteomeCentral.Thishappensifsomeinformationincludedthereisupdated(forinstance,thefinalversionofthereferenceofapublication).AlltheversionsaretrackedandkeptinProteomeCentral.Afterreprocessingofadataset,iftheresultingnewresultsaresubmittedtoPX,anewPXidentifierwillbegeneratedbutalsotheoriginalPXaccessionnumberwillberetained, toallowcoordinatedsearch for different views of data from one submission. This ensures that a simple one-timesubmission from a contributor is automatically distributed to all PX repositories with sufficientinformation.

Page 13: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

13

10 AppendixIII:HowtogetnotifiedaboutnewPXdatasetsEachPXdatasetbecomespubliclyavailableonacceptanceorpublicationofthemanuscriptsupportedbythedataset.When a submission becomes publicly available, a short summary is released though a publicannouncementsystem,viaaRSSfeedcontainingalinktoafilewithadefinedXMLschema(PXXMLfile).ThePXXMLfilecontainskeyexperimentalmetadatasuchas:datasetidentifiers,sampledetails(e.g. species and protein modifications are mandatory), mass spectrometer, publication, list ofkeywords,etc.In addition, this file contains links to all the data, and allows PeptideAtlas, UniProt, and/or otherresourcestoevaluate,reprocessandintegratethedata.Infact,anymemberofthecommunitycansubscribetothisservice.Therearethreewaystodoit:1)[email protected])Onecanreceivetheseupdatesbye-mail.Ifyouwouldliketodothat,youneedtojointhePXGoogleGroup:-LogintoGooglewithyourpreferrede-mail.-Gotohttps://groups.google.com/group/proteomexchange/-Clickon"JointheGroup"button(theexactlocationdependsonyourpreferencesforhowthegroupsaredisplayedinyourwebbrowser).-Chooseyourpreferredoptionforreceivingthee-mailswiththenewdatasets.3)OnecansubscribetothefollowingRSSfeed:http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml

Page 14: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

14

11 AppendixIV:MembershipintheProteomeXchangeConsortiumApplications for recognition as archival resources are welcome, and will be decided upon by theProteomeXchangeconsortiumbasedonthefollowingkeycriteria:

1. Experienceandfundinglevelofresource.2. Stability.3. Availabilityofdedicatedcurationstaff.4. Abilitytostoreandmakeaccessiblerawdata,metadata,andinterpretations.5. Worldwideunrestrainedavailabilityofstoreddatasetsfordownload.

The last version of the ProteomeXchange collaborative agreement (which can be found athttp://www.proteomexchange.org/documents/proteomexchange-collaborative-agreement)describesthestepsneededtobecomeamemberoftheconsortium.