PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more...
Transcript of PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more...
Vaudel et al.: PeptideShaker - Supplementary Note 1
1
PeptideShaker enables reanalysis of mass spectrometry-
derived proteomics datasets
Marc Vaudel1,2, Julia M. Burkhart1, René P. Zahedi1, Eystein Oveland2,3,4, Frode S. Berven2,4,5,
Albert Sickmann1, Lennart Martens6,7,* and Harald Barsnes2,8
1 Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Dortmund, Germany
2 Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
3 Department of Clinical Medicine, University of Bergen, Bergen, Norway
4 The KG Jebsen Centre for MS-research, Department of Clinical Medicine, University of Bergen,
Bergen, Norway
5 The Norwegian Multiple Sclerosis Competence Centre, Department of Neurology, Haukeland
University Hospital, Bergen, Norway
6 Department of Medical Protein Research, VIB, Ghent, Belgium
7 Department of Biochemistry, Ghent University, Ghent, Belgium
8 Computational Biology Unit, University of Bergen, Norway
* Correspondence:
Prof. Dr. Lennart Martens
A. Baertsoenkaai 3
B-9000 Gent
Phone: +32 9 264 93 58
Fax: +32 9 264 94 84
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
2
Table of Contents
Introduction ......................................................................................................................................................................... 3
1.0 – Installation and Hardware Requirements .................................................................................................... 4
2.0 - Creating a New Project .......................................................................................................................................... 6
3.0 - PRIDE Reanalysis ..................................................................................................................................................... 8
4.0 - File Import ............................................................................................................................................................... 11
5.0 - Results Navigation ................................................................................................................................................ 14
6.0 - Validating Proteins, Peptides and PSMs ...................................................................................................... 28
7.0 - PTM Scoring ............................................................................................................................................................ 33
8.0 - Reports, Follow Up Analyses and Submissions to PRIDE .................................................................... 34
9.0 - Command Line Use ............................................................................................................................................... 41
10.0 - Documentation, Help, Support and Updates ........................................................................................... 42
References .......................................................................................................................................................................... 44
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
3
Introduction
The interpretation of protein identification results from identification algorithms can be
conducted in several software environments1, freeware or commercial, some delivered with the
instrument by the vendor. MaxQuant2, the TransProteomic pipeline3-4, OpenMS5-7, and ID Picker8
are examples of the efforts of the scientific community to provide free solutions for protein
identification. Here, we present PeptideShaker, an interface to assemble and inspect results from
tandem mass spectra identification algorithms.
PeptideShaker allows intuitive interpretation of peptide and protein mass spectrometry based
identification results. In combination with SearchGUI9, a user-friendly graphical user interface to
conduct proteomics searches, it provides a full identification solution for both locally generated
datasets and publicly available data in PRIDE10 via ProteomeXchange11.
For every identified protein, peptide and spectrum, PeptideShaker delivers useful information
like identification confidence, modification site(s) and external information via resources like
Ensembl12 or PDB13. All results can be exported in various reports, either for further follow up
analyses or for submission to PRIDE.
Notably, the use of PeptideShaker does not require extensive knowledge in bioinformatics.
PeptideShaker is fully documented and comes with contextual help and extended tutorials. By
detailing every step from data loading to results display, the present supplementary material
details how the peptides and proteins are inferred from search engine results and how these are
scored, displayed and connected with rich resources for protein identification.
Unless stated otherwise, the data and illustration here were obtained on the PeptideShaker
example dataset, a standard measurement of HeLa cell lysate as detailed elsewhere14 and freely
available in the ProteomeXchange15 consortium via the PRIDE10 partner repository under the
accession PXD000674.
If you have any questions about PeptideShaker, please do not hesitate to contact the developers at
the PeptideShaker Google Group: http://groups.google.com/group/peptide-shaker.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
4
1.0 – Installation and Hardware Requirements
PeptideShaker is an open source project developed in Java under the very permissive Apache2
open source license. The complete source code, cross-platform executables and additional
information is available at http://peptide-shaker.googlecode.com. The software does not require
installation beyond unzipping and then double clicking the downloaded file, and works on
Windows, Linux and Mac platforms. (For the first execution on newer Mac platforms, control-
click on the file icon and then select "Open." This will provide the option to run the file
regardless of its (unidentified) source. Help on usage on Mac can be found on the PeptideShaker
website.)
The only prerequisite to run PeptideShaker is that Java is installed. However, due to the large
size of modern proteomics datasets and databases, the software performance will depend on the
amount of available memory, the more memory the better the performance. When creating a
new project, it is recommended to provide at least 4 GB of memory for smaller projects
(<100,000 spectra, <100,000 protein sequences), while bigger projects will be more memory
demanding. Working with less than 4 GB of memory is supported; import time will however be
substantially extended. Memory settings can be edited from the interface directly under “Edit” >
“Java Options” or via the Welcome Dialog under “Settings and Help”. Note that Java 32-bit does
not support high memory settings. It is therefore strongly recommended to work on 64-bit
machines – the standard for all recent computers. Java 32-bit is often installed by default on 64-
bit machines, and it is then preferable to instead install Java 64-bit.
Help with Java installation and usage can be found here: http://code.google.com/p/compomics-
utilities/wiki/JavaTroubleShooting. When not all information can be loaded into memory, the
tool will interact with locally stored data. This process will be substantially sped up on SSD discs.
In general, read/write operations are the main speed limiting steps; it is thus advised to operate
on SSD discs.
Although the creation of a PeptideShaker project is computationally demanding, its opening and
viewing does not require equally powerful hardware capabilities. It is thus possible for mass
spectrometry labs to create the project on a high performance machine, save it and share it to
end users who can then inspect it on standard desktop computers. It is also possible to create
projects automatically on servers and clusters via the command line options of the tool (see 9.0 -
Command Line Use).
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
5
PeptideShaker is started by double clicking the jar file in the unzipped downloaded file. The
PeptideShaker Welcome Dialog is then displayed, see Supplementary Figure 1. From this
dialog it is possible to: (1) Create a New Project; (2) Open a Saved Project; (3) Start a Search
using SearchGUI; (4) Reshake a PRIDE Project, i.e., reprocess a dataset in PRIDE; (5) Open an
Example Project; and (6) See the Getting Started Mini Tutorial.
Supplementary Figure 1: PeptideShaker Welcome Dialog. This dialog is displayed when starting the tool and
can be used to start the processing of a dataset (see text for details).
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
6
2.0 - Creating a New Project
After selecting “New Project”, the user sets up the new project as displayed Supplementary
Figure 2.
Supplementary Figure 2: New Project Dialog were the user defines the new PeptideShaker project.
The setup includes: (i) annotation of the project –very useful for later reuse and sharing; (ii)
selection of the input files; and (iii) editing of processing parameters. The processing parameters
include: (a) the settings used for the search; (b) the import filters; and (c) the import
preferences. These parameters are important as they impact the protein and peptide result set
and can thus not be modified after the project has been created. The search settings are the
settings used for the search, and for SearchGUI results these are automatically loaded when
selecting the search result files.
Using import filters allow the removal of Peptide to Spectrum Matches (PSMs). Given that
identification quality is known to depend on sequence length16, it is possible to filter out
short/long peptides. Filters on precursor mass deviation as suggested by Beausoleil et al.17 is
also supported. Finally, as some modifications are not recognized by all search engines, e.g.,
PTMs located on protein termini or PTMs targeting motifs of several amino acids. It is possible to
use a comprehensive search – targeting all termini or a single amino acid – and refine the results
a posteriori by using only the modifications of interest. This is achieved by filtering the PTMs not
matching the PeptideShaker PTM definition and is activated by clicking “Exclude Unknown
PTMs”.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
7
The processing parameters include initial False Discovery Rate (FDR) validation thresholds –
which can be altered after the project has been created (see 6.0 - Validating Proteins, Peptides
and PSMs) and PTM scoring options (see 7.0 - PTM Scoring).
Clicking “Load Data!” starts the processing of the files and the creation of the PeptideShaker
project.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
8
3.0 - PRIDE Reanalysis
The “PRIDE Reshake“ option allows any scientist to easily reprocess datasets in PRIDE without
requiring advanced bioinformatics skills. After clicking the “PRIDE Reshake“ button in the
Welcome Dialog, the user can choose to reanalyze either public or private datasets. (Private
datasets require the input of username and password details.) In both cases the user will see the
list of available projects with the associated assays and files as shown in Supplementary Figure
3.
Supplementary Figure 3: PRIDE data selection. At the top the project to reanalyze is selected. The assays and
associated data files are shown in the tables below.
The user can search for specific projects using the advanced Find feature as shown in
Supplementary Figure 4.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
9
When the user has located the project to reanalyze, the "Reshake PRIDE Data" button is clicked.
This will open the Reshake Settings Dialog shown in Supplementary Figure 5, where the user
can customize the properties of the reanalysis.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
10
After clicking the "Start the Reshaking!" button, PeptideShaker starts downloading the data
file(s) and extract the spectra and search settings. (Missing information can be manually added
later in SearchGUI.)
PeptideShaker then starts SearchGUI where the user can edit the search parameters as displayed
in Supplementary Figure 6.
Supplementary Figure 4: It is possible to edit the inferred parameters to accurately reproduce the original
search or completely change the context of the dataset.
The search can now be started and the import in PeptideShaker is directly triggered.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
11
4.0 - File Import
Search engines assign peptide candidates derived from the protein database to every spectrum,
providing a score illustrating the quality of the match: generally an e-value. PeptideShaker
supports different types of import formats: X!Tandem18 t.xml files implemented in the BioML
format19, OMSSA20 omx files, MS Amanda21 csv files, Mascot22 dat files, and the PSI mzIdentML23
format. The latter, notably, allows importing search results from MS-GF+24, and from virtually
any identification algorithm if the minimal information required for import in PeptideShaker is
present in the file. Details on mzIdentML and Mascot files requirements can be found on the
PeptideShaker website.
First, the search engine results are parsed using open source Java parsers that are all published
and actively maintained25-28. The results are then loaded into an open source search engine
independent structure29, allowing PeptideShaker to manipulate, save and open this large
amount of information. PeptideShaker takes advantage of the target/decoy strategy16 to convert
the scores of the search engines into Posterior Error Probability (PEP) values as commonly done
in proteomics30. Note that peptides mapping to both target and decoy proteins are excluded
from the import.
For every peptide candidate, the product of the search engine PEPs is given as score and the
best, i.e., the lowest, scoring peptide is picked as the best candidate. Given that search engines
are known to encounter difficulties at localizing PTMs, a peptide is here defined by its amino
acid sequence and the number of PTMs without accounting for their location. Amino acids with a
mass difference lower than the fragment ion tolerance are considered as undistinguishable. If
two peptide candidates score equally, they are discriminated by: (i) the occurrence of their
parent protein in the dataset; (ii) the number of search engines supporting the peptide; (iii) the
number of fragment ions annotated in the spectrum; and (iv) the precursor mass error.
In order to improve the identification rate, the list of PSMs resulting from the search engine
combination is separated into groups according to the run (i.e. spectrum file) and identified
charge. Groups are created for every spectrum file and for charges from the lowest to the highest
only if the group size does not compromise statistical accuracy31: a group size is considered
sufficient if more than 100 target hits are present before the first decoy hit and if the estimated
PEP resolution is lower than 1%. A PEP is then estimated for every PSM based on the target and
decoy hit distributions in the charge specific group.
From these PSMs, a list of peptides is established. When two peptides differ only in the PTM
localization, they are considered as separate peptide identification entities only if the PTMs are
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
12
confidently localized (see 7.0 - PTM Scoring). A peptide score, , the product of the
PSM PEPs, is attached to every peptide:
(1)
Where is the estimated PEP of the ith PSM identifying the considered peptide. The peptides
are then grouped according to their modification status, again, only when the size of the groups
allows it. The size criterion is the same as for the PSM groups: more than 100 target hits before
the first decoy hit and the estimated PEP resolution lower than 1%. A peptide level PEP is
estimated based on the target and decoy hit distributions in the modification specific group.
Using the FASTA file provided by the user, every peptide is mapped to the parent protein
sequences. Here again, indistinguishable amino acids are considered as such based on the
fragment ion accuracy. Moreover, when a protein sequence contains X's the mapping will be
ignored if X's make up more than 25% of the peptide sequence. Subsequently, protein ambiguity
groups are created based on peptide unicity as introduced by Nesvizhskii32, such that peptides
are unique to a group. PeptideShaker scores the protein groups using the product of the
estimated peptide PEPs:
(2)
Where is the estimated PEP of the ith peptide identifying the considered protein group.
When an ambiguity group presents a subset (for example group “Proteins A or B or C” is
identified as well as group “Protein A or B”), the complete group (“Proteins A or B or C” in
example) is considered as unlikely and ignored if: (i) the additional proteins in the complete
group (Protein C in example) are only supported by non-enzymatic peptides (when searching
with an enzyme); (ii) are uncharacterized proteins or proteins with lower evidence (UniProt
accessions only); or (iii) the subset (“Proteins A or B” in example) scores better and is hence
more likely to be found. In these cases, the peptides are assigned to the subset.
Finally, the PEP of every protein group is estimated by comparing the target and decoy
distributions; a representative protein is selected for every group based on the peptide
enzymaticity (when searching with an enzyme) and the protein evidence (UniProt accessions
only) and description; and the peptides of complete groups (“Proteins A or B or C” in example)
are linked to all subsets (“Proteins A or B” in example).
During the processing of the data, the progress, including tips and warnings, is shown to the user
as displayed in Supplementary Figure 7.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
13
Supplementary Figure 5: The Waiting Dialog displays progress of the project creation process and also
displays tips and warnings.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
14
5.0 - Results Navigation
When a project has been created, PeptideShaker’s main interface displays the identification
results in a clear and intuitive fashion, allowing the user to easily navigate even large datasets.
The interface is divided into nine interconnected tabs. By default, the results are displayed in the
Overview tab as shown in Supplementary Figure 8.
Supplementary Figure 6: The interface of PeptideShaker consists of nine interconnected tabs corresponding
to different use cases. The Overview tab displays extended information of the identification matches and
allows the user to intuitively navigate the identification results.
At the top of the Overview tab, a table displays detailed information about the protein ambiguity
groups identified. The Protein Inference (PI) informs the user about the protein inference status
of the protein ambiguity group using different colors: (i) green for single proteins; (ii) yellow for
groups of related proteins; (iii) orange for groups of related and unrelated proteins; and (iv) red
for groups of unrelated proteins only. Proteins are considered as unrelated if their associated
genes (UniProt accessions only) or descriptions differ. When clicking the colored rectangle, the
user can inspect the protein inference status of the protein group as displayed Supplementary
Figure 9.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
15
Supplementary Figure 7: The Protein Inference Dialog allows the user to inspect the protein ambiguity
groups, here consisting of 53 proteins considered as related by PeptideShaker (Histocompatibility antigens).
The first table displays the proteins matched with information about the gene, chromosome, protein evidence
and peptide enzymaticity. The dialog also displays eventual unique hits which can be related to this group as
here Q29960 which was found with a very low score, and other protein groups related to this group. Note
that the user can alter both the protein group label and the representative protein.
The three next columns in the Overview protein table provide information about the selected
protein representative: protein accession, description and chromosome number. Chromosome
number is available when the species is selected and is obtained from Ensembl12. The species
can be selected when creating the project or can be set later via the Edit menu. Clicking the
chromosome number provides additional information about the gene associated by UniProt to
the representative protein of the protein ambiguity group: Ensembl Gene ID, Gene Name,
Chromosome, and GO annotation as displayed Supplementary Figure 10.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
16
Supplementary Figure 8: When clicking the chromosome number in the protein table, gene information
about the selected protein is extracted from Ensembl and displayed to the user.
Next, the protein coverage is displayed. The detected coverage (colored) is compared to the
expected observable coverage (grey). The latter is estimated based on the size distribution of the
identified peptides and the maximal size allowed for enzymatic peptides. The two following
columns represent the number of identified and validated peptides and PSMs for the protein
group (see 6.0 - Validating Proteins, Peptides and PSMs).
Note how PeptideShaker takes full advantage of so-called sparklines (http://en.wikipedia.org/
wiki/Sparklines) to make the coverage and the results of the statistical analysis easier to
interpret. Sparklines are used throughout the PeptideShaker tables using our open source
JSparklines library (http://jsparklines.googlecode.com), greatly enhancing the results
inspection.
Subsequently, a spectrum counting index is displayed. Although PeptideShaker is dedicated to
identification, it does provide spectrum counting metrics that allow for a rough estimation of
protein abundances directly from identification results33. PeptideShaker comes with a version of
the emPAI index34 and an improved version of the NSAF index35, chosen as the default option for
its accuracy36. NSAF is a simple and efficient method where the number of spectra for a
given protein is normalized by the protein length :
(3)
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
17
It should be noted that a major issue with most spectrum counting indexes is that they do not
take into account protein inference issues and cleavage efficiency. PeptideShaker thus
implements an improved version of the NSAF index, where the contribution of the ith validated
PSM is weighted by a protein inference coefficient
where is the number of protein
ambiguity groups where the matched peptide is included. Note that if a peptide is redundant in
the sequence of a representative protein, is increased accordingly. Moreover, the observable
length of the protein, as used for the observable coverage, , is used, thus discarding all
domains of the sequence which cannot generate detectable peptides:
(4)
Towards the right, the molecular weight (MW) of the representative protein is shown, and
finally the confidence attached to the protein group and its validation status is displayed.
When a protein group is selected, the peptides mapping to the selected group are displayed. As
for the proteins, the protein inference status of the peptide is color coded and detailed
information is accessible by clicking the colored rectangle. The peptide sequence, followed by
the peptide's location in the representative protein of the protein group is shown next. Note how
the graphical representation of the peptide localization in the protein sequence allows for
intuitive interpretation and that PTMs are color coded in the sequence. The use of white font on
a colored background indicates a confident PTM localization, while a colored font on a white
background indicates a non-confident PTM localization. The PTM color coding can be edited in
the search parameters. Finally, the results of the target/decoy statistical processing are
displayed as in the protein table.
Similarly, when a peptide is selected, the PSMs mapping to the peptide are displayed. The
colored coded column (SE) shows the agreement of the search engines for the given spectrum,
and clicking the colored rectangles will show the details for each search engine for the given
spectrum. Subsequently, detailed information about the match is displayed: sequence with color
coded PTMs, charge, precursor mass error, confidence, and validation status.
The spectrum corresponding to the selected PSM is displayed at the bottom right with user
customizable fragment ion annotation. The three spectrum sub plots above the spectrum make it
easier to assess the quality of the match: (i) the intensity of every fragment ion is displayed
relative to the peptide sequence for forward (blue, down) and rewind (red, up) ions; (ii) a
histogram of the intensities of the annotated (green) and the non-annotated (grey) peaks; and
(ii) the fragment ion m/z error is plotted against the peak m/z for forward (blue) and rewind
(red) ions. The latter allows for straightforward detection of calibration issues. The user can
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
18
fully customize the spectrum annotation and display like color and appearance of the peaks,
benefit from the advanced annotation as shown in Supplementary Figure 11, and export the
plots in publication-grade quality.
Supplementary Figure 9: A spectrum as displayed in the Overview tab. The intensity of every fragment ion is
displayed relative to the peptide sequence at the top left for forward (blue, down) and rewind (red, up) ions.
In the top middle, a histogram of the intensities of the annotated (green) and the non-annotated (grey) peaks
is displayed. And on the top right the fragment ion m/z error is plotted against the peak m/z for forward
(blue) and rewind (red) ions. The spectrum is displayed with fully customizable annotation as exemplified
here by the overlay of automated de novo sequencing using the selected PSM.
PeptideShaker also provides visualizations of multiple PSMs. When selecting different PSMs
simultaneously, spectra are displayed in a so-called planetary system view where the x-axis
represents the m/z of the spectrum, the y-axis the m/z error and the size of the data point
represents the intensity of the peak. This display makes it easy to spot outliers or mass
calibration issues as displayed in Supplementary Figure 12.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
19
Supplementary Figure 10: The planetary system view allows the comparison of multiple measurements
(PSMs) of the same peptide within the same plot. For every fragment ion of every PSM matched in the
respective spectrum, the x-axis represents the m/z of the annotated peak, the y-axis the m/z error and the
size of the data point represents the intensity of the peak. This display makes it easy to inspect the spectrum
reproducibility and detect outliers or mass deviation issues. Here the second PSM in blue seems to deviate
from the others, showing a typical ppm error.
The annotated peaks can also be visualized in a table as displayed in Supplementary Figure 13.
Supplementary Figure 11: For a given PSM, the fragment ion matches can be plotted in a table conveniently
showing the m/z values of the detected ions.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
20
Selecting multiple PSMs shows the reproducibility of the spectrum acquisition displayed in
intensities using the summed spectrum intensity for normalization as displayed in
Supplementary Figure 14.
Supplementary Figure 12: When selecting multiple PSMs, the intensities of the matched peaks can be
displayed with error bars indicating the variability of spectrum acquisitions.
At the bottom of the Overview panel, the sequence of the representative protein of the selected
protein group is displayed, where colors represent the areas of the sequence covered by the
experiment, and grey the coverable areas of the sequence according to the chosen enzyme and
identified peptide sizes distribution. Green, yellow and red indicate the areas of the sequence
covered by confident, doubtful and not validated peptide matches (see 6.0 - Validating
Proteins, Peptides and PSMs). As displayed in Supplementary Figure 15, the PTMs are also
localized and color coded, and clicking on an area of the sequence can be used to select the given
peptide.
Supplementary Figure 13: The protein coverage panel displays the representative protein of the selected
protein group, as demonstrated here with protein of accession P49588 of the example dataset of
PeptideShaker (968 amino acids). Here 17.56% of an expected 91.84% coverage is observed as displayed in
color and grey, respectively. 12.6%, 3.2%, and 1.76% of the coverage was achieved using confident, doubtful,
and not validated peptide matches (green, yellow and red areas), respectively. The currently selected peptide
is displayed in blue and the modifications are color coded (here blue for oxidation of methionine). When
clicking in the sequence, as here done between amino acids 880 and 899 the identified peptides can be
selected. Here the peptide MHSPQTSAMLFTVDNEAGK was found with and without oxidation of methionine 9.
According to the PTM localization scores (see 7.0 - PTM Scoring), the localization of the oxidation is
confident as indicated by the colored background.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
21
The other PeptideShaker tabs cover other use cases with the same focus on intuitive navigability
of the identification results: (i) the Spectrum IDs tab (Supplementary Figure 16) allows the
user to compare the results of the different search engines; (ii) the Fractions tab
(Supplementary Figure 17) displays the contributions of different fractions to the final results;
(iii) the Modifications tab (Supplementary Figure 18) makes it possible to browse peptides
carrying certain variable modifications and inspect the results of the localization scoring
algorithms; (iv) the 3D Structures tab (Supplementary Figure 19) maps the identification
results onto 3D structures from PDB; (v) the Annotation tab (Supplementary Figure 20)
connects the identification results to various external resources like Reactome37 or STRING38;
(vi) the GO analysis tab (Supplementary Figure 21) displays Gene Ontology statistics of the
dataset; (viii) the Validation tab (Supplementary Figure 22) allows the inspection of the
target/decoy results and adapting the validation threshold; and (ix) QC Plots tab
(Supplementary Figure 23) displays different quality controls metrics on the identified
proteins, peptides and PSMs.
For further details on the tabs see the figure legends below the figures on the following pages.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
22
Supplementary Figure 14: The Spectrum IDs tab allows the user to compare the results of the different search
engines. The table at the top shows the identification results retained by PeptideShaker for a given spectrum
file with extended information on the PSM retained for every spectrum. The center-left panel shows the
match retained by PeptideShaker followed by a table listing all matches from all identification algorithms
including secondary hits. The confidence a secondary hit would have obtained if retained as first hit is
displayed allowing hits comparison accross search engines. The center-left panel displays the spectrum of the
selected PSM, it is thus possible to inspect the spectrum annotation of a secondary hit. At the bottom, several
plots display identification algorithms performance with respect to the results displayed in PeptideShaker,
alowing straightforward control of the performance of all algorithms. From the left to the right: (1) the
number of PSMs a given algorithm would provide if used alone, (2) the number of PSMs of the merged dataset
which can be ascribed to a single algorithm only, (3) the number of unassigned spectra a given algorithm
would lead to if used alone, and (4) the identification yield a given algorithm would lead to if used alone. Note
that these metrics are biased toward search engine agreement by the strategy used to select the best match
of every spectrum.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
23
Supplementary Figure 15: If loading mulitple spectrum files, the Fractions tab indicates in which spectrum
files the different proteins would be validated if not found in the other files. Various plots are available to
visualize the distrubution of each protein across the spectrum files, e.g., the number of peptides per spectrum
file as shown here. It should be noted that the plots only give an indication of the protein distrubtion across
the spectrum files, mainly for inspecting fractionated data, and should not be used as a comparison of distinct
experiments where a more advanced processing is required39.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
24
Supplementary Figure 16: The Modifications tab supports the browsing of the peptides carrying variable
modifications. The modification of interest is selected in the top left and the modified peptides of the selected
category are displayed in the top right table. When a modified peptide is selected, PeptideShaker looks for
any related peptides where related peptides carry other modifications or present different cleavage sites.
Here, a phosphorylated peptide is selected, and the same version with no modification and with an oxidation
of methionine is found. The confidence in the PTM localization is color coded and the results of the
localization scores are displayed on the sequence. More information on the localization is available when
clicking in the PTM column or by selecting the desired score tab under the spectrum.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
25
Supplementary Figure 17: The 3D Structures tab maps the identified peptides onto the 3D structure of the
protein from the PDB13 using Jmol, an open-source Java viewer for chemical structures in 3D
(http://www.jmol.org). Identified peptides are shown in green, the selected peptide in blue, and the PTMs are
color coded and annotated on the structure.
Supplementary Figure 18: The Annotation tab allows querying various external resources with results from
PeptideShaker. Using the protein accession number, the external resources can be directly searched as
illustrated here with a Reactome pathway and a STRING interation network.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
26
Supplementary Figure 19: The GO Analysis tab provides Gene Ontology statistics on the validated proteins,
highlighting the terms that are significantly more or less frequent in the given dataset compared to the
annotation of the species in Ensembl using a hypergeometric test.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
27
Supplementary Figure 20: The Validation tab allows the user to inspect the target/decoy results for proteins,
peptides and PSMs as selected in the top left table. False positive and negative rates are displayed and can be
optimized in an intuitive cost/benefit way.
Supplementary Figure 21: the QC Plots tab allows the user to quality control several parameters of the
protein peptide and PSM identification. Here for example, the distribution of the precursor mass error is
displayed, and it is then easy to see the instrument resolution.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
28
6.0 - Validating Proteins, Peptides and PSMs
PeptideShaker takes full advantage of the target/decoy strategy to allow the user to extensively
control the quality of the identifications. As already mentioned, the confidence in identification
matches is displayed as the complement of the PEP:
(5)
As an example, 100 matches with a confidence of 90% contain 10 false positives.
In proteomics, identification results are typically validated at a given False Discovery Rate
(FDR). The FDR indicates the share of false positive matches in the result set:
(6)
Where represents the count of target false positives and N the number of retained target
hits. As an example, a result set of 100 matches with an FDR of 1% will contain only one false
positive.
The number of false positives can be equivalently estimated via the number of decoy hits or
using the expectation value of the PEP . PeptideShaker hence provides two estimations of the
FDR:
(7)
(8)
By default the classical estimator of equation (7) is used, the user can switch to the probabilistic
FDR in the validation tab to estimate the FDR using the posterior error probability (8). Similarly,
the number of false negatives, , can be estimated by integrating the confidence.
Complementarily to the FDR, the False Negative Rate (FNR) of the identification process is hence
also estimated:
(9)
Where is the estimated number of true positives in the imported dataset. As an example,
when 100 hits are validated at an FNR of 20%, one can expect a total of 125 true positive hits
among which 25 were rejected by the threshold: with an FNR threshold of 1%, one covers 99%
of the possible true positive identifications loaded in PeptideShaker.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
29
PeptideShaker allows the user to intuitively edit the validation thresholds in the Validation tab
and whether a match is validated or not is shown throughout the display. The validation process,
a balance between quality and quantity, between FDR and FNR, is crucial for all downstream
analyses.
The accuracy of the false positive rate estimations was benchmarked by searching a pyrococcus
furiosus dataset against a database consisting of the concatenation of pyrococcus furiosus
sequences with the eukaryota complement of the UniProt/SwissProt database40, downloaded on
the 21st of October 2013, 181,026 (target) sequences, including the reversed version of every
sequence as decoy proteins. In this setup, eukaryote sequences (excluding known contaminants)
can be considered as false identifications while pyrococcus furiosus sequences can be considered
as correct matches, hence allowing the verification of target/decoy derived error rates41.
Peak lists obtained in41 were searched using OMSSA20 version 2.1.9, X!Tandem18 version
Sledgehammer (2013.09.01.1), MS Amanda21 version 1.0.0.3120 and MS-GF+24 version Beta
(v10024) (5/9/2014). The search was conducted using SearchGUI9 version 1.18.7. The
identification settings were as follows: Trypsin with a maximum of 2 missed cleavages; 10 ppm
as MS1 and 0.5 Da as MS2 tolerances; fixed modifications: Carbamidomethylation of Cys
(+57.021464 Da), variable modifications: Oxidation of Met (+15.994915 Da), Phosphorylation of
Ser, Thr and (+79.966331 Da). All algorithms specific settings were left to the default of
SearchGUI. The mass spectrometry data along with the identification results have been
deposited to the ProteomeXchange Consortium15 via the PRIDE partner repository10 with the
dataset identifier PXD001077.
As demonstrated in Supplementary Figures 24 to 26, the error rate estimated by
PeptideShaker accurately tracks the actual error rate for PSMs, peptides and proteins. As
discussed in the literature41-43, marginal underestimation of the error rate estimation can be
ascribed to the second refinement procedure of X!Tandem.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
30
Supplementary Figure 22: A pyrococcus furiosus dataset was searched against a concatenation of eukaryota
and pyrococcus furiosus sequences. The number of retained decoy proteins is plotted against the number of
identified eukaryote proteins at increasing protein score. In this setup, the number of eukaryota proteins
indicates the number of false identifications.
Supplementary Figure 23: Similarly as Supplementary Figure 24, the number of retained decoy peptides is
plotted against the number of identified eukaryote peptides at increasing peptide score. In order to increase
the identification rate, peptides were separated into two groups: modified and unmodified peptides. If a
category of modified peptides is substantially enriched, a standalone group is automatically created by
PeptideShaker (see main text for details).
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
0 500 1,000 1,500 2,000
Nu
mb
er o
f D
eco
y P
rote
ins
Number of Eukaryota Proteins
Proteins
y=x
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 500 1000 1500 2000
Nu
mb
er o
f D
eco
y P
ep
tid
es
Number of Eukaryota Peptides
unmodified peptides
modified peptides
peptides
y=x
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
31
Supplementary Figure 24: Similarly as Supplementary Figure 25, the number of retained decoy PSMs is
plotted against the number of identified eukaryote PSMs at increasing PSM score. In order to increase the
identification rate, PSMs were separated into three groups: PSMs identified with a charge of 2+, 3+ and with
charge >3+ (see main text for details).
In addition to the statistical validation, one generally expects proteins to be identified with at
least two confident peptides in large scale proteomic shotgun experiments (note that this does
not apply for dataset enriched for specific species like modified peptides). Similarly as Peptizer
for PSMs44, PeptideShaker therefore inspects all the validated matches using quality filters and
doubtful matches are flagged by a yellow warning icon throughout the display. Quality filters are
applied to all PSMs, peptide and protein levels. Since confident peptides require confident PSMs,
and in turn confident proteins require confident peptides, the quality of the identification
propagates from PSM to peptide and protein levels providing the user with stringently quality
controlled results. When the database or the dataset does not allow for reliable statistical
estimation (e.g., when searching small databases or identifying a low number of proteins) the
validation status is similarly marked as doubtful.
As displayed in Supplementary Figure 27, clicking on the yellow warning icon opens a dialog
with details on the validation status and allows the user to set the status of a validated match as
confident or doubtful.
0
500
1,000
1,500
2,000
2,500
3,000
0 500 1,000 1,500 2,000 2,500 3,000
Nu
mb
er
of
De
coy
PSM
s
Number of Eukaryota PSMs
2+ PSMs
3+ PSMs
4+ and 5+ PSMs
PSMs
y=x
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
32
Supplementary Figure 25: In addition to the statistical validation, PeptideShaker inspects all identification
matches using quality filters. Doubtful matches are marked with a yellow warning icon. Clicking on the icon
opens a dialog allowing the inspection of the validation procedure for this match. Details concerning the
database, the target/decoy scoring and results, and the inspection by quality filters. Here, the protein
identification was supported by only one confident peptide and only one confident spectrum and was thus
marked as doubtful.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
33
7.0 - PTM Scoring
In proteomics, there are two main paradigms for PTM localization scoring45: (i) using the
identification algorithm score difference between assumptions carrying PTMs at different sites
as a proxy to infer the quality of the localization; or (ii) using probabilistic scores to estimate the
probability that a site is actually modified, these scores are inferred from the original spectra
and are independent from the identification algorithm. PeptideShaker implements the widely
used MD-score46 and its multiple search engine equivalent, the D-score47. Complementarily, two
probabilistic scores are implemented, the A-score17 and PhosphoRS48. Although originally
designed for phosphorylation only, these scores are estimated for every variable modification
which can take different sites in a peptide. Notably, the peak annotation method used for these
scores is the same as the one used to annotate the spectra in the interface, thus allowing visual
inspection of site determining ions in spectra. During the scoring, neutral losses of the same
mass as the PTM itself are not taken into account for spectrum annotation. In our experience,
disabling all neutral losses (e.g. H2O and NH3) increases the discrimination power of the
probabilistic scores.
As mentioned in the project creation section, the selection of the probabilistic score is done in
the processing parameters dialog. There, it is also possible to disable neutral losses annotation
for the scores and set a score threshold. Finally, one can enable an automated threshold which
will be calibrated to a 99% agreement with the D-score. Whenever a peptide presents more
modification sites than detected modifications and the probabilistic score passes the threshold,
the modification is marked as Confident. It is labeled as Doubtful if it does not pass the threshold
and as Random if different modification sites score equally. If no probabilistic score is calculated,
a D-score of 95% is used as the threshold.
Note that the A-score was only defined for singly modified peptides. The A-score was also
established for spectra searched with a tolerance of ±0.5 m/z thus using a subdivision in
windows of 100 m/z: the window size equals 100 times the fragment mass resolution. In our
implementation, we kept the window size and tolerance to this original ratio. As a result, for
data searched with a tolerance of 0.02 m/z the spectrum will be subdivided into windows of
4 m/z and not 100 m/z as done in PhosphoRS48. PhosphoRS was implemented according to its
original publication48.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
34
8.0 - Reports, Follow Up Analyses and Submissions to PRIDE
All the generated data can be exported to text files which can subsequently be imported into
external tools like Excel or Perseus (http://www.perseus-framework.org). Contextual export
options are additionally available from most tables or displays. Identification details can also be
exported under the Export > Identification Features menu, where the user can select information
about protein, peptide, PSM or search engine specific result to be exported to a text file. A
phosphorylation oriented summary is also available. Finally, fully customizable reports can
export virtually any information from PeptideShaker.
Five reports are available by default: (i) Certificate Analysis: all search and processing
parameters with statistics on the dataset – a crucial feature for service providers and
publications; (ii) Default PSM Report: all identification information at the PSM level; (iii) Default
Peptide Report: all identification information at the peptide level; (iv) Default Protein Report: all
identification information at the protein level; and (v) Default Hierarchical Report: all
identification results presented in a hierarchical manner: protein > peptide > PSM. Note that the
user can create and customize his/her own reports as displayed in Supplementary Figure 28.
Documentation with description of the report content can also be exported for every report.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
35
Supplementary Figure 26: In the Report Dialog the user can select the identification features to export. These
include Annotation Settings, Input Filters, Proteins, Peptides, PSMs, PTMs, Fragment Ion Information, Project
Details, Search Parameters, Spectrum Counting Settings and Target/Decoy Validation Summary. From each
of these category, the user can select the wanted elements as displayed here for proteins - ranging from high
level information (accession, chromosome, sequence coverage, PTM mapping, etc.) to detailed mass
spectrometry results, like detected fragment ion m/z error.
PeptideShaker also offers various exports for post processing of the identifications, as displayed
in Supplementary Figure 29. These are available under the Export > Follow Up Analysis menu.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
36
Supplementary Figure 27: The Follow Up Analysis options in PeptideShaker include: (i) export of the spectra
of non-validated matches or recalibrated spectra (at the MS1 and/or MS2 level) of the validated matches; (ii)
export of accessions and sequences of validated or not validated proteins; (iii) export to the popular
Progenesis LC-MS label free quantification software; (iv) export to graph databases like Cytoscape49; (v)
export of inclusion/exclusion lists for various instruments; and (vi) export of libraries in the SWATH format
as specified by the manufacturer.
Notably, the recalibration feature can be used when calibration issues are detected (as
mentioned in the Results Navigation section) as for instance often encountered on Time Of Flight
(TOF) instruments.
Also note that the export to graph databases provides a unique intuitive view on complex
datasets with a graphical approach to the protein inference problem as displayed in
Supplementary Figure 30.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
37
Supplementary Figure 28: By supporting export to graph database formats like Cytoscape, PeptideShaker
allows for an intuitive visalization of protein inference problems, here represented by two examples from the
example dataset - proteins in red, peptides in blue. On the left, an ideal case where different peptides map
uniquely to a single protein accession. On the right, individual proteins are supported both by unique and
shared peptides.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
38
PeptideShaker offers the possibility to export a summary of the methods used for identification
under the Export > Methods Section menu. A text is automatically generated and can serve as a
basis for inclusion in the methods section of manuscripts as illustrated in Supplementary
Figure 31 (see section 6.0 - Validating Proteins, Peptides and PSMs for an example of
application).
Supplementary Figure 29: PeptideShaker generates automatically a draft of the methods used for protein
identification. This text can help for manuscript writing.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
39
The PeptideShaker project can be exported in various forms via the Export > PeptideShaker
Project As menu: (i) Zip File: a zip file containing the saved project and all related files, (ii)
mzidentML: the identification results in the PSI50 mzidentML23 format, and (iii) PRIDE XML: the
peak lists and identification results in the standard PRIDE XML format. When exporting the
entire project, the zip file can be shared and opened in the interface upon unzipping. Both
mzidentML and PRIDE XML files allow direct submission to PRIDE via ProteomeXchange
(http://www.proteomexchange.org) within a few clicks.
When creating an mzIdentML file, a dialog allows the annotation of the project as displayed in
Supplementary Figure 32. Subsequently, a valid, fully annotated and very close to MIAPE51
compliant mzidentML file is created.
Supplementary Figure 30: The mzidentML Export Dialog makes is easy to annotate and export mzIdentML
files that can readily be submitted to PRIDE via ProteomeXchange.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
40
Similarly, when creating a PRIDE XML file, the user is guided through the adding of the required
meta data annotation by a user friendly interface as displayed in Supplementary Figure 33.
The generated file is comprehensively annotated by information available in PeptideShaker and
by user input using controlled vocabulary, facilitated by the Ontology Lookup Service52.
Supplementary Figure 31: The PRIDE Export Dialog makes is easy to create well-annotated PRIDE XML file
that can readily be submitted to PRIDE via ProteomeXchange.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
41
9.0 - Command Line Use
PeptideShaker can also be used via the command line and hence run in automated batch mode.
Different command line modes are available: (i) PeptideShakerCLI: process identification files
and saves the project; (ii) ReportCLI: take a saved project as input and export the results as
default reports or as custom reports; (iii) FollowUpCLI: take a saved project as input and export
the previously described follow up features; (iv) MzidCLI: takes a saved project as input with the
relevant annotation and exports an mzidentML file.
Note that all command line options for ReportCLI, FollowUpCLI and MzidCLI can also be used in
the PeptideShakerCLI mode. Detailed information about the parameters can be found on the
PeptideShaker website (http://code.google.com/p/peptide-shaker/wiki/PeptideShakerCLI).
Notably, the PeptideShaker command line version has already been included in Galaxy53 by an
independent third party (https://bitbucket.org/galaxyp/peptideshaker).
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
42
10.0 - Documentation, Help, Support and Updates
Contextual help is available everywhere in the interface in the form of question marks as
displayed in Supplementary Figure 34.
Supplementary Figure 32: Question marks are present everywhere in the interface triggering contextual help.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
43
The help links to external resources, publications and additional information on the
PeptideShaker website. Beginners can also have a look at our general protein identification
tutorials14: http://compomics.com/bioinformatics-for-proteomics/identification. All these
resources are kept up-to-date with the development process of the software.
For other questions there is also an active discussion group: http://groups.google.com/group/
peptide-shaker.
Creating bug reports is easy via the Bug Report Dialog, Help > Bug Report, as displayed in
Supplementary Figure 35. Please use the issue tracker at the PeptideShaker web page to
report issues.
Supplementary Figure 33: If encountering a problem the user is directed to online help directly from the
interface, or a bug report with details can be sent to the developers for faster bug fixing.
New versions are regularly released including bug fixes and new features. If an internet
connection is available, the user is notified and an auto-update is proposed. Changes are
documented for every version on the PeptideShaker website in the Release Note wiki
(https://code.google.com/p/peptide-shaker/wiki/ReleaseNotes) and announced on the
mailing list.
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
44
References
1. Vaudel, M., Sickmann, A. & Martens, L. Current methods for global proteome identification. Expert review of proteomics 9, 519-532 (2012).
2. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature biotechnology 26, 1367-1372 (2008).
3. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical chemistry 74, 5383-5392 (2002).
4. Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 1150-1159 (2010).
5. Kohlbacher, O. et al. TOPP--the OpenMS proteomics pipeline. Bioinformatics 23, e191-197 (2007).
6. Bertsch, A., Gropl, C., Reinert, K. & Kohlbacher, O. OpenMS and TOPP: open source software for LC-MS data analysis. Methods in molecular biology 696, 353-367 (2011).
7. Sturm, M. et al. OpenMS - an open-source software framework for mass spectrometry. BMC bioinformatics 9, 163 (2008).
8. Ma, Z.Q. et al. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. Journal of proteome research 8, 3872-3881 (2009).
9. Vaudel, M., Barsnes, H., Berven, F.S., Sickmann, A. & Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11, 996-999 (2011).
10. Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 3537-3545 (2005).
11. Juan A Vizcaíno et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32, 223–226 (2014).
12. Flicek, P. et al. Ensembl 2011. Nucleic acids research 39, D800-806 (2011). 13. Sussman, J.L. et al. Protein Data Bank (PDB): database of three-dimensional structural
information of biological macromolecules. Acta crystallographica. Section D, Biological crystallography 54, 1078-1084 (1998).
14. Vaudel, M. et al. Shedding light on black boxes in protein identification. Proteomics 14, 1001-1005 (2014).
15. Vizcaino, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotech 32, 223-226 (2014).
16. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for mass spectrometry-based proteomics. Methods in molecular biology 604, 55-71 (2010).
17. Beausoleil, S.A., Villen, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nature biotechnology 24, 1285-1292 (2006).
18. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466-1467 (2004).
19. Fenyo, D. The Biopolymer Markup Language. Bioinformatics 15, 339-340 (1999). 20. Geer, L.Y. et al. Open mass spectrometry search algorithm. Journal of proteome research 3,
958-964 (2004). 21. Dorfer, V. et al. MS Amanda, a Universal Identification Algorithm Optimized for High
Accuracy Tandem Mass Spectra. J Proteome Res (2014). 22. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein
identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-3567 (1999).
23. Jones, A.R. et al. The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & cellular proteomics : MCP 11, M111 014381 (2012).
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
45
24. Kim, S. et al. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Molecular & cellular proteomics : MCP 9, 2840-2852 (2010).
25. Helsens, K., Martens, L., Vandekerckhove, J. & Gevaert, K. MascotDatfile: an open-source library to fully parse and analyse MASCOT MS/MS search results. Proteomics 7, 364-366 (2007).
26. Barsnes, H., Huber, S., Sickmann, A., Eidhammer, I. & Martens, L. OMSSA Parser: an open-source library to parse and extract data from OMSSA MS/MS search results. Proteomics 9, 3772-3774 (2009).
27. Muth, T., Vaudel, M., Barsnes, H., Martens, L. & Sickmann, A. XTandem Parser: an open-source library to parse and analyse X!Tandem MS/MS search results. Proteomics 10, 1522-1524 (2010).
28. Griss, J., Reisinger, F., Hermjakob, H. & Vizcaino, J.A. jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats. Proteomics 12, 795-798 (2012).
29. Barsnes, H. et al. compomics-utilities: an open-source Java library for computational proteomics. BMC bioinformatics 12, 70 (2011).
30. Nesvizhskii, A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of proteomics 73, 2092-2123 (2010).
31. Vaudel, M., Burkhart, J.M., Sickmann, A., Martens, L. & Zahedi, R.P. Peptide identification quality control. Proteomics 11, 2105-2114 (2011).
32. Nesvizhskii, A.I. & Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Molecular & cellular proteomics : MCP 4, 1419-1440 (2005).
33. Vaudel, M., Sickmann, A. & Martens, L. Peptide and protein quantification: a map of the minefield. Proteomics 10, 650-670 (2010).
34. Ishihama, Y. et al. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4, 1265-1272 (2005).
35. Paoletti, A.C. et al. Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proc Natl Acad Sci U S A 103, 18928-18933 (2006).
36. Colaert, N., Gevaert, K. & Martens, L. RIBAR and xRIBAR: Methods for reproducible relative MS/MS-based label-free protein quantification. J Proteome Res 10, 3183-3189 (2011).
37. Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic acids research 39, D691-697 (2011).
38. von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic acids research 31, 258-261 (2003).
39. Vaudel, M., Sickmann, A. & Martens, L. Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics. Biochimica et biophysica acta 1844, 12-20 (2014).
40. Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic acids research 32, D115-119 (2004).
41. Vaudel, M. et al. A complex standard for protein identification, designed by evolution. Journal of proteome research 11, 5065-5071 (2012).
42. Everett, L.J., Bierl, C. & Master, S.R. Unbiased statistical analysis for multi-stage proteomic search strategies. Journal of proteome research 9, 700-707 (2010).
43. Bern, M. & Kil, Y.J. Comment on "Unbiased statistical analysis for multi-stage proteomic search strategies". Journal of proteome research 10, 2123-2127 (2011).
44. Helsens, K., Timmerman, E., Vandekerckhove, J., Gevaert, K. & Martens, L. Peptizer, a tool for assessing false positive peptide identifications and manually validating selected results. Molecular & cellular proteomics : MCP 7, 2364-2372 (2008).
45. Chalkley, R.J. & Clauser, K.R. Modification site localization scoring: strategies and performance. Molecular & cellular proteomics : MCP 11, 3-14 (2012).
Nature Biotechnology: doi:10.1038/nbt.3109
Vaudel et al.: PeptideShaker - Supplementary Note 1
46
46. Savitski, M.M. et al. Confident phosphorylation site localization using the Mascot Delta Score. Molecular & cellular proteomics : MCP 10, M110 003830 (2011).
47. Vaudel, M. et al. D-score: a search engine independent MD-score. Proteomics 13, 1036-1041 (2013).
48. Taus, T. et al. Universal and confident phosphorylation site localization using phosphoRS. Journal of proteome research 10, 5354-5362 (2011).
49. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 13, 2498-2504 (2003).
50. Orchard, S., Hermjakob, H. & Apweiler, R. The proteomics standards initiative. Proteomics 3, 1374-1376 (2003).
51. Taylor, C.F. et al. The minimum information about a proteomics experiment (MIAPE). Nature biotechnology 25, 887-893 (2007).
52. Barsnes, H., Cote, R.G., Eidhammer, I. & Martens, L. OLS dialog: an open-source front end to the ontology lookup service. BMC bioinformatics 11, 34 (2010).
53. Goecks, J., Nekrutenko, A., Taylor, J. & Galaxy, T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11, R86 (2010).
Nature Biotechnology: doi:10.1038/nbt.3109