PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more...

46
Vaudel et al.: PeptideShaker - Supplementary Note 1 1 PeptideShaker enables reanalysis of mass spectrometry- derived proteomics datasets Marc Vaudel 1,2 , Julia M. Burkhart 1 , René P. Zahedi 1 , Eystein Oveland 2,3,4 , Frode S. Berven 2,4,5 , Albert Sickmann 1 , Lennart Martens 6,7,* and Harald Barsnes 2,8 1 Leibniz-Institut für Analytische Wissenschaften ISAS e.V., Dortmund, Germany 2 Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway 3 Department of Clinical Medicine, University of Bergen, Bergen, Norway 4 The KG Jebsen Centre for MS-research, Department of Clinical Medicine, University of Bergen, Bergen, Norway 5 The Norwegian Multiple Sclerosis Competence Centre, Department of Neurology, Haukeland University Hospital, Bergen, Norway 6 Department of Medical Protein Research, VIB, Ghent, Belgium 7 Department of Biochemistry, Ghent University, Ghent, Belgium 8 Computational Biology Unit, University of Bergen, Norway * Correspondence: Prof. Dr. Lennart Martens A. Baertsoenkaai 3 B-9000 Gent Phone: +32 9 264 93 58 Fax: +32 9 264 94 84 [email protected] Nature Biotechnology: doi:10.1038/nbt.3109

Transcript of PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more...

Page 1: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

1

PeptideShaker enables reanalysis of mass spectrometry-

derived proteomics datasets

Marc Vaudel1,2, Julia M. Burkhart1, René P. Zahedi1, Eystein Oveland2,3,4, Frode S. Berven2,4,5,

Albert Sickmann1, Lennart Martens6,7,* and Harald Barsnes2,8

1 Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Dortmund, Germany

2 Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway

3 Department of Clinical Medicine, University of Bergen, Bergen, Norway

4 The KG Jebsen Centre for MS-research, Department of Clinical Medicine, University of Bergen,

Bergen, Norway

5 The Norwegian Multiple Sclerosis Competence Centre, Department of Neurology, Haukeland

University Hospital, Bergen, Norway

6 Department of Medical Protein Research, VIB, Ghent, Belgium

7 Department of Biochemistry, Ghent University, Ghent, Belgium

8 Computational Biology Unit, University of Bergen, Norway

* Correspondence:

Prof. Dr. Lennart Martens

A. Baertsoenkaai 3

B-9000 Gent

Phone: +32 9 264 93 58

Fax: +32 9 264 94 84

[email protected]

Nature Biotechnology: doi:10.1038/nbt.3109

Page 2: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

2

Table of Contents

Introduction ......................................................................................................................................................................... 3

1.0 – Installation and Hardware Requirements .................................................................................................... 4

2.0 - Creating a New Project .......................................................................................................................................... 6

3.0 - PRIDE Reanalysis ..................................................................................................................................................... 8

4.0 - File Import ............................................................................................................................................................... 11

5.0 - Results Navigation ................................................................................................................................................ 14

6.0 - Validating Proteins, Peptides and PSMs ...................................................................................................... 28

7.0 - PTM Scoring ............................................................................................................................................................ 33

8.0 - Reports, Follow Up Analyses and Submissions to PRIDE .................................................................... 34

9.0 - Command Line Use ............................................................................................................................................... 41

10.0 - Documentation, Help, Support and Updates ........................................................................................... 42

References .......................................................................................................................................................................... 44

Nature Biotechnology: doi:10.1038/nbt.3109

Page 3: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

3

Introduction

The interpretation of protein identification results from identification algorithms can be

conducted in several software environments1, freeware or commercial, some delivered with the

instrument by the vendor. MaxQuant2, the TransProteomic pipeline3-4, OpenMS5-7, and ID Picker8

are examples of the efforts of the scientific community to provide free solutions for protein

identification. Here, we present PeptideShaker, an interface to assemble and inspect results from

tandem mass spectra identification algorithms.

PeptideShaker allows intuitive interpretation of peptide and protein mass spectrometry based

identification results. In combination with SearchGUI9, a user-friendly graphical user interface to

conduct proteomics searches, it provides a full identification solution for both locally generated

datasets and publicly available data in PRIDE10 via ProteomeXchange11.

For every identified protein, peptide and spectrum, PeptideShaker delivers useful information

like identification confidence, modification site(s) and external information via resources like

Ensembl12 or PDB13. All results can be exported in various reports, either for further follow up

analyses or for submission to PRIDE.

Notably, the use of PeptideShaker does not require extensive knowledge in bioinformatics.

PeptideShaker is fully documented and comes with contextual help and extended tutorials. By

detailing every step from data loading to results display, the present supplementary material

details how the peptides and proteins are inferred from search engine results and how these are

scored, displayed and connected with rich resources for protein identification.

Unless stated otherwise, the data and illustration here were obtained on the PeptideShaker

example dataset, a standard measurement of HeLa cell lysate as detailed elsewhere14 and freely

available in the ProteomeXchange15 consortium via the PRIDE10 partner repository under the

accession PXD000674.

If you have any questions about PeptideShaker, please do not hesitate to contact the developers at

the PeptideShaker Google Group: http://groups.google.com/group/peptide-shaker.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 4: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

4

1.0 – Installation and Hardware Requirements

PeptideShaker is an open source project developed in Java under the very permissive Apache2

open source license. The complete source code, cross-platform executables and additional

information is available at http://peptide-shaker.googlecode.com. The software does not require

installation beyond unzipping and then double clicking the downloaded file, and works on

Windows, Linux and Mac platforms. (For the first execution on newer Mac platforms, control-

click on the file icon and then select "Open." This will provide the option to run the file

regardless of its (unidentified) source. Help on usage on Mac can be found on the PeptideShaker

website.)

The only prerequisite to run PeptideShaker is that Java is installed. However, due to the large

size of modern proteomics datasets and databases, the software performance will depend on the

amount of available memory, the more memory the better the performance. When creating a

new project, it is recommended to provide at least 4 GB of memory for smaller projects

(<100,000 spectra, <100,000 protein sequences), while bigger projects will be more memory

demanding. Working with less than 4 GB of memory is supported; import time will however be

substantially extended. Memory settings can be edited from the interface directly under “Edit” >

“Java Options” or via the Welcome Dialog under “Settings and Help”. Note that Java 32-bit does

not support high memory settings. It is therefore strongly recommended to work on 64-bit

machines – the standard for all recent computers. Java 32-bit is often installed by default on 64-

bit machines, and it is then preferable to instead install Java 64-bit.

Help with Java installation and usage can be found here: http://code.google.com/p/compomics-

utilities/wiki/JavaTroubleShooting. When not all information can be loaded into memory, the

tool will interact with locally stored data. This process will be substantially sped up on SSD discs.

In general, read/write operations are the main speed limiting steps; it is thus advised to operate

on SSD discs.

Although the creation of a PeptideShaker project is computationally demanding, its opening and

viewing does not require equally powerful hardware capabilities. It is thus possible for mass

spectrometry labs to create the project on a high performance machine, save it and share it to

end users who can then inspect it on standard desktop computers. It is also possible to create

projects automatically on servers and clusters via the command line options of the tool (see 9.0 -

Command Line Use).

Nature Biotechnology: doi:10.1038/nbt.3109

Page 5: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

5

PeptideShaker is started by double clicking the jar file in the unzipped downloaded file. The

PeptideShaker Welcome Dialog is then displayed, see Supplementary Figure 1. From this

dialog it is possible to: (1) Create a New Project; (2) Open a Saved Project; (3) Start a Search

using SearchGUI; (4) Reshake a PRIDE Project, i.e., reprocess a dataset in PRIDE; (5) Open an

Example Project; and (6) See the Getting Started Mini Tutorial.

Supplementary Figure 1: PeptideShaker Welcome Dialog. This dialog is displayed when starting the tool and

can be used to start the processing of a dataset (see text for details).

Nature Biotechnology: doi:10.1038/nbt.3109

Page 6: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

6

2.0 - Creating a New Project

After selecting “New Project”, the user sets up the new project as displayed Supplementary

Figure 2.

Supplementary Figure 2: New Project Dialog were the user defines the new PeptideShaker project.

The setup includes: (i) annotation of the project –very useful for later reuse and sharing; (ii)

selection of the input files; and (iii) editing of processing parameters. The processing parameters

include: (a) the settings used for the search; (b) the import filters; and (c) the import

preferences. These parameters are important as they impact the protein and peptide result set

and can thus not be modified after the project has been created. The search settings are the

settings used for the search, and for SearchGUI results these are automatically loaded when

selecting the search result files.

Using import filters allow the removal of Peptide to Spectrum Matches (PSMs). Given that

identification quality is known to depend on sequence length16, it is possible to filter out

short/long peptides. Filters on precursor mass deviation as suggested by Beausoleil et al.17 is

also supported. Finally, as some modifications are not recognized by all search engines, e.g.,

PTMs located on protein termini or PTMs targeting motifs of several amino acids. It is possible to

use a comprehensive search – targeting all termini or a single amino acid – and refine the results

a posteriori by using only the modifications of interest. This is achieved by filtering the PTMs not

matching the PeptideShaker PTM definition and is activated by clicking “Exclude Unknown

PTMs”.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 7: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

7

The processing parameters include initial False Discovery Rate (FDR) validation thresholds –

which can be altered after the project has been created (see 6.0 - Validating Proteins, Peptides

and PSMs) and PTM scoring options (see 7.0 - PTM Scoring).

Clicking “Load Data!” starts the processing of the files and the creation of the PeptideShaker

project.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 8: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

8

3.0 - PRIDE Reanalysis

The “PRIDE Reshake“ option allows any scientist to easily reprocess datasets in PRIDE without

requiring advanced bioinformatics skills. After clicking the “PRIDE Reshake“ button in the

Welcome Dialog, the user can choose to reanalyze either public or private datasets. (Private

datasets require the input of username and password details.) In both cases the user will see the

list of available projects with the associated assays and files as shown in Supplementary Figure

3.

Supplementary Figure 3: PRIDE data selection. At the top the project to reanalyze is selected. The assays and

associated data files are shown in the tables below.

The user can search for specific projects using the advanced Find feature as shown in

Supplementary Figure 4.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 9: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

9

When the user has located the project to reanalyze, the "Reshake PRIDE Data" button is clicked.

This will open the Reshake Settings Dialog shown in Supplementary Figure 5, where the user

can customize the properties of the reanalysis.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 10: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

10

After clicking the "Start the Reshaking!" button, PeptideShaker starts downloading the data

file(s) and extract the spectra and search settings. (Missing information can be manually added

later in SearchGUI.)

PeptideShaker then starts SearchGUI where the user can edit the search parameters as displayed

in Supplementary Figure 6.

Supplementary Figure 4: It is possible to edit the inferred parameters to accurately reproduce the original

search or completely change the context of the dataset.

The search can now be started and the import in PeptideShaker is directly triggered.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 11: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

11

4.0 - File Import

Search engines assign peptide candidates derived from the protein database to every spectrum,

providing a score illustrating the quality of the match: generally an e-value. PeptideShaker

supports different types of import formats: X!Tandem18 t.xml files implemented in the BioML

format19, OMSSA20 omx files, MS Amanda21 csv files, Mascot22 dat files, and the PSI mzIdentML23

format. The latter, notably, allows importing search results from MS-GF+24, and from virtually

any identification algorithm if the minimal information required for import in PeptideShaker is

present in the file. Details on mzIdentML and Mascot files requirements can be found on the

PeptideShaker website.

First, the search engine results are parsed using open source Java parsers that are all published

and actively maintained25-28. The results are then loaded into an open source search engine

independent structure29, allowing PeptideShaker to manipulate, save and open this large

amount of information. PeptideShaker takes advantage of the target/decoy strategy16 to convert

the scores of the search engines into Posterior Error Probability (PEP) values as commonly done

in proteomics30. Note that peptides mapping to both target and decoy proteins are excluded

from the import.

For every peptide candidate, the product of the search engine PEPs is given as score and the

best, i.e., the lowest, scoring peptide is picked as the best candidate. Given that search engines

are known to encounter difficulties at localizing PTMs, a peptide is here defined by its amino

acid sequence and the number of PTMs without accounting for their location. Amino acids with a

mass difference lower than the fragment ion tolerance are considered as undistinguishable. If

two peptide candidates score equally, they are discriminated by: (i) the occurrence of their

parent protein in the dataset; (ii) the number of search engines supporting the peptide; (iii) the

number of fragment ions annotated in the spectrum; and (iv) the precursor mass error.

In order to improve the identification rate, the list of PSMs resulting from the search engine

combination is separated into groups according to the run (i.e. spectrum file) and identified

charge. Groups are created for every spectrum file and for charges from the lowest to the highest

only if the group size does not compromise statistical accuracy31: a group size is considered

sufficient if more than 100 target hits are present before the first decoy hit and if the estimated

PEP resolution is lower than 1%. A PEP is then estimated for every PSM based on the target and

decoy hit distributions in the charge specific group.

From these PSMs, a list of peptides is established. When two peptides differ only in the PTM

localization, they are considered as separate peptide identification entities only if the PTMs are

Nature Biotechnology: doi:10.1038/nbt.3109

Page 12: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

12

confidently localized (see 7.0 - PTM Scoring). A peptide score, , the product of the

PSM PEPs, is attached to every peptide:

(1)

Where is the estimated PEP of the ith PSM identifying the considered peptide. The peptides

are then grouped according to their modification status, again, only when the size of the groups

allows it. The size criterion is the same as for the PSM groups: more than 100 target hits before

the first decoy hit and the estimated PEP resolution lower than 1%. A peptide level PEP is

estimated based on the target and decoy hit distributions in the modification specific group.

Using the FASTA file provided by the user, every peptide is mapped to the parent protein

sequences. Here again, indistinguishable amino acids are considered as such based on the

fragment ion accuracy. Moreover, when a protein sequence contains X's the mapping will be

ignored if X's make up more than 25% of the peptide sequence. Subsequently, protein ambiguity

groups are created based on peptide unicity as introduced by Nesvizhskii32, such that peptides

are unique to a group. PeptideShaker scores the protein groups using the product of the

estimated peptide PEPs:

(2)

Where is the estimated PEP of the ith peptide identifying the considered protein group.

When an ambiguity group presents a subset (for example group “Proteins A or B or C” is

identified as well as group “Protein A or B”), the complete group (“Proteins A or B or C” in

example) is considered as unlikely and ignored if: (i) the additional proteins in the complete

group (Protein C in example) are only supported by non-enzymatic peptides (when searching

with an enzyme); (ii) are uncharacterized proteins or proteins with lower evidence (UniProt

accessions only); or (iii) the subset (“Proteins A or B” in example) scores better and is hence

more likely to be found. In these cases, the peptides are assigned to the subset.

Finally, the PEP of every protein group is estimated by comparing the target and decoy

distributions; a representative protein is selected for every group based on the peptide

enzymaticity (when searching with an enzyme) and the protein evidence (UniProt accessions

only) and description; and the peptides of complete groups (“Proteins A or B or C” in example)

are linked to all subsets (“Proteins A or B” in example).

During the processing of the data, the progress, including tips and warnings, is shown to the user

as displayed in Supplementary Figure 7.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 13: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

13

Supplementary Figure 5: The Waiting Dialog displays progress of the project creation process and also

displays tips and warnings.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 14: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

14

5.0 - Results Navigation

When a project has been created, PeptideShaker’s main interface displays the identification

results in a clear and intuitive fashion, allowing the user to easily navigate even large datasets.

The interface is divided into nine interconnected tabs. By default, the results are displayed in the

Overview tab as shown in Supplementary Figure 8.

Supplementary Figure 6: The interface of PeptideShaker consists of nine interconnected tabs corresponding

to different use cases. The Overview tab displays extended information of the identification matches and

allows the user to intuitively navigate the identification results.

At the top of the Overview tab, a table displays detailed information about the protein ambiguity

groups identified. The Protein Inference (PI) informs the user about the protein inference status

of the protein ambiguity group using different colors: (i) green for single proteins; (ii) yellow for

groups of related proteins; (iii) orange for groups of related and unrelated proteins; and (iv) red

for groups of unrelated proteins only. Proteins are considered as unrelated if their associated

genes (UniProt accessions only) or descriptions differ. When clicking the colored rectangle, the

user can inspect the protein inference status of the protein group as displayed Supplementary

Figure 9.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 15: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

15

Supplementary Figure 7: The Protein Inference Dialog allows the user to inspect the protein ambiguity

groups, here consisting of 53 proteins considered as related by PeptideShaker (Histocompatibility antigens).

The first table displays the proteins matched with information about the gene, chromosome, protein evidence

and peptide enzymaticity. The dialog also displays eventual unique hits which can be related to this group as

here Q29960 which was found with a very low score, and other protein groups related to this group. Note

that the user can alter both the protein group label and the representative protein.

The three next columns in the Overview protein table provide information about the selected

protein representative: protein accession, description and chromosome number. Chromosome

number is available when the species is selected and is obtained from Ensembl12. The species

can be selected when creating the project or can be set later via the Edit menu. Clicking the

chromosome number provides additional information about the gene associated by UniProt to

the representative protein of the protein ambiguity group: Ensembl Gene ID, Gene Name,

Chromosome, and GO annotation as displayed Supplementary Figure 10.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 16: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

16

Supplementary Figure 8: When clicking the chromosome number in the protein table, gene information

about the selected protein is extracted from Ensembl and displayed to the user.

Next, the protein coverage is displayed. The detected coverage (colored) is compared to the

expected observable coverage (grey). The latter is estimated based on the size distribution of the

identified peptides and the maximal size allowed for enzymatic peptides. The two following

columns represent the number of identified and validated peptides and PSMs for the protein

group (see 6.0 - Validating Proteins, Peptides and PSMs).

Note how PeptideShaker takes full advantage of so-called sparklines (http://en.wikipedia.org/

wiki/Sparklines) to make the coverage and the results of the statistical analysis easier to

interpret. Sparklines are used throughout the PeptideShaker tables using our open source

JSparklines library (http://jsparklines.googlecode.com), greatly enhancing the results

inspection.

Subsequently, a spectrum counting index is displayed. Although PeptideShaker is dedicated to

identification, it does provide spectrum counting metrics that allow for a rough estimation of

protein abundances directly from identification results33. PeptideShaker comes with a version of

the emPAI index34 and an improved version of the NSAF index35, chosen as the default option for

its accuracy36. NSAF is a simple and efficient method where the number of spectra for a

given protein is normalized by the protein length :

(3)

Nature Biotechnology: doi:10.1038/nbt.3109

Page 17: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

17

It should be noted that a major issue with most spectrum counting indexes is that they do not

take into account protein inference issues and cleavage efficiency. PeptideShaker thus

implements an improved version of the NSAF index, where the contribution of the ith validated

PSM is weighted by a protein inference coefficient

where is the number of protein

ambiguity groups where the matched peptide is included. Note that if a peptide is redundant in

the sequence of a representative protein, is increased accordingly. Moreover, the observable

length of the protein, as used for the observable coverage, , is used, thus discarding all

domains of the sequence which cannot generate detectable peptides:

(4)

Towards the right, the molecular weight (MW) of the representative protein is shown, and

finally the confidence attached to the protein group and its validation status is displayed.

When a protein group is selected, the peptides mapping to the selected group are displayed. As

for the proteins, the protein inference status of the peptide is color coded and detailed

information is accessible by clicking the colored rectangle. The peptide sequence, followed by

the peptide's location in the representative protein of the protein group is shown next. Note how

the graphical representation of the peptide localization in the protein sequence allows for

intuitive interpretation and that PTMs are color coded in the sequence. The use of white font on

a colored background indicates a confident PTM localization, while a colored font on a white

background indicates a non-confident PTM localization. The PTM color coding can be edited in

the search parameters. Finally, the results of the target/decoy statistical processing are

displayed as in the protein table.

Similarly, when a peptide is selected, the PSMs mapping to the peptide are displayed. The

colored coded column (SE) shows the agreement of the search engines for the given spectrum,

and clicking the colored rectangles will show the details for each search engine for the given

spectrum. Subsequently, detailed information about the match is displayed: sequence with color

coded PTMs, charge, precursor mass error, confidence, and validation status.

The spectrum corresponding to the selected PSM is displayed at the bottom right with user

customizable fragment ion annotation. The three spectrum sub plots above the spectrum make it

easier to assess the quality of the match: (i) the intensity of every fragment ion is displayed

relative to the peptide sequence for forward (blue, down) and rewind (red, up) ions; (ii) a

histogram of the intensities of the annotated (green) and the non-annotated (grey) peaks; and

(ii) the fragment ion m/z error is plotted against the peak m/z for forward (blue) and rewind

(red) ions. The latter allows for straightforward detection of calibration issues. The user can

Nature Biotechnology: doi:10.1038/nbt.3109

Page 18: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

18

fully customize the spectrum annotation and display like color and appearance of the peaks,

benefit from the advanced annotation as shown in Supplementary Figure 11, and export the

plots in publication-grade quality.

Supplementary Figure 9: A spectrum as displayed in the Overview tab. The intensity of every fragment ion is

displayed relative to the peptide sequence at the top left for forward (blue, down) and rewind (red, up) ions.

In the top middle, a histogram of the intensities of the annotated (green) and the non-annotated (grey) peaks

is displayed. And on the top right the fragment ion m/z error is plotted against the peak m/z for forward

(blue) and rewind (red) ions. The spectrum is displayed with fully customizable annotation as exemplified

here by the overlay of automated de novo sequencing using the selected PSM.

PeptideShaker also provides visualizations of multiple PSMs. When selecting different PSMs

simultaneously, spectra are displayed in a so-called planetary system view where the x-axis

represents the m/z of the spectrum, the y-axis the m/z error and the size of the data point

represents the intensity of the peak. This display makes it easy to spot outliers or mass

calibration issues as displayed in Supplementary Figure 12.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 19: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

19

Supplementary Figure 10: The planetary system view allows the comparison of multiple measurements

(PSMs) of the same peptide within the same plot. For every fragment ion of every PSM matched in the

respective spectrum, the x-axis represents the m/z of the annotated peak, the y-axis the m/z error and the

size of the data point represents the intensity of the peak. This display makes it easy to inspect the spectrum

reproducibility and detect outliers or mass deviation issues. Here the second PSM in blue seems to deviate

from the others, showing a typical ppm error.

The annotated peaks can also be visualized in a table as displayed in Supplementary Figure 13.

Supplementary Figure 11: For a given PSM, the fragment ion matches can be plotted in a table conveniently

showing the m/z values of the detected ions.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 20: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

20

Selecting multiple PSMs shows the reproducibility of the spectrum acquisition displayed in

intensities using the summed spectrum intensity for normalization as displayed in

Supplementary Figure 14.

Supplementary Figure 12: When selecting multiple PSMs, the intensities of the matched peaks can be

displayed with error bars indicating the variability of spectrum acquisitions.

At the bottom of the Overview panel, the sequence of the representative protein of the selected

protein group is displayed, where colors represent the areas of the sequence covered by the

experiment, and grey the coverable areas of the sequence according to the chosen enzyme and

identified peptide sizes distribution. Green, yellow and red indicate the areas of the sequence

covered by confident, doubtful and not validated peptide matches (see 6.0 - Validating

Proteins, Peptides and PSMs). As displayed in Supplementary Figure 15, the PTMs are also

localized and color coded, and clicking on an area of the sequence can be used to select the given

peptide.

Supplementary Figure 13: The protein coverage panel displays the representative protein of the selected

protein group, as demonstrated here with protein of accession P49588 of the example dataset of

PeptideShaker (968 amino acids). Here 17.56% of an expected 91.84% coverage is observed as displayed in

color and grey, respectively. 12.6%, 3.2%, and 1.76% of the coverage was achieved using confident, doubtful,

and not validated peptide matches (green, yellow and red areas), respectively. The currently selected peptide

is displayed in blue and the modifications are color coded (here blue for oxidation of methionine). When

clicking in the sequence, as here done between amino acids 880 and 899 the identified peptides can be

selected. Here the peptide MHSPQTSAMLFTVDNEAGK was found with and without oxidation of methionine 9.

According to the PTM localization scores (see 7.0 - PTM Scoring), the localization of the oxidation is

confident as indicated by the colored background.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 21: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

21

The other PeptideShaker tabs cover other use cases with the same focus on intuitive navigability

of the identification results: (i) the Spectrum IDs tab (Supplementary Figure 16) allows the

user to compare the results of the different search engines; (ii) the Fractions tab

(Supplementary Figure 17) displays the contributions of different fractions to the final results;

(iii) the Modifications tab (Supplementary Figure 18) makes it possible to browse peptides

carrying certain variable modifications and inspect the results of the localization scoring

algorithms; (iv) the 3D Structures tab (Supplementary Figure 19) maps the identification

results onto 3D structures from PDB; (v) the Annotation tab (Supplementary Figure 20)

connects the identification results to various external resources like Reactome37 or STRING38;

(vi) the GO analysis tab (Supplementary Figure 21) displays Gene Ontology statistics of the

dataset; (viii) the Validation tab (Supplementary Figure 22) allows the inspection of the

target/decoy results and adapting the validation threshold; and (ix) QC Plots tab

(Supplementary Figure 23) displays different quality controls metrics on the identified

proteins, peptides and PSMs.

For further details on the tabs see the figure legends below the figures on the following pages.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 22: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

22

Supplementary Figure 14: The Spectrum IDs tab allows the user to compare the results of the different search

engines. The table at the top shows the identification results retained by PeptideShaker for a given spectrum

file with extended information on the PSM retained for every spectrum. The center-left panel shows the

match retained by PeptideShaker followed by a table listing all matches from all identification algorithms

including secondary hits. The confidence a secondary hit would have obtained if retained as first hit is

displayed allowing hits comparison accross search engines. The center-left panel displays the spectrum of the

selected PSM, it is thus possible to inspect the spectrum annotation of a secondary hit. At the bottom, several

plots display identification algorithms performance with respect to the results displayed in PeptideShaker,

alowing straightforward control of the performance of all algorithms. From the left to the right: (1) the

number of PSMs a given algorithm would provide if used alone, (2) the number of PSMs of the merged dataset

which can be ascribed to a single algorithm only, (3) the number of unassigned spectra a given algorithm

would lead to if used alone, and (4) the identification yield a given algorithm would lead to if used alone. Note

that these metrics are biased toward search engine agreement by the strategy used to select the best match

of every spectrum.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 23: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

23

Supplementary Figure 15: If loading mulitple spectrum files, the Fractions tab indicates in which spectrum

files the different proteins would be validated if not found in the other files. Various plots are available to

visualize the distrubution of each protein across the spectrum files, e.g., the number of peptides per spectrum

file as shown here. It should be noted that the plots only give an indication of the protein distrubtion across

the spectrum files, mainly for inspecting fractionated data, and should not be used as a comparison of distinct

experiments where a more advanced processing is required39.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 24: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

24

Supplementary Figure 16: The Modifications tab supports the browsing of the peptides carrying variable

modifications. The modification of interest is selected in the top left and the modified peptides of the selected

category are displayed in the top right table. When a modified peptide is selected, PeptideShaker looks for

any related peptides where related peptides carry other modifications or present different cleavage sites.

Here, a phosphorylated peptide is selected, and the same version with no modification and with an oxidation

of methionine is found. The confidence in the PTM localization is color coded and the results of the

localization scores are displayed on the sequence. More information on the localization is available when

clicking in the PTM column or by selecting the desired score tab under the spectrum.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 25: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

25

Supplementary Figure 17: The 3D Structures tab maps the identified peptides onto the 3D structure of the

protein from the PDB13 using Jmol, an open-source Java viewer for chemical structures in 3D

(http://www.jmol.org). Identified peptides are shown in green, the selected peptide in blue, and the PTMs are

color coded and annotated on the structure.

Supplementary Figure 18: The Annotation tab allows querying various external resources with results from

PeptideShaker. Using the protein accession number, the external resources can be directly searched as

illustrated here with a Reactome pathway and a STRING interation network.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 26: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

26

Supplementary Figure 19: The GO Analysis tab provides Gene Ontology statistics on the validated proteins,

highlighting the terms that are significantly more or less frequent in the given dataset compared to the

annotation of the species in Ensembl using a hypergeometric test.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 27: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

27

Supplementary Figure 20: The Validation tab allows the user to inspect the target/decoy results for proteins,

peptides and PSMs as selected in the top left table. False positive and negative rates are displayed and can be

optimized in an intuitive cost/benefit way.

Supplementary Figure 21: the QC Plots tab allows the user to quality control several parameters of the

protein peptide and PSM identification. Here for example, the distribution of the precursor mass error is

displayed, and it is then easy to see the instrument resolution.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 28: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

28

6.0 - Validating Proteins, Peptides and PSMs

PeptideShaker takes full advantage of the target/decoy strategy to allow the user to extensively

control the quality of the identifications. As already mentioned, the confidence in identification

matches is displayed as the complement of the PEP:

(5)

As an example, 100 matches with a confidence of 90% contain 10 false positives.

In proteomics, identification results are typically validated at a given False Discovery Rate

(FDR). The FDR indicates the share of false positive matches in the result set:

(6)

Where represents the count of target false positives and N the number of retained target

hits. As an example, a result set of 100 matches with an FDR of 1% will contain only one false

positive.

The number of false positives can be equivalently estimated via the number of decoy hits or

using the expectation value of the PEP . PeptideShaker hence provides two estimations of the

FDR:

(7)

(8)

By default the classical estimator of equation (7) is used, the user can switch to the probabilistic

FDR in the validation tab to estimate the FDR using the posterior error probability (8). Similarly,

the number of false negatives, , can be estimated by integrating the confidence.

Complementarily to the FDR, the False Negative Rate (FNR) of the identification process is hence

also estimated:

(9)

Where is the estimated number of true positives in the imported dataset. As an example,

when 100 hits are validated at an FNR of 20%, one can expect a total of 125 true positive hits

among which 25 were rejected by the threshold: with an FNR threshold of 1%, one covers 99%

of the possible true positive identifications loaded in PeptideShaker.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 29: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

29

PeptideShaker allows the user to intuitively edit the validation thresholds in the Validation tab

and whether a match is validated or not is shown throughout the display. The validation process,

a balance between quality and quantity, between FDR and FNR, is crucial for all downstream

analyses.

The accuracy of the false positive rate estimations was benchmarked by searching a pyrococcus

furiosus dataset against a database consisting of the concatenation of pyrococcus furiosus

sequences with the eukaryota complement of the UniProt/SwissProt database40, downloaded on

the 21st of October 2013, 181,026 (target) sequences, including the reversed version of every

sequence as decoy proteins. In this setup, eukaryote sequences (excluding known contaminants)

can be considered as false identifications while pyrococcus furiosus sequences can be considered

as correct matches, hence allowing the verification of target/decoy derived error rates41.

Peak lists obtained in41 were searched using OMSSA20 version 2.1.9, X!Tandem18 version

Sledgehammer (2013.09.01.1), MS Amanda21 version 1.0.0.3120 and MS-GF+24 version Beta

(v10024) (5/9/2014). The search was conducted using SearchGUI9 version 1.18.7. The

identification settings were as follows: Trypsin with a maximum of 2 missed cleavages; 10 ppm

as MS1 and 0.5 Da as MS2 tolerances; fixed modifications: Carbamidomethylation of Cys

(+57.021464 Da), variable modifications: Oxidation of Met (+15.994915 Da), Phosphorylation of

Ser, Thr and (+79.966331 Da). All algorithms specific settings were left to the default of

SearchGUI. The mass spectrometry data along with the identification results have been

deposited to the ProteomeXchange Consortium15 via the PRIDE partner repository10 with the

dataset identifier PXD001077.

As demonstrated in Supplementary Figures 24 to 26, the error rate estimated by

PeptideShaker accurately tracks the actual error rate for PSMs, peptides and proteins. As

discussed in the literature41-43, marginal underestimation of the error rate estimation can be

ascribed to the second refinement procedure of X!Tandem.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 30: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

30

Supplementary Figure 22: A pyrococcus furiosus dataset was searched against a concatenation of eukaryota

and pyrococcus furiosus sequences. The number of retained decoy proteins is plotted against the number of

identified eukaryote proteins at increasing protein score. In this setup, the number of eukaryota proteins

indicates the number of false identifications.

Supplementary Figure 23: Similarly as Supplementary Figure 24, the number of retained decoy peptides is

plotted against the number of identified eukaryote peptides at increasing peptide score. In order to increase

the identification rate, peptides were separated into two groups: modified and unmodified peptides. If a

category of modified peptides is substantially enriched, a standalone group is automatically created by

PeptideShaker (see main text for details).

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

0 500 1,000 1,500 2,000

Nu

mb

er o

f D

eco

y P

rote

ins

Number of Eukaryota Proteins

Proteins

y=x

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 500 1000 1500 2000

Nu

mb

er o

f D

eco

y P

ep

tid

es

Number of Eukaryota Peptides

unmodified peptides

modified peptides

peptides

y=x

Nature Biotechnology: doi:10.1038/nbt.3109

Page 31: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

31

Supplementary Figure 24: Similarly as Supplementary Figure 25, the number of retained decoy PSMs is

plotted against the number of identified eukaryote PSMs at increasing PSM score. In order to increase the

identification rate, PSMs were separated into three groups: PSMs identified with a charge of 2+, 3+ and with

charge >3+ (see main text for details).

In addition to the statistical validation, one generally expects proteins to be identified with at

least two confident peptides in large scale proteomic shotgun experiments (note that this does

not apply for dataset enriched for specific species like modified peptides). Similarly as Peptizer

for PSMs44, PeptideShaker therefore inspects all the validated matches using quality filters and

doubtful matches are flagged by a yellow warning icon throughout the display. Quality filters are

applied to all PSMs, peptide and protein levels. Since confident peptides require confident PSMs,

and in turn confident proteins require confident peptides, the quality of the identification

propagates from PSM to peptide and protein levels providing the user with stringently quality

controlled results. When the database or the dataset does not allow for reliable statistical

estimation (e.g., when searching small databases or identifying a low number of proteins) the

validation status is similarly marked as doubtful.

As displayed in Supplementary Figure 27, clicking on the yellow warning icon opens a dialog

with details on the validation status and allows the user to set the status of a validated match as

confident or doubtful.

0

500

1,000

1,500

2,000

2,500

3,000

0 500 1,000 1,500 2,000 2,500 3,000

Nu

mb

er

of

De

coy

PSM

s

Number of Eukaryota PSMs

2+ PSMs

3+ PSMs

4+ and 5+ PSMs

PSMs

y=x

Nature Biotechnology: doi:10.1038/nbt.3109

Page 32: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

32

Supplementary Figure 25: In addition to the statistical validation, PeptideShaker inspects all identification

matches using quality filters. Doubtful matches are marked with a yellow warning icon. Clicking on the icon

opens a dialog allowing the inspection of the validation procedure for this match. Details concerning the

database, the target/decoy scoring and results, and the inspection by quality filters. Here, the protein

identification was supported by only one confident peptide and only one confident spectrum and was thus

marked as doubtful.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 33: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

33

7.0 - PTM Scoring

In proteomics, there are two main paradigms for PTM localization scoring45: (i) using the

identification algorithm score difference between assumptions carrying PTMs at different sites

as a proxy to infer the quality of the localization; or (ii) using probabilistic scores to estimate the

probability that a site is actually modified, these scores are inferred from the original spectra

and are independent from the identification algorithm. PeptideShaker implements the widely

used MD-score46 and its multiple search engine equivalent, the D-score47. Complementarily, two

probabilistic scores are implemented, the A-score17 and PhosphoRS48. Although originally

designed for phosphorylation only, these scores are estimated for every variable modification

which can take different sites in a peptide. Notably, the peak annotation method used for these

scores is the same as the one used to annotate the spectra in the interface, thus allowing visual

inspection of site determining ions in spectra. During the scoring, neutral losses of the same

mass as the PTM itself are not taken into account for spectrum annotation. In our experience,

disabling all neutral losses (e.g. H2O and NH3) increases the discrimination power of the

probabilistic scores.

As mentioned in the project creation section, the selection of the probabilistic score is done in

the processing parameters dialog. There, it is also possible to disable neutral losses annotation

for the scores and set a score threshold. Finally, one can enable an automated threshold which

will be calibrated to a 99% agreement with the D-score. Whenever a peptide presents more

modification sites than detected modifications and the probabilistic score passes the threshold,

the modification is marked as Confident. It is labeled as Doubtful if it does not pass the threshold

and as Random if different modification sites score equally. If no probabilistic score is calculated,

a D-score of 95% is used as the threshold.

Note that the A-score was only defined for singly modified peptides. The A-score was also

established for spectra searched with a tolerance of ±0.5 m/z thus using a subdivision in

windows of 100 m/z: the window size equals 100 times the fragment mass resolution. In our

implementation, we kept the window size and tolerance to this original ratio. As a result, for

data searched with a tolerance of 0.02 m/z the spectrum will be subdivided into windows of

4 m/z and not 100 m/z as done in PhosphoRS48. PhosphoRS was implemented according to its

original publication48.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 34: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

34

8.0 - Reports, Follow Up Analyses and Submissions to PRIDE

All the generated data can be exported to text files which can subsequently be imported into

external tools like Excel or Perseus (http://www.perseus-framework.org). Contextual export

options are additionally available from most tables or displays. Identification details can also be

exported under the Export > Identification Features menu, where the user can select information

about protein, peptide, PSM or search engine specific result to be exported to a text file. A

phosphorylation oriented summary is also available. Finally, fully customizable reports can

export virtually any information from PeptideShaker.

Five reports are available by default: (i) Certificate Analysis: all search and processing

parameters with statistics on the dataset – a crucial feature for service providers and

publications; (ii) Default PSM Report: all identification information at the PSM level; (iii) Default

Peptide Report: all identification information at the peptide level; (iv) Default Protein Report: all

identification information at the protein level; and (v) Default Hierarchical Report: all

identification results presented in a hierarchical manner: protein > peptide > PSM. Note that the

user can create and customize his/her own reports as displayed in Supplementary Figure 28.

Documentation with description of the report content can also be exported for every report.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 35: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

35

Supplementary Figure 26: In the Report Dialog the user can select the identification features to export. These

include Annotation Settings, Input Filters, Proteins, Peptides, PSMs, PTMs, Fragment Ion Information, Project

Details, Search Parameters, Spectrum Counting Settings and Target/Decoy Validation Summary. From each

of these category, the user can select the wanted elements as displayed here for proteins - ranging from high

level information (accession, chromosome, sequence coverage, PTM mapping, etc.) to detailed mass

spectrometry results, like detected fragment ion m/z error.

PeptideShaker also offers various exports for post processing of the identifications, as displayed

in Supplementary Figure 29. These are available under the Export > Follow Up Analysis menu.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 36: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

36

Supplementary Figure 27: The Follow Up Analysis options in PeptideShaker include: (i) export of the spectra

of non-validated matches or recalibrated spectra (at the MS1 and/or MS2 level) of the validated matches; (ii)

export of accessions and sequences of validated or not validated proteins; (iii) export to the popular

Progenesis LC-MS label free quantification software; (iv) export to graph databases like Cytoscape49; (v)

export of inclusion/exclusion lists for various instruments; and (vi) export of libraries in the SWATH format

as specified by the manufacturer.

Notably, the recalibration feature can be used when calibration issues are detected (as

mentioned in the Results Navigation section) as for instance often encountered on Time Of Flight

(TOF) instruments.

Also note that the export to graph databases provides a unique intuitive view on complex

datasets with a graphical approach to the protein inference problem as displayed in

Supplementary Figure 30.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 37: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

37

Supplementary Figure 28: By supporting export to graph database formats like Cytoscape, PeptideShaker

allows for an intuitive visalization of protein inference problems, here represented by two examples from the

example dataset - proteins in red, peptides in blue. On the left, an ideal case where different peptides map

uniquely to a single protein accession. On the right, individual proteins are supported both by unique and

shared peptides.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 38: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

38

PeptideShaker offers the possibility to export a summary of the methods used for identification

under the Export > Methods Section menu. A text is automatically generated and can serve as a

basis for inclusion in the methods section of manuscripts as illustrated in Supplementary

Figure 31 (see section 6.0 - Validating Proteins, Peptides and PSMs for an example of

application).

Supplementary Figure 29: PeptideShaker generates automatically a draft of the methods used for protein

identification. This text can help for manuscript writing.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 39: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

39

The PeptideShaker project can be exported in various forms via the Export > PeptideShaker

Project As menu: (i) Zip File: a zip file containing the saved project and all related files, (ii)

mzidentML: the identification results in the PSI50 mzidentML23 format, and (iii) PRIDE XML: the

peak lists and identification results in the standard PRIDE XML format. When exporting the

entire project, the zip file can be shared and opened in the interface upon unzipping. Both

mzidentML and PRIDE XML files allow direct submission to PRIDE via ProteomeXchange

(http://www.proteomexchange.org) within a few clicks.

When creating an mzIdentML file, a dialog allows the annotation of the project as displayed in

Supplementary Figure 32. Subsequently, a valid, fully annotated and very close to MIAPE51

compliant mzidentML file is created.

Supplementary Figure 30: The mzidentML Export Dialog makes is easy to annotate and export mzIdentML

files that can readily be submitted to PRIDE via ProteomeXchange.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 40: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

40

Similarly, when creating a PRIDE XML file, the user is guided through the adding of the required

meta data annotation by a user friendly interface as displayed in Supplementary Figure 33.

The generated file is comprehensively annotated by information available in PeptideShaker and

by user input using controlled vocabulary, facilitated by the Ontology Lookup Service52.

Supplementary Figure 31: The PRIDE Export Dialog makes is easy to create well-annotated PRIDE XML file

that can readily be submitted to PRIDE via ProteomeXchange.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 41: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

41

9.0 - Command Line Use

PeptideShaker can also be used via the command line and hence run in automated batch mode.

Different command line modes are available: (i) PeptideShakerCLI: process identification files

and saves the project; (ii) ReportCLI: take a saved project as input and export the results as

default reports or as custom reports; (iii) FollowUpCLI: take a saved project as input and export

the previously described follow up features; (iv) MzidCLI: takes a saved project as input with the

relevant annotation and exports an mzidentML file.

Note that all command line options for ReportCLI, FollowUpCLI and MzidCLI can also be used in

the PeptideShakerCLI mode. Detailed information about the parameters can be found on the

PeptideShaker website (http://code.google.com/p/peptide-shaker/wiki/PeptideShakerCLI).

Notably, the PeptideShaker command line version has already been included in Galaxy53 by an

independent third party (https://bitbucket.org/galaxyp/peptideshaker).

Nature Biotechnology: doi:10.1038/nbt.3109

Page 42: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

42

10.0 - Documentation, Help, Support and Updates

Contextual help is available everywhere in the interface in the form of question marks as

displayed in Supplementary Figure 34.

Supplementary Figure 32: Question marks are present everywhere in the interface triggering contextual help.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 43: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

43

The help links to external resources, publications and additional information on the

PeptideShaker website. Beginners can also have a look at our general protein identification

tutorials14: http://compomics.com/bioinformatics-for-proteomics/identification. All these

resources are kept up-to-date with the development process of the software.

For other questions there is also an active discussion group: http://groups.google.com/group/

peptide-shaker.

Creating bug reports is easy via the Bug Report Dialog, Help > Bug Report, as displayed in

Supplementary Figure 35. Please use the issue tracker at the PeptideShaker web page to

report issues.

Supplementary Figure 33: If encountering a problem the user is directed to online help directly from the

interface, or a bug report with details can be sent to the developers for faster bug fixing.

New versions are regularly released including bug fixes and new features. If an internet

connection is available, the user is notified and an auto-update is proposed. Changes are

documented for every version on the PeptideShaker website in the Release Note wiki

(https://code.google.com/p/peptide-shaker/wiki/ReleaseNotes) and announced on the

mailing list.

Nature Biotechnology: doi:10.1038/nbt.3109

Page 44: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

44

References

1. Vaudel, M., Sickmann, A. & Martens, L. Current methods for global proteome identification. Expert review of proteomics 9, 519-532 (2012).

2. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature biotechnology 26, 1367-1372 (2008).

3. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical chemistry 74, 5383-5392 (2002).

4. Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 1150-1159 (2010).

5. Kohlbacher, O. et al. TOPP--the OpenMS proteomics pipeline. Bioinformatics 23, e191-197 (2007).

6. Bertsch, A., Gropl, C., Reinert, K. & Kohlbacher, O. OpenMS and TOPP: open source software for LC-MS data analysis. Methods in molecular biology 696, 353-367 (2011).

7. Sturm, M. et al. OpenMS - an open-source software framework for mass spectrometry. BMC bioinformatics 9, 163 (2008).

8. Ma, Z.Q. et al. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. Journal of proteome research 8, 3872-3881 (2009).

9. Vaudel, M., Barsnes, H., Berven, F.S., Sickmann, A. & Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11, 996-999 (2011).

10. Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 3537-3545 (2005).

11. Juan A Vizcaíno et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32, 223–226 (2014).

12. Flicek, P. et al. Ensembl 2011. Nucleic acids research 39, D800-806 (2011). 13. Sussman, J.L. et al. Protein Data Bank (PDB): database of three-dimensional structural

information of biological macromolecules. Acta crystallographica. Section D, Biological crystallography 54, 1078-1084 (1998).

14. Vaudel, M. et al. Shedding light on black boxes in protein identification. Proteomics 14, 1001-1005 (2014).

15. Vizcaino, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotech 32, 223-226 (2014).

16. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for mass spectrometry-based proteomics. Methods in molecular biology 604, 55-71 (2010).

17. Beausoleil, S.A., Villen, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nature biotechnology 24, 1285-1292 (2006).

18. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466-1467 (2004).

19. Fenyo, D. The Biopolymer Markup Language. Bioinformatics 15, 339-340 (1999). 20. Geer, L.Y. et al. Open mass spectrometry search algorithm. Journal of proteome research 3,

958-964 (2004). 21. Dorfer, V. et al. MS Amanda, a Universal Identification Algorithm Optimized for High

Accuracy Tandem Mass Spectra. J Proteome Res (2014). 22. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein

identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-3567 (1999).

23. Jones, A.R. et al. The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & cellular proteomics : MCP 11, M111 014381 (2012).

Nature Biotechnology: doi:10.1038/nbt.3109

Page 45: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

45

24. Kim, S. et al. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Molecular & cellular proteomics : MCP 9, 2840-2852 (2010).

25. Helsens, K., Martens, L., Vandekerckhove, J. & Gevaert, K. MascotDatfile: an open-source library to fully parse and analyse MASCOT MS/MS search results. Proteomics 7, 364-366 (2007).

26. Barsnes, H., Huber, S., Sickmann, A., Eidhammer, I. & Martens, L. OMSSA Parser: an open-source library to parse and extract data from OMSSA MS/MS search results. Proteomics 9, 3772-3774 (2009).

27. Muth, T., Vaudel, M., Barsnes, H., Martens, L. & Sickmann, A. XTandem Parser: an open-source library to parse and analyse X!Tandem MS/MS search results. Proteomics 10, 1522-1524 (2010).

28. Griss, J., Reisinger, F., Hermjakob, H. & Vizcaino, J.A. jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats. Proteomics 12, 795-798 (2012).

29. Barsnes, H. et al. compomics-utilities: an open-source Java library for computational proteomics. BMC bioinformatics 12, 70 (2011).

30. Nesvizhskii, A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of proteomics 73, 2092-2123 (2010).

31. Vaudel, M., Burkhart, J.M., Sickmann, A., Martens, L. & Zahedi, R.P. Peptide identification quality control. Proteomics 11, 2105-2114 (2011).

32. Nesvizhskii, A.I. & Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Molecular & cellular proteomics : MCP 4, 1419-1440 (2005).

33. Vaudel, M., Sickmann, A. & Martens, L. Peptide and protein quantification: a map of the minefield. Proteomics 10, 650-670 (2010).

34. Ishihama, Y. et al. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4, 1265-1272 (2005).

35. Paoletti, A.C. et al. Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proc Natl Acad Sci U S A 103, 18928-18933 (2006).

36. Colaert, N., Gevaert, K. & Martens, L. RIBAR and xRIBAR: Methods for reproducible relative MS/MS-based label-free protein quantification. J Proteome Res 10, 3183-3189 (2011).

37. Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic acids research 39, D691-697 (2011).

38. von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic acids research 31, 258-261 (2003).

39. Vaudel, M., Sickmann, A. & Martens, L. Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics. Biochimica et biophysica acta 1844, 12-20 (2014).

40. Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic acids research 32, D115-119 (2004).

41. Vaudel, M. et al. A complex standard for protein identification, designed by evolution. Journal of proteome research 11, 5065-5071 (2012).

42. Everett, L.J., Bierl, C. & Master, S.R. Unbiased statistical analysis for multi-stage proteomic search strategies. Journal of proteome research 9, 700-707 (2010).

43. Bern, M. & Kil, Y.J. Comment on "Unbiased statistical analysis for multi-stage proteomic search strategies". Journal of proteome research 10, 2123-2127 (2011).

44. Helsens, K., Timmerman, E., Vandekerckhove, J., Gevaert, K. & Martens, L. Peptizer, a tool for assessing false positive peptide identifications and manually validating selected results. Molecular & cellular proteomics : MCP 7, 2364-2372 (2008).

45. Chalkley, R.J. & Clauser, K.R. Modification site localization scoring: strategies and performance. Molecular & cellular proteomics : MCP 11, 3-14 (2012).

Nature Biotechnology: doi:10.1038/nbt.3109

Page 46: PeptideShaker enables reanalysis of mass spectrometry ... · amount of available memory, the more memory the better the performance. When creating a new project, it is recommended

Vaudel et al.: PeptideShaker - Supplementary Note 1

46

46. Savitski, M.M. et al. Confident phosphorylation site localization using the Mascot Delta Score. Molecular & cellular proteomics : MCP 10, M110 003830 (2011).

47. Vaudel, M. et al. D-score: a search engine independent MD-score. Proteomics 13, 1036-1041 (2013).

48. Taus, T. et al. Universal and confident phosphorylation site localization using phosphoRS. Journal of proteome research 10, 5354-5362 (2011).

49. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 13, 2498-2504 (2003).

50. Orchard, S., Hermjakob, H. & Apweiler, R. The proteomics standards initiative. Proteomics 3, 1374-1376 (2003).

51. Taylor, C.F. et al. The minimum information about a proteomics experiment (MIAPE). Nature biotechnology 25, 887-893 (2007).

52. Barsnes, H., Cote, R.G., Eidhammer, I. & Martens, L. OLS dialog: an open-source front end to the ontology lookup service. BMC bioinformatics 11, 34 (2010).

53. Goecks, J., Nekrutenko, A., Taylor, J. & Galaxy, T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11, R86 (2010).

Nature Biotechnology: doi:10.1038/nbt.3109