Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn...

8
Supplementary Figures Surpassing 10,000 identified and quantified proteins in a single run by optimizing current LC-MS instrumentation and data analysis strategy Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias Treiber, Roland Bruderer and Lukas Reiter‡ ‡contact: [email protected] Electronic Supplementary Material (ESI) for Molecular Omics. This journal is © The Royal Society of Chemistry 2019

Transcript of Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn...

Page 1: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figures

Surpassing 10,000 identified and quantified proteins in a single run by optimizing current LC-MS instrumentation and

data analysis strategy

Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias

Treiber, Roland Bruderer and Lukas Reiter‡

‡contact: [email protected]

Electronic Supplementary Material (ESI) for Molecular Omics.This journal is © The Royal Society of Chemistry 2019

Page 2: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figure 1: Optimization of Chromatography. (A) Comparison of 3solid phases using 2h gradients and a HeLa sample. The DIA method was optimisedto the chromatographic peak width. DIA data were analysed using a project-specificHeLa library. (B) Comparison of gradient lengths using a HeLa sample. The DIAmethod was optimised to the chromatographic peak width. DIA data were analysedusing a project-specific HeLa library. Three run replicates were acquired to calculateaverage identifications per run including standard deviation and identification with aCV < 20%.

B

IDs with a quantitative CV < 20% average IDs

0

50,000

100,000

150,000

200,000

Gradient Length

2h 4h 6h 8h

IDs

Precursors

0

50,000

100,000

150,000

Peptides

Gradient Length

2h 4h 6h 8h

0

2,000

4,000

6,000

8,000

Gradient Length

2h 4h 6h 8h

+7% +2%

+11% +3%-3%

p=3e-8 p=4e-6 p=3e-3+26%

+8%+6%

+21%+7%

-5%

p=2e-6p=3e-3

p=0.09+32%+8%

+6%

+21%+7%

-12%

p=9e-7

p=6e-3p=0.11

Proteins+1%

119,3

91

117,2

26

117,2

37

117,3

91

117,8

27

117,8

59

124,5

50

125,7

05

124,4

28

87,3

78

86,8

00

85,4

09

87,5

45

87,6

42

86,7

89

92,1

12

92,9

47

91,6

17

Precursors Peptides Proteins

0

40,000

80,000

120,000

IDs

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

200nl

250nl

300nl

0

2,000

4,000

6,000

6,4

49

6,4

37

6,4

24

6,4

77

6,4

78

6,4

67

6,5

18

6,5

45

6,5

34

A

BEH CSH ReproSil BEH CSH ReproSil BEH CSH ReproSil

Page 3: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figure 2: Library Generation Times using Search Archives. TheDirectDIA analysis comprised 6 runs, the project library 40 runs and the resourcelibrary 2,206 runs. The hybrid libraries were combination of the DirectDIA analysiswith the respective library. The processing times can vary based on computerconfiguration and other processes running at the same time.

0

50

100

150

200

250

DirectD

IA

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e-L

ibra

ry

Resourc

e-H

ybrid

Lib

rary

Gen

era

tio

n T

ime [

h]

13.5 h0.8 h

15.5 h1.2 h

30 h

1.8 h

235 h

14 h

250 h

17 h

raw file

search archive

Page 4: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figure 3: Overview of Hela Libraries. (A) Library size of theproject-specific and the hybrid HeLa library. DDA and DIA data were searched usingPulsar (SpectroMine). FDR on precursor, peptide and protein level was set to 1%.Libraries were generated in SpectroMine using search archives. (B) Overlap inquantified precursors, peptides and proteins from DirectDIA, project-library andproject-hybrid library-based analysis of a 6h HeLa triplicate run. For protein groupsonly the first entry was considered.

227,3

33

231,7

91

0

1e5

2e5

3e5

4e5

IDs

Lib

rary

Hyb

rid

0

3,000

6,000

9,000

10,5

42

10,6

24

ProteinsPeptidesPrecursorsLib

rary

Hyb

rid

Lib

rary

Hyb

rid

400,8

49

415,0

02

+2%

+1%+4%

A

B Precursors Peptides Proteins

6,823

241

3%

52

<1%

167

2%

1,019

17

138

96,666

1,785

1%

12,673

8%

1,785

2%

55,941

1,014

7,630

138,64714,344

5%

20,124

8%

4,888

3%

97,575

1,495

16,286

Page 5: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figure 4: Application of Hybrid Approach to Testis Tissue. (A)Library overview. DDA and DIA data were searched using Pulsar (SpectroMine).FDR on precursor, peptide and protein level was set to 1%. Libraries were generatedin SpectroMine using search archives. (B) Application of libraries to injection triplicateof one testis tissue sample. DIA data were analysed in Spectronaut and precursorand protein FDR was set to 1%. Two-sample t-test was applied to test for statisticalsignificance of identification differences between the DIA data analysis using thedifferent libraries. Average run identifications and CVs < 20% were calculated basedon the triplicate injection of the sample. (C) Overlap in quantified precursors,peptides and proteins from the analysis using the four different libraries (6h Testistriplicate). For protein groups only the first entry was considered.

436,8

83

449,4

69

410,6

70

437,1

55

255,4

32

255,8

81

235,5

09

244,1

39

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e

Resourc

e-H

ybrid

0

1e5

2e5

3e5

4e5

5e5

IDs

PeptidesPrecursors

13,4

36

13,4

03

13,2

74

13,3

85

0

5,000

10,000

15,000

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e

Resourc

e-H

ybrid

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e

Resourc

e-H

ybrid

+3% +6%

+0% +4%

-0% +1%

ProteinsA

B

0

1e5

2e5

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e

Resourc

e-H

ybrid

IDs

Precursors

0

50,000

100,000

150,000

Peptides

0

2,500

5,000

7,500

10,000

Proteins

+27%

+16%

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e

Resourc

e-H

ybrid

Pro

ject-

Lib

rary

Pro

ject-

Hyb

rid

Resourc

e

Resourc

e-H

ybrid

p=7e-7

+10%

+4%

p=7e-5

+12%

+14%

p=3e-5

+1%

+2%

p=3e-3 +10%

+10%

p=9e-8

+6%

+4%

p=6e-8

IDs with a quantitative CV < 20% average IDs

Page 6: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Precursors Peptides ProteinsC

ProjectHybrid

157,997

256

7,7647,489

1,285 14,71330,0861,204

6,924

9,185

5,354

318

14,670

61,781

14,382

Project

Resource ResourceHybrid

ProjectHybrid

109,419

113

3,2585,291

770 3,09513,399927

1,360

5,435

3,371

318

4,454

39,713

9,221

Project

Resource ResourceHybrid

ProjectHybrid

7,900

17

10134

4 15857220

201

154

36

11

410

692

68

Project

Resource ResourceHybrid

Page 7: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figure 5: Analysis of Testis Cancer Set with 2h and 4h DIAMethod. The hybrid libraries for DIA data analysis were generated by combining thetestis project-specific library with the search results of the respective 2h, 4h or 6h DIAruns. (A) Overview of identifications. Average identifications and standard deviationsare depicted for the NAT (green) and cancer (red) cohorts based on three biologicalreplicates. (B) Overview of statistical analysis based on the Spectronaut pipeline.Proteins were considered significantly differentially abundant with an absolute fold-change > 1 and a Q value <0.01. (C) Overlap of significantly changed proteins. Forprotein groups only the first entry was considered.

239,9

74

254,6

16 3

24,1

65

110,7

35

114,1

80 157,1

81 190,1

62

206,0

04

268,4

00

152,1

33

161,4

65

196,9

15

82,7

31

85,5

09

111,4

85

122,2

60

132,2

69

166,8

31

0

1e5

2e5

3e5

IDs

10,5

64

10,5

43

11,1

97

7,6

13

7,6

08

8,4

88 9,6

31

9,6

84

10,4

04

0

4,000

8,000

12,000

Precursors Peptides Proteins

NA

T

Cancer

Tota

l

2h

NA

T

Cancer

Tota

l

4h

NA

T

Cancer

Tota

l

6h

NA

T

Cancer

Tota

l

2h

NA

T

Cancer

Tota

l

4h

NA

T

Cancer

Tota

l

6h

NA

T

Cancer

Tota

l

2h

NA

T

Cancer

Tota

l

4h

NA

T

Cancer

Tota

l6h

A

B

0

3,000

6,000

9,000

2h 4h 6h

Gradient Length

Can

did

ate

s

proteins upregulated in cancer

proteins without change in abundance

proteins downregulated in cancer

C

4h 6h

2h

1,657

502

11%657

14%

567

12%

772

174230

Page 8: Surpassing 10,000 identified and quantified proteins in a ... · Jan Muntel, Tejas Gandhi, Lynn Verbeke, Oliver M. Bernhardt, Tobias ... IDs with a quantitative CV < 20% average IDs

Supplementary Figure 6: Additional Analysis. (A) IPA (Qiagen) analysis ofmismatch repair pathway. Pathway was exported from IPA and quantitative data ofthe proteins from this pathway are depicted as bar chart. Proteins quantified with a Qvalue < 0.01 are marked with a star. (B) Abundance changes between cancer andNAT cohort of the DNA-directed RNA polymerase proteins. Stars indicate proteinsquantified with a Q value < 0.01.

B

RPA1

RPA2

RPA12

RPA43

RPA34

RPA49

RPB1

RPB2

RPB3

RPB4

RPB7

RPB9

RPB11

GRL1A

GL1AD

RPC1

RPC2

RPC3

RPC4

RPC5

RPC6

RPC7

RPC7L

RPC8

RPC9

RPC10

RPAC1

RPAC2

RPAB1

RPAB2

RPAB3

RPAB4

RPAB5

Log2ratio

-1 -0.5 0 0.5 1 1.5 2

III

III

I and

III

I, II

and

III

DN

A-d

irecte

d R

NA

Po

lym

era

ses

*

**

**

**

**

**

**

**

*

**

*

*

**

**

**

*

A

PCNAMSH2MSH3MSH6MLH1PMS2EXO1FEN1RFC1RFC2RFC3RFC4RFC5

POLD1RPA1

-2 -1 0 1 2

Log2ratio

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

0%

25%

50%

75%

100%

0

5

10

Mismatch Repair

Perc

en

tag

e

-log

10 p

valu

e

14

not identified

downregulated

downregulated (Q<0.01)

upregulated

upregulated (Q<0.01)