2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

92
Paulo Costa Carvalho Laboratory for Proteomics and Protein Engine Fiocruz - PR nalyzing shotgun proteomic data pcarvalho.com

Transcript of 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Page 1: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Paulo Costa CarvalhoLaboratory for Proteomics and Protein EngineeringFiocruz - PR

Analyzing shotgun proteomic data

pcarvalho.com

Page 2: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

2

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 3: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

3

J. Proteome Res., 2011, 10 (1), pp 153–160DOI: 10.1021/pr100677g

Motivations

Page 4: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

4

Page 5: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

5

Editorial

“There has been an unprecedented improvement in the quality and quantity of commercial proteomics data generation technologies, making data generation more accessible to many researchers. However, more and more discoveries will be led by researchers in command of the skills necessary to mine and extensively interpret the volumes of data. Already the ability to generate data vastly outpaces our ability to interpret it, and the lack of expertise in interpreting data is the current gating factor in the advancement of proteomics sciences. Proteomics scientists with training solely in data generation techniques will be shut out of more and more research opportunities.

Nuno Bandeira, July 2011

Computational Proteomics

Page 6: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Too many roads not taken

Eduards AM, Nature, Feb 2011

Page 7: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

7

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 8: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Proteomics has revolutionized biochemical research

Page 9: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

pcarvalho.com 9

Page 10: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

10

LC / MS shotgun proteomic data

Mass / Charge

Time

Page 11: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

AF Y L K

m/z

A F Y AL KNH2 COOH

(precursor)2+

(B) (Y)

Page 12: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

A FY L K

A

m/z

A F Y L KNH2 COOH

AF Y L K

(precursor)2+

(B) (Y)

Page 13: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

A F Y

L K

L

m/z

A F Y KNH2 COOH

AF Y L K

A F

Y L K(precursor)2+

(B) (Y)

Page 14: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

m/z

A F Y L KNH2 COOH

K

A F Y L

AF Y L K

A F

Y L K

A F Y

L K

(precursor)2+

(B) (Y)

Page 15: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

15

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 16: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Strategies for protein identificationby mass spectrometry

• Peptide sequence match• Advantage: most sensitive (when the protein is in the DB)• Disadvantage: sequence must be in the DB; needs to

specify PTMs a priori.• De novo sequencing

• Advantage: does not require a database • Disadvantage: most error prone.

• Sequence Tag Search• Advantages: no need to specify PTM a priori; tolerant to

small changes in the sequence• Disadvantages: not as sensitive as PSM when the protein

is in the DB

Page 17: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

17

• De novo sequencing• Advantage: does not require a database • Disadvantage: most error prone

M/Z

MS/MS

Inte

ns

ity

QG

D

F V L ET

S K

HA

GI

I

LV

L

G

T

SV

G

V

V

K

E

DA

S

PE

Page 18: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

18

• Sequence Tag Search• Advantages: no need to specify PTM a priori; tolerant to small sequence changes• Disadvantages: not as sensitive as PSM when the protein is in the DB

Na S et al., MCP, 2008

Page 19: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

19

• Peptide sequence match• Advantage: most sensitive (when the protein is in the DB)• Disadvantage: sequence must be in the DB; needs to specify

PTMs a priori

Page 20: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

20

Protein Identification using a database

ProLuCIDXtandemOMSSA

AndromedaSEQUESTMascot

Page 21: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Interpreting MS/MS Proteomics Results

Brian C. SearleProteome Software Inc. Portland, Oregon USA

[email protected]

NPC Progress Meeting(February 2nd, 2006)

Illustrated by Toni Boudreault

Page 22: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

M/z

Inte

nsity

R I T P E AH2O

B-type, A-type, Y-type IonsAll these peaks are seen together

simultaneouslyand we don’t

even know…

Page 23: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

M/z

Inte

nsity

What type of ion they are, making the mass differences approach even more difficult.

Finally, as with all analytical techniques,

Page 24: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

M/z

Inte

nsity

There’s noise,producing a final spectrum that looks like…

Page 25: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

M/z

Inte

nsity

….This, on a good day. And so it’s actually fairly difficult to…

Page 26: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

26

XCalibur :: Show experimental data

Page 27: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Known Ion Types

B-type ionsA-type ionsY-type ions

We knew a couple of things about peptide fragmentation.

Not only do we know to expect B, A, and Y ions,

but…

Page 28: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Known Ion Types

B-type ionsA-type ionsY-type ions

B- or Y-type +2H ionsB- or Y-type -NH3 ions

B- or Y-type -H2O ions

• 100%• 20%• 100%

• 50%• 20%• 20%

… likelihood of seeing each type of ion,

where generally B and Y ions are most prominent.

Page 29: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

If we know the amino acid

sequence of a peptide,

we can guess what the spectra should

look like!

So it’s actually pretty easy to guess what a spectrum

should look like

if we know what the peptide sequence is.

Page 30: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

ELVISLIVESK

Model Spectrum

*Courtesy of Dr. Richard Johnsonhttp://www.hairyfatguy.com/

So as an example, consider the peptide

ELVIS LIVES K

that was synthesized by Rich Johnson in

Seattle

Page 31: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Model Spectrum

We can create a hypothetical spectrum based on our rules

Page 32: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

B/Y type ions (100%)

A type ionsB/Y -NH3/-H2O

(20%)

B/Y +2H type ions(50%)

Where B and Y ions are estimated at 100%,

plus 2 ions are estimated at

50%, and other stragglers are at 20%.

Page 33: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Model Spectrum

So if we consider the spectrum that was derived from the ELVIS LIVES K peptide…

Page 34: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Model Spectrum

We can find where the overlap is between the hypothetical and the actual spectra…

Page 35: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Model Spectrum

And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.

Page 36: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

• 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced.

• 1989 Yeast Genome project announced• 1990 Human Genome project announced• 1992 First chromosome (Yeast) sequenced• 1995 H. influenza sequenced • 1996 Yeast Genome sequenced• 2000 Human Genome draft

Sequencing Explosion

Eng, J. K.; McCormack, A. L.; Yates, J. R. III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.

In 1994 Jimmy Eng and John Yates published a technique to

exploit genome sequencing

And the idea was …

for use in tandem mass

spectrometry.

Page 37: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST

.…instead of searching all possible peptide sequences,

search only those in genome databases.

Now, in the post- genomic world this seems like a pretty

trivial idea,

but back then there was a lot of assumption placed on

the idea

that we’d actually have a complete Human genome in

a reasonable amount of time.

Page 38: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST Model Spectrum

For a scoring function they decided to use Cross-Correlation,

Like so. which basically sums the peaks that

overlap between hypothetical and the actual spectra

Page 39: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST Model Spectrum

And then they shifted the spectra back and ….

Page 40: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST Model Spectrum

They used this number, also called the Auto-Correlation, as their background.

… Forth so that the peaks shouldn’t align.

Page 41: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST XCorr

Gentzel M. et al Proteomics 3 (2003) 1597-1610

Offset (AMU)

Cor

rela

tion

Sco

re

Cross Correlation(direct comparison)

Auto Correlation(background)

This is another representation of the Cross Correlation and the Auto Correlation.

Page 42: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST XCorrCross Correlation

(direct comparison)

Auto Correlation(background)

CrossCorr

avg AutoCorr offset=-75 to 75 XCorr =Gentzel M. et al Proteomics 3 (2003) 1597-1610

Offset (AMU)

Cor

rela

tion

Sco

re

The XCorr score is the Cross Correlation divided

by the average of the auto correlation over a

150 AMU range.

The XCorr is high if the direct comparison is significantly

greater than the background,

which is obviously good for peptide identification.

Page 43: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

SEQUEST DeltaCn

XCorr1 XCorr 2

XCorr1and so far, there really

haven’t been any significant

improvements on it.The DeltaCn is another

score that scientists often use.

It measures how good the XCorr is relative to the

next best match.

And this XCorr is actually a pretty robust method for estimating how accurate

the match is,

As you can see, this is actually a pretty crude calculation.

Page 44: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

44

Raw Xtractor / Pause for search

* Show an MS2 file

Page 45: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

45

ProLuCID

ProLuCID is a fast and sensitive tandem mass spectra-based protein identification program recently developed in the Yates laboratory at The Scripps Research Institute.

Page 46: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Show ProLuCID RunnerCarvalho PC et al; unpublished

46

ProLuCID runner

Page 47: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Search Engine (e.g. ProLuCID, SEQUEST, etc)

Workflow

MS PSM

Protein Identification

Database

Page 48: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

48

The Challenge: How to pinpoint trustworthy identifications

1 spectrum = 1 identification!

Page 49: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

49

Filtering data

Page 50: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

50

In the beginning…

spectrum scores protein peptide

sort

by

mat

ch s

core SEQUEST

XCorr > 2.5dCn > 0.1

MascotScore > 45

X!TandemScore < 0.01

Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.

Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size.

Page 51: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

51

There has to be a better way

The threshold model has these problems, which PeptideProphet, DTASelect and others try to solve:

The threshold model has these problems, which PeptideProphet, DTASelect and others try to solve:

• Poor sensitivity/specificity trade-off, unless you consider multiple scores simultaneously.

• No way to choose an error rate (p=0.05).

• Need to have different thresholds for:– different instruments (QTOF, TOF-TOF, IonTrap)– ionization sources (electrospray vs MALDI)– sample complexities (2D gel spot vs MudPIT)– different databases (SwissProt vs NR)

• Impossible to compare results from different search algorithms, multiple instruments, and so on.

Page 52: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

52

Creating a discriminant score

spectrum scores protein peptide

sort

by

mat

ch s

core

PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr, DCn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.

PeptideProphet starts with a discriminant score. If an application uses several scores, (SEQUEST uses Xcorr, DCn, and Sp scores; Mascot uses ion scores plus identity and homology thresholds), these are first converted to a single discriminant score.

Page 53: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

pcarvalho.com 53

Scaffold:: Proteome Software

Page 54: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

54

correctly identifieseverything, with

no error

Keller et al, Anal Chem 2002

This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified).

The ideal is zero error and everything identified (sensitivity = 100%).

PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.

This graph shows the trade-offs between the errors (false identifications) and the sensitivity (the percentage of possible peptides identified).

The ideal is zero error and everything identified (sensitivity = 100%).

PeptideProphet corresponds to the curved line. Squares 1–5 are thresholds chosen by other authors.

Page 55: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

55

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

“correct”

“incorrect”

Discriminant score (D)

Num

ber

of s

pect

ra in

eac

h bi

nThis histogram shows the distributions of correct and incorrect matches.

PeptideProphet assumes that these distributions are standard statistical distributions.

Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.

This histogram shows the distributions of correct and incorrect matches.

PeptideProphet assumes that these distributions are standard statistical distributions.

Using curve-fitting, PeptideProphet draws the correct and incorrect distributions.

Mixture of distributions

Page 56: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

56

Sequências alvo--------------------------

Decoys rotulados}{ Estratégia

decoy para FDR

Resultado

busca

Labeled decoy – False Discovery Rate

Elias and Gygi, Nature Methods, 2007

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

Page 57: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

pcarvalho.com 57

Search Engine Processor

SVM - example

Page 58: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

58

Summary: “The use of iProphet in the TPP increases thenumber of correctly identified peptides at a constant falsediscovery rate (FDR) as compared to both PeptideProphetand another state-of-the art tool Percolator.”

Page 59: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

59

Maximizing proteins under a given FDR

Page 60: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

60

Page 61: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

61

Target Sequences--------------------------

Labeled Decoys }{ New FDR strategy

Resultado

search

Unlabeled Decoys – False Discovery Rate

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

-------------------------Unlabeled Decoyd

U-Decoy

Page 62: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Total Identified Spectra

LD (spectra) UD (spectra)

WNN 115248 1152 4656Bayes 108376 1083 1064

Unlabeled Decoys – False Discovery Rate

Page 63: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Spectra Peptides Proteins (FDR) UL FDRSEPro 104,654 17,840 1283 (0.9%) 1%Scaffold 88,970 15,406 1,160 (2.3%) 2%

Table I. Scaffold A refers to a 99% confidence level for proteins, 95% for peptides. Scaffold B refers to 95 and 80%, respectively for proteins and peptides.

Page 64: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

64

Generating the SEPro Report

Page 65: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

65

Generating the SEPro Report

Page 66: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

66

Generating the SEPro Report

Page 67: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

67

Generating the SEPro Report

Page 68: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

68

Generating the SEPro Report

Page 69: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

69

Generating the SEPro Report

Page 70: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

70

Generating the SEPro Report

Page 71: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

71

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 72: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Relative quantitation

Thermo

Page 73: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Picture from Strassberger et al, JOP, 2010

Label free quantitation

* Search for examples in xcalibur

Scan 12048How to deal with different charge states????

Subject to random sampling; what are its immplications?

Page 74: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

74

Differential Analysis is performed in two steps

Differential Analysis

Marginal Cases (found in only 1 condition)

Differential (found in both)

Page 75: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

75

Venn Diagrams of the proteins identified by shotgun proteomics from a cell lysate inbiological states B1 and B2. Panels A, B, and C consider only proteins that appearedin one or more, two or more, or in all three replicates, respectively.

Page 76: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

76

Venn Diagrams of the proteins identified by shotgun proteomics from a cell lysate in biological states B1 (A) and B2 (B). R1, R2, and R3 refer to the replicates from 59each state.

Page 77: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

77

What proteins can be considered as statistically different for marginal cases?

Page 78: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Low ()Num. Rep. (t) Num. Proteins Fraction () p-value

1 613 0.637 0.1802 283 0.294 0.0563 66 0.069 0.019

Medium ()1 297 0.310 0.1412 417 0.435 0.0423 245 0.255 0.015

High ()1 168 0.176 0.1122 185 0.193 0.0333 604 0.631 0.011

Very High ()1 59 0.070 0.0832 62 0.073 0.0243 725 0.857 0.008

Venn Diagram of the proteins identified by shotgun proteomics from a cell lysate in biological states B1 and B2. Proteins that could not be statistically claimed to be differentially expressed in one of the two states according tothe proposed Bayesian approach (those forwhich p-value 0.05) were automatically filtered out during the generation of the Venn Diagram.

Carvalho PC et al; Bioinformatics 2011

Page 79: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

79

Differential Analysis is performed in two steps

Differential Analysis

Marginal Cases (found in only 1 condition)

Differential (found in both)

Page 80: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

80

}}

Estrategia Tradicional - Data Dependent Analysis (DDA)

Nova estrategia – Extended Data Independent Analysis (XDIA)

Page 81: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Results

• Number of identified spectra increased by 250%.(improves label-free quantitation)

• Number of unique peptide increased by 35%.

81

Page 82: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

82

Page 83: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.
Page 84: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

84

Multiplexed spectrum identification

Page 85: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Confidence when integrating extracted ion chromatograms

DDA XDIA

Page 86: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Co-eluting peptide ions of similar m/z

A AA, B B B B

Data Dependent AnalysisExtended Data Independent Analysis

Time

Peptide Mass:

Page 87: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Spectral deconvolution and monotopic peaks reasignment to aid in identification and XIC quantitation

Page 88: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

88

• Shotgun proteomics • Motivation for studying proteomics.• What is shotgun proteomics.

• Data analysis• Protein identification• Label-free quantitation• PatternLab for proteomics

• Final Considerations

Outline

Page 89: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

89

Show SEProQ here

Page 90: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Pinpoint differentially expressed proteins Venn Diagrams

Gene Ontology Analysis Find trends in time-course experiments

PatternLab for proteomics: a one stop shop for data analysisCarvalho PC et al., Current Protocols in Bioinformatics, 2010

Page 91: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

91

Page 92: 2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g.

Computational workflow

Finding Statistically Differentially Expressed Proteins / Data AnalysisPatternLab for proteomics (Trends, Venn Diagrams, Differential Statistics, Gene Ontology Analysis, etc..)

Protein Quantitation

Search Engine Processor / SEProQ

Protein Identification / Quality control ProLuCID => Search Engine Processor

Search Engine Preprocessing

YADA XDIA Processor CPM

Experimental: Data acquisition using the mass spectrometer

DDA XDIA