Uncertainty in the Measurements

21
Novel Algorithms for the Novel Algorithms for the Quantification Confidence Quantification Confidence in in Quantitative Proteomics Quantitative Proteomics with Stable Isotope with Stable Isotope Labeling* Labeling* Chongle Pan Chongle Pan 1,2 1,2 ; David L. Tabb ; David L. Tabb 1 ; Dale ; Dale Pelletier Pelletier 1 ; W. Hayes McDonald ; W. Hayes McDonald 1 ; Greg ; Greg Hurst Hurst 1 ; Nagiza F. Samatova ; Nagiza F. Samatova 1 ; Robert L. ; Robert L. Hettich Hettich 1 ; ; 1 Oak Ridge National Laboratory, Oak Ridge, TN Oak Ridge National Laboratory, Oak Ridge, TN 2 Genome Science and Technology, UT-ORNL Genome Science and Technology, UT-ORNL * Research support provided by the U.S. Department of Energy, Office of Biological and Environmental

description

Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling*. Chongle Pan 1,2 ; David L. Tabb 1 ; Dale Pelletier 1 ; W. Hayes McDonald 1 ; Greg Hurst 1 ; Nagiza F. Samatova 1 ; Robert L. Hettich 1 ; 1 Oak Ridge National Laboratory, Oak Ridge, TN - PowerPoint PPT Presentation

Transcript of Uncertainty in the Measurements

Page 1: Uncertainty in the Measurements

Novel Algorithms for the Novel Algorithms for the Quantification ConfidenceQuantification Confidence in in Quantitative Proteomics with Quantitative Proteomics with

Stable Isotope Labeling*Stable Isotope Labeling*

Chongle PanChongle Pan1,21,2; David L. Tabb; David L. Tabb11; Dale Pelletier; Dale Pelletier11; ; W. Hayes McDonaldW. Hayes McDonald11; Greg Hurst; Greg Hurst11; Nagiza F. ; Nagiza F.

SamatovaSamatova11; Robert L. Hettich; Robert L. Hettich11;;

11Oak Ridge National Laboratory, Oak Ridge, TNOak Ridge National Laboratory, Oak Ridge, TN22 Genome Science and Technology, UT-ORNL Genome Science and Technology, UT-ORNL

* Research support provided by the U.S. Department of Energy, Office of Biological and Environmental Research.

Page 2: Uncertainty in the Measurements

Uncertainty in the MeasurementsUncertainty in the Measurements

Mass spectrometric measurement of a proteinMass spectrometric measurement of a protein

Mr = 23,564 DaMr = 23,564 Da ±10 Da 95% confidence±10 Da 95% confidence

Relative quantification of a protein in Relative quantification of a protein in quantitative proteomicsquantitative proteomics

Abundance ratio = 1:1Abundance ratio = 1:1

95% confidence interval = [2:1, 1:2]95% confidence interval = [2:1, 1:2]The principal aimThe principal aim

RelExRelEx11, ASAPratio, ASAPratio22, , XPRESSXPRESS3 3 , MSQuan, MSQuan44

11Anal Chem, 2003. Anal Chem, 2003. 7575: p. 6912-21: p. 6912-2122Anal Chem, 2003. 75: p. 6648-57Anal Chem, 2003. 75: p. 6648-5733Nat Biotechnol, 2001. 19: p. 946-51Nat Biotechnol, 2001. 19: p. 946-5144Nat Biotechnol, 2004. 22: p. 1139-45.Nat Biotechnol, 2004. 22: p. 1139-45.

Page 3: Uncertainty in the Measurements

ExperimentalExperimental1.1. Metabolic labeling ofMetabolic labeling of Rhodopseudomonas Rhodopseudomonas

palustris palustris withwith thethe stable isotope stable isotope 1515N N 2.2. Standard mixtures of natural and Standard mixtures of natural and 1515N-labeled N-labeled

proteomes at the pre-determined mixing ratios proteomes at the pre-determined mixing ratios 3.3. Shotgun proteomics analysisShotgun proteomics analysis

– MS instrument:MS instrument: linear ion trap linear ion trap (LTQ, Finnigan)(LTQ, Finnigan)

– 2D-LC method:2D-LC method: 24-hour MudPIT technique 24-hour MudPIT technique55

4.4. Protein identificationProtein identification– Database searching:Database searching: DBDigger DBDigger66

– Identification filtering:Identification filtering: DTASelect DTASelect77

5 5 Int. J. of Mass Spec. 2002. 219: p. 245-251.Int. J. of Mass Spec. 2002. 219: p. 245-251.6 6 Anal Chem, 2003. Anal Chem, 2003. 7575: p. 6912-21: p. 6912-217 7 J. Proteome Res. 2002 1: p. 21-26.J. Proteome Res. 2002 1: p. 21-26.

Page 4: Uncertainty in the Measurements

Benchmark DataBenchmark Data

Peptide I.D. filtering: 95% of true positive rate Peptide I.D. filtering: 95% of true positive rate

Protein I.D. filtering: minimum of 2 peptidesProtein I.D. filtering: minimum of 2 peptides14N:15N Peptide hits Protein hits

1:1 16,100 1,9621:1 15,876 1,8985:1 16,095 1,8721:5 16,800 2,082

10:1 16,725 1,9171:10 17,738 2,163

Sample dynamic rangeno difference

5-fold difference 1:5 5:110-fold difference 1:10 10:1

Mixing ratio (14N:15N)1:1 (in duplicate)

Data qualityData quality

ReproducibilityReproducibility

Page 5: Uncertainty in the Measurements

MS1 or mzXML formatMS1 or mzXML format

SIC reconstructionSIC reconstruction

peak detectionpeak detection

peptide quantificationpeptide quantification

protein quantificationprotein quantificationmaximum likelihood maximum likelihood

estimationestimation

principal component principal component analysisanalysis

parallel paired covarianceparallel paired covariance

Block DiagramBlock Diagram

selected ion chromatogramselected ion chromatogram

mass spectral datamass spectral data

chromatographic peakchromatographic peak

peptide abundance ratiopeptide abundance ratio

confidence score confidence score

protein abundance ratioprotein abundance ratio

confidence intervalconfidence interval

Page 6: Uncertainty in the Measurements

Peak DetectionPeak Detection

cova

rian

ce

scan numberscan number

ion

inte

nsity

Light isotopologue SIC; Heavy isotopologue SIC

S/N=3; S/N=13

S/N=42

Parallel paired covariance Parallel paired covariance chromatogram (PPC)chromatogram (PPC)

Peak boundaries are defined as the local minima in the PPC, which include all MS/MS matching the peptide

Pea

k b

ou

nd

arie

s

Page 7: Uncertainty in the Measurements

Peptide QuantificationPeptide Quantification

Peptide abundance ratios can be estimated byPeptide abundance ratios can be estimated by Peak height ratioPeak height ratio

scan number

ion

inte

nsity

ion

inte

nsity

scan number

Peak area ratio Peak area ratio ASAPratio, MSQuan, XPRESSASAPratio, MSQuan, XPRESS

Page 8: Uncertainty in the Measurements

heavy isotopologue ion intensity

light

isot

opol

ogue

ion

inte

nsity

Peptide QuantificationPeptide Quantificationio

n in

tens

ity

scan number

Linear regression RelExRelEx

ratio = tan(θ)

θ

PC1

PC2

Principal component analysis (PCA)

signal-to-noise ratio = PCA-SNR

θ

Page 9: Uncertainty in the Measurements

Quantification AccuracyQuantification Accuracy

Pep

tid

e co

un

ts

log2(ratio)

ExpectedExpected loglog 22(ratio)(ratio)

Peak height ratioPeak area ratioPCA/linear regression

1:5

Page 10: Uncertainty in the Measurements

5:1 10:1

1:101:1

log2(ratio)

Pep

tid

e co

un

tsP

epti

de

cou

nts

log2(ratio)log2(ratio)

Quantification AccuracyQuantification Accuracy

1:1 5:1

1:5 1:10

10:1

Page 11: Uncertainty in the Measurements

Quantification ConfidenceQuantification Confidence

log2(ratio)

log

2(P

CA

-SN

R)

pep

tid

e co

un

ts

5:1 2D histogram ofpeptide log2(ratio) & log2(PCA-SNR)

Page 12: Uncertainty in the Measurements

Quantification ConfidenceQuantification Confidence

log2(ratio)

log

2(P

CA

-SN

R)

5:1 Bin the peptides by their log2(PCA-SNR) value

Bias: the deviation of the average estimated log2(ratio)from the expected log2(ratio)

Bias increases asPCA-SNR decreases below a threshold

Page 13: Uncertainty in the Measurements

Quantification ConfidenceQuantification Confidence

log2(ratio)

log

2(P

CA

-SN

R)

5:1 Bin the peptides by their log2(PCA-SNR) value

Variance: the variability of the estimated log2(ratio)

Variance increases asPCA-SNR decreases

Page 14: Uncertainty in the Measurements

Quantification ConfidenceQuantification Confidence1:5

log2(ratio)

log 2

(S/N

)

Comet-like two-dimensional distributionComet-like two-dimensional distribution

As logAs log22(SNR) decreases,(SNR) decreases,

the spread of logthe spread of log22(ratio) estimates increases (ratio) estimates increases

the average of logthe average of log22(ratio) estimates regresses to zero(ratio) estimates regresses to zero

log2(ratio)

log

2(P

CA

-SN

R)

log2(ratio)log2(ratio)

log

2(P

CA

-SN

R)

1:1 1:10

10:11:1 5:1

Page 15: Uncertainty in the Measurements

Quantification ConfidenceQuantification Confidencelo

g2(P

CA

-SN

R)

| mean { log2(ratio) } |

5:1&1:5 10:1&1:101:1

log

2(P

CA

-SN

R)

standard deviation { log2(ratio) }

1:15:1&1:5

10:1&1:10

The quantification bias and variance for peptides are linear functions of PCA-SNR

Page 16: Uncertainty in the Measurements

Protein QuantificationProtein Quantification

log2(ratio)

log

2(P

CA

-SN

R)

mean

Maximum likelihood point estimate of a protein’s abundance ratio is the ratio that best explains its measured peptides’ estimated log2(ratio) at the calculated log2(PCA-SNR)

2 sd

measuredpeptides

A series of theoretical probability distributions of peptide abundance ratio estimates at each PCA-SNR level

Page 17: Uncertainty in the Measurements

Quantification AccuracyQuantification Accuracy

RelEx filtering:> 0.7 correlation at 1> 0.4 correlation at 10> 3 signal-to-noise> 2 peptides

log2(ratio)

pro

tein

co

un

ts

1:5 protein MSERelEx 1090 2.0

PRATIO 1091 0.9

MSE: Mean Square Error

PRATIO filtering:> 2 PCA-SNR> 2 peptides< 4 95% confidence interval width for log2(ratio)

Page 18: Uncertainty in the Measurements

5:1

1:10

Quantification AccuracyQuantification Accuracy

log2(ratio) log2(ratio) log2(ratio)

1:1

1:1 5:1

1:5 1:10

10:1

prot

ein

coun

tspr

otei

n co

unts protein MSE

1115 0.41262 0.6

protein MSE1210 0.51242 0.5

protein MSE1090 2.01091 0.9

protein MSE1046 2.2980 1.8

protein MSE1092 4.61061 1.6

protein MSE1070 4.01072 1.9

Page 19: Uncertainty in the Measurements

Confidence Interval EstimationConfidence Interval Estimation

log2(ratio)

1:5 Display of the point estimates (+) and the 95% confidence interval estimates ( ----------- ) for protein abundance ratios

Pro

tein

Page 20: Uncertainty in the Measurements

Confidence Interval EstimationConfidence Interval Estimation

Point estimates and confidence interval estimates of protein abundance ratios

log2(ratio) log2(ratio)log2(ratio)

1:1 1:5

5:1 10:11:1

1:10

Page 21: Uncertainty in the Measurements

ConclusionsConclusions

Three novel algorithmsThree novel algorithms Parallel paired covarianceParallel paired covariance for peak detection for peak detection Principal component analysisPrincipal component analysis for peptide quantification for peptide quantification Maximum likelihood estimationMaximum likelihood estimation for protein quantification for protein quantification

Improved Protein Quantification Accuracy

Rigorous Confidence Interval Estimation

Three novel algorithmsThree novel algorithms Parallel paired covarianceParallel paired covariance for peak detection for peak detection Principal component analysisPrincipal component analysis for peptide quantification for peptide quantification Maximum likelihood estimationMaximum likelihood estimation for protein quantification for protein quantification

Improved Protein Quantification Accuracy

Rigorous Confidence Interval Estimation

The fully automated program with graphic user The fully automated program with graphic user interface is freely available for testing by contacting interface is freely available for testing by contacting C. Pan (email: [email protected])C. Pan (email: [email protected])