Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational...
-
Upload
anis-richard -
Category
Documents
-
view
213 -
download
0
Transcript of Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational...
![Page 1: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/1.jpg)
Analysis of tandem mass spectra - II
Prof. William Stafford NobleGENOME 541
Intro to Computational Molecular Biology
![Page 2: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/2.jpg)
Re-ranking identified spectra
(Keller Analytical Chemistry 2002)
(Anderson J Proteome Research 2003)
(Käll Nature Methods 2007)
![Page 3: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/3.jpg)
EAMPK
EAMPK
EAMPK EAMPK
EAMPKEAMPK
EAMPK
EAMPK
EAMPK?
This is the problem we set out to solve
![Page 4: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/4.jpg)
Modified problem: Is this peptide assignment correct?
m/z
Inte
nsi
ty
VVVTGLGMLSPVGNTVESTWK +2
1304.4+1
888.14+1
![Page 5: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/5.jpg)
Peptide-spectrum match features
• Total peptide mass• Charge (+1, +2 or +3)• Total ion current• Peak count• Preliminary SEQUEST score (Sp)• Sp rank• Cross-correlation score (XCorr)• Change in XCorr (delta Cn) • Mass difference
• Percent of theoretical peaks matched• Percent of observed peaks matched• Percent of peptide fragment ion current matched• Percent sequence identity between top and second-
ranked peptides
![Page 6: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/6.jpg)
![Page 7: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/7.jpg)
• Uses linear discriminant analysis rather than SVM.• Uses a four-dimensional feature space (XCorr, delta Cn,
ln SpRank, delta mass).• Uses EM to fit distributions to the discriminants of the
two classes, yielding a probability.• Learns a simple, independent probability model of the
number of tryptic termini.• Publicly available software, PeptideProphet, is widely
used.
![Page 8: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/8.jpg)
![Page 9: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/9.jpg)
Peptide-spectrum matches against the real database
Peptide-spectrum matches against the shuffled database
q=0.01
![Page 10: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/10.jpg)
![Page 11: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/11.jpg)
Features
![Page 12: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/12.jpg)
![Page 13: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/13.jpg)
2780 PSMs
13706 PSMs
8050 PSMs
12691 PSMs
1% FDR
10863 PSMs
![Page 14: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/14.jpg)
Cleaving with elastase
![Page 15: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/15.jpg)
Variation by data set
Black lines are q = 0.01Yellow line is y=xRed line = equal q value thresholds
Elastase data set Chymotrypsin data set
![Page 16: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/16.jpg)
Percolator best match
SEQUEST best match
![Page 17: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/17.jpg)
![Page 18: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/18.jpg)
Protein identification
![Page 19: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/19.jpg)
The protein ID problem
Proteins
Peptides
Spectra
EEAMPFK CYCYGGLGK CYCLLIGK FTEILYCDLNR VNILLGLPK
1.00.95
0.98
0.870.74
![Page 20: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/20.jpg)
The peptide-to-protein mapping is many-to-many
![Page 21: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/21.jpg)
0.03
0.10
0.91
0.99
0.97
≥ 0.90
Proteins (X) Peptides (Y) Spectra (D)
One- and two-peptide rules use a simple threshold
![Page 22: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/22.jpg)
0.03
0.10
0.91
0.99
0.97
≥ 0.90
Proteins (X) Peptides (Y) Spectra (D)
Select the minimum number of proteins to explain the peptides
IDPicker
![Page 23: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/23.jpg)
ProteinProphet
0.03
0.10
0.91
0.99
0.97
0.3
0.7
0.80.2
1
1
0.55
0.45
Proteins (X) Peptides (Y) Spectra (D)
Use an EM-like procedure…
![Page 24: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/24.jpg)
0.03
0.10
0.910.3
0.7
0.8
0.2
1
0.45
0.91
0.03
0.97
0.991
0.550.97
Proteins (X) Peptides (Y) Spectra (D)
ProteinProphet
![Page 25: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/25.jpg)
Proteins (X) Peptides (Y)
0.8 x 0.03
0.10
0.3 x 0.91
Spectra (D)
0.7 x 0.91
0.2 x 0.03
0.45 x 0.97
0.99
0.55 x 0.97
Pr(X2 |D)
1 (1 0.70.91)(1 0.80.03)(1 0.10)
ProteinProphet
![Page 26: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/26.jpg)
![Page 27: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/27.jpg)
EM-like algorithm
E-step
M-step
All proteins containing peptide i
Probability of protein n
Weight of link from peptide i to protein n
Maximum probability assigned to peptide i
![Page 28: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/28.jpg)
Nested Mixture Model
0.03
0.10
0.91
0.91
0.03
0.97
0.99
0.97
Proteins (X) Peptides (Y) Spectra (D)
Modeled as mixture of present and absent
Model number of matches conditional on
protein states
Model distribution of scores conditional on
peptide states(Li Ann Applied Science 2010)
![Page 29: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/29.jpg)
Shen et al. 2008
Li et al. 2008
Model the MS/MS process generatively (forward) using free parameters.Sum over all possible protein and peptide states to get posterior probabilities.Use Expectation-maximization to get parameter estimates.
Model the MS/MS process generatively using an existing static peptide detectability model.Use Markov chain Monte Carlo to estimate posterior probabilities.
Generatively model:Y | XD | Y
Perform inference to get Pr(X | D)
The emergence of graphical Bayesian methods
![Page 30: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/30.jpg)
Fido
Fido performs exact calculations on a Bayesian network model
![Page 31: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/31.jpg)
Barista uses a neural network to score PSMs
Input units: 17 PSM features
Hidden units
Output unit
PSM feature vector
![Page 32: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/32.jpg)
The Barista model includes spectra, peptides and proteins
F R 1
N R g E E N R
g E s: E ,s ?max f E,s
f E,s
R1 R2 R3
E1 E2 E3 E4
S1 S2 S3 S4 S5 S6 S7
Proteins
Peptides
SpectraNeural network score function
Number of peptides in protein R
![Page 33: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/33.jpg)
Model Training
• Search against a database containing real (target) and shuffled (decoy) proteins.
• For each protein, the label y {+1, -1} indicates whether it is a target or decoy.
• Hinge loss function: L(F(R),y) = max(0, 1-yF(R))• Goal: Choose parameters W such that F(R) > 0 if y = 1,
F(R) < 0 if y = -1.
repeatPick a random protein (Ri, yi)Compute F(Ri)if (1 – yF(Ri)) > 0 then
Make a gradient step to optimize L(F(Ri),yi)end if
until convergence
![Page 34: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/34.jpg)
Barista performs well in target/decoy evaluation
![Page 35: Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.](https://reader036.fdocuments.us/reader036/viewer/2022062804/56649d9c5503460f94a84e97/html5/thumbnails/35.jpg)
Why does Barista work well?
Sources of information loss during two-stage analysis:• Spectra that are not confidently assigned to a peptide
during the initial search are lost.• Also lost are lower-ranked peptides that match a given
spectrum, corresponding to– the correct peptide when the top-ranked peptide is
incorrect, or– a second correct peptide when the spectrum is chimeric.
• A single score is less informative than a rich feature vector describing the PSM.