Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO...
-
date post
19-Dec-2015 -
Category
Documents
-
view
229 -
download
1
Transcript of Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO...
![Page 1: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/1.jpg)
Practical Guide to Significantly Improve Peptide Identification
Sensitivity and Accuracy
Bin Ma, CTOBioinformatics Solutions Inc.
June 5, 2011.
![Page 2: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/2.jpg)
The Sensitivity and Accuracy Dilemma
score
false
true
FDR# reported false hits
# reported hits
![Page 3: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/3.jpg)
Publication Guideline• Earlier experiments paid too much attention on sensitivity and
not enough on accuracy.• MCP started the guideline in 2004 to ensure accuracy.
![Page 4: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/4.jpg)
People are generally over-optimistic about how reliable their results are.– ABRF iPRG 2011.
1%
iPRG/ABRF 2011 Study
30 out of 45 submissions have FDR much higher than the required 1%
Estimated FDR lower bound
Estimated FDR upper bound
“ ”
![Page 5: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/5.jpg)
PEAKS Achieved both Sensitivity and Accuracy
1%
PEAKS PEAKS
More peptides in submission
![Page 6: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/6.jpg)
Outline
1. FDR – pitfalls and solutions2. De novo sequencing assisted database search3. Three essential examinations to ensure result
quality.
![Page 7: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/7.jpg)
1. FDR – pitfalls and solutions
![Page 8: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/8.jpg)
FDR Estimation
Search Engine
𝐹𝐷𝑅=¿𝑑𝑒𝑐𝑜𝑦¿ 𝑡𝑎𝑟𝑔𝑒𝑡
target
decoy # decoy hits
Protein DB
Identified Peptides
# false target hits ≈
![Page 9: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/9.jpg)
Pitfall 1 – Multiple Round Search
Round 1. Fast Search
Round 2. More Sensitive Search
FDR underestimation.
# decoy hits# false target hits ¿
more targets than decoys
Craig and Beavis 2004. Bioinformatics 20, 1466–67.
Bern and Kil 2011, J Proteome Res. 10, 2123-27.
Evertt et al. 2010. J Proteome Res. 9, 700-707.
![Page 10: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/10.jpg)
Our Solution: Decoy Fusion
Fast Search
More Sensitive Search
Decoy sequence append to each target protein.
PEAKS DB paper. Submitted.
Equal targets and decoys
# decoy hits# false target hits ≈
![Page 11: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/11.jpg)
Pitfall 2 – Mix Protein and Peptide ID
Idea: Peptides on a multi-hit protein get a bonus on their scores to increase sensitivity.
Pitfall
More multi-hit proteins from target DB more false hits are “saved” from target DBFDR underestimation.
A weak hit is “saved” due to the bonus.
So is this weak false hit.
decoy hit
target false hit
![Page 12: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/12.jpg)
Our Solution: Decoy Fusion
Weak false hits are “saved” with approx. equal probabilities in target and decoy.
Get the sensitivity, but still estimate the FDR correctly.
![Page 13: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/13.jpg)
Pitfall 3 – Machine Learning with Decoy
Idea: Re-train the coefficients of scoring function for every search after knowing the decoy hits.Pitfall: Risk of over-fit. Machine learning experts only.
Adjust scoring function to remove decoy hits after search.
Fewer target false hits are removedFDR underestimation
Search
target false hits
decoy hits
![Page 14: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/14.jpg)
Solutions
1. Don’t use it. Judges cannot be players.
2. Only use for very large dataset.3. Train coefficients and reuse; don’t re-train
for every search.
oror
![Page 15: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/15.jpg)
PEAKS 5.3
• PEAKS DB used all these techniques (and many more) to ensure the accuracy while maximizing sensitivity.
• Reliable FDR estimation is the top priority in PEAKS DB design.
![Page 16: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/16.jpg)
2. De novo sequencing assisted database search
![Page 17: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/17.jpg)
An Idea to Improve Score Function
score
false
true
Idea: If de novo matches a DB peptide, it is likely to be correct.
![Page 18: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/18.jpg)
De Novo Assisted DB Search# matched amino acidsbetween de novo & DB search
x+4ybest separation line
DB Search Score
![Page 19: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/19.jpg)
score
false
true
Including de novo matching as a feature gives the score function a better discriminative power.
before after
This is just one example of many other new features in PEAKS 5.3 for improving score function.
![Page 20: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/20.jpg)
… far better than what I could ever squeeze out of my data – Stefano Gotta, Siena Biotech
0 500 1000 1500 2000 2500 3000 3500 40000.0%
0.5%
1.0%
1.5%
2.0%
2.5%
# of PSM
FDR
product M PEAKS DB
“ ”
![Page 21: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/21.jpg)
DB search
Found?
Yes
No
De Novo
All Spectra
DB peptides De novo only
PEAKS DB Workflow
De novo both helps to improve DB search, and reports novel peptides.
![Page 22: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/22.jpg)
3. Three essential examinations to ensure result quality.
![Page 23: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/23.jpg)
Don’t Trust Software Blindly!• Google “Don’t trust software blindly” returned
5,140,000 results.• As you quality control your experiments,
quality control the software’s results too.
![Page 24: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/24.jpg)
Essential Examination 1
#decoy #targetin low score region
Low #decoy in high score region
![Page 25: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/25.jpg)
Essential Examination 2
High scoring peptidesshould have low precursor error.
Precursor error start to scatterbelow threshold
![Page 26: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/26.jpg)
Essential Examination 3
• Spectrum annotation around score threshold.
![Page 27: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/27.jpg)
Take Home Message
• Another year of dedicated work on PEAKS.• Ensured accuracy; maximized sensitivity.• Do the three essential examinations.– They are simple … at least in PEAKS.
![Page 28: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d2e5503460f94a061d6/html5/thumbnails/28.jpg)
“a big step forward” – Christian Schmelzer, Martin Luther University
Enjoy!
http://www.bioinfor.com/peaks-download-a-pricing