Algorithms for Peptide Mass Spectrometry

Ole Schulz-Trieglaff

Max Planck Research School for Computational Biologyand

Free University Berlin, Germany

joint work with Rene Hussong, Clemens Gröpl,Andreas Hildebrandt and Knut Reinert

Outline

• Computational quantification of peptides

How to obtain quantitative information about peptides in a biological sample?

• Quality control How good is my result ?

Liquid Chromatography-Mass Spectrometry (LC-MS)

Separation 1Different peptides have different retention time (rt).

IonizationPeptide receivesz charge units.

Separation 2Detector measures mass/charge (m/z).

LC ESI TOF Spectrum (scan)

tyLC-MS map

LC-MS data acquisition

Isotopic pattern

• Natural isotopes occur with well-known abundances.

• Can be modelled by a binomial distribution.

• Depend on molecular formula of peptide.

12C 98.90

14N 99.63

16O99.76

18O 0.20%

1H99.98

Collect peptides from protein database and compute average amino acid (“averagines”).

Tabulate average isotope pattern for a range ofpeptide masses.

Modeling isotopic pattern

small peptide

large peptide

Why bother ?

• Clinical studies but also basic research rely on an accurate quantification of peptides or proteins.

• All subsequent steps depend on its quality.

• Modern mass spectrometer generate thousands of spectra per day.

• Need for fast and accurate algorithms !

Previous work

Other approaches based on image processing methodsincluding a (global) segmentation of LC-MS map.

Our approach

Idea: We know what we are looking for and how it

looks like. So why not use this knowledge?

1) Pre-process scans by local pseudo-alignment.2) Sweep across the LC-MS map and scan for

isotopic pattern using wavelets.3) Combine isotopic pattern in subsequent

scans.4) Filter for false-positives by fitting a peptide

template.

Determine most likely charge state and m/z by voting across all scans.

Sweep and combine

Hash m/z, charge estimate and extend match.

Sweep and combine

Pseudo-alignment

For each scan, look ahead in time (i.e. at the next scan) .

Add intensities of data points lying at similar positions in thenext scan.

Aim: improve s/n ratio by raising isotopic pattern over noise level.

Sweep and combine

For each scan in LC-MS map:do

detect isotopic pattern using waveletshash m/z and charge estimate for each pattern

if (isotopic pattern in previous scan(s) at similar position)

then continue box else

open new box surrounding the isotopic peaks. fi

Wavelet-based pattern detection

mass 500, charge 1 mass 500, charge 2 mass 2000, charge 1

,)(')(1

),(W dxa

transformed signal signal mother wavelet

Wavelet-based pattern detection

Scoring of intervals in wavelet transform based on mean intensity and (local) variance (F-statistic).

non-peptidic compound

peptide with charge 3

noise peaks

Filtering candidatesFilter for false positives using a peptide template.

Peptide template = isotope distribution + elution profile

m/z RT

• Discard points with bad fit to temple.• Discard regions with bad correlation to template.

Results

A test case: mix of standard peptides.

Stability analysis: can we detect distorted isotopic pattern, too ?

Add uniformly distributed noise with amplitude of 10%, 25%, 50% and 75% of the intensity of the monoisotopic peak.

Check mass, charge and bounding box the peptide sets extracted by our algorithm.

Results

Data set: mix of standard peptides.

Oxytocine, 1007.5 Th, Charge 1

Amplitude

0% 10% 25% 50% 75%

scans 11/11 11/11 10/11 10/11 0/11

charge Yes Yes Yes Yes No

Substance P, 674.5 Th, charge 2

Amplitude

0% 10% 25% 50% 75%

scans 16/20 16/20 13/20 12/20 13/20

charge Yes Yes Yes Yes No

What’s next ?

• How “good” is my result (e.g. the set of peptides) ?

Sounds trivial, but it isn’t !

Example: Using an addition experimental step(MS/MS ion fragmentation) we can get sequence information for several hundreds of peptide ions in a LC-MS map.

Feature extraction algorithms extract thousands ofpeptide signals from a typical map.

How good is my set ?

Many signals without sequence information. Too many to inspect them manually. Do they make sense?

Two criteria: meaningful signals should

• have equal intensities between replicate samples (within some experimental error).

• their masses should be close to the masses of the peptides in this particular organism.

MA plot of replicate samples

MA plot = average intensity of matching signals (x)

vs. ratio of signal intensity (y), both on log-scale

MA plot of replicate samples

Quantification results of algorithm MsInspect show higher variation but MsInspect also

extracts far more signals (10000 vs. 2000 for our approach).

Mass deviance

Mass deviance = min. distanceof a peptide signal mass to themass of a theoretically obtainedpeptide feature.

Do these additional signals make sense ?

MsInspect detects many features with high mass deviance. Either peptides not in sequence database (unlikely) or noise “picked up” by the algorithm.

How good is my set?

Conclusions:

• We can say a bit about the signals we detected (but not much).

• Different algorithms can give very different results.

• Lack of standard data sets impedes advancement of computational research.

Summary and future work• Algorithm for an automated and accurate quantification

of peptides from LC-MS data based on wavelet-based filtering.

• Available under the LGPL at www.openms.de.

Future work:• Better quality control and more complex data.

• So far only quantification of peptides. How to infer the abundance of the corresponding protein ?

Thanks for your attention.

Any questions ?

trieglaf@inf.fu-berlin.dewww.openms.de

Algorithms for Peptide Mass Spectrometry

Documents

Transcript of Algorithms for Peptide Mass Spectrometry

Statistical Methods for Peptide and Protein Identiﬁcation using Mass ...stephenslab.uchicago.edu/assets/papers/qli-uwthesis.pdf · Protein identiﬁcation using mass spectrometry

Genedata Expressionist · Peptide Mapping Genedata Expressionist automates peptide mapping, dramatically reducing analysis time. Its scalable architecture and refined algorithms enable

Introduction to mass spectrometry: protein/peptide vaporization and … 01-14-03.pdf · 2011-03-04 · Introduction to mass spectrometry: protein/peptide vaporization and ... 1886

Algorithms in Bioinformatics: A Practical …ksung/algo_in_bioinfo/...Algorithms in Bioinformatics: A Practical IntroductionPractical Introduction Peptide SequencingPeptide Sequencing

Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.

Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Protein Sequencing and Identification by Mass Spectrometry 1.

Application of a Mass Spectrometry Based Multi-Attribute ... · ≤ 10 CFU/10 mL. 0: MAM is a Peptide Map based on Mass Spectrometry ... mAbA. 10. 77. 13. Residue. Peptide. Structura

Peptide identiﬁcation using tandem mass spectrometry data · mass spectrometry (MS/MS) is commonly used for peptide identiﬁcation. More precisely, an unknown peptide undergoes

Peptide ion fragmentation in mass spectrometry - UAB 1-15-08.pdfPeptide ion fragmentation in mass spectrometry ... Glutamic acid 129.043 Serine 87.032 ... More haste, less speed?

The identification proteins using mass spectrometry “mass-mapping / peptide mass-fingerprinting” Dr Kevin Mills Institute of Child Health, UCL, London.

Environmental Applications of ESI FT-ICR Mass Spectrometry Oxidized Peptide and Metal Sulfide Clusters Dissertation Defense January 7, 2010 Jeffrey M.

Algorithms in Mass Spectrometry

Tandem Mass Spectrometry - Computer Science · 2012. 2. 7. · Tandem Mass Spectrometry: Peptide identification via database search, de novo sequencing Some slides adapted by Sangtae

Peptide and protein analysis with mass spectrometrydownloads.hindawi.com/journals/jspec/2002/320152.pdf · S.A. Trauger et al. / Peptide and protein analysis with mass spectrometry

Comprehensive peptide quantification for data independent ... · Comprehensive peptide quantification for data independent acquisition mass spectrometry using chromatogram libraries

Improving the Detection Limits for Highly Basic ... notes/technology...neuropeptides by mass spectrometry. Problem: Important neuropeptides include Vasoactive Intestinal Peptide (VIP),

Schedule 1.9.00Introduction 2.9.10Basic concepts of mass spectrometry (MS) and MALDI-TOF MS 3.9.30Protein identification by Peptide Mass Fingerprinting.

Naming Algorithms for Derviatives of Peptide-like Natural ... · Naming algorithms for derivatives of peptide-like natural products Roger Sayle& Noel O’Boyle Nextmove software,

CSE182 CSE182-L12 Mass Spectrometry Peptide identification.

CSE182-L12 Mass Spectrometry Peptide identification CSE182.