Predicting Protein Sequences From Mass Spectral Data

Predicting Protein Sequences From Mass

Spectral Data

Gary Van DomselaarUniversity of Alberta

Canadian Proteomics Initiative

May 18, 2004

Introduction

• Review:– Protein Separation

– Cleavage

– Mass Spectra

– MS and MS/MS

• The Objective: Matching Mass Spectra to Protein Sequences

Introduction

• Strategies:– Maldi & Peptide Mass Fingerprinting

– MS/MS & Fragment Ion Searching

– MS/MS & Sequence Tag Searches

– MS/MS & De Novo Peptide Sequencing

Review: Protein Separation

High Performance Liquid Chromatography2D Gel Electrophoresis

Protein Separation: 2D Gel Electrophoresis

SDSPAGE

Protein Separation: High Performance Liquid

Chromatography (HPLC)

Solvent

Solvent

Mixer Pump

SampleInjector

Column MassSpec

Protein Separation: 1D PAGE LC/MS

Solvent

Solvent

Mixer Pump

Sample Injector

Column

Complex Protein Mixture SDS

PAGE

In-Gel Digestion

ESI-MS

Protein Separation: 2D LC/MS

Solvent

Solvent

Mixer Pump

Sample Injector

SCX RPC

Complex Protein Mixture

In-Solution Digestion

ESI-MS

Review: Cleavage

http://ca.expasy.org/tools/peptidecutter/peptidecutter_enzymes.html

Protease Cleavage Rules

Trypsin XXX[KR]--[!P]XXX

Chymotrypsin XX[FYW]--[!P]XXX

Lys C XXXXXK-- XXXXX

Asp N endo XXXXXD-- XXXXX

CNBr XXXXXM--XXXXX

Missed Cleavages• Proteases are not perfect enzymes

• Protease products are not confined to the predicted products

– Contaminating proteases

– PTMs at the recognition site blocks access

– Unexpected recognition sites:

• Ex: trypsin produces 'ragged termini' when two or more consecutive basic residues are present in the sequence

Missed Cleavages

>Protein 1acedfhsakdfqeasdfpkivtmeeewendadnfekqwfe

Sequence Tryptic Fragments (no missed cleavage)acedfhsak (1007.4251) dfgeasdfpk (1183.5266) ivtmeeewendadnfek (2098.8909) gwfe (609.2667)

Tryptic Fragments (1 missed cleavage)acedfhsak (1007.4251) dfgeasdfpk (1183.5266) ivtmeeewendadnfek 2098.8909) gwfe (609.2667)acedfhsakdfgeasdfpk (2171.9338)ivtmeeewendadnfekgwfe (2689.1398)dfgeasdfpkivtmeeewendadnfek (3263.2997)

Autolysis Peaks

500 1000 1500 2000 2500

698

2098

11991007

609

450

2211 (trp)

1940 (trp)

Review: The Mass Spectrum

Rel

ativ

e In

tens

ity

Mass / Charge (m/z)

Average Mass and Monoisotopic Mass

• Monoisotopic mass is the mass determined using the masses of the most abundant isotopes

• Average mass is the abundance weighted mass of all isotopic components

http://www.matrixscience.com/help/mass_accuracy_help.html

Average Mass and Monoisotopic Mass

http://65.219.84.5/moverz/tutorials/pages/peak.html

Calculating Peptide Masses• Sum the monoisotopic residue masses

• Add mass of H2O (18.01056)

• Add mass of H+ (1.00785 to get M+H)

• If Met is oxidized add 15.99491

• If Cys has acrylamide adduct add 71.0371

• If Cys is iodoacetylated add 58.0071

• Other modifications are listed at– http://prowl.rockefeller.edu/aainfo/deltamassv2.html

• Only consider peptides with masses > 400

Amino Acid Residue Masses

Glycine 57.02147Alanine 71.03712Serine 87.03203Proline 97.05277Valine 99.06842Threonine 101.04768Cysteine 103.00919Isoleucine 113.08407Leucine 113.08407Asparagine 114.04293

Aspartic acid 115.02695Glutamine 128.05858Lysine 128.09497Glutamic acid 129.04264Methionine 131.04049Histidine 137.05891Phenylalanine 147.06842Arginine 156.10112Tyrosine 163.06333Tryptophan 186.07932

Monoisotopic Mass

Amino Acid Residue Masses

Glycine 57.0520Alanine 71.0788Serine 87.0782Proline 97.1167Valine 99.1326Threonine 101.1051Cysteine 103.1448Isoleucine 113.1595Leucine 113.1595Asparagine 114.1039

Aspartic acid 115.0886Glutamine 128.1308Lysine 128.1742Glutamic acid 129.1155Methionine 131.1986Histidine 137.1412Phenylalanine 147.1766Arginine 156.1876Tyrosine 163.1760Tryptophan 186.2133

Average Mass

Review: ESI-MS Spectrum

http://www.astbury.leeds.ac.uk/Facil/MStut/mstutorial.htm

m/z = (MW + nH+)n

Review: ESI-MS/MS

MS1 MS2

Collision Cell

Review: ESI-MS/MS

AEGKLRFK(biotin)

b1 A EGKLRFK(biotin) a

7b

2 AE GKLRFK(biotin) a

6

b3 AEG KLRFK(biotin) a

5

b4 AEGK LRFK(biotin) a

4

b5 AEGKL RFK(biotin) a

3

b6 AEGKLR FK(biotin) a

2

b7 AEGKLRF K(biotin) a

1

http://www.abrf.org/JBT/2000/December00/dec00bibbs.html

Review: MALDI Spectra

http://biop.ox.ac.uk/www/lj2000/endicott/endicott_7.html

• Generates Singly Charged Ions

• High Upper Detection Limit

Matching Spectra and Protein Sequences

Protein Digest

MRNSYRFLASSLSVVVSLLLIPEDVCEKIIGGNEVTPHSRPYMVLLSLDRKTICAGALIAKDWVLTAAHCNLNKRSQVILGAHSITYEEPTKQIMLVKKEFPYPCYDPATREGDLKLLQL

In Silico DigestionProtein Database

LASSLSVVVSLLLIPEDVCEKIIGGNEVTPHSRPYMVLLSLDRTICAGALIAKDWVLTAAHCNLNKRITTTYEEPTKQIMLVKEFPYPCYDPATREGDLKLL

0

20

40

60

80

100

m/z

%T

IC

0

20

40

60

80

100

m/z

%T

IC

Theoretical MS

Experimental MS

?Mass Analysis

Strategies for Matching Mass Spectra with Protein Sequences• Maldi & Peptide Mass Fingerprinting

• MS/MS & Fragment Ion Searching

• MS/MS & Sequence Tag Searches

• MS/MS & De Novo Peptide Sequencing

Peptide Mass Fingerprinting• Used to identify protein spots on gels or

protein peaks from an HPLC run

• Depends of the fact that if a peptide is cut up or fragmented in a known way, the resulting fragments (and resulting masses) are unique enough to identify the protein

• Requires a database of known sequences

• Uses software to compare observed masses with masses calculated from database

Principles of Fingerprinting


>Protein 2acekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe

>Protein 3acedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence Mass (M+H) Tryptic Fragments

4842.05

4842.05

4842.05

acedfhsakdfgeasdfpkivtmeeewendadnfekgwfe

acekdfhsadfgeasdfpkivtmeeewenkdadnfeqwfe

acedfhsadfgekasdfpkivtmeeewendakdnfegwfe

Principles of Fingerprinting



>Protein 3acedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence Mass (M+H) Mass Spectrum

4842.05

4842.05

4842.05

Preparing a Peptide Mass Fingerprint Database

• Take a protein sequence database (Swiss-Prot or nr-GenBank)

• Determine cleavage sites and identify resulting peptides for each protein entry

• Calculate the mass (M+H) for each peptide

• Sort the masses from lowest to highest

• Have a pointer for each calculated mass to each protein accession number in databank

Building A PMF Database

>P12345acedfhsakdfqeasdfpkivtmeeewendadnfekqwfe

>P21234acekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfe

>P89212acedfhsadfqekasdfpkivtmeeewendakdnfeqwfe

Sequence DB Calc. Tryptic Frags Mass Listacedfhsakdfgeasdfpkivtmeeewendadnfekgwfe


acedfhsadfgekasdfpkivtmeeewendakdnfegwfe

450.2017 (P21234) 609.2667 (P12345) 664.3300 (P89212) 1007.4251 (P12345)1114.4416 (P89212)1183.5266 (P12345)1300.5116 (P21234) 1407.6462 (P21234)1526.6211 (P89212)1593.7101 (P89212) 1740.7501 (P21234) 2098.8909 (P12345)

The Simplest Scoring Scheme: Peptide Counting

• Take a mass spectrum of a trypsin-cleaved protein (from gel or HPLC peak)

• Identify as many masses as possible in spectrum (avoid autolysis peaks)

• Compare query masses with database masses and calculate # of matches or matching score (based on length and mass difference)

• Rank proteins by number of hits and return top scoring entry – this is the protein of interest

Query vs. DatabaseQuery Masses Database Mass List Results

450.2017 (P21234) 609.2667 (P12345) 664.3300 (P89212) 1007.4251 (P12345)1114.4416 (P89212)1183.5266 (P12345)1300.5116 (P21234) 1407.6462 (P21234)1526.6211 (P89212)1593.7101 (P89212) 1740.7501 (P21234) 2098.8909 (P12345)

450.2201609.3667698.31001007.53911199.49162098.9909

2 Unknown masses1 hit on P212343 hits on P12345

Conclude the queryprotein is P12345

Peptide Counting• Works well for high quality data

• Gives higher scores to larger proteins

• PeptIdent• http://us.expasy.org/tools/peptident.html

• PepSea• http://pepsea.protana.com/PA_PepSeaForm.html

• MS-Fit• http://prospector.ucsf.edu/ucsfhtml3.2/msfit.htm

MOWSE

• MOlecular Weight SEarch

• Scoring based on peptide frequency distribution from the OWL non redundant Database

BleasbyPappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol. 3:327-332



>Protein 3MASMGTLAFD EYGRPFLIIK DQDRKSRLMG LEALKSHIMA AKAVANTMRT SLGPNGLDKMMVDKDGDVTV TNDGATILSM MDVDHQIAKL MVELSKSQDD EIGDGTTGVV VLAGALLEEAEQLLDRGIHP IRIAD

Sequence Mass (M+H) Tryptic Fragments

4842.05

4842.05

14563.36

acedfhsakdfgeasdfpkivtmeeewendadnfekgwfe


SQDDEIGDGTTGVVVLAGALLEEAEQLLDR2DGDVTVTNDGATILSMMDVD HQIAKMASMGTLAFDEYGRPFLIIK2TSLGPNGLDKLMGLEALKLMVELSKAVANTMRSHIMAAKGIHPIRMMVDKDQDR

MOWSE

>Protein 1acedfhsakdfqeasdfpkivtmeeewendadnfekqwfel

>Protein 2acekdfhsadfqeasdfpkivtmeeewenkdadnfeqwfekqwfei

MOWSE2. For each protein, place fragments into 100 Da bins.

Mol. Wt. Fr agment2098.8909 IVTMEEEWENDADNFEK1183.5266 DFQEASDFPK1007.4251 ACEDFHSAK 722.3508 QWFEL

1740.7500 DFHSADFQEASDFPK1407.6460 IVTMEEEWENK1456.6127 DADNFEQWFEK 722.3508 QWFEI

�� "!#� $�% &�%�$�'�� (��)�� *�� )��+�� *�� %�'�,.-"&�%�'�/0�1&�-�%�'123(��4�� +��5�� 4��6�� 5�� "!#� $�(�7�%�&�%�$�'1�"/�!8'1��9�� 61�� 9��1�� %�'�/:�1&�-�%�'123(�� &�;�� %�'1,�-3&�()�� *�� )��+�� *��4�� +��5�� 4��61�� 5��

/�!8'3� <�7 /�!8'3��

MOWSE3. Divide the number of fragments for each bin by the total number of fragments for each 10 kDa protein interval=�> ? @�A�B�C�D E�?�F G"H�I J�K L1M N1O3PQN3RQS TU�V�V�V�W U1XQV�V Y Z�[�\0]3]3]1^ ]3_�` a�`�_�b1]3c X V1d XQU�eX�f�V�V�W U�V�V�V V V1d V�V�VX�g�V�V�W X�f�V�V V V1d V�V�VX�h V�V�W XQg�V�V `�b1i.j�a�`�b�k�]�a�j�`�b1l3c X V1d XQU�eX�m�V�V�W X�h V�V V V1d V�V�VX�e�V�V�W X�m�V�V V V1d V�V�VXQn�V�V�W X�e�V�V Y Z�[�\0]3]3]1^ ]3_�cpop`�a.`�_�b1]1k�^ b1] U V1d U�e�VX�q�V�V�W XQn�V�V V V1d V�V�VX�U�V�V�W X�q�V�V V V1d V�V�VX�XQV�V�W X�U�V�V `�b�k�]�a.jp`�b�lpc X V1d XQU�eX�V�V�V�W X�XQV�V a�r�]3`�b1i�j�a.c X V1d XQU�ef�V�V�W XQV�V�V V V1d V�V�Vg�V�V�W f�V�V V V1d V�V�Vh V�V�W g�V�V V V1d V�V�Vm�V�V�W h V�V U V1d U�e�Ve�V�V�W m�V�V V V1d V�V�Vn�V�V�W e�V�V V V1d V�V�V

k�^sb1] t�o k�^sb�] Y

MOWSE4. For each 10 kD interval, normalize to the largest bin value=�> ? @�A�B�C�D E�?�F G"H�I J�K L1M N1O3PQN3RQS TU�V�V�V�W U1XQV�V Y Z�[�\0]3]3]1^ ]3_�` a�`�_�b1]3c X V1d XQU�e V�d eX�f�V�V�W U�V�V�V V V1d V�V�V VX�g�V�V�W X�f�V�V V V1d V�V�V VX�h V�V�W XQg�V�V `�b1i.j�a�`�b�k�]�a�j�`�b1l3c X V1d XQU�e V�d eX�m�V�V�W X�h V�V V V1d V�V�V VX�e�V�V�W X�m�V�V V V1d V�V�V VXQn�V�V�W X�e�V�V Y Z�[�\0]3]3]1^ ]3_�cpop`�a.`�_�b1]1k�^ b1] U V1d U�e�V XX�q�V�V�W XQn�V�V V V1d V�V�V VX�U�V�V�W X�q�V�V V V1d V�V�V VX�XQV�V�W X�U�V�V `�b�k�]�a.jp`�b�lpc X V1d XQU�e V�d eX�V�V�V�W X�XQV�V a�r�]3`�b1i�j�a.c X V1d XQU�e V�d ef�V�V�W XQV�V�V V V1d V�V�V Vg�V�V�W f�V�V V V1d V�V�V Vh V�V�W g�V�V V V1d V�V�V Vm�V�V�W h V�V U V1d U�e�V Xe�V�V�W m�V�V V V1d V�V�V Vn�V�V�W e�V�V V V1d V�V�V V

� H3M � J�K � �QN��

k�^sb1] t�o k�^sb�] Y

MOWSE5. Compare spectrum masses against fragment masslist for each protein in the database. Retrieve the frequency score for each match and multiply.

�� G"H�I J1K L1M N1O3PQN3RQS TU�V�V�V�W U�X�V�V � �� !#"$!� �%��#& X V�d XQU�e V1d eXQf�V�V�W U�V�V�V V V�d V�V�V VXQg�V�V�W XQf�V�V V V�d V�V�V VX�h V�V�W XQg�V�V !�%�'�(�"�!�%*)$�*"�(#!�%,+�& X V�d XQU�e V1d eXQm�V�V�W X�h V�V V V�d V�V�V VX�e�V�V�W XQm�V�V V V�d V�V�V VX n�V�V�W X�e�V�V � �� &�-�!#"�!� $%��)��.%�� U V�d U�e�V XX�q�V�V�W X n�V�V V V�d V�V�V VXQU�V�V�W X�q�V�V V V�d V�V�V VX�X�V�V�W XQU�V�V !�%,)$�/"�(/!$%�+*& X V�d XQU�e V1d eXQV�V�V�W X�X�V�V "10$��!�%,'�(�"�& X V�d XQU�e V1d ef�V�V�W XQV�V�V V V�d V�V�V Vg�V�V�W f�V�V V V�d V�V�V Vh V�V�W g�V�V V V�d V�V�V Vm�V�V�W h V�V U V�d U�e�V Xe�V�V�W m�V�V V V�d V�V�V Vn�V�V�W e�V�V V V�d V�V�V V

� H"M ��J3K � �QN��

)2��%��3/- )2��%/�#�

1740.7500 1456.6127 722.3508

0.5 x 1 x 1 = 0.5

MOWSE6. Invert and multiply, and normalize to an 'average' protein of 50 000 k Da:

PN = product of distribution frequency scores

H = 'Hit' Protein MW = 5672.48

50 000 PN x H

Score =

= 0.5 x 1 x 1 = 0.5

50 000 0.5 x 5672.48

= = 17.62

MOWSE4. For each 10 kD interval, normalize

�� !#"$��%&�� "#��'$( �� )�� *�� )�� *�� ( �� +�� $%,�� +�� -� ��%.�� "#��'$( �� / ��!0"��( �� )�� %,�� *�� )�� *��

MOWSE Takes into account relative abundance of peptides in the database when calculating scores. Protein size is compensated for.

The model consists of numerous spaces separated by 100 Da (the average aa mass).

Does not provide a measure of confidence for the prediction.

• MOWSE• http://www.hgmp.mrc.ac.uk/Bioinformatics/Web

app/mowse/

• MS-Fit• http://prospector.ucsf.edu/ucsfhtml3.2/msfit.htm

MOWSE

MASCOT• Probability-based MOWSE

• The probability that the observed match between experimental data and a protein sequence is a random event is approximately calculated for each protein in the sequence database.Probability model details not published.

Perkins DN, Pappin DJC, Creasy DM, and Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551-3567.

Extreme Value Distribution

0

1000

2000

3000

4000

5000

6000

7000

8000

<20 30 40 50 60 70 80 90 100 110 >120

P(x) = 1 - e -e-x

MASCOT

Mascot/Mowse Scoring

• The Mascot Score is given as S = -10*Log(P), where P is the probability that the observed match is a random event

• Try to aim for probabilit ies where P<0.05 (less than a 5% chance the peptide mass match is random).

ProFound

• Uses a bayesian probability model

• Takes individual properties of each protein in the database.

Bayes Theorem

• Describes the probability of some event given that some other event has already occurred (conditional probability).

P(A | B) = P(B | A) P(A)

P(B)

• “The probability of some event A occurring given that event B has occurred is equal to the probability of event B occurring given that event A has occurred, multiplied by the probability of event A occurring and divided by the probability of event B occurring”.

Likelihood

Prior Probability

Posterior Probability

Bayes Theorem• Example:

• 0.1% of women aged 40 have breast cancer.

• For 40 y/o women with cancer, a mammography will show positive 95% of the time.

• For 40 y/o women without cancer, a mammography will show positive 10% of the time.

• A 40 y/o woman tests positive by mammography for breast cancer.

• What is the probability she really does have breast cancer?

Bayes TheoremP(Disease) = 0.001 P(No Disease) = 0.999

P(Positive test | Disease) = 0.95 P(Negative test | Disease) = 0.05

P(Negative test | No disease) = 0.90 P(Positive test | No disease) = 0.10

P(Disease | Positive test) = P(Positive test | Disease) P(Disease)

P(Positive test)

P(Positive test) = P(Disease) P(Positive test | Disease)

+ P(No disease) P(Positive test | No disease) = 0.001x0.95 + .999x0.10 = 0.101

P(Disease | Positive test) = 0.95 x 0.999 = 0.0094 (less than 1%).

0.101

Bayes Theorem

�� The Product Rule:

Example: Draw a card from a deck of playing cards. What is the probability that it is the king of clubs?

A KingB ClubsC Deck

�� ! "��#�%$'&�(*)+� ,.-'/10324/6587�9;:=<>�@?BA4CED�%�F�G� �H�I "��,.-'/10�� 7J9;:=<>� �DKCMLON

P>QSRUT'V�W>X;Y+Z�[]\_^�`ba c�d+egf�hji�P*QSZ�['\k^�`la RUT'V�WnmoV�p1c�d=eqfUh!P*Q#RUT'V�W_`la c�d=eqfUhP>QSRUT'V�W�XoY;Z�[]\_^�`ba c�d+egf6hFiJrgsut�v�tKsuwkx.yzrlsuwkx

Bayes Theorem and PMFP(D|kI) P(k|I)

P(D|I)P(k|DI) =

K The hypothesis “protein k is the protein being analyzed”D The experimental data = mi...mn I Background information

0

20

40

60

80

100

%TI

C

m1m2

m3

m4

m5

m6m7

m8

m9

m10

m12

m11

m13m14

m15

Bayes Theorem and PMFP(D|kI) P(k|I)

P(D|I)P(k|DI) =

K The hypothesis “protein k is the protein being analyzedD The experimental data = mi...mn I The available background information (species,

approximate mass of the parent protein, cleavage enzyme, mass accuracy, etc.)

P(k|DI) The posterior probability that the hypothesis is true given the data D and the background information I.

P(k|I) The prior probability of the hypothesis given the background information I

P(D|kI) The likelihood probability that the data D would beobserved if the hypothesis were true.

P(D|I) A normalization constant, independent of K.

Bayes Theorem and PMF��


•Zero for every hypothesis that doesnt satisfy the background information (protein molecular weight, cleavage enzyme, species, etc.)•Otherwise 2 possibilities:

1. A uniform probability for all hypothesis that satisfy the constraints (all proteins that have the correct MW, cleaved with the correct enzyme), therefore a constant.

2. The prior probability from a previous experiment (ie multiple digestions).

Bayes Theorem and PMF


P(k|I) 0 (does not satisfy constraints)constant (no previous data available)P(k|DprevI) if previous data, Dprev is available

��


P(D|kI) The likelihood probability that the data D would beobserved if the hypothesis were true.

0

20

40

60

80

100

m/z

%T

IC

0

20

40

60

80

100

%TI

C

m1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

2 subsets: hits(H) and misses (M).

The 'Product Rule' P(AB|C) = P(B|AC)P(A|C) can be used to factor the data into the probability for hits and the probability for misses given the hits.

c�i ��

!#"%$'& (�) *,+-!�" .�/,0 12 .31 45/,0 1 4768 & (�) * +�!#" .�/,0 12 & (�) *,!#" .91 4:/70 1 4 6-& (�)%.;/70 1 *

��


�� !��"�� #��!��$�$� ��%�&��'�(�� Likelihood probability for hits

• Factor as products of probabilities for individual hits by applying the product rule:

)+*-,!.�/ 0132 465�7-8:9 ; < .0 )+*-, ; 1%2 465 ,>=�/ ; ? .1@7

A>B-C&D>E F�G�H!A>B�DIE C&F G�AJB�C3E F GExample: 3 hits, r = 3KML N3O�P QRTS UWV�X�Y KZL-N3OR N\[R N^]R_S U_V`X

Set m1 to 'A', m

2m

3 to 'B', kI to 'C'KML N OR N [R N ]R$S U_V`XaY KZL-N [R N ]R_S U_V N O X KZL-N OR_S UWV�X

Now set m2 to 'A', m

3 to 'B', kIm

1 to 'C'KML N [R N ]R_S U_V N OR X�Y KLbN ]R6S U_V N OR N [R X KZL-N [R$S U_V N OR X

KLbN OR N [R N ]R$S U_V�X�Y KZL-N ]R$S U_V N OR N [R X KML N [R_S UWV N OR X KZL-N OR$S U_V`X):*-, .�/ 01 2 465�7-8c9 ; < .

0 ):*-, ; 1 2 465 , =�/ ; ? .1 7 , =1Define as '1'

��

The logical product of 2 hypotheses: 1) the i th hit (H

i) originates from a

particular peptide in the protein k and 2) its measured mass is m

i.

Therefore

dfeg


h#i-j3k�l mnTo p_q`r�stu v km hMi j u n_o pWq jxw�l u y knzr

j un|{+} u j u

~:��%� �$� �&�� -� ~:� �#� �� 6� �!�� ~:� � � � �6� � �� ~:�� 6� �!�� ~:� �� 6� �!�� ~+�� 6� � ��

The product rule can be applied to separate H

i and m

i:

0

20

40

60

80

100

m/z

%T

IC

0

20

40

60

80

100

%T

IC

m1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

mi0

Hi

mi

��T� ¢¡+£�¤¥��T� ¡�£��¦ §� �¨¡�£



~:� �J�� 6� �>�� -� ~��b�#� �� 6� �>�� ~:� �#� � �6� �!�� ~:� �� $� �&�� #� �� ~+� � � � �6� � �� ~:� � � � �6� � ��

0

20

40

60

80

100

m/z%

TIC

0

20

40

60

80

100

%T

IC

m1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

KML�� S U_V N��P � �`OR XThe probability for the i th measured peptide to be a hit, given protein k and i-1 previous hits.

i

i-1

N

N-i-1K#L�� S U_V N ��P � ��OR X�Y ��

��T� ¢¡+£�¤ �� b¡�£$��¦ §��¨¡�£

�� ¢¡�£�¤¢�� ¡�£$� �f � �¨¡|£

~+� � � � � �6� � �� ~c�� $� � �� ~:� �#� � �6� �!�� ~+� � �6� � �� ~+� �#� � �6� �!�� ~:�� 6� � ��


h#i-j3k�l mnTo p_q`r�stu v km hMi j un_o pWq j3w�l u y knzr

0

20

40

60

80

100

m/z

%T

IC

0

20

40

60

80

100

%T

IC

m1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

KML�� U_V N��P � �`OR N�� XThe probability for the measured mass value to be mi given its theoretical mass mi0.

mi0

mi

~:� �J�� 6� �>�� -� ~��b�#� �� 6� �>�� ~:� �#� � �6� �!�� ~:� �� $� �&�� #� �� ~+� � � � �6� � �� ~:� � � � �6� � ��



0

20

40

60

80

100

%T

IC

The probability for the measured mass value to be mi given its theoretical mass mi0. Measured masses are normally distributed:

0

20

40

60

80

100

%T

IC

��

��! " #%$��'&)( +*�,- �. &0/�1 23 �4 57698;:=<?>A@��. CBD�. &)/ E

5 3 E F��

��T� ¢¡+£�¤¥��T� ¡�£��¦ §� �¨¡�£

~:� �J�� 6� �>�� -� ~��b�#� �� 6� �>�� ~:� �#� � �6� �!�� ~:� �� $� �&�� #� �GIH7J+KML N O=P)QSRUT L VXWY[Z H\JCQ]L N O;PUQ9R0T L V�WY Q]L R Z��

��T� ¢¡+£�¤¥��T� ¡�£��¦ §� �¨¡�£Bayes Theorem and PMF

h#i-j3k�l mnTo p_q`r�stu v km hMi j un_o pWq j3w�l u y knzr

0

20

40

60

80

100

%T

IC

If there exist more than one potential theoretical match within the tolerance of the measured mass, the probability for the i th hit is:

0

20

40

60

80

100

%T

IC

��

��

0

20

40

60

80

100

%T

IC

0

20

40

60

80

100

%T

IC

mi j0

gi

mi

'j ' potential matches

^=_ `Dab0c d�e `gfih j klabnm o ^=_ pq r asCt�u j q `vj c d�e `?fAh j k�abwm

x yz|{~}�{ y y� �=�� ji�q r as t��)�� _ `7j {�`7j q f m ��=� j ��


h#i-j3k�l mn o p_q`r�s t u v km hMi j un o pWq j3w�l u y kn r�s tu

v km�� v k�� hZi } u � j u o p_q j w�l u y kn r

x�� j r a� yz|{�}+{ y y� �� j �q r as t��)�� _ `7j {;`|j q f m ��=� j � �

x]yz y�;� �� q r as ��)�� _ `DaC{�`7j q f m �� a� �� yzg{ �� y y� � � �=� �q r as � ��)�� _ ` � {�`�� q f m ��=� � � �

x _ zv{�� m �_ z m � � j r a�� y�~� �=� �q r as t��)�;� � _ `7j {�`vj q f m ��=� j � � �

Probability for hits for all massesProduct of individual hit Probabilities

Modified for multiple possible matches

�� ~��b�#� ! � �6� �!�� @� ~�� !� � �6� �>�� ! �Probability that the i th measured peptide is a hit

Probability that the masses match

��T� ¢¡+£�¤¥��T� ¡�£��¦ §� �¨¡�£


"�# $&%(' )*,+ -/.�021 #35476 08#93 098;:< = %)?>A@BDC E7F�GH = %I � JLKNMPO�Q #9$ < 4R$ < H S 0 TEUB <T VXWr the number of observed hitsN the total number of peptidesm

ithe measured mass of the ith peptide

mi j0

the theoretical mass for the ith hit

the measured mass standard deviationY

Z\[^]`_(acbedgf\Zh[jilkNm no ipnjqrksm njqLtu _(acbed^f\Zh[jivksm no _2acbwdeZh[jipnjqrksm njqet\_2a�bxivkwm no d

Probability for hits for all masses

��T� ¢¡+£�¤¥��T� ¡�£��¦ §� �¨¡�£


Z [ ]`_(acbedgfhZ\[ji ksm no i njqrksm n^qwtu _2acbwd f\Zh[ji ksm no _2a�bedeZh[ji njqrksm njqLt _(acbxi k m no dLikelihood probability for misses given the hits

What are misses?

• The remaining measured masses that cannot be accounted for by the protein sequence (w).

• Errors in protein sequence, unknown modification, unexpected cleavage

• “Modified Peptides”

0

20

40

60

80

100

m/z

%T

IC

0

20

40

60

80

100

%TI

C

m1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

��


Z [ ]`_(acbedgfhZ\[ji ksm no i njqrksm n^qwtu _2acbwd f\Zh[ji ksm no _2a�bedeZh[ji njqrksm njqLt _(acbxi k m no dLikelihood probability for misses given the hits

0

20

40

60

80

100

m/z

%T

IC

0

20

40

60

80

100

%TI

C

m1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

• The total number of peptides in protein k is N

• The number of misses is w

• All misses are 'modified peptides'

• The number of modified peptides is J, which is between w and N-r (ie J includes unobserved modified peptides).

��


Z [ ]`_(acbedgfhZ\[ji ksm no i njqrksm n^qwtu _2acbwd f\Zh[ji ksm no _2a�bedeZh[ji njqrksm njqLt _(acbxi k m no dThe probability for all misses can be factored like this:

� �� ! � �� "� � � !� � �� #� � � ! � �� ! � �� "� $ � ! � � ! � � � �� #� � � ! � ��%�� ! � �� "� $ � ! � �� ! � ��%� � � � � ! � ��&�'� � � ! � �� $ � ! �

� � � � � � �(�� )��

��


Z [ ]`_(acbedgfhZ\[ji ksm no i njqrksm n^qwtu _2acbwd f\Zh[ji ksm no _2a�bedeZh[ji njqrksm njqLt _(acbxi k m no d�*� � � � � �� (� � � � � �+�� (��)�� *� � � �� ! � �� $ � ! � ��(� � � � �#� � � ! � �� "� � � ! � ��'� � � ! � �� $ � ! �

Probability for there being J modified peptides, given protein k and r observed hits

,.-0/21 354 687'9 :;=<�>@?BADC :EFG H"IA2C : ? ADC :G >J?BADC :E

K ADC : LNMOQPSRTUP V RUW)XV T W)X V RSY T W�X P V#Z W�XV)[ YS\]W�X V0Z Y V)[ Y^\]W)W�X

��



The probability for observing a modified peptide, given protein k, J modified peptides and r hits plus j-1 misses being observed already.

0

20

40

60

80

100

%TI

C

j-1

N-r-(j-1)# available peptides

J-(j-1) # remaining unobserved peptidesm1m

2

m3

m4

m5

m6m

7

m8

m9

m10

m12

m11

m13m

14

m15

�� ! #"%$'&( *)+ *"�$,&

��



The likelihood probability for the modified peptide to have a measured mass m

r+ j

0

20

40

60

80

100

%T

IC

mmin

mmax

-/.�021 3�4 5 6879+021�: 1�3�4 ;�<= > 1�3�4�? @ A02BDC E�F#02BDG H j = 1...w

��


Z [ ]`_(acbedgfhZ\[ji ksm no i njqrksm n^qwtu _2acbwd f\Zh[ji ksm no _2a�bedeZh[ji njqrksm njqLt _(acbxi k m no d�*� � � � � �� (� � � � � �+�� (��)�� *� � � �� ! � �� $ � ! � ��(� � � � �#� � � ! � �� "� � � ! � ��'� � � ! � �� $ � ! ��*� � � � � �� (� � � � � �+�� (��)�� ! � �� "��

�� "� �� Probability for all misses

��


Z [ ] � a�bedgf\Zh[ji ksm no i njqNksm njqLtu _(acbedgfhZ\[ i ksm no _2a�bed Zh[ji njqrksm njqet � acb/i ksm no d

!#"�$&%('�)�* %('�+-, .0/$1)�* %243576 8$:9<; =�>?$:9�@ A0B +CEDGF4HJI KLNM OQPR�S DUT V#W RUXD�T R�XZY[ \ HK^]`_a7b c#dfeg \ HhGikjml�npo(q D�F [ V?F [ g r R sc�a [s tvuw-xGy{z |Q}G~ � x �7�k��~ �x ��~ �1� G � <1&�� {��4 � <�v��k� � x � G �-� G 4 � ~ �� G �� G� �� B�C E �� B G H� ¡

¢¤£�¥N¦ §©¨#ª<«Z¢¤£�¥N¦ ¨?ª�¢¤£¬§¦ ¥f¨?ª

w�x y z |Q}G~ �kw®x | z }(~ x �¯�k��~ �x ��~ � � G � <1�° ± �� B C E �� B�G H� G �4 � <� � ��² � x � G �³� G 4 � ~ ��®� G ��´ µ

ProFound (PMF)• Bayesian approach considered to be the most

coherent, consistent and efficient of the statistical methods.

• Scores reflect the confidence level of the hypothesis that protein k is the sample protein based on the given information

• Scores improve with additional information (tag information)

• Can identify simple mixtures of proteins by fusing single proteins pairwise, in groups of three and so on.

ProFound Results

Advantages of PMF

• Uses a “robust” & inexpensive form of MS (MALDI)

• Doesn’t require too much sample optimization

• Can be done by a moderately skilled operator (don’t need to be an MS expert)

• Widely supported by web servers

• Improves as DB’s get larger & instrumentation gets better

• Very amenable to high throughput robotics (up to 500 samples a day)

Limitations With PMF• Requires that the protein of interest already be in

a sequence database

• Not suitable for searching EST databases

• Typically not all predicted peptides are detected

– Poor solubility

– Selective ionization

– Short peptide length

– Post-translational modification

– Unexpected cleavage

– Contamination

• Spurious or missing critical mass peaks always lead to problem.

Limitations With PMF

• Not suitable for identification of proteins in complex mixtures if unseparated mixtures are proteolyzed

• Mass resolution/accuracy is critical, best to have <20 ppm mass resolution.

• Generally found to only be about 40% effective in positively identifying gel spots

MS-MS and Fragment Ion Searching

• Provides precise sequence-specific data

• More informative than PMF

• Can be used for de novo sequencing

• Can be used to identify post-translational modifications.

SEQUEST

• Compares predicted MS-MS spectra against observed daughter ion spectra to identify and rank matches

Yates JR III, Eng JK, McCormack AL, and Sheiltz D (1995) Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67:1426-1436.

SEQUEST

0

20

40

60

80

100

m/z

%TI

C

I—T—T—T—Y—E—E—P—T—K

MRNSYRFLASSLSVVVSLLLIPEDVCEKIIGGNEVTPHSRPYMVLLSLDRKTICAGALIAKDWVLTAAHCNLNKRSQVILGAHSITTTYEEPTKQIMLVKKEFPYPCYDPATREGDLKLL

In Silico Digestion

LASSLSVVVSLLLIPEDVCEKIIGGNEVTPHSRPYMVLLSLDRTICAGALIAKDWVLTAAHCNLNKRITTTYEEPTKQIMLVKEFPYPCYDPATREGDLKLL

0

20

40

60

80

100

m/z

%TI

C

Protein Database

In Silico Fragmentation

SEQUEST

m/z

Rank/Sp Sp

1 / 1 2313.4863 5.7752 2729.8 30 / 462 / 42 2313.3834 0.5288 2.7211 401.1 14 / 38 YDR409W N.LMNDNDDDDDDRLMAEITSN.H3 / 5 2311.5780 0.5544 2.5736 693.0 16 / 36 YLR058C M.TTRGM*GEEDFHRIVQYINK.A4 / 343 2313.8718 0.5605 2.5385 261.3 12 / 38 YMR173 W-A L.PTRRRVLMVPATTIRMVLTT.M 5 / 127 2314.7051 0.5681 2.4942 323.4 13 / 40 YPL168W T.KFSAMEINLITSLVRGYKGEG.K

(M+H)+ deltCn XCor r I ons Reference Peptide

0.0000 YOL086C K.ATDGGAHGVINVSVSEAAIEASTR.Y

• Identify database peptides that match the parent mass

• Keep the 200 most intense peaks from the MS/MS spectrum

• Compare these fragment ions against the theoretical MS/MS spectrum from the database peptide and generate a preliminary score (Sp) based on the number of matching ions (Ions).

• Perform a cross-correlation analysis (Xcorr) on the top 500 preliminary scoring peptides.

• Sort candidate peptide by XCorr.

Interpreting SEQUEST Output

m/z

Rank/Sp Sp




Sp The preliminary score. Based on the number of matching ions. The higher the better. Larger peptides have bigger Sp values. A 20-residue peptide should have an Sp > 1000, a 6 residue peptide should have an Sp > 500.


m/z

Rank/Sp Sp




Rank/Sp The ranked Sp. The first number is the current rank (1,2,3,4,5). The second number is the preliminary ranking.

Be wary of Rank/Sps that move up dramatically (eg 4/343).

Ideally, look for 1/1 for a good hit.


m/z

Rank/Sp Sp

1 / 1 2313.4863 5.7752 2729.8 30 / 462 / 42 2313.3834 0.5288 2.7211 401.1 14 / 38 YDR409W N.LMNDNDDDDDDRLMAEITSN.H3 / 5 2311.5780 0.5544 2.5736 693.0 16 / 36 YLR058C M.TTRGM*GEEDFHRIVQYINK.A4 / 343 2313.8718 0.5605 2.5385 261.3 12 / 38 YMR173W A L.PTRRRVLMVPATTIRMVLTT.M 5 / 127 2314.7051 0.5681 2.4942 323.4 13 / 40 YPL168W T.KFSAMEINLITSLVRGYKGEG.K



DeltCn The delta correlation value. Tells you how different the first hit is from the subsequent hits. Values of DeltCn >0.1 indicate a good top hit.


m/z

Rank/Sp Sp




XCorr . The cross-cor relation value from the search. Used to produce the final ranking. Xcorr > 2.0 are usually good hits. Increases with increasing peptide size. For 20 residue peptide, look for Xcorr > 5.For 6 residue peptide look for Xcorr > 1.5.


m/z

Rank/Sp Sp




Ions. How many of the (top 200 most intense) exper imental ions matched up with theoretical ions.

70% or 80% coverage is good.

SEQUEST Summary

m/z

• Gives a concise overview of a batch of search results without the necessity of having to look at each individual SEQUEST output files.

• Performs protein identification by noting which proteins are most prevalent in a set of SEQUEST output results.

SEQUEST Summary

m/z

MS Spectrum NumberTotal Ion Current (> 5 E+5)

Result File NumberCharge State

Delta Mass (Exp. - Theory)

SEQUEST Summary

m/z

Experimental MassXcorr (>2.0)

DeltCn (>0.2)Sp (Preliminary Score)

Rsp (< 10)

SEQUEST Summary

m/z

Matching Ions (>70%)Accession Number

Database OccurrancesPeptide

SEQUEST Summary

m/zA “Prevalent” Protein

How many times the protein appeared in the SEQUEST output files in the 1st (top scoring) position, 2nd position, ..., down to the 5th position.

Consensus Score =10x8 + 8x1 + 6x0 +4x0 + 2x0 + 1x0= 88

SEQUEST

m/z

• Popular

• Uses heuristics to score results

• Output is complicated, requires user input to assess the validity of a result.

• Confidence cannot be assessed numerically.

PeptideProphet and Protein Prophet

PeptideProphet

• Reads in SEQUEST summary HTML files.

• http://peptideprophet.sourceforge.net/

PeptideProphet

• Validates peptide assignments to MS/MS spectra from SEQUEST (and others).

• Looks at search scores and peptide properties among correct and incorrect peptides:

– Number of termini compatible with enzymatic cleavage (for unconstrained searches)

– Mass differences WRT the precursor ion

• Uses those distributions to compute a probability that it is correct

PeptideProphet • Performed an experiment to identify SEQUEST hits

and misses:

• Prepared a sample of 18 control proteins from various organisms (from bovine, chicken, rabbit, E. coli, S. Cerevisiae, and B. lichenformis).

– Appended the database sequences for the control proteins to a database of Drosophila proteins.

– Searched the modified database with the control protein MS-MS spectra and SEQUEST.

– All identifications from Drosophila are 'misses'.

PeptideProphet • Performed discriminant function analysis to weight the

var ious SEQUEST scores according to their ability to discr iminate hits from misses.F � XCorr ,RankSP, Ions, � Cn, � Mass �=c0 � c1 XCorr � c2 RankSP � c3 Ions � c4 � Cn � c5 � Mass

• Actually used a transformation of the XCorr score to achieve better discr imination, reduce peptide length dependence on XCorrF � XCorr ,RankSP, Ions, � Cn, � Mass �= c0 � c1 XCorr ' � c2 RankSP � c3 Ions � c4 � Cn � c5 � Mass

XCorr ' �ln � XCorr �ln � NL � , if L � Lc

ln � XCorr �ln � NC � , if L � LC

L = # aa, NL = # expected frag. ions

LC = Xcorr independence threshold

NC = Corresponding exp. frag. threshold

PeptideProphet

• Plotted the positive and negative hits as a function of the discriminant score.

P F + � � 1�2 �� e�

� F��

� 22 � 2

P � F � - �� F �� "! 1 e! F !$#%& � T �('��

PeptideProphet • The probability of getting a correct

result, given the discriminant score is calculated using our old friend, the Bayes' Law:

P � + � F �� P � F � + � P � + �P � F � + � P � + �� P � F � - � P � - �

P � + �F �1�

2 �� e�� F ��2

2 � 2 � Total Correct �1�

2 �� e� � F ��2

2 � 2 � Total Correct �� F �� 1e� F �� ! � T �#"$� � Total Incorrect �

PeptideProphet • Adding extra information to improve

the score: Number of tryptic termini (NTT)– The majority of correctly assigned

peptides have 2 tryptic termini:• A.KMCDPTYR.F

– The majority of incorrectly assigned peptides have 0 tryptic termini

• AGMCDPTYHF

• This information can be used to improve the score

PeptideProphet

• Examine the training set data for relationship between predictions and NTT:– Correct: NTT0 = .03, NTT1 = .28, NTT2 = .69

– Incorrect: NTT0 = .80, NTT1 = .19, NTT2 = .01

• Modify the scoring scheme, eg for NTT=2:

P � + � F �1�

2 � e� � F ��2

2 � 2 � Total Correct � � 0.69

1�2 �� e� � F ��

2

2 � 2 � Total Correct � � 0.69 � � F � � � � � 1 e� F � � ! � T � " � � Total Incorrect �� 0.01

PeptideProphet

• PeptideProphet uses an Expectation Maximization Algorithm to adjust the probabilities of correct and incorrect assignments from the training set to real datasets.

PeptideProphet

PeptideProphet

PeptideProphet

PeptideProphet

http://www.proteomecenter.org/course/20040113-Day2.pdf

ProteinProphet

• Reads in ProteinProphet results

• Calculates the probability that the peptides identified from PeptideProphet correspond to identified proteins from a protein database.

• http://proteinprophet.sourceforge.net/

ProteinProphetAssuming each peptide assignment to a spectrum is considered independent evidence for its corresponding protein, the protein probability can be calculated as:

P � 1 ��

i

�1 � maxj p

�+ � Di

j ��

ProteinProphetAdjusting for observed peptide grouping:

Correct peptide assignments tend to correspond to “multihit” proteinsIncorrect peptide assignments tend to correspond to proteins with no other hits.

MRNSYRFLASSLSVVVSLLLIPEDVCEKIIGGNEVTPHSRPYMVLLSLDRKTICAGALIKDWVLTAAHCNLNRSQVILGAHSITTTYEEPTKQIMLVKKEFPYPCYDPATREGDLKLL

MASMGTLAFDEYGRPFLIIKDQDRKSRLMGLEALKSHIMAAKAVANTMRTSLGPNGLDKMMVDKDGDVTVTNDGATILSMMDVDHQIAKLMVELSKSQDD EIGDGTTGVVVLAGALLEE

NSPi � ��m�m � i � P � + � Dm �

IIGGNEVTPHSR = .91TICAGALIK = .65ITTTYEEPTK = .85

NSP(EGDLK) = .91 + .65 + .85 = 2.41

ProteinProphetAdjusting for observed peptide grouping:

NSPi � ��m �m � i � P � + � Dm �

p � + � D ,NSP �� p � + � D � p � NSP � + �p � + � D � p � NSP � + � p � - � D � p � NSP � - �

D number of tryptic termini, database search scores, number of missed cleavages, etc.

p(+ | D) the peptide probability scores from PeptideProphet

ProteinProphet


P(+ | D,NSP) The probability that the peptide assignment is correct, given the Data and # sibling peptides

P(NSP | +) The probability of having a particular NSP value, according to the distribution of correct peptide assignments

P(NSP | -) The probability of having a particular NSP value, according to the distribution of incorrect peptide assignments.

ProteinProphet


To calculate the various NSP-related distributions, the NSP values are made discrete by placing them into bins. The probability that a correctly assigned peptide has an NSP value in bin k is computed by summng over the peptide values in bin k.

0-0.5 0.5-1 1- 1.5 1.5-2 2-2.5

�� "! ��#� $ � ��

N the total number of peptide assignments in bin k

ProteinProphet


� � �� #� � �� "! ��#� $ � � � � � �

N The total number of peptide assignments in bin k

P(+) The prior probability of a correct peptide assignment

�� Computed by summing over all peptides i:

ProteinProphet


� � �� ! � � � $ � ��

The NSP distributions for incorrect assignments is computed analogously.

��

ProteinProphet

p � + � D ,NSP � � p � + � D � p � NSP � + �p � + � D � p � NSP � + � p � - � D � p � NSP � - �

� � �� ! � � � $ � ��

� � �� ! � � � $ � ��

� ��

ProteinProphet

• NSP distributions will change from sample to sample due to data set size, protein sequence database, proteins in the sample set, data quality etc.

• The EM algorithm is used to find p(NSP | +) and p(NSP | -)

ProteinProphet

Inorrect NSP Values Correct NSP Values

ProteinProphet

• Degenerate PeptidesMRNSYRFLASSLSVVVSLLLIPEDVCEKIIGGNEVTPHSRPYMVLLSLDRKTICAGALIKDWVLTAAHCNLNRSQVILGAHSITTTYEEPTKQIMLVKKEFPYPCYDPATREGDLKLLEE

MASMGTLAFDEYGRPFLIIKDQDRKSRLMGLEALKSHIMAAKPYMVLLSLDRKAVANTMRTSLGPNGLDKMMVDKDGDVTVTNDGATILSMMDVDHQIAKLMVELSKSQDDEIGDGTTGV

� Some peptides assigned from MS/MS spectra can be found in more than one protein, thus they are 'degenerate'.

� How does one figure out which is the true corresponding protein?

ProteinProphet

��

Weight the peptides according to the probability of that protein being in the sample

Peptide i corresponds to Ns different

proteins, the relative weight wni that

this peptide actually corresponds to protein n (n= 1... Ns) is determined according to the probability of protein n relative those of all Ns proteins:

ProteinProphet

� ��

The Protein probability function is then modified to account for degeneracy

P � 1 ��

i

�1 � wi

n maxj p�+ � Di NSPi

n ��

ProteinProphetProtein Probability NSP-adjusted peptide prob

Original Probability

# tryptic termini

NSPs# peptides inNSP Bin

Shared peptide weight

Protein Coverage

Predicting Protein Sequences From Mass Spectral Data

Documents

Transcript of Predicting Protein Sequences From Mass Spectral Data