Protein sequencing and Mass Spectrometry
-
Upload
merritt-walton -
Category
Documents
-
view
62 -
download
2
description
Transcript of Protein sequencing and Mass Spectrometry
Protein sequencing and Mass Spectrometry
Sample Preparation
Enzymatic Digestion (Trypsin)
+Fractionation
Single Stage MS
MassSpectrometry
LC-MS: 1 MS spectrum / second
Tandem MS
Secondary Fragmentation
Ionized parent peptide
The peptide backbone
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus C-terminus
The peptide backbone breaks to formfragments with characteristic masses.
Ionization
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus C-terminus
The peptide backbone breaks to formfragments with characteristic masses.
Ionized parent peptide
H+
Fragment ion generation
H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus C-terminus
The peptide backbone breaks to formfragments with characteristic masses.
Ionized peptide fragment
H+
Tandem MS for Peptide ID
147K
1166L
260
1020E
389
907D
504
778E
633
663E
762
534L
875
405F
1022
292G
1080
145S
1166
88
y ions
b ions
100
0250 500 750 1000
[M+2H]2+
m/z
% I
nte
nsit
y
Peak Assignment
147K
1166L
260
1020E
389
907D
504
778E
633
663E
762
534L
875
405F
1022
292G
1080
145S
1166
88
y ions
b ions
100
0250 500 750 1000
y2 y3 y4
y5
y6
y7
b3b4 b5 b8 b9
[M+2H]2+
b6 b7 y9
y8
m/z
% I
nte
nsit
y Peak assignment impliesSequence (Residue tag) Reconstruction!
Database Searching for peptide ID
• For every peptide from a database– Generate a hypothetical spectrum– Compute a correlation between observed and
experimental spectra– Choose the best
• Database searching is very powerful and is the de facto standard for MS.– Sequest, Mascot, and many others
Spectra: the real story
• Noise Peaks• Ions, not prefixes & suffixes• Mass to charge ratio, and not mass
– Multiply charged ions
• Isotope patterns, not single peaks
Peptide fragmentation possibilities(ion types)
-HN-CH-CO-NH-CH-CO-NH-
RiCH-R’
ai
bici
xn-iyn-i
zn-i
yn-i-1
bi+1
R”
di+1
vn-i wn-i
i+1
i+1
low energy fragments high energy fragments
Ion types, and offsets
• P = prefix residue mass• S = Suffix residue mass• b-ions = P+1• y-ions = S+19• a-ions = P-27
Mass-Charge ratio
• The X-axis is (M+Z)/Z– Z=1 implies that peak is at M+1– Z=2 implies that peak is at (M+2)/2
• M=1000, Z=2, peak position is at 501
– Suppose you see a peak at 501. Is the mass 500, or is it 1000?
Spectral Graph
• Each prefix residue mass (PRM) corresponds to a node.
• Two nodes are connected by an edge if the mass difference is a residue mass.
• A path in the graph is a de novo interpretation of the spectrum
87 144G
Spectral Graph• Each peak, when assigned to a prefix/suffix ion type generates a
unique prefix residue mass.• Spectral graph:
– Each node u defines a putative prefix residue M(u).– (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0.– Paths in the spectral graph correspond to a interpretation
300100
401
200
0
S G E K
27387 146144 275 332
Re-defining de novo interpretation
• Find a subset of nodes in spectral graph s.t.– 0, M are included– Each peak contributes at most one node (interpretation)(*)– Each adjacent pair (when sorted by mass) is connected by an edge (valid
residue mass)– An appropriate objective function (ex: the number of peaks interpreted) is
maximized
300100
401
200
0
S G E K
27387 146144 275 332
87 144G
Two problems
• Too many nodes.– Only a small fraction are correspond to b/y ions (leading to
true PRMs) (learning problem)– Even if the b/y ions were correctly predicted, each peak
generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem).
– In general, the forbidden pairs problem is NP-hard
300100
401
200
0
S G E K
27387 146144 275 332
However,..
• The b,y ions have a special non-interleaving property
• Consider pairs (b1,y1), (b2,y2)– If (b1 < b2), then y1 > y2
Non-Intersecting Forbidden pairs
300100 4002000
S G E K• If we consider only b,y ions, ‘forbidden’ node pairs are non-
intersecting, • The de novo problem can be solved efficiently using a dynamic
programming technique.
87 332
The forbidden pairs method
• There may be many paths that avoid forbidden pairs.
• We choose a path that maximizes an objective function, – EX: the number of peaks interpreted
The forbidden pairs method
• Sort the PRMs according to increasing mass values.• For each node u, f(u) represents the forbidden pair• Let m(u) denote the mass value of the PRM.
300100 4002000 87 332
u f(u)
D.P. for forbidden pairs
• Consider all pairs u,v– m[u] <= M/2, m[v] >M/2
• Define S(u,v) as the best score of a forbidden pair path from 0->u, v->M
• Is it sufficient to compute S(u,v) for all u,v?
300100 4002000 87 332
u v
D.P. for forbidden pairs
• Note that the best interpretation is given by
max((u,v )E ) S(u,v)
300100 4002000 87 332
u v
D.P. for forbidden pairs
• Note that we have one of two cases.1. Either u < f(v) (and f(u) > v)2. Or, u > f(v) (and f(u) < v)
• Case 1.– Extend u, do not touch f(v)
300100 4002000u f(u)
v
S(u,v) max(u':(u,u' )Eu'f (v ) )
S(u,u') 1
The complete algorithm
for all u /*increasing mass values from 0 to M/2 */for all v /*decreasing mass values from M to M/2 */
if (u > f[v])
else if (u < f[v])
If (u,v)E /*maxI is the score of the best interpretation*/maxI = max {maxI,S[u,v]}
S[u,v] max (w,u)Ewf (v )
S[w,v]1
S[u,v] max (v,w )Ewf (u)
S[u,w]1