Protein sequencing and Mass Spectrometry

26
Protein sequencing and Mass Spectrometry

description

Protein sequencing and Mass Spectrometry. Enzymatic Digestion (Trypsin) +. Fractionation. Sample Preparation. Single Stage MS. Mass Spectrometry. LC-MS: 1 MS spectrum / second. Tandem MS. Secondary Fragmentation. Ionized parent peptide. The peptide backbone. - PowerPoint PPT Presentation

Transcript of Protein sequencing and Mass Spectrometry

Page 1: Protein sequencing and Mass Spectrometry

Protein sequencing and Mass Spectrometry

Page 2: Protein sequencing and Mass Spectrometry

Sample Preparation

Enzymatic Digestion (Trypsin)

+Fractionation

Page 3: Protein sequencing and Mass Spectrometry

Single Stage MS

MassSpectrometry

LC-MS: 1 MS spectrum / second

Page 4: Protein sequencing and Mass Spectrometry

Tandem MS

Secondary Fragmentation

Ionized parent peptide

Page 5: Protein sequencing and Mass Spectrometry

The peptide backbone

H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH

Ri-1 Ri Ri+1

AA residuei-1 AA residuei AA residuei+1

N-terminus C-terminus

The peptide backbone breaks to formfragments with characteristic masses.

Page 6: Protein sequencing and Mass Spectrometry

Ionization

H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH

Ri-1 Ri Ri+1

AA residuei-1 AA residuei AA residuei+1

N-terminus C-terminus

The peptide backbone breaks to formfragments with characteristic masses.

Ionized parent peptide

H+

Page 7: Protein sequencing and Mass Spectrometry

Fragment ion generation

H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH

Ri-1 Ri Ri+1

AA residuei-1 AA residuei AA residuei+1

N-terminus C-terminus

The peptide backbone breaks to formfragments with characteristic masses.

Ionized peptide fragment

H+

Page 8: Protein sequencing and Mass Spectrometry

Tandem MS for Peptide ID

147K

1166L

260

1020E

389

907D

504

778E

633

663E

762

534L

875

405F

1022

292G

1080

145S

1166

88

y ions

b ions

100

0250 500 750 1000

[M+2H]2+

m/z

% I

nte

nsit

y

Page 9: Protein sequencing and Mass Spectrometry

Peak Assignment

147K

1166L

260

1020E

389

907D

504

778E

633

663E

762

534L

875

405F

1022

292G

1080

145S

1166

88

y ions

b ions

100

0250 500 750 1000

y2 y3 y4

y5

y6

y7

b3b4 b5 b8 b9

[M+2H]2+

b6 b7 y9

y8

m/z

% I

nte

nsit

y Peak assignment impliesSequence (Residue tag) Reconstruction!

Page 10: Protein sequencing and Mass Spectrometry

Database Searching for peptide ID

• For every peptide from a database– Generate a hypothetical spectrum– Compute a correlation between observed and

experimental spectra– Choose the best

• Database searching is very powerful and is the de facto standard for MS.– Sequest, Mascot, and many others

Page 11: Protein sequencing and Mass Spectrometry

Spectra: the real story

• Noise Peaks• Ions, not prefixes & suffixes• Mass to charge ratio, and not mass

– Multiply charged ions

• Isotope patterns, not single peaks

Page 12: Protein sequencing and Mass Spectrometry

Peptide fragmentation possibilities(ion types)

-HN-CH-CO-NH-CH-CO-NH-

RiCH-R’

ai

bici

xn-iyn-i

zn-i

yn-i-1

bi+1

R”

di+1

vn-i wn-i

i+1

i+1

low energy fragments high energy fragments

Page 13: Protein sequencing and Mass Spectrometry

Ion types, and offsets

• P = prefix residue mass• S = Suffix residue mass• b-ions = P+1• y-ions = S+19• a-ions = P-27

Page 14: Protein sequencing and Mass Spectrometry

Mass-Charge ratio

• The X-axis is (M+Z)/Z– Z=1 implies that peak is at M+1– Z=2 implies that peak is at (M+2)/2

• M=1000, Z=2, peak position is at 501

– Suppose you see a peak at 501. Is the mass 500, or is it 1000?

Page 15: Protein sequencing and Mass Spectrometry

Spectral Graph

• Each prefix residue mass (PRM) corresponds to a node.

• Two nodes are connected by an edge if the mass difference is a residue mass.

• A path in the graph is a de novo interpretation of the spectrum

87 144G

Page 16: Protein sequencing and Mass Spectrometry

Spectral Graph• Each peak, when assigned to a prefix/suffix ion type generates a

unique prefix residue mass.• Spectral graph:

– Each node u defines a putative prefix residue M(u).– (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0.– Paths in the spectral graph correspond to a interpretation

300100

401

200

0

S G E K

27387 146144 275 332

Page 17: Protein sequencing and Mass Spectrometry

Re-defining de novo interpretation

• Find a subset of nodes in spectral graph s.t.– 0, M are included– Each peak contributes at most one node (interpretation)(*)– Each adjacent pair (when sorted by mass) is connected by an edge (valid

residue mass)– An appropriate objective function (ex: the number of peaks interpreted) is

maximized

300100

401

200

0

S G E K

27387 146144 275 332

87 144G

Page 18: Protein sequencing and Mass Spectrometry

Two problems

• Too many nodes.– Only a small fraction are correspond to b/y ions (leading to

true PRMs) (learning problem)– Even if the b/y ions were correctly predicted, each peak

generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem).

– In general, the forbidden pairs problem is NP-hard

300100

401

200

0

S G E K

27387 146144 275 332

Page 19: Protein sequencing and Mass Spectrometry

However,..

• The b,y ions have a special non-interleaving property

• Consider pairs (b1,y1), (b2,y2)– If (b1 < b2), then y1 > y2

Page 20: Protein sequencing and Mass Spectrometry

Non-Intersecting Forbidden pairs

300100 4002000

S G E K• If we consider only b,y ions, ‘forbidden’ node pairs are non-

intersecting, • The de novo problem can be solved efficiently using a dynamic

programming technique.

87 332

Page 21: Protein sequencing and Mass Spectrometry

The forbidden pairs method

• There may be many paths that avoid forbidden pairs.

• We choose a path that maximizes an objective function, – EX: the number of peaks interpreted

Page 22: Protein sequencing and Mass Spectrometry

The forbidden pairs method

• Sort the PRMs according to increasing mass values.• For each node u, f(u) represents the forbidden pair• Let m(u) denote the mass value of the PRM.

300100 4002000 87 332

u f(u)

Page 23: Protein sequencing and Mass Spectrometry

D.P. for forbidden pairs

• Consider all pairs u,v– m[u] <= M/2, m[v] >M/2

• Define S(u,v) as the best score of a forbidden pair path from 0->u, v->M

• Is it sufficient to compute S(u,v) for all u,v?

300100 4002000 87 332

u v

Page 24: Protein sequencing and Mass Spectrometry

D.P. for forbidden pairs

• Note that the best interpretation is given by

max((u,v )E ) S(u,v)

300100 4002000 87 332

u v

Page 25: Protein sequencing and Mass Spectrometry

D.P. for forbidden pairs

• Note that we have one of two cases.1. Either u < f(v) (and f(u) > v)2. Or, u > f(v) (and f(u) < v)

• Case 1.– Extend u, do not touch f(v)

300100 4002000u f(u)

v

S(u,v) max(u':(u,u' )Eu'f (v ) )

S(u,u') 1

Page 26: Protein sequencing and Mass Spectrometry

The complete algorithm

for all u /*increasing mass values from 0 to M/2 */for all v /*decreasing mass values from M to M/2 */

if (u > f[v])

else if (u < f[v])

If (u,v)E /*maxI is the score of the best interpretation*/maxI = max {maxI,S[u,v]}

S[u,v] max (w,u)Ewf (v )

S[w,v]1

S[u,v] max (v,w )Ewf (u)

S[u,w]1