De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu...
-
Upload
ernesto-keaton -
Category
Documents
-
view
215 -
download
1
Transcript of De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu...
![Page 1: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/1.jpg)
De novo glycan structure search with CID MS/MS spectra of
native N-glycopeptides
18.12.2008Hannu Peltoniemi
![Page 2: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/2.jpg)
De novo vs database matching
MS2 spectrum
Unknown glycan
glycandatabase
Database matching
matching
Best scoring glycan(s) in the DB
• Only those structures that are in the DB can be found• OK if comprehensive DB• If glycan not in the DB the result may be closest matching (wrong) structure or no result at all
![Page 3: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/3.jpg)
MS2 spectrum
Unknown glycan
De novo
Best scoring glycans
• No database -> also new structures can be found !• Computational intensive, requires high quality spectra• Typically no definite answer, but a set of high scoring structures.
On the fly structure generation and matching
![Page 4: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/4.jpg)
De novo structure search
Part of the N-glycopeptide workflow:Joenväärä et al., N-Glycoproteomics
- An automated workflow approach., Glycobiology 2008,18(4):339-349.
Input: Protonated, deconvoluted MS2 spectra
Steps:1) identification of peptides 2) identification of N-glycan compositions 3) identification of de novo N-glycan structures (branching, no linkage)
![Page 5: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/5.jpg)
Input data
Spectrum with annotated glycopeptide and glycan composition fragments.
![Page 6: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/6.jpg)
Example data
Peptide: QDQCIYNTTYLNVQRGlycan composition: 6 Hex 5 HexNac 3 NeuAc
![Page 7: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/7.jpg)
Same data, different view:
O O OOOO
OO O
OO O
O O
OO O
O OO
OO O
O O
OOO O
O
O
OO O
OO O
O
Hex
Hex
NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3
6
6
5 5 5 5
0
0
0 0 0 0
composition: 6 Hex 5 HexNac 3 NeuAc
Glycan fragments attached to peptide
Free glycans
HexNAc HexNAc HexNAc HexNAc
![Page 8: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/8.jpg)
The puzzle
• All the measured fragment compositions of a unknown structure with the given total composition are known• Some theoretical fragments may be missing• Some measured fragments may be false
O O OOOO
OO O
OO O
O
What is the structure that explains best the data?
?
![Page 9: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/9.jpg)
Solution
The problem is split to two phases
1)Generation of possible structures: Structures are grown starting from N-glycan core. The population size is limited by removing structures with lowest fit with peptide+glycan fragments
2) Scoring: The set of structures are scored with full data. The final glycopeptide score is set to sum of peptide and glycan structure scores.
![Page 10: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/10.jpg)
measured
theoretical
Initialization
The missfit (cost) between theoretical structure and measured data is defined as the number of not matching theoretical and measured fragments.
Example data: peptide + 5 Hex 4 HexNAc
![Page 11: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/11.jpg)
Growing structuresStart (core)
End (final composition)
add unit
add unit
add unit
add unit
If population grows too large structures with highest cost are removed.
![Page 12: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/12.jpg)
Scoring
...
Score is calculated as –log10(P), where P is the probability (binomial) that a random set of fragments would match as well or better as the ranked structure. The final glycopeptide score is sum of peptide and structure scores.
highest scoring
lowest scoring
![Page 13: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/13.jpg)
Options
• All glycosidig bonds can be broken• Unlimited number of cuts
Assumptions
• Monosaccharide names• Number of possible connections with each monosaccharide• Accepted connections between monosaccharides• Start structures (N-glycan cores)• Max population size when growing structures
![Page 14: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/14.jpg)
Testing with in silico generated data
structure theoretical spectrum
fragmentation
randomly removing and adding noise fragments
x x xxxx
xxx
xxxx
xxxx
xx
xxx
xxx
xx x
x x
xxx
xxxxx
xxxxxx
xxxxx
xxxxx
xxxx
xx x
xxx
xxxx
xxxx
xxxx
xxx
xxx
xxx
xxx
xx x x x
NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3
Hex
Hex
HexNAc HexNAc HexNAc HexNAc
peptide+glycan
glycan
x x xxxx
xx
x
xx
x
x
x x
xx
xxxx x
xx
xx
x
xxx
x
xxx
xx x
x xxx
x
x
x
x
x
x
xx
xxxx
x
xx x
input to the de novo algoritm
randomized spectrum
![Page 15: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/15.jpg)
no noise2 noise fragm ents4 noise fragm ents
20 30 40 50 60 70 80
02
04
06
08
01
00
Correct structure w ith rank 3
Removed reducing end fragments (%)
Re
sults
ma
tch
ing
th
e c
rite
ria (
%)
20 30 40 50 60 70 80
02
04
06
08
01
00
Correct structure w ith rank 1
Removed reducing end fragments (%)
Re
sults
ma
tch
ing
th
e c
rite
ria (
%)
Percentage of runs (% )
(20,40) (40,60) (60,80) (80,100) (20,40) (40,60) (60,80) (80,100)
Removed reducing, non reducing end fragments (% )
Removed reducing, non reducing end fragments (% )
Results of the in silico tests
If about ½ of the theoretical fragments present => The correct structure is among the few highest scoring ones.
Each mark is a result of a 100 runs.
![Page 16: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/16.jpg)
Testing with serum sample
• Very complex wet lab data set, i.e. a human serum specimen• Removal the high abundance proteins prior to LC-MS/MS • 80 spectra with identified peptide and glycan compositions• 62 spectra with putative structures• Mostly typical structures• Mostly small structures, large ones seems to be hard to catch
![Page 17: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/17.jpg)
NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3
Hex
Hex
HexNAc HexNAc HexNAc HexNAc
Reducing end fragm ents(attached to peptide).
Non reducing end fragm ents(free glycans).
0
0
6
6
0 0 0 05 5 5 5X : theoretical O : m easured
x x xxxx
xxx
xxxx
xxxx
O O OOOO
OO O
OO O
O
xx
xxx
xxxO
OO O
O OO
xx x
x
OO O
O xO
xxx
xxxxx
xxxxxx
xxxxx
xxxxx
xxxx
OOO O
O
O
xx x
xxx
xxxx
xxxx
xxxx
xxx
OO O
OO x
xx
xxx
xxx
xx
O
Ox x x
G lyca n is a tta che d to pe ptideQ D Q C IY N T T Y L N V Q R (A lpha -1 -a c id g lyco pro te in 1 ).
S e rum , m /z=1194.93, z=4
T hree best sco ring s truc tu res.
73 .2 72 .8 72 .6S co re
M e a s ure d a nd the o re tica l fra gm e nts fo r the be s t s co ring s truc tu re .
Example serum spectrum
![Page 18: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/18.jpg)
ANT3(224,187), FIBG(78), THRB(121), A1AG1(56), FETUA(156), HPT(241), HRG(344), FIBB(394), TRFE(630), IGHA1(144), A1AT(70,107,271), { VINEX(102), HPTR(126) }
FIBG(78), HRG(344), IGHA1(144) VTNC(169)
IGHG1(180), IGHG2(176) IGHA1(144) A1AG1(93)
IGHG2(176) IGHA1(144) CO2(621), CO3(85)
IGHG2(176) IGHA1(144) CO3(85)
Structures found from the serum sample
![Page 19: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f9493183d/html5/thumbnails/19.jpg)
Conclusions
• De novo glycan structure identification of intact glycopeptides is possible
• High quality spectra is necessary
• Typically no definite answer but a few structures matching equally well => biological insight still needed if one identified structure needs to be picked