Peptide Mass Fingerprinting Manimalha Balasubramani Genomics and Proteomics Core Laboratories.
-
Upload
rebecca-wilkins -
Category
Documents
-
view
215 -
download
0
Transcript of Peptide Mass Fingerprinting Manimalha Balasubramani Genomics and Proteomics Core Laboratories.
Peptide Mass Fingerprinting
Manimalha Balasubramani
Genomics and Proteomics Core Laboratories
Genomics and Proteomics Core Lab websitewww.genetics.pitt.edu
GPCL Inventory
ABI Voyager DE PRO, user operated
ABI 4700 Proteomics Analyzer
Thermoelectron LCQ Deca with Surveyor HPLC
ABI Qstar Elite with Ultimate 3000 HPLC
Bruker micrOTOF with Ultimate 3000 HPLC
Bruker 12 Tesla FTMS with Ultimate 3000 HPLC
4700 Proteomics Analyzer, ABIVoyager DE PRO, ABI
micrOTOF, Bruker
LCQ Deca XP, Thermofisher
12T FT MS, BrukerQstar Elite, ABI
Peptide mass fingerprinting (PMF) is a technique for protein and
peptide identification
Outline
• PMF Workflow:– Sample preparation – Mass spectra: MS, and MS/MS– Database searches
• Examples, hands-on exercises
• Contaminants, post-translational modifications, enzyme digestions
• Evaluating PMF analysis
PMF: Sample preparation
Gel separation – 1D or 2D
ExciseSpot
Trypsin Digest
Protein Peptides
Peptide fingerprint
Mass Spectra are acquired with..
MALDI TOF MS (Voyager DE PRO, ABI)
MALDI TOF/TOF MS (4700 Proteomics Analyzer, ABI)
MALDI – Matrix Assisted Laser Desorption Ionization
TOF – Time Of Flight
MS – Mass Spectrometry
Mass Spectrum: MS
699.0 1159.2 1619.4 2079.6 2539.8 3000.0
Mass (m/z)
1.6E+4
0
10
20
30
40
50
60
70
80
90
100
% In
tens
ity
4700 Reflector Spec #1 MC=>TR[BP = 1479.9, 15779]
1479
.882
4
1439
.896
7
1567
.827
6
1163
.700
0
2045
.127
3
927.
5582
1881
.022
3
1724
.927
2
1305
.788
8
1730
.772
3
1399
.775
1
1249
.695
4
1895
.038
6
1283
.788
1
1433
.807
4
1554
.743
7
1640
.027
7
841.
5205
2555
.290
3
1763
.782
0
1687
.869
1
2262
.055
7
1516
.713
5
1014
.682
7
1590
.861
9
1081
.547
9
1121
.552
0
2458
.305
2
1195
.624
3
789.
5378
898.
5428
2493
.350
1
Mass to charge ratio (m/z)
Inte
nsity
FWHM
Full width at half maxima of a peak
Source: wiki
Resolution and mass accuracy
R = M
ΔmR = resolutionM = mass of the peak of
interest Δ m = width in daltons of the
peak
Δm measured at 50% peak
height is the Full Width at Half
Maxima (FWHM)
Ubiquitin ESI Spectra on 12T FT-ICR
Mass Error > 0.56 ppm
Ubiquitin ESI Spectra on 12T FT-ICR
Mass Error < 0.56 ppm
Ubiquitin ESI Spectra 12T FT-ICR
Resolution > 175,000
Mass accuracy is measured as parts per million value
ppm = 106Δm = 106 M R
Peptide Mass Fingerprint
699.0 1159.2 1619.4 2079.6 2539.8 3000.0
Mass (m/z)
1.6E+4
0
10
20
30
40
50
60
70
80
90
100
% In
tens
ity
4700 Reflector Spec #1 MC=>TR[BP = 1479.9, 15779]
1479
.8824
1439
.8967
1567
.8276
1163
.7000
2045
.1273
927.5
582
1881
.0223
1724
.9272
1305
.7888
1730
.7723
1399
.7751
1249
.6954
1895
.0386
1283
.7881
1433
.8074
1554
.7437
1640
.0277
841.5
205
2555
.2903
1763
.7820
1687
.8691
2262
.0557
1516
.7135
1014
.6827
1590
.8619
1081
.5479
1121
.5520
2458
.3052
1195
.6243
789.5
378
898.5
428
2493
.3501
Mass spectrum processing, calibration
• External calibration
• Internal calibration – trypsin autodigestion peaks– Keratin peaks– Spiking with an internal standard
Peak List
• Spectrum viewer
• Compiled from the mass spectra – Mass list– Mass list and intensity
• Peak list is submitted for Database searching
DatabaseEg. Protein databases -Non-redundant NCBI,Swiss-Prot,IPI, etc.
Peak ListsIn silico digest
820.7842.51012.61296.61555.7
……...
Algorithm compares peak
lists
Gel separation – 1D or 2D
ExciseSpot
Trypsin Digest
Protein Peptides699.0 1159.2 1619.4 2079.6 2539.8 3000.0
Mass (m /z)
1.6E +4
0
10
20
30
40
50
60
70
80
90
100
% In
tens
ity
4700 R eflector Spec #1 MC =>TR [B P = 1479.9, 15779]
1479
.882
4
1439
.896
7
1567
.827
6
1163
.700
0
2045
.127
3
927.
5582
1881
.022
3
1724
.927
2
1305
.788
8
1730
.772
3
1399
.775
1
1249
.695
4
1895
.038
6
1283
.788
1
1433
.807
4
1554
.743
7
1640
.027
7
841.
5205
2555
.290
3
1763
.782
0
1687
.869
1
2262
.055
7
1516
.713
5
1014
.682
7
1590
.861
9
1081
.547
9
1121
.552
0
2458
.305
2
1195
.624
3
789.
5378
898.
5428
2493
.350
1
Mass spectrum (MS)
Peak List820.7842.51012.61296.61555.7
……...
Reports Protein Identification
Database searching
Description of database searching using Mascot program
- At GPCL, 4800 Proteomics analyzer data is presented to the Mascot webserver through ProteinPilot
- Mascot can be accessed through the web
- http://www.matrixscience.com
Mascot scoring
A frequency factor matrix, F, is created, in which each row represents an interval of 100 Da in peptide mass, and each column an interval of 10 kDa in intact protein mass. As each sequence entry is processed, the appropriate matrix elements fi,j are incremented so as to accumulate statistics on the size distribution of peptide masses as a function of protein mass. The elements of F are then normalised by dividing the elements of each 10 kDa column by the largest value in that column to give the Mowse factor matrix M:
After searching the experimental mass values against a calculated peptide mass database, the score for each entry is calculated according to:
Where MProt is the molecular weight of the entry and the product term is calculated from the Mowse factor elements for each match between the experimental data and peptide masses calculated from the entry.
Source: http://www.matrixscience.com/
PMF search page
Parameters used in database searching
• Database searched• Taxonomy• Enzyme • Missed cleavages• Fixed versus variable modifications
(PTMs)• MW and pI• Mass tolerance
Oxidation of methionine in proteins and peptides
+16 Da +32 Da
From Ionsource.com
S-carboxymethylation of the amino acid residue cysteine with thealkylating agent iodoacetic acid
Or s-carbamidomethylation with iodoacetamide (+57 da)
+ 58 Da
From Ionsource.com
Databases: NCBI
nr.*tar.gz
non-redundant protein sequence database withentries from GenPept, Swissprot, PIR, PDF, PDB,and NCBI RefSeq
Swiss-Prot, IPI, others
Submit a peak list to Mascot1075.5130621086.5811771090.5472411092.5178221100.6302491103.5727541106.5532231107.5296631118.4987791119.5195311121.5096441129.6044921141.5723881156.5867921166.5372311170.6074221172.6121831179.5903321194.6041261217.5677491232.6104741252.5837401308.6542971312.7058111314.7443851337.6724851401.6512451424.745728
1427.8305661435.7188721475.7626951479.7103271493.7341311502.7747801530.8347171575.8509521607.8070071629.8684081639.9354251752.8638921753.9046631754.9151611791.7445071792.8050541794.8208011816.8013921875.9761961902.0061041940.9416501960.0533451962.9289552211.1186522225.1303712233.1052252249.076660
http://matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=PMF
Mascot PMF report
Hands-on exercise
• Go to Desktop– open txt file
• copy and paste in Mascot search page– Specify search parameters
» Allow 100ppm error for PMFal_100.txt» Allow 25ppm error for PMFgd_25.txt
Not all peaks are matched –why?
• Theoretical peptide list– peptides lengths vs. MS range– Enzyme – missed/non-specific cleavage– Incorrect ORF– Amino acid substitutions– Ion suppression/efficiency
Matched Peptides
• Experimental peptide list– Contaminants
• Trypsin autolysis peptides• Hair, skin keratins• Matrix molecules, clusters• Unknown contaminants
– Modifications• PTM’s – known and unknown, biological origin• Oxidized methionines, – gel induced artifacts• Chemical – cysteine carbamidomethylation, sample handling
introduced• Adducts• Amino acid substitutions• Splice variant
Not all peaks are matched –why?
Database search takes into account contaminants, modifications, For eg.
Evaluating PMF analysis
• Acceptable hit– High score– Major peaks accounted for
• No hit– Insufficient data – low intensity MS– Single gel band contains >2-3 proteins– Protein not represented in database – ORF/genome
• Further analysis– MS/MS confirmation of few major peaks, unaccounted peaks –
Ideal– Low score, good spectrum – LC MS/MS– Low score, low intensity spectrum – concentrate sample,
reacquire– High score, some unaccounted peaks – MS/MS
MS/MS
• Plot of m/z versus intensity
• At GPCL,– MALDI TOF/TOF MS – ESI QqTOF MS– ESI IT MS– MALDI/ESI FT ICR MS
Tandem MS
4700 Proteomics Analyzer, Applied Biosystems
800 1180 1560 1940 2320 2700
Mass (m/z)
3840.4
0
10
20
30
40
50
60
70
80
90
100
% Int
ensit
y
4700 Reflector Spec #1 MC=>TR[BP = 1570.7, 3840]
1570
.6766
904.4
686
1296
.6848
2465
.1987
2093
.0872
1552
.6698
1829
.9774
MS
MS, followed by precursor ion selection
Fragment ion spectrum
Tandem MS
69.0 386.8 704.6 1022.4 1340.2 1658.0
Mass (m/z)
3105.9
0
10
20
30
40
50
60
70
80
90
100
% Int
ensit
y
4700 MS/MS Precursor 1570.7 Spec #1 MC[BP = 175.1, 3106]
175.1326
1056.51071554.7853
1571.9679684.3845
1556.5172
112.0977 1558.4042813.4371
246.1672 333.21051559.94171441.7213
480.27491039.4810316.1747 1570.2634741.3559463.2531 627.3450 942.4836120.0979 1040.9976 1551.70021268.54271171.5131400.2173229.1560 758.3326490.3423 629.3128 910.867972.1029 1445.2834837.0470
Tandem mass spectrum
http://qbab.aber.ac.uk
Database Searching•Peptide Mass Fingerprinting•Sequence tag approach
De novo sequencinginspect raw data
http://qbab.aber.ac.uk
Tandem mass spectra (MS/MS) can be used for peptide sequencing
Mascot Search Results Search title : SampleSetID: 362, AnalysisID: 567, MaldiWellID: 15790, SpectrumID: 17225, Path=\Mani\102004\New Analysis 1 Database : NCBInr 20040606 (1846720 sequences; 611532004 residues) Timestamp : 20 Oct 2004 at 14:52:50 GMT Top Score : 681 for gi|180570, creatine kinase [Homo sapiens]
Probability Based Mowse Score
Score is -10*Log(P), where P is the probability that the observed match is a random event. Protein scores greater than 75 are significant (p<0.05).
Accession Mass Score Description 1. gi|180570 42591 681 creatine kinase [Homo sapiens] 2. gi|21536286 42617 681 brain creatine kinase; creatine kinase-B [Homo sapiens] 3. gi|33304149 42730 681 creatine kinase, brain [synthetic construct] 4. gi|125292 42674 568 CREATINE KINASE, B CHAIN (B-CK) [Cannis familiaris] 5. gi|180572 42658 538 creatine kinase-B 6. gi|125295 42636 514 CREATINE KINASE, B CHAIN (B-CK) 7. gi|180555 42460 507 creatine kinase-B 8. gi|203476 40598 473 creatine kinase-B 9. gi|31542401 42685 471 creatine kinase, brain [Rattus norvegicus] 10. gi|203474 42699 471 creatine kinase 11. gi|40807002 44540 469 Unknown (protein for IMAGE:5598839) [Rattus norvegicus] 12. gi|47477783 44782 469 Ckb protein [Rattus norvegicus] 13. gi|13096153 42551 441 Chain A, Crystal Structure Of Bovine Retinal Creatine Kinase 14. gi|12852054 42700 427 unnamed protein product [Mus musculus] 15. gi|10946574 42686 427 creatine kinase, brain [Mus musculus] 16. gi|47213348 42953 237 unnamed protein product [Tetraodon nigroviridis] 17. gi|627264 40353 236 creatine kinase (EC 2.7.3.2) isozyme IV - African clawed frog 18. gi|27503418 42214 235 Ckb-prov protein [Xenopus laevis] 19. gi|45384340 42844 209 B-creatine kinase [Gallus gallus] 20. gi|6573489 42713 201 Chain A, Crystal Structure Of Chicken Brain-Type Creatine Kinase
Top hits from Mascot Search – there are multiple accession numbers for the same
protein
Search returns a cluster of proteins with the same matching peptides1. gi|180570 Mass: 42591 Score: 681 creatine kinase [Homo sapiens] Observed Mr(expt) Mr(calc) Delta Start End Miss Ions Peptide 1232.62 1231.61 1231.61 0.00 87 - 96 0 45 DLFDPIIEDR 1232.62 1231.61 1231.61 0.00 87 - 96 0 ---- DLFDPIIEDR 1254.57 1253.56 1253.58 -0.02 97 - 107 0 ---- HGGYKPSDEHK 1303.70 1302.70 1302.72 -0.02 33 - 43 0 ---- VLTPELYAELR 1303.70 1302.70 1302.72 -0.02 33 - 43 0 54 VLTPELYAELR 1458.70 1457.69 1457.67 0.02 139 - 151 1 ---- GFCLPPHCSRGER 1586.81 1585.80 1585.83 -0.03 157 - 172 0 81 LAVEALSSLDGDLAGR 1586.81 1585.80 1585.83 -0.03 157 - 172 0 ---- LAVEALSSLDGDLAGR 1656.79 1655.79 1655.82 -0.03 367 - 381 0 ---- LEQGQAIDDLMPAQK 1657.80 1656.79 1656.83 -0.04 224 - 236 0 47 TFLVWVNEEDHLR 1657.80 1656.79 1656.83 -0.04 224 - 236 0 ---- TFLVWVNEEDHLR 1848.94 1847.93 1847.97 -0.04 342 - 358 0 ---- LGFSEVELVQMVVDGVK 1864.93 1863.92 1863.97 -0.04 342 - 358 0 ---- LGFSEVELVQMVVDGVK 1964.88 1963.88 1963.92 -0.05 321 - 341 0 ---- GTGGVDTAAVGGVFDVSNADR 1964.88 1963.88 1963.92 -0.05 321 - 341 0 139 GTGGVDTAAVGGVFDVSNADR 2120.98 2119.97 2120.02 -0.05 320 - 341 1 ---- RGTGGVDTAAVGGVFDVSNADR 2120.98 2119.97 2120.02 -0.05 320 - 341 1 27 RGTGGVDTAAVGGVFDVSNADR 2169.91 2168.91 2168.96 -0.05 14 - 32 0 ---- FPAEDEFPDLSAHNNHMAK 2225.06 2224.05 2224.17 -0.12 157 - 177 1 ---- LAVEALSSLDGDLAGRYYALK 2439.08 2438.07 2438.14 -0.07 12 - 32 1 31 LRFPAEDEFPDLSAHNNHMAK 2439.08 2438.07 2438.14 -0.07 12 - 32 1 ---- LRFPAEDEFPDLSAHNNHMAK 2518.10 2517.09 2517.16 -0.07 108 - 130 0 92 TDLNPDNLQGGDDLDPNYVLSSR 2518.10 2517.09 2517.16 -0.07 108 - 130 0 ---- TDLNPDNLQGGDDLDPNYVLSSR 3753.61 3752.60 3752.73 -0.13 97 - 130 1 ---- HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR 3753.61 3752.60 3752.73 -0.13 97 - 130 1 55 HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR
4. gi|125292 Mass: 42674 Score: 568 CREATINE KINASE, B CHAIN (B-CK) Observed Mr(expt) Mr(calc) Delta Start End Miss Ions Peptide 1254.57 1253.56 1253.58 -0.02 97 - 107 0 ---- HGGYKPSDEHK 1303.70 1302.70 1302.72 -0.02 33 - 43 0 ---- VLTPELYAELR 1303.70 1302.70 1302.72 -0.02 33 - 43 0 54 VLTPELYAELR 1458.70 1457.69 1457.67 0.02 139 - 151 1 ---- GFCLPPHCSRGER 1586.81 1585.80 1585.83 -0.03 157 - 172 0 81 LAVEALSSLDGDLAGR 1586.81 1585.80 1585.83 -0.03 157 - 172 0 ---- LAVEALSSLDGDLAGR 1624.76 1623.75 1623.85 -0.10 367 - 381 0 ---- LEQGQAIDDLVPAQK 1848.94 1847.93 1847.97 -0.04 342 - 358 0 ---- LGFSEVELVQMVVDGVK 1864.93 1863.92 1863.97 -0.04 342 - 358 0 ---- LGFSEVELVQMVVDGVK 1964.88 1963.88 1963.92 -0.05 321 - 341 0 ---- GTGGVDTAAVGGVFDVSNADR 1964.88 1963.88 1963.92 -0.05 321 - 341 0 139 GTGGVDTAAVGGVFDVSNADR 2120.98 2119.97 2120.02 -0.05 320 - 341 1 ---- RGTGGVDTAAVGGVFDVSNADR 2120.98 2119.97 2120.02 -0.05 320 - 341 1 27 RGTGGVDTAAVGGVFDVSNADR 2169.91 2168.91 2168.96 -0.05 14 - 32 0 ---- FPAEDEFPDLSAHNNHMAK 2225.06 2224.05 2224.17 -0.12 157 - 177 1 ---- LAVEALSSLDGDLAGRYYALK 2439.08 2438.07 2438.14 -0.07 12 - 32 1 31 LRFPAEDEFPDLSAHNNHMAK 2439.08 2438.07 2438.14 -0.07 12 - 32 1 ---- LRFPAEDEFPDLSAHNNHMAK 2518.10 2517.09 2517.16 -0.07 108 - 130 0 92 TDLNPDNLQGGDDLDPNYVLSSR 2518.10 2517.09 2517.16 -0.07 108 - 130 0 ---- TDLNPDNLQGGDDLDPNYVLSSR 3753.61 3752.60 3752.73 -0.13 97 - 130 1 ---- HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR 3753.61 3752.60 3752.73 -0.13 97 - 130 1 55 HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR
Nominal mass (Mr): 42591; Calculated pI value: 5.34Observed Mass & pI: 43kd, 6.2-6.27
Creatine kinase - B [Homo sapiens]Match to: gi|21536286 ; Score: 681
Sequence Coverage: 46%
1 MPFSNSHNAL KLRFPAEDEF PDLSAHNNHM AKVLTPELYA ELRAKSTPSG
51 FTLDDVIQTG VDNPGHPYIM TVGCVAGDEE SYEVFKDLFD PIIEDRHGGY
101 KPSDEHKTDL NPDNLQGGDD LDPNYVLSSR VRTGRSIRGF CLPPHCSRGE
151 RRAIEKLAVE ALSSLDGDLA GRYYALKSMT EAEQQQLIDD HFLFDKPVSP
201 LLSASGMARD WPDARGIWHN DNKTFLVWVN EEDHLRVISM QKGGNMKEVF
251 TRFCTGLTQI ETLFKSKDYE FMWNPHLGYI LTCPSNLGTG LRAGVHIKLP
301 NLGKHEKFSE VLKRLRLQKR GTGGVDTAAV GGVFDVSNAD RLGFSEVELV
351 QMVVDGVKLL IEMEQRLEQG QAIDDLMPAQ K
Creatine kinase B is the highest scoring protein
GPCL resources for Bioinformatic analysis
• Mascot version 2.1.0, Matrix Science Ltd– Mascot Daemon
• ProteinPilot software 2.0, Applied Biosystems/MDS Sciex– Paragon algorithm– And Mascot algorithm
• Sequest, Thermoelectron
Selected list
Resources
http://www.hsls.pitt.edu/guides/genetics/obrc/proteomics
2nd Dimension – SDS PAGE
1st Dimension - Isoelectric focussing
Spot picking
Trypsin gel digest
..its high-throughput…
Sample separation..
HPLC
1D or 2D LC MALDIIn-solutionIsoelectricfocussing
GPCL services..
• Fee for service model
• Support investigators– Scientific expertise– Technical expertise– Grant submission
Genomics and Proteomics Core Laboratories
Paul Wood Billy W. DayDirector Scientific Director
Janette LambAssistant Director
Proteomics Lab
Chris Bolcato John CardamoneEmanuel M Schreiber Guy UeichiJames Porter
Robert WolfeJason Sun
A mass spectrum
• Plot of m/z versus intensity
• MALDI TOF (/TOF) MS
• ESI TOF MS
• ESI QqTOF MS
• ESI IT MS
• MALDI/ESI FT ICR MS
Mass analyzers – several designs
Aebersold and Mann, Nature review, 422, p198, 2003
QqTOF MS/MS
9%
19% 7%
34%
5%
4%22%
Mascot
Each search engine identifies about the same number of spectra,
Each search engine identifies about the same number of spectra,
But the overlap is surprisingly small.
Different search engines match different spectra.
But the overlap is surprisingly small.
Different search engines match different spectra.
Each search engine scores differently
SEQUEST
X!tandem
Courtesy: Proteome Software Inc.
James Lyons-Weiler Scientific Director Bioinformatics Analysis Core (412) 393-2087 (office) (412) 728-8743 (cell) Fax: 412-648-1891