Post on 18-Dec-2015
Schedule
1. 9.00 Introduction
2. 9.10 Basic concepts of mass spectrometry (MS) and MALDI-TOF MS
3. 9.30 Protein identification by Peptide Mass Fingerprinting (PMF) and db search; MASCOT score!
4. 9.50 Workflow example 1 : 2D-PAGE – MALDI-TOF
5. 10.00 Coffe break
6. 10.15 Electrospray MS, tandem MS, peptide fragmentation
7. 10.45 De novo sequencing exercise
8. 11.20 Discussion of exercise results
9. 11.30 LC-MS/MS, protein identification with MS/MS data
10. 11.45 Database search with MS/MS data
11. 12.15 Lunch break
12. 13.00 Continuation of database search with MS/MS data
13. 13.45 Discussion of exercise results
14. 14.15 Shotgun proteomics (fractionation ,complexity)
15. 14.45 Data interpretation and validation (homologous proteins, statistical evaluation)
16. 15.00 Example : typical experiment results : format and interpretation
17. 15.30 Advanced proteomics : PTMs and quantitation
18. 16.00 Test
19. 16.30 End
Red : exercises
Why do proteomics ?
• Proteins are the main effectors of all cellular functions
• Proteins have a complex life that cannot be studied by DNA- or RNA-based techniques
– Regulation of translation– Maturation, processing– Sorting, targeting, secretion– PTM (Post Translational Modifications)– Protein-protein interactions– Degradation– …
[ RNA ] [ protein ]?
Challenges of in vivo proteomics
• Complexity : – 35’000 genes (?) in H. sapiens, – 15’000 expressed in a single cell ? – but 50’000-150’000 chemically different protein species ?
• Dynamic range :– 105 x or 106 x between low and high-abundance proteins
• Plasticity : – continuous variation in protein expression pattern, PTM’s, degradation,
…
• Proteins…– have vastly different physico-chemical properties
• (acidic, basic, hydrophilic, hydrophobic, …) – cannot be amplified ….
www.plasmaproteome.org
Tools needed
1) Separate
AB d yx z W 1 a2 P n
electrophoresis (1D, 2D), Isoelectric focusing liquid chromatography, affinity, …
2) Analyse
mass spectrometry
Mass Spectrometry in proteomics
Mass spectrometry : essential functions
SAMPLEION
GENERATIONION
SEPARATIONION
DETECTION
ION SOURCE MASS ANALYZER DETECTOR
ESI : Electrospray Ionisation
MALDI : Matrix Assisted Laser Disorption/Ionization
QuadrupolesIon trapsTime-of-flight with reflectronTOF/TOFFT-ICR (Fourier transform –Ion Cyclotron Resonance)
Faraday cup
Scintillation counter
Electromultiplier
High-energy dynodes with
electronmultiplier
Array (detector)
FT-MS
Diagram of a mass spectrometer
inletinlet instrumentinstrumentcontrolcontrolsystemsystem
data storagedata storage
vacuum systemvacuum system
ionisationionisationsourcesource
massmassanalyzeranalyzer detectordetector
Masses and mass measurements
•All mass spectrometers function by measuring molecules in their ionized state
•All values determined by MS are relative to the m/z assumed by the molecule after the ionization process
The relationship between the molecular mass (m) and the m/z value can be calculated as follows:
m/z = (m + (mA * z )) / z
mA is the mass of the adduct responsible for ionization (typically H+ for positive MS mode).
Amino acid residue masses and related theoretical immonium ions
One letter Code
Three-letter code
Monoisotopic mass
Immonium
A Ala: 71.03711 44.0495
C Cys 103.00919 76.0215
D Asp 115.02694 88.0393
E Glu 129.04259 102.0550
F Phe 147.06841 120.0808
G Gly 57.02146 30.0338
H His 137.05891 110.0713
I Ile 113.08406 86.0964
K Lys 128.09496 101.1073
L Leu 113.08406 86.0964
M Met 131.04049 104.0528
N Asn 114.04293 87.0553
P Pro 97.05276 70.0651
Q Gln 128.05858 101.0709
R Arg 156.10111 129.1135
S Ser 87.03203 60.0444
T Thr 101.04768 74.0600
V Val 99.06841 72.0808
W Trp 186.07931 159.0917
Y Tyr 163.06333 136.0757
H
R
NH
C
O
C OHH
1 17
n
mresiduesn
+ +m =
-> m/z (mono) of VATVSLPR 1+ : ( 841.502 + 1.008 ) /1 = 842.510
-> m/z (mono) of LGE…AAK 3+ : ( 2210.097 + (3 x 1.008 )) /3 = 737.707
Sequence Composition MW(mono) MW(ave) m/z (mono) m/z(ave)------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------VATVSLPR C37.H68.N11.O11 841.5022 841.999 842.510 843.014
LGEHNIDVLEGNEQFINAAK C96.H152.N27.O33 2210.097 2211.397 2211.105 2211.105
Isotope Atomic mass % Abundance Element Mr(mono.) Mr(avg.)1H 1.0078 99.985 H 1.0078 1.00802H 2.0141 0.015-----------------------------------------------------------------------------------------------------------------------------------12C 12.0000 98.93 C 12.0000 12.010713C 13.0034 1.07------------------------------------------------------------------------------------------------------------------------------------35Cl 34.9689 75.78 Cl 34.9689 35.452537Cl 24.22 24.22
m/z = (m + (mA * z )) / z
Mass distribution and mass measurements
Theoretical MS : the effect of resolution
850848846844842
100
90
80
70
60
50
40
30
20
10
0
Calculated with resolution 1000
Calculated with resolution 10000
Theoretical distribution
The average mass corresponds to the “top of the peak” of a measurement done at low resolution
Avg.Mono.
839.0 840.6 842.2 843.8 845.4 847.0
Mass (m/z)
00
10
20
30
40
50
60
70
80
90
100
842.502(all 12C)
843.4851x 13C
844.4802x 13C
% I
nte
ns
ity
Average
Monoisotopic
Experimental : 1+ ion (MALDI)
Example : Peptide VATVSLPR
1+ : ( 841.502 + 1.008 ) /1 = 842.510
MALDI (Matrix Assisted Laser Desoprtion Ionisation)
MALDI TOF (Time Of Flight)
MALDI IONISATIONMALDI IONISATION
• Great for Peptide Mass Fingerprinting– Fast– Easy to measure – Sensitive– Salt-tolerant (to some extent)– Also good for larger MW (small proteins)– Sample on a stable support (no time constraints)– 1+ ions simpler data analysis
•High accuracy needs careful calibration•Difficult (but possible) to do MS/MS by MALDI•Signal suppression in complex mixtures•Crystallisation conditions influence results
disadvantages
MALDI TOFMALDI TOF
• Protein identification by Peptide Mass Fingerprinting (PMF)
MS-based protein identification : concept
Protein sample
Protein fragments (5-30 AA peptides)
Exact masses of peptides
Fragmentation (MS/MS)spectrum of each peptide
Protein sequence(s)
Protein fragment sequences (same protease specificity)
Calculated exact masses of peptides
Calculated fragmentation spectrum of each peptide
Specific proteasee.g. trypsin
MS
software
software
software
Best Match(es)
experimental In silico
Software
(MASCOT, SEQUEST,…)
Protease digestion(Trypsin)
Peptideextraction
Theoretical MS data
Sequence database
Best matchingSequence(s)
MS
m/z
m/zm/zm/z
Experimental MS data
MS-based protein identification : concept
MALDI-TOF of a tryptic digest of a protein
+TOF MS: 50 MCA scans from Sample 1 (BSA Digest 100 fmol) of BSA Digest 100 fmol MS ...a=3.56217430068478150e-004, t0=3.64725878201043440e+001, Thresholded
Max. 1305.0 counts.
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800m/z, amu
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
Inte
ns
ity, c
ou
nts
927.59
847.59
1440.001479.98 1567.94
1640.161022.56869.07
1305.871163.77 1481.98
1296.861249.77
1050.55871.07 1024.56 1417.931283.91789.53 1142.86 1443.011073.03 1595.951386.76 1824.09857.141108.71 1292.95978.60
1790.101501.84 1616.92
+TOF MS: 50 MCA scans from Sample 1 (BSA Digest 100 fmol) of BSA Digest 100 fmol MS ...a=3.56217430068478150e-004, t0=3.64725878201043440e+001, Thresholded
Max. 1305.0 counts.
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800m/z, amu
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
Inte
nsity, co
un
ts
927.59
847.59
1440.001479.98 1567.94
1640.161022.56869.07
1305.871163.77 1481.98
1296.861249.77
1050.55871.07 1024.56 1417.931283.91789.53 1142.86 1443.011073.03 1595.951386.76 1824.09857.141108.71 1292.95978.60
1790.101501.84 1616.92
Extracted peak list
m/z
847.5896869.0722922.5712923.5815927.59041022.55511050.55331163.76951164.75311193.73931249.77051250.81031296.85561297.84991305.86681416.89291440.00081479.97731482.95831567.94171640.16351824.06…
Information contained in MS spectrum
Search form…
MALDI-TOF of a tryptic digest of BSA
+TOF MS: 50 MCA scans from Sample 1 (BSA Digest 100 fmol) of BSA Digest 100 fmol MS ...a=3.56217430068478150e-004, t0=3.64725878201043440e+001, Thresholded
Max. 1305.0 counts.
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800m/z, amu
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
Inte
ns
ity, c
ou
nts
927.59
847.59
1440.001479.98 1567.94
1640.161022.56869.07
1305.871163.77 1481.98
1296.861249.77
1050.55871.07 1024.56 1417.931283.91789.53 1142.86 1443.011073.03 1595.951386.76 1824.09857.141108.71 1292.95978.60
1790.101501.84 1616.92
YLYEIAR
LSQKFPK LVNELTEFAK
FKDLGEEHFK
HPEYAVSVLLR
HLVDEPQNLIK
LGEYGFQNALIVR
KVPQVSTPTLVEVSR
DAFLGSFLYEYSR
??
?
? ??
PMF example
Db taxonomy error tol score #peps threshold 2nd hit
------------------------------------------------------------------------------------------------------------------------------------------
SP all 50 ppm 108 22 67 30
SP all 500 ppm 65 23 67 44
SP A.thaliana 500 ppm 65 23 50 28
SP A.thaliana 50 ppm 108 22 50 26
SP A.thaliana 30 ppm 110 22 50 20
1121.5841121.5841277.6981277.6981407.6721415.7381415.7381419.7791547.8781547.8781622.8021622.8021658.9411658.9411745.8961745.8961815.0381815.0381877.9321893.921970.21970.2854.487856.497864.463870.5151045.5641106.0611106.5631126.5711143.5701203.6711384.7881437.7091442.7671454.8531464.6991542.6551607.8271660.9541671.8401802.9491940.9712000.0592211.1062224.1112225.1152231.2102233.1042239.1362249.0642283.1802283.1802299.177
Search Parameters :
Type of search : Peptide Mass Fingerprint ----- Database : SwissProt 53.1
Enzyme : Trypsin
Fixed modifications : Carbamidomethyl (C) ----- Variable modifications : Oxidation (M)
Mass values : Monoisotopic - - - Protein Mass : Unrestricted
Peptide Mass Tolerance : ± 30 ppm
Peptide Charge State : 1+
Max Missed Cleavages : 1
Threshold = f (database size) ; score = f ( #matches, « uniqueness »)
Peak list
Factors affecting performance of PMF
• Sample purity : PMF cannot deal with mixtures of more than 2 proteins. Therefore this method is best coupled to high resolution protein purification techniques such as 2D-PAGE
• Specificity of the digestion : high purity trypsin (cleaves after K / R ) is the most commonly used enzyme
• Employing the highest mass accuracy available in the measurement of peptide masses. Less than 20 ppm error can be routinely achieved with modern instruments. Such a degree of precision is essential for identifying proteins from species with a large genome.
WORKFLOWS 1 : WORKFLOWS 1 : „classical“„classical“ 2D-PAGE 2D-PAGE
++ MALDI TOF MALDI TOF
11 22
3A3A 3B 3B 445
667 788
9910101111 1212
1
14
8
15
Workflow 1 : adaptation of bacteria to growth conditionsWorkflow 1 : adaptation of bacteria to growth conditions
Normal medium Low Glucose
Wick LM, et al, Environ Microbiol 3: 588-599, 2001
E.Coli adapts to a very low glucose medium by up- and downregulating a set of 15 proteins
M1
M2
M3
M4
M5
M6
M7
M8
Peptide Mass FingerprintingSearch with mass list
M1......M8
Tryptic digestion
Nr mw* pI* Acc. N. Name Function
1 52.2 5.07 P25553 aldA central metabolism
2 60.3 6.21 P23847 dppA peptide transport
3 47.5 5.06 P05313 aceA Central metabolism
4 48.45 6.71 P10904 ugpB transport
5 55.2 6.02 P00822 atpA proton transport/energy
6 ? ? P76108 ydcS transport
7 41.3 7.03 P02917 livJ amino acid transport
8 40.7 5.22 P02928 malE sugar transport
9 33.36 5..25 P02927 mglB sugar transport
10 61.5 5.81 P37192 gatY catabolism
11 30.9 7.76 P02925 rbsB sugar transport
12 25.8 5.22 P09551 argT amino acid transport
Workflow 1 : adaptation of bacteria to growth conditionsWorkflow 1 : adaptation of bacteria to growth conditions
Metabolic enzymes and transport proteins affected
Utilisation of alternative sources of energy
P. Kebarle, M. Peschke / Analytica Chimica Acta 406 (2000) 11–35
• molecules compete for ionisatione.g. Na+ >> Peptide
ELECTROSPRAY IONISATIONELECTROSPRAY IONISATION
• Great for MS/MS– Can be directly coupled to reversed phase LC
(separation !)– Sensitive– Excellent for MS/MS due to 2+/3+ ions
•Sample introduction more complex•Data analysis more difficult (2+/3+ ions) •one-shot sample analysis (time constraints)•Very low tolerance to contaminants
disadvantages
ELECTROSPRAY IONISATIONELECTROSPRAY IONISATION
1+ versus 2+ ions
1160 1162 1164 1166 1168 1170 1172 1174m/z, amu
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
84
Inten
sity, cou
nts
1163.76
1164.76
1165.74
1166.781162.70
m=1.0 Da
1.0 Da
1.0 Da
1+
580.0 581.0 582.0 583.0 584.0 585.0 586.0m/z, amu
0
20
40
60
80
100
120
140
160
180
200
Inte
ns
ity, co
un
ts
582.28
582.77
583.27
m=0.5 Da
0.5 Da
0.5 Da
2+
Modes of measurement : MS & MS/MS
•Ion production (ionisation)
•Ion separation
•Ion detection
•Ion production (ionisation)
•Ion separation – isolation of “parent” ion
•Ion fragmentation (CID)
•Ion separation – separate fragment ions
•Ion detection - measure fragment ions
MS
Tandem MSMS/MS
• CID= collision induced dissociation
• Low energy (<100 eV) vs high energy collisions (>> 100 eV)
• Precursor ion = parent ion : the one being fragmented
• Daughter ions = fragment ions produced by CID
• Tandem mass spectrometry = MS/MS•- here : the combination of ion selection / CID / fragment analysis
• ESI of tryptic peptides typically generates doubly charged ions due to the presence of Lys or Arg at the C terminal end of the peptides
• y- and b- ion series fragments are usually observed in MS/MS fragmentation spectra.
MS/MS Glossary and facts
Covalent bonds being broken ion series
Covalent bonds being broken ion series
Method for partial interpretation of MS/MS spectra
• Determine charge state of parent ion, calculate MH+ mass,
• Clean up spectrum if necessary
• Inspect high mass end for b and y ions and residue mass loss:
Last b ion is MH+ - 18 - residue mass;
first y ion is MH+ - residue mass,
• Start at high mass end above precursor and look for descending ion series by sequentially
subtracting residue masses
OR
• Start at y1 ion (147 or 175 if tryptic) and look for ascending ion series
• If ambiguities arise:
- check if any of the peaks of interest is a 2+ ion,
- has a complementary peak that has already been used (thus belongs to another series),
- is a -18 or -28 of another major peak.
• Check if any of the residues in the extracted tag is confirmed by immonium ions,
• Be careful about dipeptide masses that match residue masses, for example G+G=N
Question :what are the main problems with this approach ?
ESI & Quadrupole-based instruments
Triple Quadrupole
Triple Quadrupole-Ion trap
Quadrupole-QuadrupoleTOF
Ion Trap (3D trap)
All these instruments can perform MS/MS fragmentation experiments
3D ion trap (Paul trap) functions :
- Trap all ions / scan MS (full scan)
- Trap all ions / isolate precursor / scan zoom scan
- Trap all ions / isolate precursor / fragment / scan fragments MS/MS scan
- Trap all ions / isolate precursor / fragment / … repeat n times …/ scan fragments MSn scan
Example 1 : electrospray ion trap MS
REFLECTRON
SKIMMERSAMPLING CONE
ELECTROSPRAYNEEDLE
HEXAPOLE GAS COLLISION CELL
DETECTORPUSHERTOF
RF HEXAPOLERF HEXAPOLE
Example 2 : Quadrupole Time of flight Mass Spectrometer (QTof)
electrospray source
MS-based protein identification : concept
Protein sample
Protein fragments (5-30 AA peptides)
Exact masses of peptides
Fragmentation (MS/MS)spectrum of each peptide
Protein sequence(s)
Protein fragment sequences (same protease specificity)
Calculated exact masses of peptides
Calculated fragmentation spectrum of each peptide
Specific proteasee.g. trypsin
MS
software
software
software
Best Match(es)
experimental In silico
Experimental set-up : nanoLC-MS/MS
database correlation
C18 ColumnL = 10 cmID = 50-100 µm
Mass spectrometer
T-splitter
~5%
HPLCpumps
~95%further analysis
(waste)sample
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Time, min
Inten
sity, cps
1
2
Data collection Cycle :
a) full scan (MS)
b) detect peaks , choose ions to fragment (charge states, exclusion list)
c) isolate ion, fragment, collect spectrum
d) go back to a)
450 500 550 600 650 700 750 800
m/z, amu
476.2484
554.2839 634.6555
751.8398
514.2206
Full scan
100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900m/z, amu
290.15
175.1259
86.1037741.21
636.25129.07
765.30476.22405.18
272.07310.13 541.17
619.22 872.31
MS/MS of (1)150 200 250 300 350 400 450 500 550 600 650 700 800 850 900 950 1000 1050 1100 1150 1200
m/z, amu
265.1288247.11
275.11
404.16
538.21136.0824293.1229 796.31
667.27
378.19
909.38201.0968
1010.43558.23
422.1668187.1512 284.1341 475.19 1081.48
MS/MS of (2)
An on-line LC-ESI-MS experiment with automatic data acquisitionAn on-line LC-ESI-MS experiment with automatic data acquisition
For automatic identification with MS/MS data, raw data files are submitted for searching the database. An export software parses the LC-MS file and identifies all MS/MS spectra. Each spectrum is then extracted, together with associate information (precursor ion mass, elution time,etc). Individual spectra are then treated to remove noise and, importantly, all fragment peaks are centroided and deisotoped. It results, for each spectrum, a flat text file that consists of header information followed by a mass / intensity list. We show below a spectrum reduced in such a format for the MASCOT software. An input file for a database search can contain hundreds or thousands of these spectra .
MASCOT input data – example (one spectrum) ; Mass / intensity list
SEARCH=MISREPTYPE=PeptideBEGIN IONSPEPMASS=488.475147958288CHARGE=3+TITLE=Elution from: 22.8 to 22.82 period: 0 experiment: 1 cycles: 2
74.0597 2.584.0392 1.5119.1633 28.5129.0928 3143.8366 5.5147.1138 4171.1069 7227.0473 2233.6612 7262.0674 11.5………..…END IONS
Data export
+TOF Product (653.3)
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300m/z, amu
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
13.0
14.0
15.0
16.0
17.0
18.0
19.019.7
Intensity, counts
251.12
110.06
223.13
138.05653.36
332.21
594.2586.09
350.24 712.461055.49
373.30 566.32465.16 956.41 1168.56166.05 841.53
206.16
a1
b1
I
a2
b2
y10
Ib3
y9
b4 y8a5
b5
y7
y6
b8
y3
Iy2
y1
Residue Immonium a b y-----------------------------------------------------------------------------------H, His 110.07 110.07 138.06 1305.71L, Leu 86.09 223.15 251.15 1168.65V, Val 72.08 322.22 350.21 1055.57D, Asp 88.03 437.25 465.24 956.50E, Glu 102.05 566.29 594.28 841.47P, Pro 70.06 663.34 691.34 712.43Q, Gln 101.07 791.40 819.39 615.38N, Asn 87.05 905.44 933.44 487.32L, Leu 86.09 1018.53 1046.52 373.28I, Ile 86.09 1131.6157 1159.61 260.19K, Lys 101.10 1259.7106 1287.70 147.11
MH22+
precursor
Matching of MS / MS data
Black : predictedRed : predicted and detected
Mascot search outputMascot search output
Mascot Search Results
Significant hits: ALBU_BOVIN (P02769) Serum albumin precursor (Allergen Bos d 6). ALBU_CANFA (P49822) Serum albumin precursor (Allergen Can f 3). VWF_PIG (Q28833) Von Willebrand factor precursor (vWF) (Fragment). CIQ3_BOVIN (P58126) Voltage-gated potassium channel protein KQT-like 3 RYR2_RABIT (P30957) Ryanodine receptor 2 (Cardiac muscle-type ryanodin K2CA_BOVIN (P04263) Keratin, type II cytoskeletal 68 kDa, component IA ALFB_RABIT (P79226) Fructose-bisphosphate aldolase B (EC 4.1.2.13) (Li
ALBU_BOVIN Mass: 71244 Total score: 711 Peptides matched: 12 (P02769) Serum albumin precursor (Allergen Bos d 6). Observed Mr(expt) Mr(calc) Delta Miss Score Rank Peptide
15 461.80 921.58 921.48 0.10 0 55 1 AEFVEVTK 19 501.80 1001.58 1001.58 0.01 0 31 1 LVVSTQTALA 31 569.80 1137.58 1137.49 0.09 0 72 1 CCTESLVNR 33 582.30 1162.58 1162.62 -0.04 0 76 1 LVNELTEFAK 47 653.40 1304.78 1304.71 0.08 0 90 1 HLVDEPQNLIK 58 722.40 1442.78 1442.63 0.15 0 87 1 YICDNQDTISSK 60 739.80 1477.58 1477.52 0.07 0 43 1 ETYGDMADCCEK 61 740.40 1478.78 1478.79 -0.00 0 61 1 LGEYGFQNALIVR 64 751.90 1501.78 1501.61 0.18 0 37 1 EYEATLEECCAK 74 547.30 1638.88 1638.93 -0.05 1 85 1 KVPQVSTPTLVEVSR 97 627.70 1880.08 1879.91 0.16 0 34 1 RPCFSALTPDETYVPK 98 636.70 1907.08 1906.91 0.16 0 42 1 LFTFHADICTLPDTEK
Orthogonal datasets and confidence levels
Database : 100’000 sequences 500 spectra
Probability of one (any) spectrum “accidentally” matching a sequence (wrong match) : 1/100’000 x 500 = 5.10-3 (0.005)
Probability of 2 spectra “accidentally” matching the same sequence (wrong match) : 5.10-3 x 5.10-3 = 2.5 x 10-5
Much higher confidence of identification with at least two peptides matching the same protein sequence
Every peptide is unambiguously assigned to its “parent “ sequence, therefore many proteins can be identified in one sample during one run
PMF :
one MS spectrum one dataset (peak list)
MS/MS :
n MS/MS spectra n orthogonal datasets
m/z
time
Chromatographic Separation (reversed-phase)
Tandem mass spectra of 50-2000 peptides
Q7Z5Y2 Mass: 118789 Total score: 178 Peptides matched: 6 Rho-interacting protein 3. Mr(calc) Score Peptide 930.48 42 EGLTVQER 1032.54 11 NWIQTIMK 1206.63 29 FSLCILTPEK1369.75 24 LSTHELTSLLEK 1406.77 55 FFILYEHGLLR 1775.88 16 QVPIAPVHLSSEDGGDR
ProteasedigestionPeptideextraction
Nano-HPLC
MS/MS
Summary : Typical Analytical Workflow
Database matchesDHX9_HUMAN ATP-dependent RNA helicase ANFM_HUMAN Neurofilament triplet M proteinQ9BQG0 Hypothetical proteinMYO6_HUMAN Myosin VI. TP2A_PIG DNA topoisomerase II, alpha isozymeQ7Z5Y2 Rho-interacting protein 3. FLIH_HUMAN Flightless-I protein homolog. TP2B_MOUSE DNA topoisomerase II, beta isozymeS3B1_HUMAN Splicing factor 3B subunit Q8VCW5 Similar to alpha internexin neuronal Q8CHF9 MKIAA0376 protein (Fragment).
Protein sequence database
Database searching Software (MASCOT)Output :
•Protein identification in simple/complex mixtures
•Extensive sequence coverage and peptide mapping
•Analysis of modified peptides possible
Biological question
DB search with MS/MS exercise
http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS
! Searches can be bookmarked for later accession !!------------------------------------------------------------------------------------------------------------Datasets : three different ones :
(1) BSA spectra used for the novo ; file : 461_740_only.mgf (2) BSA LC run 136 spectra ; file : BSA2.mgf (3) Rafts complex sample partial 123 spectra ; file : rafts1_123spectra.mgf------------------------------------------------------------------------------------------------------------
(1) Parameters
Search : Database: SwissProt, Taxonomy: Mammalia, Enzyme: Trypsin, Instrument: ESI-QUAD-TOF, Report top 10 hits;
i) search with Peptide tol. 0.3 Da, MS/MS tol. 0.6 Daii) search with Peptide tol. 2.0 Da, MS/MS tol. 0.6 Daiii) search with Peptide tol. 2.0 Da, MS/MS tol. 2.0 Da
Output : Peptide summary, Ions score cutoff = 0, Require bold red =0
Observe : peptide scores, score of first unrelated sequence, Mascot score threshold
(2) Parameters as in (1) with
Search : Report top AUTO hits, Peptide tol. 0.3 Da, MS/MS tol. 0.6 Da, Fixed modification: Carbamidomethyl (C), Variable modification: Oxidation (M)
i) search with Enzyme: Trypsinii) search with Enzyme: semi-Trypsin
Output : use Standard scoring and Ion cut-off of 14
Observe : number of matches for BSA, protein scores, species of matched proteins, mass error of peptidesmatched, compare spectra of high scored (> 60) and low scored (<35) peptides,
for ii) look in particular at additional peptides
(3) Parameters as in (2) with
i) search with Database: human, Variable modification: Oxidation (M)ii) search with Database: human, Variable modifications: Oxidation (M), Phospho (ST), Acetyl (N-term)iii) search with Database: Escherichia coli, Variable modification: Oxidation (M)
In Mascot output, use Standard scoring, Ion cut-off of 14 and Require bold red
Observe : number of protein identified, homologous sequences (see bold red vs red peptides),Mascot score threshold, unassigned peptides for i) look in particular at the seven last hits: are they convincing?for ii) look in particular at additional peptides and protein identified: are they convincing ?
Following discussion will highlight these aspects :
- result format and what everything means- parameters for validation : mass error, sequence, fragment patterns, … - strong hits vs weak hits : relationship with protein amount- selectivity vs comprehensiveness : chosing the right error tolerance and other parameters- homologous proteins problem
click
Results format
Default Input(change according to exercise)
WORKFLOWS 2 : WORKFLOWS 2 :
General shotgun protein identification General shotgun protein identification techniquestechniques
example : example : Affinity pull-downAffinity pull-down
++ 1D-PAGE 1D-PAGE
++ LC-MS/MS LC-MS/MS
Shotgun sequencing from complex mixturesShotgun sequencing from complex mixtures
Denaturation, Proteolytic digestion
Db search
Multiprotein complex
List of identified proteins
1. P452182. P215433. Q125884. P326515. Q012456. ….
Complex peptide mixture(1000-20000 species)
Rp-LC-MSMS run
Alternative : MuDPIT (Multi Dimensional Protein Identification Alternative : MuDPIT (Multi Dimensional Protein Identification Technology) Technology)
Denaturation, Proteolytic digestion
Strong Cation Exchange(SCX)separation
Db search
Multiprotein mixture (complex)
List of identified proteins
1. P452182. P215433. Q125884. P326515. Q012456. ….
Complex peptide mixture(1000 - 2000000)
Rp-LC-MSMS runs
Post-digestion separation of peptides by two-dimensional liquid chromatography Post-digestion separation of peptides by two-dimensional liquid chromatography instead of separation of proteinsinstead of separation of proteins
Peptide validation problem
There are good hits, bad hits, and everything in between…
?
True positives with weak spectra
False positives
Homologoussequences
Many solutions proposed for validation : manual, semimanual, training sets,mass accuracy,…. statistical validation is now popular
Monoisotopic mass of neutral peptide Mr(calc): 1113.622Fixed modifications: Carbamidomethyl (C)Ions Score: 69 Expect: 3.8e-07 Matches (Bold Red): 17/90 fragment ions using 32 most intense peaks
A good hit (Mascot score=69)
MS/MS Fragmentation of VMLAANIGTPK -- Found in Zurbriggen_TtengcongensisMatch to Query 525: 1113.616556 from(557.815554,2+)
F020857.dat
Monoisotopic Mr(calc): 1478.7994Fixed modifications: Carbamidomethyl (C) Ions Score: 13 Expect: 18 Matches (Bold Red): 10/110 fragment ions using 26 most intense peaks
A bad hit (Mascot score=13)
MS/MS Fragmentation of VSIALSSHWINPRFound in KLOT_MOUSE, Klotho precursor - Mus musculus (Mouse)Match to Query 2: 1478.838768 from(740.426660,2+)
*
*
*
*
**
* Strong unmatched peaks
*
Further processing of Mascot results ( shotgun experiment, one or more samples)
Mascot Search Results
Significant hits: ALBU_BOVIN (P02769) Serum albumin precursor (Allergen Bos d 6). ALBU_CANFA (P49822) Serum albumin precursor (Allergen Can f 3). VWF_PIG (Q28833) Von Willebrand factor precursor (vWF) (Fragment). CIQ3_BOVIN (P58126) Voltage-gated potassium channel protein KQT-like 3 RYR2_RABIT (P30957) Ryanodine receptor 2 (Cardiac muscle-type ryanodin K2CA_BOVIN (P04263) Keratin, type II cytoskeletal 68 kDa, component IA ALFB_RABIT (P79226) Fructose-bisphosphate aldolase B (EC 4.1.2.13) (Li
Mascot Search Results
Significant hits: ALBU_BOVIN (P02769) Serum albumin precursor (Allergen Bos d 6). ALBU_CANFA (P49822) Serum albumin precursor (Allergen Can f 3). VWF_PIG (Q28833) Von Willebrand factor precursor (vWF) (Fragment). CIQ3_BOVIN (P58126) Voltage-gated potassium channel protein KQT-like 3 RYR2_RABIT (P30957) Ryanodine receptor 2 (Cardiac muscle-type ryanodin K2CA_BOVIN (P04263) Keratin, type II cytoskeletal 68 kDa, component IA ALFB_RABIT (P79226) Fructose-bisphosphate aldolase B (EC 4.1.2.13) (Li
220507
Uncategorized
Sample
Uncategorized
Sample220507_QS_DEJ_TA
P
220507_QS_DEJ_R
AP-Protein name Number
of similar matches
Accession numbers
Protein molecular weight (AMU)
TAP Igol-
Heat shock protein SSA1 (Heat shock protein YG100) - Saccharomyces cerevisiae (Baker's yeast)1 HSP71_YEAST69641.3 10 260S ribosomal protein L4-A (L2A) (RP2) - Saccharomyces cerevisiae (Baker's yeast) 2 RL4A_YEAST,RL4B_YEAST39074 8 9Elongation factor 1-alpha (EF-1-alpha) (Translation elongation factor 1A) (Eukaryotic elongation factor 1A) (eEF1A) - Saccharomyces cerevisiae (Baker's yeast)1 EF1A_YEAST50014.6 10 8Uncharacterized protein YNL157W - Saccharomyces cerevisiae (Baker's yeast) 1 YNP7_YEAST18034.1 0 5Homocitrate synthase, mitochondrial precursor (EC 2.3.3.14) - Saccharomyces cerevisiae (Baker's yeast)2 HOSM_YEAST,Q6B1Y9_YEAST48577.5 6 2RNA-binding protein SRO9 (Suppressor of RHO3 protein 9) - Saccharomyces cerevisiae (Baker's yeast)1 SRO9_YEAST48041.5 2 3GTP-binding protein GTR1 - Saccharomyces cerevisiae (Baker's yeast) 1 GTR1_YEAST35831.2 8 060S ribosomal protein L10 (L9) (Ubiquinol-cytochrome C reductase complex subunit VI-requiring protein) - Saccharomyces cerevisiae (Baker's yeast)1 RL10_YEAST25344.1 4 2Glyceraldehyde-3-phosphate dehydrogenase 3 (EC 1.2.1.12) (GAPDH 3) - Saccharomyces cerevisiae (Baker's yeast)1 G3P3_YEAST35728.6 2 340S ribosomal protein S15 (S21) (YS21) (RP52) (RIG protein) - Saccharomyces cerevisiae (Baker's yeast)1 RS15_YEAST15984.2 4 1Elongation factor 3A (EF-3A) (EF-3) (Translation elongation factor 3A) (Eukaryotic elongation factor 3) (eEF3) (Yeast elongation factor 3) - Saccharomyces cerevisiae (Baker's yeast)1 EF3A_YEAST115978.2 5 1
Sample 1
Sample 2
Mascot
-Statistical Validation-Protein assignment (parsimony)-Sample alignment
Sca
ffo
ld
1 2
Final list (.xls)
Protein validation problem
! We are not identifying proteins, but peptides !! We are not identifying proteins, but peptides !
1) Apply principle of parsimony (Occam’s razor) : within a family, list protein which can explain the most of the identified peptides
2) To highlight the presence of a member of a protein family, at least one discriminating (unique) peptide must be present (what if it is a borderline hit ?)
??? Worst case : two distinct homologous members of the same family found in two samples, each with a weak discriminating peptide….what to say ?
Free Scaffold viewer : www.proteomesoftware.com
Data analysis and distribution software : Scaffold
Protein ID probabilityNb unique peptidesNb identified spectraNb unique spectraPercentage of total spectra
“Advanced” proteomics
• Quantitative proteomics– Isotope labelling : metabolic vs chemical : ICAT, iTRAQ, SILAC– Label-free
• Proteome subsets– Phosphoproteome – Ubiquitinated proteins– …
• Clinical proteomics (marker discovery)– Too vast to summarise
• Proteome imaging– MALDI of tissues
Relative quantification :
Comparison of proteins from samples A vs B ? Which proteins change in amount and how much ?
Applications :
-Healthy vs. diseased tissues
-Healthy vs. diseased body fluids
-Drug treated / untreated cells
-Stimulated / unstimulated cells
-Mutants / wt cells-……..
+TOF MS: Experiment 1, 44.071 to 46.012 min from 181203_QS_MQ_RuedaICAT1_long...a=3.56275471721098790e-004, t0=7.24150134716619500e+001
Max. 649.4 counts.
930 940 950 960 970 980 990 1000 1010 1020m/z, amu
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
In
te
ns
ity
, c
ou
nts
8.96
968.52
9.96
1.00
10.96
2.00
m = 9 Da
Peptide from sample A
Peptide from sample B
Relative quantitation : stable isotope labelling is very fashionable!
Sample A : light isotope Sample B : heavy isotope
mix, digest
Quantitate and identify ( MS)
How to label ?
-chemically, post protein synthesis
“specific” chemical modification of AA side chain
(+) any sample can be done(-) side reactions
-metabolically, during protein synthesis
Incorporation of one or more labelled amino acid(+) “native” proteins(-) need cultivable organism
d0- or d8-ICATd0- or d8-ICAT
Biotin Biotin tagtag
Linker (heavy or Linker (heavy or light)light)
SS
NN NN
OO
NN OOOO
OO NNII
OO OOXX
XX
XX
XX
XX
XX
XX
XX
ThiolThiolreactivereactive
(X= H or D)
State 1 State 2Transition states
[protein1][protein2]
.
.[proteinn]
[protein1][protein2]
.
.[proteinn]
Isotope Coded Affinity Tag (ICAT) reagents
Cell State 1 Cell State 2
Combine samples
Inte
nsi
tym/z
A1
A2A3B1
B2Identify proteins by MS/MS
Inte
nsi
ty
m/z
aa1aa2aa3
aa4
Quantitate protein levels by H8 / D8 peak heigth ratios
New Methods :New Methods : ICAT:quantitation ICAT:quantitation and identificationand identification
Biotin NH
O IHH
HH HH
HH
Biotin NH
O IDD
DD DD
DD
Modify with (H8)-ICAT
Modify with (d8)-ICAT
•Digest Trypsin•Purify Cys-containing peptides on avidin column
HS- -SH
Avidin Affinity Chromatography
Avidin Affinity Chromatography
controlcontrol treatmenttreatment
ICAT labelICAT label
Combine & proteolyzeCombine & proteolyze
d0d0d0d0
d0d0/d8d8
d8d8d8d8
Fraction # (Time (min))Fraction # (Time (min))
KC
l K
Cl
[M]
[M]
0.00.00.10.10.20.20.30.30.40.40.50.5
0.00.0
0.30.3
0.60.6
70701010 2020 3030 4040 5050 6060
Ion-Exchange
OD
OD
22
1414
%
%
AcN
AcNR
.A.
R.A
. (%
)(%
)
00
5050
100100
00
100100
5050
Time (min)Time (min)
1010 2020 4040 5050 6060 70703030
LC-MS/MS
m/zm/z400400 800800 12001200 16001600 20002000
00
5050
100100 y9 y12y4 y8y5 y11 y14 y15 y17y13 y16y7y6 y10
b15b4 b10 b14b7 b8b6b3b2 b5 b9 b11 b12 b13R.A
. R
.A.
(%)
(%) D SQTNI N I AL T D A
A SLTAD NIQ NT DI
Area = 1.21x109
Area = 1.01x109
d0d0
d8d8 d8d8//d0d0 = 1:0.83
A) Identify
B) Quantify
1)
2)
3)
4)
5)
6)
Pair wise ICAT with Multidimensional Chromatography
ICAT (+) and (-)
- relative protein quantification by MS
- simplification of complex mixtures by selecting a subset of peptides after digestion
- eliminate analytical variability by mixing samples
~15 different isotope labelling methods developed in the last 5 years !!
- protein quantification unreliable for weak signals
- affinity purification (avidin) : losses for low amounts
- multiple side reactions possible
+
-
Recent ICAT studies (R. Aebersold’s group)
Wollscheid B, von Haller PD, Yi E, Donohoe S, Vaughn K, Keller A, Nesvizhskii AI, Eng J, Li XJ, Goodlett DR, Aebersold R, Watts JD.
Lipid raft proteins and their identification in T lymphocytes.Subcell Biochem. 2004;37:121-52
Yan W, Lee H, Yi EC, Reiss D, Shannon P, Kwieciszewski BK, Coito C, Li XJ, Keller A, Eng J, Galitski T, Goodlett DR, Aebersold R, Katze MG.
System-based proteomic analysis of the interferon response in human liver cells .Genome Biol. 2004;5(8):R54.
Giglia-Mari G, Coin F, Ranish JA, Hoogstraten D, Theil A, Wijgers N, Jaspers NG, Raams A, Argentini M, van der Spek PJ, Botta E, Stefanini M, Egly JM, Aebersold R, Hoeijmakers JH, Vermeulen W.
A new, tenth subunit of TFIIH is responsible for the DNA repair syndrome trichothiodystrophy group A.Nat Genet. 2004 Jul;36(7):714-9.
Ranish JA, Hahn S, Lu Y, Yi EC, Li XJ, Eng J, Aebersold R.Identification of TFB5, a new component of general transcription and DNA repair factor IIH.Nat Genet. 2004 Jul;36(7):707-13.
Hardwidge PR, Rodriguez-Escudero I, Goode D, Donohoe S, Eng J, Goodlett DR, Aebersold R, Finlay BBProteomic analysis of the intestinal epithelial cell response to enteropathogenic Escherichia coli.J Biol Chem. 2004 May 7;279(19):20127-36.
Zhang J, Goodlett DR, Peskind ER, Quinn JF, Zhou Y, Wang Q, Pan C, Yi E, Eng J, Aebersold RH, Montine TJ. Quantitative proteomic analysis of age-related changes in human cerebrospinal fluid.
Neurobiol Aging. 2005 Feb;26(2):207-27.
Marelli M, Smith JJ, Jung S, Yi E, Nesvizhskii AI, Christmas RH, Saleem RA, Tam YY, Fagarasanu A, Goodlett DR, Aebersold R, Rachubinski RA, Aitchison JD.
Quantitative mass spectrometry reveals a role for the GTPase Rho1p in actin organization on the peroxisome membrane.
J Cell Biol. 2004 Dec 20;167(6):1099-112. Epub 2004 Dec 13.
• Label light / heavy cultures(Leu d0 / d3)
• Stimulate heavy cells
• Mix cells or lysates
• Purify fraction of interest
• Analyse by LC-MS/MS (->ID)
• Quantify signals of ion pairs
SILAC
Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M.
Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
Mol Cell Proteomics. 2002 May;1(5):376-86.
SILAC (+) and (-)
• relative protein quantification by MS
• eliminate preparative variability by mixing samples immediately after culture
• eliminate analytical variability
• peptides in native state (no side reactions)
• protein quantification unreliable for very weak signals
• mass shift variable (dependent on number of residues)
• only feasible with organisms in culture
+
-
Recent SILAC articles
Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M.Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.Mol Cell Proteomics. 2002 May;1(5):376-86.
Blagoev B, Ong SE, Kratchmarova I, Mann M. Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics.Nat Biotechnol. 2004 Sep;22(9):1139-45. Epub 2004 Aug 15.
Gruhler A, Olsen JV, Mohammed S, Mortensen P, Faergeman NJ, Mann M, Jensen ON. Quantitative Phosphoproteomics Applied to the Yeast Pheromone Signaling Pathway.Mol Cell Proteomics. 2005 Mar;4(3):310-327.
de Hoog CL, Foster LJ, Mann M.RNA and RNA binding proteins participate in early stages of cell spreading through spreading initiation centers.Cell. 2004 May 28;117(5):649-62.
Ong SE, Kratchmarova I, Mann M. Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC).J Proteome Res. 2003 Mar-Apr;2(2):173-81.
Blagoev B, Kratchmarova I, Ong SE, Nielsen M, Foster LJ, Mann M.A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling.Nat Biotechnol. 2003 Mar;21(3):315-8. Epub 2003 Feb 10.
Foster LJ, De Hoog CL, Mann M. Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors.Proc Natl Acad Sci U S A. 2003 May 13;100(10):5813-8. Epub 2003 Apr 30.
Proteome subset : Phosphoproteome : S. Ficarro
Brill LM, Salomon AR, Ficarro SB, Mukherji M, Stettler-Gill M, Peters EC.
Robust phosphoproteomic profiling of tyrosine phosphorylation sites from human T cells using immobilized metal affinity chromatography and tandem mass spectrometry.Anal Chem. 2004 May 15;76(10):2763-72.
Salomon AR, Ficarro SB, Brill LM, Brinker A, Phung QT, Ericson C, Sauer K, Brock A, Horn DM, Schultz PG, Peters EC.
Profiling of tyrosine phosphorylation pathways in human cells using mass spectrometry.Proc Natl Acad Sci U S A. 2003 Jan 21;100(2):443-8. Epub 2003 Jan 9.
PP
-P-Tyr antibody
PP
P
P
PP
IMACImmobilisedMetal Affinity Chromatography
EsterifyAsp, Glu
TrypsinDigestion
LC-MS/MS
Proteome imaging : tissue imaging by MALDI : R.Caprioli
Caldwell RL, Caprioli RM. Tissue profiling by mass spectrometry: A review of methodology and applications.Mol Cell Proteomics. 2005 Jan 26; [Epub ahead of print]
Reyzer ML, Caldwell RL, Dugger TC, Forbes JT, Ritter CA, Guix M, Arteaga CL, Caprioli RM.Early changes in protein expression detected by mass spectrometry predict tumor response to molecular therapeutics.Cancer Res. 2004 Dec 15;64(24):9093-100.
Chaurand P, Sanders ME, Jensen RA, Caprioli RM. Proteomics in diagnostic pathology: profiling and imaging proteins directly in tissue sections.Am J Pathol. 2004 Oct;165(4):1057-68. Review.
Pierson J, Norris JL, Aerni HR, Svenningsson P, Caprioli RM, Andren PE. Molecular profiling of experimental Parkinson's disease: direct analysis of peptides and proteins on brain tissue sections by MALDI mass spectrometry.J Proteome Res. 2004 Mar-Apr;3(2):289-95.
….
References and web links
REVIEWS AND TUTORIALS :
1) Molecular Biologist’s Guide to Proteomics, Paul R. Graves and Timothy A. J. HaysteadMICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Mar. 2002, p. 39–63 Vol. 66, No. 1
2) Steen H, Mann M., The ABC's (and XYZ's) of peptide sequencing.Nat Rev Mol Cell Biol. 2004 Sep;5(9):699-711.
Links :
Matrix science help site : www.matrixscience.com/help_index.htmlExpasy knowledgebase: www.expasy.org/Expasy proteomics portal : www.expasy.org/tools/MS info resource : www.ionsource.com/Our links : www.unil.ch/paf/links.htmlOur tutorial file www.unil.ch/paf/page19604.htmlHuman Protein Reference Database www.hprd.org