1
Proteomics
George Tsaprailis, DirectorLinda Breci, Associate Director
Arizona Proteomics ConsortiumUniversity of Arizona
What is proteomics and how is Mass spectrometry used
in proteomics ?
2
Proteomics: the study of the Proteome
A collection of proteins, usually comprising
a biological system
Important because (1) proteins perform most cellular
functions, (2) proteins are the major elements of
most cellular structures, and (3) proteins are targets
of drugs/toxicants
Why Proteomics?Mass Spectrometry
Protein ID
3
Protein Chemistry
MassSpectrometry
Computing (+ Bioinformatics)
Proteomics involves-
Protein Chemistry
•Sample isolation/clean-up
•Sample purification
Protein
fractions digest
peptides
4
MassSpectrometry
Key to Proteomics is to obtain peptide masses and/or
sequences
Proteinmixture
MS analysis
MS data
Proteinsseparation
Peptidemixture
digestion
separationPeptides
digestion
MS MS/MS
MassSpectrometry
All types of hardware used in proteomics
5
Computing (+ Bioinformatics)
The Proteomic Approach
Sample
Pre-prep
steps
Protein
DigestPeptides
Mass
Spectrometer
ESI
LC-MS/MS
MALDI
MS
1D PAGE 2D PAGEProtein(s) Solution
HPLC fractions
IP eluent
Protein Id + Informatics
Protein Chemistry
MassSpectrometry
Computing (+ Bioinformatics)
6
What is proteomics and how is Mass spectrometry used
in proteomics ?
Mass Spectrometry
• What is a mass spectrometer and what does it measure?– An instrument that makes ions
– Measures the mass/charge (m/z) of ions
• Mass Spectrometry in proteomics– For proteins and peptides
• whole protein mass measurements
• protein identification based on peptide mass measurement
• protein identification based on peptide structure analysis (fragmentation)
• Need to know some basic principles
7
Protein/peptide relationship
Enzyme
Protein Peptides
Making ions
H2N CH C
CH3
O
HN CH C
CH2
O
CH CH3
CH3
HN CH C
CH2
O
HN CH C
CH2
OH
O
CH2
CH2
CH2
NH2
Ala-Leu-Phe-Lys mass of neutral = 477.3
H+
Ala-Leu-Phe-Lys m/z of singly charged = 478.3
Ala-Leu-Phe-Lys m/z of doubly charged = 239.6
H+
8
Making ions
Ions are made in an ion source
Important methods in Proteomics:
1) MALDI (matrix assisted laser desorption)
2) ESI (electrospray ionization
Electrospray IonizationESI
Matrix Assisted LaserDesorption IonizationMALDI
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
0
25
50
75
100
5000 10000 15000
0
25
50
75
100
14306.0
Inte
nsity
m/z
+8
1789.00
+9
1590.33
+13
1101.40
+10
1431.47
+111301.53
+12
1193.20
Rela
tive In
tensity
m/z
CalculatedMass Spectrum
10000 20000 30000 40000
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
[2M+H]+
14318.68
[M+2H]2+
7157.18
[M+H]+
14318.68
Inte
nsity
m/z
[2M+H] +
28,638.7
9
Analyzing ions
The ion source is coupled to the analyzer
Important analyzers in Proteomics:
1) TOF (time of flight)
2) Ion Trap
Matrix Assisted Laser Desorption (MALDI)
(ion source)
Time of Flight (analyzer)
Pulsed
laser light
Detector
Analyzer
Ion Beam
Sample and matrix
on tip of solid probe
+
+
+ +LASER
FLIGHT TUBE ANALYZER
10
Time of Flight (TOF)
http://www.abrf.org/ABRFNews/1997/June1997/jun97lennon.html
Linear Mode:better sensitivitypoor resolution
Reflectron Mode:less sensitivityhigher resolution
MALDI-TOF spectrum (mix of peptides)
m/z500 2500
90
0
D:\011003_500fmol\Bsaintcal\2Ref\pdata\1\1r (11:26 10/04/01)
x 4.0
Ref
Ref
11
MALDI Reflectron Spectrum Of ACTH
M@LDI13-Nov-2003
2444 2446 2448 2450 2452 2454 2456 2458 2460 2462 2464 2466 2468 2470 2472 2474 2476 2478 2480 2482m/z0
100
%
ACTHResCk 3 (0.098) Cm (1:5) TOF LD+ 6.57e32467.344
2466.324
2451.525
2468.330
2469.316
2470.337
2471.290
Electrospray (ESI) (ion source)
Ion Trap (analyzer)
Detector
Analyzer
Ion Beam
Liquid sample sprayed
from needle or capillary
++
+
++
+
++
+
Dry gas
or Heat
4500 V
HPLC
ESI
ION TRAP ANALYZER
12
ESI-Ion Trap Spectrum
200 400 600 800 1000 1200 1400 1600 1800 2000
0
20
40
60
80
100
Re
lative In
ten
sity
m/z
476.9
952.4
1903.4
[M+H]+ = 951.4 + 1 = 952.4
[M+2H]2+ = (951.4 + 2) / 2 = 476.7
[2M+H]+ = (951.4 x 2) + 1 = 1903.8
[M+2H]2+
[2M+H]+
[M+H]+
View an ion trap animation
• Exercise 1
13
Resolution and mass accuracy
varies by instrument
MASS RANGE Resolution Accuracy (Error)
m/z (at m/z 1,000) (at m/z 1,000)
2,000 (full scan)
10,000 (zoom scan)
0.006% (60 ppm) Ext. Cal.
0.003% (30ppm) Int.Cal.
INSTRUMENT
to 4,000FTICR
MALDI/TOF to 400,000
LCQ (Ion Trap)
15,000 (reflectron)
0.0001% (1ppm)
0.03% (300 ppm)to 2,000
500,000
610)(
×−
=
lMWTheoretica
MeasuredMWlMWTheoreticappm
Resolution
http://www.matrixscience.com
Resolution
30,000
10,000
3,000
1,000
14
MALDI Reflectron Spectrum Of ACTH
M@LDI13-Nov-2003
2444 2446 2448 2450 2452 2454 2456 2458 2460 2462 2464 2466 2468 2470 2472 2474 2476 2478 2480 2482m/z0
100
%
ACTHResCk 3 (0.098) Cm (1:5) TOF LD+ 6.57e32467.344
2466.324
2451.525
2468.330
2469.316
2470.337
2471.290
You must know the resolution of your
instrument to analyze the data!
– We need to know the possible error in the measurement
– Is the peak monoisotopic?
– Is the peak average?
15
Analysis of whole proteins
by MALDI-TOF and ESI-Ion trap
• MALDI-TOF = measure with 1 or 2 protons
– large molecules like Proteins require Linear mode
(much lower resolution)
• ESI-Ion Trap = measure with many protons (high charge state)
– mass of the protein can be calculated from the
multiply charged peaks
Mass Spec measures isotopesExcel calculated example: Carbon is 12.000
For every 12C there is 1.1% 13C
10 carbons
100
10.8
0.5
0
20
40
60
80
100
120
1 2 3 4 5 6 7
isotopes add up10 carbons = 11% 13C Peak
for 100 carbons, the 13C peakis larger than the 12C peak
100 carbons
92.5
100
53.5
18.8
51 0.2
0
20
40
60
80
100
120
1 2 3 4 5 6 7
16
Proteins have very large isotope widthsTheoretical Isotope distribution of Lysozyme
9th Isotope14313.906
1st Isotope14304.885
Isotope # m /z % M ax im um
0 14304.885 0.2
1 14305.888 1.2
2 14306.891 4.6
3 14307.893 12.8
4 14308.896 26.9
5 14309.898 46.3
6 14310.900 67.6
7 14311.902 86.3
8 14312.904 97.7
9 14313.906 100.0
10 14314.908 93.5
11 14315.910 80.4
12 14316.912 64.2
13 14317.914 47.8
14 14318.916 33.4
15 14319.918 21.8
16 14320.920 13.2
17 14321.922 7.5
18 14322.924 3.9
19 14323.925 1.8
20 14324.927 0.7
21 14325.929 0.2
Lysozyme by MALDI/TOF
Average mass = 14,314
10000 20000 30000 40000
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
[2M+H]+
14318.68
[M+2H]2+
7157.18
[M+H]+
14318.68
Inte
nsity
m/z
[2M+H]+
28638.68
[M+H]+
14316.24
[2M+2H]2+
7157.18
17
Lysozyme by ESI-Ion Trap
Average mass = 14,314
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
0
25
50
75
100
5000 10000 15000
0
25
50
75
100
14306.0
Inte
nsity
m/z
+8
1789.00
+9
1590.33
+13
1101.40
+10
1431.47
+11
1301.53
+12
1193.20
Re
lative
In
ten
sity
m/z
CalculatedMass Spectrum
14318.2
So we’ve made measurements
Now What?
• A lot of information is available on-line about proteins and/or the gene
• We will explore protein information in general
• We will then use the available info to perform data analysis
18
MALDI-TOF analysis of Alkaline
phosphatase – Computer Exercise #2
0 25000 50000 75000 100000
2.00E+008
4.00E+008
6.00E+008
8.00E+008
1.00E+009
1.20E+009
Inte
nsity
m/z
[M+2H]2+
23445.54
[M+H]+
47155.22
[2M+H]+
94472.97
Protein identification – Two strategies
• single stage mass spectrometry (MS)
– called “peptide mass mapping”
– measure all peptides in one spectrum
– MALDI-TOF
– produces low confidence results
• tandem mass spectrometry (MS/MS)
– measure peptides as they elute from an HPLC
– ESI-Ion Trap
– produces high confidence results
19
Single Stage – Peptide mass mapping
Using MALDI-TOF
MALDI-TOF Spectrum of tryptic digest
m/z500 2500
90
0
D:\011003_500fmol\Bsaintcal\2Ref\pdata\1\1r (11:26 10/04/01)
x 4.0
Ref
Ref
20
MALDI Reflectron Spectrum Of ACTH
M@LDI13-Nov-2003
2444 2446 2448 2450 2452 2454 2456 2458 2460 2462 2464 2466 2468 2470 2472 2474 2476 2478 2480 2482m/z0
100
%
ACTHResCk 3 (0.098) Cm (1:5) TOF LD+ 6.57e32467.344
2466.324
2451.525
2468.330
2469.316
2470.337
2471.290
Data Analysis for peptide mass mapping
• Important data
– multiple peaks
– mass accuracy
– confirming information
(pI, approx. mass,
organism, etc.)
?MS
MS Peptide MWFound in Selected
DatabasesNDALYFPT...
SWDLTAL...
PTDLDVSY...
protein peptides identify
rank
for example:Measured Peptide = 1274.5183
21
Data Analysis for peptide mass mapping
>gi|27807105|ref|NP_777037.1| solute carrier family 6 (neurotransmitter transporter,
glycine), member 9 [Bos taurus]�gi|1279843|gb|AAB01159.1| glycine transporter
MAAAQGPVAPSKLEQNGAVPSEATKSDQNLGQGNWRNQIEFVLTSVGYAVGLGNV
WRFPYLCYRNGGGAFMFPYFIMLIFCGIPLFFMELSFGQFASQGCLGVWRISPMFK
GVGYGMMVVSTYIGIYYNVVICIAFYYFFSSMTPVLPWTYCNNPWNTPDCMSVLDN
PNITNGSQPPALPGNVSQALNQTLKRTSPSEEYWRLYVLKLSDDIGNFGEVRLPLLG
CLGVSWVVVFLCLIRGVKSSGKVVYFTATFPYVVLTILFIRGVTLEGAFTGIMYYLTPQ
WDKILEAKVWGDAASQIFYSLGCAWGGLVTMASYNKFHNNCYRDSVIISITNCATSV
YAGFVIFSILGFMANHLGVDVSRVADHGPGLAFVAYPEALTLLPISPLWSLLFFFMLILL
GLGTQFCLLETLVTAIVDEVGNEWILQKKTYVTLGVAVAGFLLGIPLTSQAGIYWLLLM
DNYAASFSLVIISCIMCVSIMYIYGHQNYFQDIQMMLGFPPPLFFQICWRFVSPAIIFFIL
IFSVIQYQPITYNQYQSSQTGLPLFTCQIAPAHVPQPLSGARTPSPKPWSVRVSVLRA
PLCSDSPGRAASNPL
MAAAQGPVAPSK = 1127.5883LEQNGAVPSEATK = 1343.6807SDQNLGQGNWR = 1274.5878
Measured Peptide = 1274.5183
1274.5878 theoretical
1274.5183 measured
0.0695 difference
error = 55 ppm
Data Analysis for peptide mass mapping
• Important data
– multiple peaks
– mass accuracy
– confirming information (pI, approx. mass, organism, etc.)
?MS
MS Peptide MWFound in Selected
DatabasesNDALYFPT...
SWDLTAL...
PTDLDVSY...
protein peptides identify
rank
22
Computer Exercise #4
Analyze peptide mass mapping data
• 4 lists of peptide masses provided on
worksheet
– (Alternate address of excel data):
http://www.chem.arizona.edu/facilities/msf/index.html
Problems with whole protein analysis
• Peaks are broad
– large groups of isotope peaks
– peaks further broadened by adducts (contaminants, salts)
• Proteins are often modified
– Instrument may not resolve the mass difference
– No information regarding which amino acid is modified
• Proteins are in a complex matrix
– background stuff
– other proteins (too complex!!!)
Therefore proteins are identified from peptides!
23
How are proteins separated
• Proteins from biological organisms are a complex mixture
• Separating proteins
– 1D SDS-PAGE
• Cross linking controls MW separated
• Low resolution technique, spot can contain 10's to
100's of proteins
– 2D SDS-PAGE
• Best for complex protein mixtures (IEF + SDS-PAGE)
• Other methods
– Chromatography (reverse phase, size exclusion, ion
exchange, affinity)
– Preparative isolectric focusing (IEF)
Protein Mixture
or IP eluant
1D SDS-PAGE
�Great clean-up tool (rid of salts, detergents, etc…)
�Great concentration tool
�Biological analytes
�Various stains available – various detection limits
�USE PRECAST GELS (polymer issue) if possible
�Various size gels (spatial resolution)
�Various MW ranges
1D Electrophoresis
24
http://www.biorad.com
1D Electrophoresis
Separation on the basis of intrinsic charge (pKa)
isoelectric focusing
Se
pa
ratio
n o
n t
he
ba
sis
of
Siz
e
PA
GE
(S
DS
ge
l e
lectr
op
ho
resis
)
(1)
(2)
2D Electrophoresis
25
Protein Mixture or IP eluant
or Cell/tissue2D SDS-PAGE
�Great clean-up tool (rid of salts, detergents, etc…)
�Various stains available – various detection limits
�Protein profiling
�Various pH ranges
�2D gels are very much sample related (sample may require further clean-up
prior to 2D gel
Avoid excess salts in sample (not focus, IPGs burn, 30-40 mM max salt)
�Often Automated w/ robotics–high throughput (MALDI-TOF)
�Often good for visualizing PTMs
2D Electrophoresis
+ –pH 3 pH 7.5 pH 10
+ –
pH 3 pH 7.5 pH 10
+ –
pH 3 pH 7.5 pH 10
+ –
pH 3 pH 7.5 pH 10
The 1st D: Isoelectric Focussing
26
+ –
pH 3 pH 7.5 pH10
–pH 3 pH 7.5 pH10
+ –pH 3
pH 7.5pH10
Proteins
migrate
through the gel
at a rate
proportional to
their size
Smallest
proteins travel
the furthest
distance
charge
size
The 2nd D: SDS-PAGE
• Do Computer Exercise #3
• Laboratory tour
Top Related