Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life...
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life...
Experimental & Bioinformatic Tools for Proteomics
Steve Oliver
Professor of GenomicsFaculty of Life Sciences
The University of Manchester http://www.cogeme.man.ac.uk
http://www.bioinf.man.ac.uk
Functional Genomics
Level of Analysis Definition Status Method of Analysis
Genome Complete set of genes of an organism or its organelles.
Context-independent (modifications to the yeast genome may be made with exquisite precision.
Systematic DNA sequencing.
Transcriptome Complete set of mRNA molecules present in a cell, tissue or organ.
Context-dependent (the complement of mRNAs varies with changes in physiology, development or pathology.
Hybridisation arrays.
SAGE
High-throughput Northern analysis.
Proteome Complete set of protein molecules present in a cell, tissue or organ.
Context-dependent. 2-D gel electrophoresis. Peptide mass fingerprinting.
Two-hybrid analysis.
Metabolome Complete set of metabolites (low molecular weight intermediates) present in a cell, tissue or organ.
Context-dependent. Infra-red spectroscopy.
Mass spectometry.
Nuclear magnetic resonance spectometry.
GENOME
TRANSCRIPTOME
PROTEOME
METABOLOME
Proteomics
Separation
Identification
Quantitation
Bioinformatics
knowledge+ prediction
post-translational modification
separationmethods
simple peptidemap fingerprintsimple peptide
map fingerprint
complex peptidemap fingerprint
complex peptidemap fingerprint
BioinformaticsIdentification
[digest]
[digest]
“virtual” proteome“virtual” proteome
real proteomereal proteome
simple mixtures& single proteinssimple mixtures& single proteins
complex mixtures& subsets
complex mixtures& subsets
Complex mixture analysisComplex mixture analysis
2D-gels,functional
separations,n-dimensional
chromatography
2D-gels,functional
separations,n-dimensional
chromatography
genomegenome
peptide mass database
peptide mass database
Aberdeen PRF1: S. cerevisiae 2D map4.0 4.5 5.0 5.5 6.0 6.5
+SSE1
+SSC1
+ SSB1+
VMA1
ADE5,7+
ADE6
HIS4
+
+
CDC48+
+SSA2
ABP1+
+HSP60
+ +PUB1VMA2
PDR13
+
+
+
ATP2
LYS9
SAM2
+ +
SAM1
+ +
ADO1+
SGT2
+CLC1,BGL2
+
EFB1
+
Ykl056c
+
HYP2 ++
RPS0B
+
EGD2FBA1
+
+NTF2
+
+
+
FPR1
PFY1
RPS21
++
AHP1TSA1
+
COF1
+ +HXK2TIF3
+
STI1
ALD6
+ +
+GDH1
+ARG1
ACT1+
+IPP1
+RHR2
+
+
+
+
ASC1
TPI1 TPI1
RIB3
SOD1
+
ADK1+
+
+
PDC1
ENO2
+ +
FBA1
+ +
ENO2
+ +
+
LEU1
PAB1
PDC1
+
+
+
+
MET17
MET6
CYS3
PSA1
+CYS4
+
+
+
ADH1
ILV5
TDH3
+
PST2
+SSA1
+WTM1+
ASN2
+Yfr044c
+
+GLK1,ARO8
+SES1
PDC1+
YHB1+
+OYE2
ENO1+
+PGK1?
+BMH2
+PDB1+
+BMH1+SEC53
+Ylr301w
+VMA4
+SPE3
FBA1
+
ENO2+
+ ENO2+
+
ENO2+URA1 +ADH1
HXK1
+CDC19
+ +PDC1
PGK1
+TPM1
RPS0A
FBA1+
HSP26+
+
BNA1
+MGE1
+TDH3
+EGD1
RIB4+ CPH1
+
+RPL22A
150100
90
80
70
60
50
40
30
20
10
Peptide mass fingerprinting
denature KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCLPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV
KETAAAK FER QHMDSSTSAASSSNYCNQMMK SR NLTK DR
CLPVNTFVHESLADVQAVCSQK NVACK NGQTNCYQSYSTMSITDCR
ETGSSK YPNCAYKTTQANK HIIVACEGNPYVPVHFDASV
m1 m2 m3 m4 m5 m6
digest (trypsin)
m7 m8 m9
m10 m11 m12
mass spectrometry
mass
abun
danc
e
m10
m1
m11 m12m9
m7
Proteomic applications
• Quantitative Proteomics– “Expression” proteomics
• protein levels under different conditions/times
• Qualitative Proteomics– Identification proteomics
• protein:protein interactions• post-translational modifications
““A MASS SPECTROMETER A MASS SPECTROMETER MEASURES THE MW….”MEASURES THE MW….”
““......A A MS ANALYSIS GIVES MS ANALYSIS GIVES THE MASS-TO-CHARGE RATIO (THE MASS-TO-CHARGE RATIO (m/zm/z) )
FOR IONS…IN GAS PHASE”.FOR IONS…IN GAS PHASE”.Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
What is a “mass spectrometer”...?What is a “mass spectrometer”...?
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
DIRECT DIRECT INTRODUCTION INTRODUCTION
(solid, liquid, gas) (solid, liquid, gas) SEPARATION SEPARATION
TECHNIQUES TECHNIQUES (HPLC, CE, GC)(HPLC, CE, GC)
ION SOURCEION SOURCE
(“ion generation”)(“ion generation”)
vacuumvacuum
Pumping Pumping systemsystem
Sample Sample introductionintroduction
Data Data ProcessingProcessing
ANALYZER ANALYZER
(“mass analysis”)(“mass analysis”)
DetectorDetector
EI, FAB, EI, FAB, MALDI,ElectrosprayMALDI,Electrospray
TOF, quadrupole, ion trapTOF, quadrupole, ion trap
Brancia FL , Trieste, 12/02/2004Brancia FL , Trieste, 12/02/2004
Various ionisation methodsVarious ionisation methods
• Electron impact ionisation (1919 A.J. Dempster)Electron impact ionisation (1919 A.J. Dempster)• Chemical Ionisation CIChemical Ionisation CI• Fast atomic bombardment FAB (1981 M. Barber)Fast atomic bombardment FAB (1981 M. Barber)• Matrix-assisted laser desorption ionisation Matrix-assisted laser desorption ionisation
MALDI (1988 K. Tanaka, M. Karas F. Hillenkamp)MALDI (1988 K. Tanaka, M. Karas F. Hillenkamp)• Electrospray ES (1985, J. Fenn)Electrospray ES (1985, J. Fenn)
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
‘‘Soft’ Ionisation TechniquesSoft’ Ionisation Techniques
‘‘Soft’ refers to the low amount of energy imparted into the Soft’ refers to the low amount of energy imparted into the analyte during ionisation. Too much internal energy will analyte during ionisation. Too much internal energy will result in fragmentation. Soft ionisation techniques form result in fragmentation. Soft ionisation techniques form intact molecular or pseudo-molecular (M+H) ions.intact molecular or pseudo-molecular (M+H) ions.
Matrix-assisted laser desorption Matrix-assisted laser desorption ionisation (MALDI)ionisation (MALDI)
Electrospray (ES)Electrospray (ES)
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
““...for their developments of soft desorption ionisation methods for ...for their developments of soft desorption ionisation methods for mass spectrometric analysis of biological macromolecules”.mass spectrometric analysis of biological macromolecules”.
Nobel Prize in Chemistry 2002Nobel Prize in Chemistry 2002
11//2 of the prize went to Kurt Wutrich (Switzerland) development of NMR analysis//2 of the prize went to Kurt Wutrich (Switzerland) development of NMR analysis
1/4 to 1/4 to John B. FennJohn B. Fenn (USA) (USA)
Virginia Commonwealth UniversityVirginia Commonwealth University
Electrospray IonizationElectrospray Ionization
1/4 to 1/4 to Koichi TanakaKoichi Tanaka (Japan) (Japan)
Shimadzu Corp. KyotoShimadzu Corp. Kyoto
Laser IonizationLaser Ionization
Brancia FL , Trieste, 12/02/2004Brancia FL , Trieste, 12/02/2004
Electrospray (ES)Electrospray (ES)
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
[M+nH][M+nH]n+n+
Droplet shrinks Droplet shrinks due to solvent due to solvent evaporationevaporation
Droplet explodes due Droplet explodes due to charge density limitto charge density limit
Gaseous ions formed via Gaseous ions formed via one of two proposed one of two proposed
mechanismsmechanisms
samplesolution
mass analyzer
high vacuum
+HV
pressure gradient
potential gradient
counter electrode(near ground)
electrospraycapillary
skimmerelectrodes
atmospheric pressure
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
The principal outcome of the electrospray process is the The principal outcome of the electrospray process is the transfer of transfer of analyte species, generally ionised in condensed phase, into the gas phase analyte species, generally ionised in condensed phase, into the gas phase
as isolated entitiesas isolated entities
+HV+HV
+ + + + + + ++
+++ + ++
+ Aerosol of Aerosol of charged dropletscharged droplets
Gaskell SJ Gaskell SJ Jounal of Mass SpectrometryJounal of Mass Spectrometry 1997 1997 Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
ES spectrum of Rho proteinES spectrum of Rho protein
600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300m/z0
100
%
771.6759.3
759.1
747.1
735.5
724.1
713.2
713.0
702.6
702.4
682.1
682.0
672.4
672.2653.9
784.4
797.7
825.6
840.3
855.6
871.7
888.0
905.0
941.0
960.2980.3
1001.2
[M+56H]56+
[M+50H]50+
Rho Protein: 47004.33 DaRho Protein: 47004.33 Da
Courtesy of Dr Matt OpenshawCourtesy of Dr Matt Openshaw Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Electrospray (ES)Electrospray (ES)
[M+56H][M+56H]56+ 56+ = = 840.3 m/z840.3 m/z
Therefore, Therefore, M M = = [840.3 x 56] – 56[840.3 x 56] – 56== 47000.8 Da47000.8 Da
Deconvolution: Takes all the multiply charged ions and converts them into a Deconvolution: Takes all the multiply charged ions and converts them into a spectrum on a mass (Da) scale i.e. works out the molecular weight is most likely to spectrum on a mass (Da) scale i.e. works out the molecular weight is most likely to
be. be.
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
ES spectrum after deconvolutionES spectrum after deconvolution
44000 44500 45000 45500 46000 46500 47000 47500 48000 48500 49000 49500 50000mass0
100
%
47004.9
47004.0 Da47004.0 Da
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
AdvantagesAdvantages
• Production of molecular ions from solutionProduction of molecular ions from solution
• The ease of coupling with separation The ease of coupling with separation techniques (micro LC-MS/MSMS, nano LC-techniques (micro LC-MS/MSMS, nano LC-MS/MSMS)MS/MSMS)
• Production of multiply charged ionsProduction of multiply charged ions
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Matrix Assisted Laser Desorption Matrix Assisted Laser Desorption IonisationIonisation
MALDIMALDI
Time-of-FlightTime-of-Flight
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Matrix assisted laser desorption ionisation Matrix assisted laser desorption ionisation (MALDI)(MALDI)
COOH
OH
OH
OH
COOH
CN
OH
COOHH3CO
OCH3
-cyano-4-hydroxy cinnamic acid (CHCA)
2,5-dihydroxybenzoic acid (DHB)
Trans-3,5-dimethoxy-4- hydroxy cinnamic acid
(sinapinic acid; SA)
Typically used with a nitrogen laser (337 nm)
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
MALDI is an efficient desorption ionisation technique for MALDI is an efficient desorption ionisation technique for producing gaseous ions from a solid sample by laser producing gaseous ions from a solid sample by laser
pulsespulses
[M+H]+ Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Matrix Assisted Laser Matrix Assisted Laser Desorption/Ionisation (MALDI)Desorption/Ionisation (MALDI)
Unlike ES, MALDI forms predominantly singly charged ions e.g. [M+H]Unlike ES, MALDI forms predominantly singly charged ions e.g. [M+H]++ or adducts or adducts (sodium [M+Na](sodium [M+Na]++ or potassium [M+K] or potassium [M+K]++))
Sodium = 23 amuSodium = 23 amuPotassium = 39 amuPotassium = 39 amu
[M+H][M+H]++
22 m/z
38 m/z
[M+Na][M+Na]++
[M+K][M+K]++
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Why is the matrix so important?Why is the matrix so important?
• Matrix is necessary to dilute and disperse the Matrix is necessary to dilute and disperse the analyteanalyte
• It functions as energy mediator for ionising It functions as energy mediator for ionising the analyte itself or other neutral moleculethe analyte itself or other neutral molecule
• It forms an activated state produced by photo It forms an activated state produced by photo ionisationionisation
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
AdvantagesAdvantages
• MALDI primarily creates singly charged ions MALDI primarily creates singly charged ions [M+H][M+H]++
• Less sensitive to contaminantsLess sensitive to contaminants• Sensitivity at femtomole levelSensitivity at femtomole level• High throughput analysisHigh throughput analysis
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Time-of-flight (ToF) mass Time-of-flight (ToF) mass spectrometerspectrometer
Flight tube (field-free region)Flight tube (field-free region)
Extraction gridExtraction grid
MALDI targetMALDI target
DetectorDetector
t = 0t = 0 t = > 0t = > 0
mvmv22/2= zV/2= zVtt22=m/z(d=m/z(d22/2V)/2V)
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Reflectron-time of flight mass analyserReflectron-time of flight mass analyser
VACCEL
Electrostaticmirror
Detector 1
Detector 2Target
Laser
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Sensitivity = femtomole 10-15 M/l (...attomole 10-18 M)
Simplicity = very easy training required
$$$ = 70 to 650 k$ 120 to 650 k$
Speed (“high throughput”) = ~104/day dynamic system
Structural information = MSn MSn
Software = “ ...evaluation in progress.”
MALDI ESI
Selectivity (“resolution”) = >5000
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
Structural information can be achieved by Structural information can be achieved by tandem mass spectrometrytandem mass spectrometry
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
The tandem mass spectrometry The tandem mass spectrometry experimentexperiment
Ion source
e.g. electrospray
Analyser 1
e.g. quadrupole
Decompositionregion
collisionally activateddecomposition CAD
Analyser 2
e.g. quadrupole,time-of-flight
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
*
**
**
m+f1+
f2+
f3+
f4+
m+
f1+ f
2+
f3+ f
4+
f1+
m+ f1+
MS1 MS2Collision Cell
Collision gasmolecules
ion source ion beam
m
f1
f2
f3
f4
m/z
TIC
(a)
iondetector
*
(b)
m
f1
f3
m/z
TIC
f3+
Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004
PROBLEMS WITH ‘CLASSICAL’ PROTEOME ANALYSIS:
1. Not comprehensive
2. Not high-throughput
3. Destroys protein-protein interactionsthat provide important clues to function
0
50
100
150
200
250
300
350
400
450
1000 1200 1400 1600 1800 2000
Peptide mass (Da)
Number of (protein) database matches
C. elegans
S.cerevisiae
E.coliH.influenzae
• Multidimensional protein identification technology (MudPIT)
• Washburn MP, et al Nat Biotechnol 2001, 19:242-247.
Reverse PhaseSCX
Load complete digest of sample
MS/MS
Develop with gradient and spray directly onto MSMS
Identified 1500 proteins from yeast including lower abundance species and membrane proteins
2415 (46%) of Plasmodium genome identified in all 4 stages of parasitic life cycle
Just Enough Diagnostic Information
Sidhu KS, Sangavich P, Brancia FL, Sullivan AG, Gaskell SJ, Wolkenhauer O,
Oliver SG, Hubbard SJ (2001) Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome
database searching. Proteomics 1, 1368-1377.
Provide limited sequence information by:
1. Identification of N-terminal amino acid byPTC derivatisation
2. Use guanidination to identify C-terminus,determine lysine content, and improve signal response
3. Specifically fragment next to Asp residues using MALDI-QToF MS
PTC-derivatisationPTC-derivatisation•phenylthiocarbamoyl derivative•Edman chemistry•N-terminal amino acid •b1 ion created via low energy collisions•precursor ion scan gives parents•increased sensitivity
peptide ions ms1 ms2
fixed on b1scan for
precursors collisioncell
Spectra collected of all peptides whichgive rise to a given b1 ion (implying
knowledge of the N-terminal amino acid)
Database peptide hits by N-terminal amino acid
Error = ± 0.5 Da N-terminalAmino acid
mean numberof peptides
ANY 74.15W 1.70C 1.77H 2.30M
3.41:N 5.61I 5.76E 6.04S 7.18L
8.39:I/L 14.16
Average number of matching proteins in the yeast proteome when searching with a peptide mass in the 1000-2000 Da range
Rare amino acids give a bigger search gain
Guanidation of Lysine
NH2
NH2
O
OH
OH
ONH2
NH
NHH2
N
O NH2
NH2
H3C
lysine homoarginine
O-methyl isourea
MALDI spectrum of an enolase tryptic digest
0
500
1000
1500
2000
1000 1500 2000 2500
Mass (m/z)
R R
R
R
R
R
KK K
MALDI spectrum of a tryptic digest of enolase after guanidation
0
2000
4000
6000
800 1000 1200 1400 1600 1800 2000 2200 2400 2600
Mass (m/z)
*K
800 1000 1200 1400 1600 1800 2000 2200 2400 2600
Mass (m/z)
*K
*K*K
*K *K
RRR*K
*K
*K
*K
R
R
Initial set of search peptides and associated
information
Initial set of search peptides and associated
information
Search database, compile protein “hit list” with matching
peptides
Search database, compile protein “hit list” with matching
peptides
Top-scoring protein is matched. Remove
corresponding peptides from search list
Top-scoring protein is matched. Remove
corresponding peptides from search list
If all initial search peptides masses are matched, stop, else continue searching
Real yeast proteomics
• Alternatives to 2D-gels – denaturing technology– low abundance spots difficult to identify
• Many steps of orthogonal 1D-steps– Size exclusion chromatography– Ion exchange chromatography– 1D-gels
3612
.77
1752
.62
795.
23
925.
33
1040
.30
1150
.49
1210
.39
1416
.55
1512
.69
1752
.65
0
800 1000 1200 1400 1600 1800
Mass (m/z)
795.
3281
1.32
3600
3570
.36
1470
.68
1708
.61
1768
.59
RK
R
K
800 1000 1200 1400 1600 3600
Before guanidination
After guanidination12
21.9
0
Yeast proteome sample
Database search gainsStandard MALDI7 search peptides(before guanidination)
1656 proteins match at least 1 peptide
2549 proteins match at least 1 peptide
Standard MALDI12 search peptides(after guanidination)
3235 proteins match at least 1 peptide
Combined 19 (7 + 12) search peptides(both experiments)
Database search gains
Search peptides in common(5 from expt 1, 4 from expt 2)
Search peptides in common(5 from expt 1, 4 from expt 2)
PTC derivatised 3 peptides N-term = Ile/Leu
PTC derivatised 3 peptides N-term = Ile/Leu
All 3 sets of experimental data combined
All 3 sets of experimental data combined
Only 289 proteins match at least 1 peptide in both experiments
Only 18 proteins match at least 1 peptide in all 3 experiments
Only 204 proteins match at least 1 peptide
# peptides in common
Yeast 1 protein
0
10
20
30
40
50
60
70
80
90
100
1 2 4
total number of search peptides
% unambiguous identification
standard
guanidination
PTC (500)
PTC (50)
Asp-frag
Asp-frag (All)
Yeast 2 proteins
0
10
20
30
40
50
60
70
80
90
100
2 4 6
total number of search peptides
% unambiguous identification
standard
guanidination
PTC (500)
PTC (50)
Asp-frag
Asp-frag (All)
C. elegans 1 protein
0
10
20
30
40
50
60
70
80
90
100
1 2 4
total number of search peptides
% unambiguous identification
standard
guanidination
PTC (500)
PTC (50)
Asp-frag
Asp-frag (All)
C. elegans 2 proteins
0
10
20
30
40
50
60
70
80
90
100
2 4 6
total number of search peptides
% unambiguous identification
standard
guanidination
PTC (500)
PTC (50)
Asp-frag
Asp-frag (All)
S. cerevisiae 1 protein S. cerevisiae 2 proteins
protein hit list(quantitative data)
searchengine
primary data(input masses)
Database:Database: - proteome - proteins - peptides
secondary data(experimental proteome data)
rule-basedsystem
protein information(qualitative data)
probability possibilitycombinedevidence
Improved bioinformatics approachesImproved bioinformatics approachesfor complex mixturesfor complex mixtures
Final Scores
Contextual information
pI (theoretical & experimental)
Molecular weight (oligomerisation state)
Subcellular localisation (known, predicted - PSORT)
Molecular environment (soluble, membrane, DNA-,
actin- associated.)
Post-translational modifications (known, putative, predicted)
Sequence motifs
Homology relationships
Non-native state digestions
Scoring systems
• Bayesian approach
– k is hypothesis that the sample protein is protein k,– D is mass spec fingerprint data, – I is background information, – P(k|DI) is posterior probability for k given D and I,– P(k|I) is prior probability of k given I,– P(D|I) is a normalisation constant
)|(
)|()|()|(
IDP
kIDPIkPDIkP =
QUANTITATIVEPROTEOMICS
DiGEDifference Gel Electrophoresis
• Ünlü M. et al (1997). Difference gel electrophoresis:a single gel method for detecting changes in cell extracts. Electrophoresis,18, 2071-2077
label with cy3in dark 30mins @ 4OC
label with cy5in dark 30mins @ 4OC
quench un-reacted dye by adding 1mM lysinein dark 10mins @ 4OC
Sample 2 Sample 3
2D gel electrophoresis
Sample 1
label with cy2in dark 30mins @ 4OC
Difference Gel Electrophoresis
Cy3 Cy5
Cy3 +Cy5
no difference ●
presence / absence ● ●
up / down-regulation ●
• In vivo labelling = Isotopes introduced during cell culture
Pro ConCheap Only works for microbes and
cell culture????Information rich Very complex samples
Have to deduce sequence before assigning pairs
–
Stable Isotope Labelling
m/z
N14 N15
Light mutant Light WTHeavy WT Heavy mutant
Growth of C.elegans on isotopically labelled E.coli
Krijsveld et al (2003) Nat. Biotech.
E.coli grown on 14N nitrogen source
E.coli grown on 15N
nitrogen source
Metabolic labelling of C.elegans
Also grew Drosophila on metabolically labelled yeast
In vitro labelling - continued
I Isotopes introduced during proteolysis 18O – labelled water, C-termini
II Guanidinylation of lysine using isotopes of O-methyl isourea – lysine residues
III Dimethyl labelling – lysine residues
–Pro Con
•Cheap Complex peptide mixture
•Universal Small mass difference on MS
Biotin Affinity Tag
Cleavable Linker
Isotope Coded Linker 227 / 236 (9*13C) amu
SH- reactive group(Iodoacetamide)
ICAT – Isotope Coded Affinity Tags
Pros Cons
Universal Protein must contain cysteineSimplified sample
Gygi SP, et al . Nat Biotechnol 1999, 17:994-999.
ICAT methodICAT method
HN NH
S
O
O
NH
X
X
X
XO
OO
H
H
H
H
NH
O
Biotin Linker (heavy or light) Thiol-specific reactive group
Gygi S, Rist B et al. (1999) Nature Biotech. 17: 994.
Control sample Test sample
SH
SH
SH
SH
SH
SHSH
SH
SH
SH
SH
SH
S
S
S
S
S
S
S
S
S
S
S
S
Denature (SDS) and reduce (TCEP)
S
S
S
S
S
S S
S
S
S
S
S
Label with light reagent
Label with heavy reagent
Pool Samples
SS
SS
S
SS
SS
S
S
S
S
SS
S
S
S
S
S
SS
S
S
SS
SS
S
SS
SS
S
LC-MSMS
S
S
S
S
S
S
S
S
S
S
S
S
Digest overnight with trypsin
Purify labelled peptides using avidin column
Cleave biotin portion of the tag with concentrated TFA
iTRAQ
Ross P. et al. Mol Cell Proteomics. 2004 Sep 22
WORKFLOWWORKFLOW
reduce, alkylate (cysteine block) and digest protein sample with trypsin as usual
label each sample (max of 4) with a different iTRAQ reagent, 100ug of protein is optimal
combine all iTRAQ labeled samples to one sample mixture
clean up sample by Cation- Exchange- Chromatography
for complex sample mixtures, pre-fractionation is achieved by using a High-Resolution-Cation-Exchange column
analyze the mixture by LC/MS/MS
results are analysed by Pro Quant Software
PROTEIN TURNOVER
The missing dimension of proteomics
JM Pratt, J Petty, I Riba-Garcia, DHL Robertson, SJ Gaskell, SG Oliver, RJ Beynon (2002)
Molec. Cell. Proteomics 1, 579-591.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80
Doubling times (0.1h-1)
Time (h)
Deuterated leucine labelling Unlabelled chase
Protein labelling curve
Loss of label from proteins at different rates = turnover
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80
Doubling times (0.1h-1)
Time (h)
Deuterated leucine labelling Unlabelled chase
Protein labelling curve
Loss of label from proteins at different rates = turnover
(100ml/h-1)
Dilution rate = 0.1h-1
Half-time = 6.9h
Experimental Approach
1119.9
1119.8
1119.9
L=0
1336.2
1454.1
1467.3
1686.3 1795.
4
2057.5
2336.5
L=2L=3
L=3
L=2
L=1
L=1
L=2
1317.8
1440.0 1444.9
1668.0
1747.1
1768.22039.2
2327.2
L=1
L=1
L=2
L=3
100% d9
0% d9
50% d9
Pratt et al., Figure 3
0h
4h
6h
8h
12h
25h
51h1520 1530 1540 1550
m/z0
100
%
0
100
%
0
100
%
0
100
%
0
100
%
0
100
%
0
100
%
1538.967
1521.909 1532.971 1554.837
1539.007
1529.9491521.946 1552.938
1538.981
1529.930
1521.941 1551.931
1538.991
1521.928 1552.943
1538.9871529.932
1520.933 1551.916
1539.029
1529.984
1523.882 1551.968
1530.129
1552.1301539.080
2100 2110 2120 2130m/z0
100
%
0
100
%
0
100
%
0
100
%
0
100
%
0
100
%
0
100
%
2126.389
2112.260
2126.443
2099.2402121.3052110.260
2099.2502126.419
2110.235 2121.241
2126.4202099.251
2122.2392112.228
2099.2472126.407
2109.259
2099.316
2121.2562111.252
2099.525
2122.5222110.462
9Da (1 Leu) 27Da (3 Leu)
0 . 2
0 . 4
0 . 6
0 . 8
1
0 1 0 2 0 3 0 4 0 5 0 00
0 . 2
0 . 4
0 . 6
0 . 8
1
1 0 2 0 3 0 4 0 5 0 6 0Time(h) Time(h)
RIA
tR
IAt
NADP-glutamate dehydrogenase (GDH) (3 peptides)
Hsp26(2 peptides)
Hsp71 (4 peptides) Pyruvate decarboxylase (PDC)(4 peptides)
0.16
0.08
0NADP-GDH Hsp26 Hsp71 PDC
k lo
ss
(h-1)
± S
EM
Pratt et al., Figure 3
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 910 11 12 13 14 15 15 16 17 18 19 20 21 22 23 25 26 27 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Protein (Spot ID)
Deg
rada
tion
rate
con
stan
t (h-1
) ±
SE
MPratt et al., Figure 5
30
20
10
0
< 0.01h-1
0.01-0.02 h-1
0.02-0.03 h-1
0.03-0.04 h-1
> 0.04 h-1
Degradation rate constant
Dis
trib
uti
on
(%
)
INTEGRATION
Evaluating protein-interactiondata
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002)
Comparative assessment of large-scale data sets of protein–protein interactions.
Nature 417, 399-403.
Cornell M, Paton NW, Oliver SG (2004)A critical and integrated view of the yeast interactome.
Comp. Funct. Genom. 5, 382-402
The fusion of the “bait” protein and the DNA binding domain of the transcriptional activator cannot turn on the Reporter Gene.
(A)
(B)
(C)
A DNA Binding Domain Fused to Protein A B
Activator region fused to Protein B
Transcription A A B B
Activator region fused to Protein B
UAS LacZ
Promoter
UAS LacZ UAS LacZ
Promoter
Transcription Transcription
UAS LacZ
Promoter
B
Activator region fused to Protein B
UAS LacZ
Promoter
UAS LacZ UAS LacZ
Promoter
B B
Activator region fused to Protein B
UAS LacZ
Promoter Reporter Gene
A DNA Binding Domain Fused to Protein A
UAS LacZ
Promoter
UAS LacZ UAS LacZ
Promoter
The fusion of the “prey” protein and the activating region of the transcriptional activator is also insufficient to switch on the reporter.
Reporter Gene
Reporter Gene
The association of “bait” and “prey” brings the DNA binding domain and the activator region close enough to switch on the Reporter Gene and turn yeast blue.
Fig. 1 How the two-hybrid system detects protein associations in yeast.
UAS
DNA-binding D
reporter gene
activation D
AB
RNA POL II
Schematic representation of the two hybrid system in case of interaction of protein A and B
Gene expression
UAS
DNA-binding D
reporter gene
activation D
A
B
RNA POL II
Schematic representation of the two hybrid system in absence of interaction of protein A and B
NO TRANSCRIPT
Synthetic lethals
Definition: lethality is caused by mutating two or more genes
gene2
gene3
gene4
Single essential pathway
gene5
gene1
gene2
gene3
gene4
gene5
gene1
geneA
geneB
geneC
Functionally overlapping pathways
Dolpp-GlcNAc2Man9Glc3
(Substrate)
Asparagine-linked Glycosylation
Asp-NH2
X
SER/THR
+ Asp -NH -GlcNAc2Man9Glc3
X
SER/THR
alg mutations are synthetically lethal withconditional mutation affecting oligosaccharyltransferase activity
STT3, OST1WBP1, OST3OST6, SWP1OST2OST5OST4
(ALG genes are responsible for the core synthesis)
Integrating complex data with yeast two-hybrid data
A
B C
DEF
Complex consists of six proteinsA, B, C, D, E, F
AIn a yeast two-hybrid experiment, A interacts with another protein
Is B, C, D, E or F?
Large-scale interaction data and the distribution of interactions according to functional categories.
Quantitative comparison of interaction datasets.
Set of confirmed Y2H interactions
Confirmation of an interaction requires:
1. Identification in more than one Y2H screen, OR2. The reverse interaction must have been identified,
OR3. The two proteins must have been identified in the
same protein complex (from either classical or high-throughput affinity purification studies).
A total of 451 reliable interactions, involving 581 proteins have been identified from a combined data set comprising 5214 interactions and 4025 proteins
PEDRo: A Systematic Approach to Modelling, Capturing and
Disseminating Proteomics Data Taylor CF, Paton NW, Garwood KL,
Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L, Walker J,
Riba–Garcia I, Mohammed S, Deery MJ, Howard JA,
Dunkley T, Aebersold R, Kell DB, Lilley KS, Roepstorff P,
Yates JR III, Brass A, Brown AJP, Cash P, Gaskell SJ, Hubbard SJ, Oliver SG (2003)
Nature Biotechnol. 21, 247-591.
Garwood K, McLaughlin T, Garwood C, Joens S, Morrison N, Taylor CF, Carroll K, Evans C, Whetton AD, Hart S, Stead D, Yin Z, Brown AJP, Hesketh A, Chater K, Hansson L, Mewissen M, Ghazal P, Howard J, Lilley KS,
Gaskell SJ, Brass A, Hubbard SJ, Oliver SG, Paton NW (2004)
PEDRo: A database for storing, searching and disseminating experimental proteomics data.
BMC Genomics 5, 68 doi:10.1186/1471-2164-5-68.
Proteomics — the state of play• The volume of generated proteome data is rapidly increasing
– Movement towards high–throughput approaches– Experimental techniques increasing in complexity– Analyses also increasing in complexity
• Current publicly available proteomics data is limited– 2D–Gel image databases (e.g. SWISS–2DPAGE) contain little information about sample
preparation, or analysis of results– No widely used databases of mass spectrometry data or analyses
• A robust, future-proofed, standard representation of both methods and data from proteomics experiments is required
– Analogous to the MIAME guidelines for transcriptomics– Users will know what to expect from datasets (formats etc.)– Will facilitate handling, exchange and dissemination of data– Will guide the development of effective search/analysis tools
PEDRo and PEML• The PEDRo (Proteome Experiment Data Repository) model
– Specifies the information required about a proteomics experiment• sufficient information to exactly replicate that experiment
– Organised in a manner reflecting the procedures that generated it– Flexible enough to accommodate new technological developments– Described in UML (Universal Modelling Language) making it
implementation–independent (effectively a generic blueprint)• Implemented in SQL (the relational database repository)• Also implemented in Java (later slide), and XML (next bullet)
• PEML (Proteomics Experiment Markup Language)– The XML implementation of PEDRo for data exchange and rapid
dissemination (using XSLT to display PEML files as web pages)
• Two benefits arising from early implementation of the model– Implementation allows the underlying technologies to be tested– Making explicit what data might most usefully be captured about
proteomics experiments will speed the model’s evolution
• Sample generation– Origin of sample
• hypothesis, organism, environment, preparation, paper citations
• Sample processing– Gels (1D/ 2D) and columns
• images, gel type and ranges, band/spot coordinates
• stationary and mobile phases, flow rate, temperature, fraction details
• Mass Spectrometry• machine type, ion source, voltages
• In Silico analysis• peak lists, database name + version,
partial sequence, search parameters, search hits, accession numbers
The nature of proteomics experiment data
The PEDRo UML schema in reduced form
MALDI
Electrospray
ToF
Spot Gel2D
TreatedAnalyte ChemicalTreatment
DiG EGelItemBoundaryPoint
GelItemRelatedGelItem
Quadrupole
CollisionCell
IonTrap
Hexapole
Organism TaggingProcess
Band Gel1D
OtherIonisation
OntologyEntry
OthermzAnalysis
OtherAnalyte
OntologyEntry
OtherAnalyte ProcessingStep
Fraction
AssayDataPoint
ColumnGradientStep
MobilePhaseComponentPercentX
Detection
mzAnalysis
AnalyteProcessingStep
IonSource
Analyte
MassSpecMachine
Peak-Specific
ChromatogramIntegration
Chromatogram
Point
ListProcessing
MSMSFraction
MassSpecExperiment
Peak
PeakList
TandemSequenceData
DBSearchParameters
RelatedGelItem
Protein
DBSearch
OntologyEntryProteinHit
PeptideHit
DiG EGel
Gel
Experiment
SampleOrigin
S
MALDI
l aser_wavel engthl aser_powermatrix_ ty pegri d_vo ltage
accel eration _vo ltagei on_ mode
Electrospray
spray _ti p_v oltagespray _ti p_diametersoluti on _vol tage
con e_v oltageloading_ty pesolven tin terface_man uf actu rer
spray _ti p_man uf actu rer
ToF
ref lectron _statei n ternal _len gth
Spot
apparen t_ piapparen t_ mass Gel 2D
pi_start
pi_endmass_startmass_en df irst_ dim_detail s
sec on d_di m_ detai ls
* 1
TreatedAn aly te
Chemi calT reatmen t
digesti onderivatisation s1 1
DiGEGelI te m
dy e_ty pe
Bou nd aryPoi n t
pixel _x_c oordpixel _y_c oord
Gel Item
idareain ten sityloca l_backgroun d
ann otationann otation _sou rc evolu mepixe l_x _coord
pixe l_y _coordpixe l_radiusn ormalisati onn ormalised_vo lume
1
1
*
Rela tedGelItem
descriptionge l_referen ceitem_ referen ce
1
*
Qu adru pol e
descri ption
Co lli si onCell
gas_typegas_pressu recol lision_o ff set
IonTrap
gas_ ty pegas_ pressu rerf_ frequ ency
ex citation _amplitu deisol ation _centreisol ation _wi dthf in al_ms_level
Hexapol e
descri ption
Experiment
h yp othesis
meth od_ citation sresu lt_ci ta tions
Sample
sample_ id
sample_ dateexperi menter
*1
Organ ism
speci es_n amestrai n _i dentifi erre lev an t_gen oty pe
Sample Origi n
descripti oncon dit ioncon dit ion _degreeenv iron men t
tissue_ty pecel l_ty pecel l_cyc le_ ph asecel l_component
tech ni qu emetabo lic_ label
1 .. n
*
* 1
Taggi n gP roc ess
ly sis_ bu ffer
tag_ ty petag_ purityprotei n _c oncen trationtag_ con cen tration
f in al _vo lu me
* 0 .. 1
Ban d
lane_ n um berapparen t_ mass
Gel 1D
den aturin g_ agen tmass_startmass_en dru n_detail s
* 1
Detection
type
Oth erI on isation
n ame
On tol ogy Entry
categoryva lu edescri ption
*
1
*
1
i on isa tio n_
pa ra meters
Oth ermzAn aly si s
n ame
*
1
*
1
mz_a nal ysi s
_pa rameters
Oth erAn aly te
n ame
On tology Entry
categoryval u e
descri ption
*
1
an aly te _pa ra metersOth erAn aly teProcessin gStep
n ame*
1
*
1
a nal yte_p ro cessi ng
_step _para meters
Fraction
start_poin t
end_po intprotein _assay
Assay DataPoi nt
t imeprotei n _assay
Colu mn
descripti onman ufactu rerpart_n um ber
batch _n um berin terna l_l en gthin terna l_diameterstati on ary _ph ase
bead_sizepore_ si zetemperature
fl ow_ratein jection_ volu meparameters_f ile* 1
0 .. 1
1
*1{o rdered }
Gradien tStep
step_time1*{o rdered }
MobilePh ase
Compon en t
descriptioncon cen tration
* 1
PercentX
percentage2 .. n
1
11
mzAnal ysis
type
0 .. 10 .. 1
0 .. 1
1
AnalyteProcessingStep
IonSource
typeco lli si on_ energy
1
0 .. 1
Analyte
*
1
MassSpecMach in e
man ufactu rermodel_ n amesoftware_version
1
Peak Sp ec ificChromatogramIn tegration
resol ution
sof tware versionbackgroun d_th resh ol darea_un der_ curvepeak_ description
sister_peak _re ference
ChromatogramPoi n t
ti me_po in t
ion_ cou nt
L istProcessin g
smooth ing_process
bac kgroun d_th resh old
MSMSF raction
target_ m_to_z
plus_ or_mi n us
MassSpecExperimen t
descripti onparameters_f ile
*
1
*1
Peak
m_to_z
abu nd anc emultiplici ty
1
*
1
0 .. 1*
ha s_chi l dren
1*
1*
1
1 .. n
*
1
Tan demSequen ceData
sou rce_ ty pesequence
DBSearch Parametersprogram
databasedatabase_dateparameters_ fil e
taxon omica l_f il te rf ixed_modi fi cation svariable_modi fi cation smax_missed_cl eavages
mass_val u e_ ty pef ragment_i on_ to leran cepeptide_mass_tol eranceac curate_ mass_mode
mass_error_ty pemass_errorproton atedicat_opti on
RelatedGelItem
descri ptionge l_referen ceitem_ referen ce
Protein
ac cession _nu mbergen e_ n amesynon ym s
organ ismorf _nu mberdescri ptionsequence
modi fi cationspredicted_masspredicted_pi
DBSearch
u sernamei d_daten-termin al_aac- termi n al _aa
cou nt_ of _spec if ic_aan ame_of_ coun ted_ aaregex_pattern
*1
1*
1
*
On tol ogy Entry
categoryva lu edescription
*1
db _s ea rch_p ara me ters
Protein Hi t
a ll_pepti des_match ed1
1
*
Pepti deHit
score
s co re _types eq ue nce
1* {o rdered }
*
1 .. n
p ep ti de_hit_pa rameters
1 .. n
DiGEGel
dy e_ty pe
excitation _wave len gthexposure_timetif f _image
Gel
descriptionraw_ imageann otated_ imagesof tware_ version
warped_imagewarping_mapequ ipmentpercent_ acrylamide
sol ubi lizati on_ bu fferstain _detail sprote in _assayin-gel_ digesti on
backgroun dpixel _size_ xpixel _size_ y
1
{o rde red}
Sam pl e Gener ation Sam pl e Proce ssing
Mas s Spectro met ry MS Re sults A n aly sis
PE D Ro UML Class Diag r a m : K ey to col ou rs
The Framework Around PEDRo
1. Lab generated data is encoded using the PEDRo data entry tool, producing an XML (PEML) file for local storage, or submission
2. Locally stored PEML files may be viewed in a web browser (with XSLT), allowing web pages to be quickly generated from datasets
3. Upon receipt of a PEML file at the repository site, a validation tool checks the file before entering it into the database
4. The repository (a relational database) holds submitted data, allowing various analyses to be performed, or data to be extracted as a PEML file or another format
The PEDRo Data Collator• The tool with which a user enters information about, and data from, proteomics experiments
–The tool collates these data into a single PEML file
–The hierarchical nature of the PEDRo schema (and PEML) is reflected in the structure of the data entry tool
• Successive stages of the experimental design are added as ‘children’ of the previous stage
• Enforces an audit trail for data; e.g. details of a gel cannot be entered without first describing the sample
• A simple, filterable list of all the sub–records present and tree-style browser act as ‘index’ and ‘contents’ for the PEML file being edited
Conclusions• The PEDRo model does require a substantial amount of data
– Much of this information will be available in the lab of origin– Some data will be common to many experiments, and therefore need only be
entered once, then saved as a template in PEDRoDC
• But there are several advantages to adopting such a model– All datasets will contain information sufficient to quickly establish the
provenance and relevance (to the researcher) of a dataset– Datasets will be detailed enough to allow non–standard searches, for
example, by sample extraction technique– Tools can be developed that allow easy access to large numbers of
such datasets, from a wide range of proteomics sites– Integration with other resources such as the major sequence
databases, will provide sophisticated search and analysis capability– Information exchange between researchers will be facilitated through
the use of a common language (PEML), and the ability to rapidly display PEML-encoded data as a web page