Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant...

5
Frank Schmidt 1 Angelika Lueking 1 Eckhard Nordhoff 1, 2 Johan Gobom 1 Joachim Klose 2, 3 Harald Seitz 1 Volker Egelhofer 1 Holger Eickhoff 1 Hans Lehrach 1 Dolores J. Cahill 1, 2 1 Max Planck Institute for Molecular Genetics, Berlin, Germany 2 PROTAGEN, Bochum, Germany 3 Charité, Department for Human Genetics, Humboldt-University, Berlin, Germany Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant proteins We describe the technical feasibility and methodology to characterize a protein by a minimal set of structural information generated by matrix assisted laser desorption/ ionization (MALDI)-mass spectrometry, termed a “minimal protein Identifier” (MPI). MPIs can be determined for proteins from two-dimensional gels and recombinant pro- teins and can be used to compare and identify proteins from these sources. Keywords: Matrix assisted laser desorption/ionization-time of flight-mass spectrometry / Minimal protein identifier / Protein arrays / Proteomics / Two-dimensional gel electrophoresis EL 4773 1 Introduction Proteomics can be defined as the systematic analysis and characterization of proteins in biological samples [1, 2]. Presently, the dominant approach to proteomics com- bines two-dimensional gel electrophoresis (2-DE) with mass spectrometry (MS) to identify both the amino acid sequence and the post-translational modifications of the fractionated proteins [3]. The contribution of proteomics to study biological processes is well recognized, since, although differential gene expression profiling gives valu- able information on gene activities at specific time points [4–7], such approaches cannot predict protein abun- dances, secondary modifications and protein interac- tions. With the combination of 2-DE and MS, the sys- tematic analysis of large protein populations has become possible. This approach is assisted by the increasing speed of computer processing power and the develop- ment of database search algorithms that use mass spec- trometric data to retrieve matching protein sequences. Individual proteins are identified in protein or DNA se- quence databases using mass spectrometrically deter- mined peptide mass maps [8–12], sequence tags [13–15] or fragment-ion fingerprints of individual cleavage pep- tides [16–18]. When DNA databases are searched, the software translates each DNA sequence into its corre- sponding protein sequence in all reading frames, followed by in silico proteolysis, resulting in a set of predicted Correspondence: Dr. Dolores J. Cahill, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, D-14195 Berlin, Germany E-mail: [email protected] Fax: +49-30-8413-1128 Abbreviation: MPI, minimal protein identifier peptide sequences, which provide the basis for protein identification by the named methods. Obviously, the ap- proach is more successful when databases containing sequences of well-characterized proteins are searched. In many cases, however, the identification fails or, worse, false positive sequence entries are retrieved from the searched databases. There are a number of reasons for this: (i) the currently available databases contain many sequence errors, which result in the prediction of incor- rect proteolytic peptides and their fragment-ion finger- prints, (ii) splicing and editing of mRNA sequences, post- translational truncation of polypeptide chains as well as a large pool of possible secondary modifications of specific residues, and (iii) methodological artifacts during protein extraction, separation and all subsequent sample pre- paration steps, as well as the artifacts arising from mass spectrometric analyses that can alter the molecular mass of the detected products and, thereby, obscure the results. To overcome these limitations, we have developed a novel concept, where each protein can be characterized by a minimal set of structural information generated by MS, previously termed “minimal protein identifier” (MPI) [19]. MPIs contain experimentally determined proteolytic pep- tide molecular mass maps of recombinant or homologous proteins, recorded by high-throughput (MALDI-TOF-MS). Our approach is based on the availability of a large set of recombinant proteins which we have previously generated from a cDNA expression library derived from human fetal brain [20], which contains a significant portion of the pro- teins, encoded by the human genome. This original library was redundant [20] and a nonredundant set can be gener- ated by analysis on the DNA level such as oligonucleotide fingerprinting, to generate a so-called UNIprotein set [21]. A UNIgene-UNIprotein set provides an immortal source of both recombinant DNA and the encoded proteins. The DNA can be amplified by PCR and subsequently arrayed to generate DNA chips and then used to profile mRNA levels by differential expression profiling. The proteins of Electrophoresis 2002, 23, 621–625 621 ª WILEY-VCH Verlag GmbH, 69451 Weinheim, 2002 0173-0835/02/0402–621 $17.50+.50/0 Proteomics and 2-DE

Transcript of Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant...

Page 1: Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant proteins

Frank Schmidt1

Angelika Lueking1

Eckhard Nordhoff1, 2

Johan Gobom1

Joachim Klose2, 3

Harald Seitz 1

Volker Egelhofer1

Holger Eickhoff1

Hans Lehrach1

Dolores J. Cahill1, 2

1Max Planck Institute forMolecular Genetics,Berlin, Germany

2PROTAGEN,Bochum, Germany

3Charité, Department for HumanGenetics, Humboldt-University,Berlin, Germany

Generation of minimal protein identifiersof proteins from two-dimensional gels andrecombinant proteins

We describe the technical feasibility and methodology to characterize a protein by aminimal set of structural information generated by matrix assisted laser desorption/ionization (MALDI)-mass spectrometry, termed a “minimal protein Identifier” (MPI).MPIs can be determined for proteins from two-dimensional gels and recombinant pro-teins and can be used to compare and identify proteins from these sources.

Keywords: Matrix assisted laser desorption/ionization-time of flight-mass spectrometry / Minimalprotein identifier / Protein arrays / Proteomics / Two-dimensional gel electrophoresis EL 4773

1 Introduction

Proteomics can be defined as the systematic analysisand characterization of proteins in biological samples [1,2]. Presently, the dominant approach to proteomics com-bines two-dimensional gel electrophoresis (2-DE) withmass spectrometry (MS) to identify both the amino acidsequence and the post-translational modifications of thefractionated proteins [3]. The contribution of proteomicsto study biological processes is well recognized, since,although differential gene expression profiling gives valu-able information on gene activities at specific time points[4–7], such approaches cannot predict protein abun-dances, secondary modifications and protein interac-tions. With the combination of 2-DE and MS, the sys-tematic analysis of large protein populations has becomepossible. This approach is assisted by the increasingspeed of computer processing power and the develop-ment of database search algorithms that use mass spec-trometric data to retrieve matching protein sequences.

Individual proteins are identified in protein or DNA se-quence databases using mass spectrometrically deter-mined peptide mass maps [8–12], sequence tags [13–15]or fragment-ion fingerprints of individual cleavage pep-tides [16–18]. When DNA databases are searched, thesoftware translates each DNA sequence into its corre-sponding protein sequence in all reading frames, followedby in silico proteolysis, resulting in a set of predicted

Correspondence: Dr. Dolores J. Cahill, Max Planck Institute forMolecular Genetics, Ihnestrasse 73, D-14195 Berlin, GermanyE-mail: [email protected]: +49-30-8413-1128

Abbreviation: MPI, minimal protein identifier

peptide sequences, which provide the basis for proteinidentification by the named methods. Obviously, the ap-proach is more successful when databases containingsequences of well-characterized proteins are searched.In many cases, however, the identification fails or, worse,false positive sequence entries are retrieved from thesearched databases. There are a number of reasons forthis: (i) the currently available databases contain manysequence errors, which result in the prediction of incor-rect proteolytic peptides and their fragment-ion finger-prints, (ii) splicing and editing of mRNA sequences, post-translational truncation of polypeptide chains as well as alarge pool of possible secondary modifications of specificresidues, and (iii) methodological artifacts during proteinextraction, separation and all subsequent sample pre-paration steps, as well as the artifacts arising from massspectrometric analyses that can alter the molecular massof the detected products and, thereby, obscure the results.

To overcome these limitations, we have developed a novelconcept, where each protein can be characterized by aminimal set of structural information generated by MS,previously termed “minimal protein identifier” (MPI) [19].MPIs contain experimentally determined proteolytic pep-tide molecular mass maps of recombinant or homologousproteins, recorded by high-throughput (MALDI-TOF-MS).

Our approach is based on the availability of a large set ofrecombinant proteins which we have previously generatedfrom a cDNA expression library derived from human fetalbrain [20], which contains a significant portion of the pro-teins, encoded by the human genome. This original librarywas redundant [20] and a nonredundant set can be gener-ated by analysis on the DNA level such as oligonucleotidefingerprinting, to generate a so-called UNIprotein set [21].A UNIgene-UNIprotein set provides an immortal source ofboth recombinant DNA and the encoded proteins. TheDNA can be amplified by PCR and subsequently arrayedto generate DNA chips and then used to profile mRNAlevels by differential expression profiling. The proteins of

Electrophoresis 2002, 23, 621–625 621

ª WILEY-VCH Verlag GmbH, 69451 Weinheim, 2002 0173-0835/02/0402–621 $17.50+.50/0

Pro

teo

mic

san

d2-

DE

Page 2: Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant proteins

both the cDNA library and the UNIprotein set can beexpressed and either used to generate high density proteinarrays or the proteins can be enzymatically digested andthen analyzed by MS and, thereby, each protein will haveits MPI determined and stored in a database. Because therecombinant clones can be sequenced, the UNIprotein setoffers the distinct advantage that a direct connection or“bridge” can be made from DNA sequence information ofindividual clones to their protein products and converselya connection can be made from the protein products,for example, from 2-D gels to the corresponding DNAsequence. This paper demonstrates the technical feasibil-ity to identify homologous proteins isolated by 2-DE bycomparing the recorded MPIs with those determined fortheir recombinant counterparts.

2 Materials and methods

2.1 Strains, transformation and media

Escherichia coli strains XL-1Blue, BL21(D3)pLysS (Invitro-gen, Leek, The Netherlands) and SCS1 (Stratagene, LaJolla, CA, USA) were used for cloning and expression asdescribed [20, 22]. Pichia pastoris: strain GS115 (his4,Mut+; Invitrogen) was used for eukaryotic protein expres-sion as described [22].

2.2 Generation of peptide molecular mass maps

Bacterial protein expression, in strain SCS1, was per-formed as described [20], and the expression in strainBL21(D3)pLysS was performed as described [22]. Theexpressed proteins were metal-chelate affinity purifiedand digested with trypsin as previously reported [23, 24].Sample desalting and enrichment was achieved usingmicro-scale reversed-phase purification (ZipTip-C18, Milli-pore, Bedford, MA, USA) purification, following the proto-col provided by the manufacturer. Mass spectra of posi-tively charged ions were recorded on a Bruker Scout 384Reflex III instrument (Bruker Daltonik, Bremen, Germany)operated in the reflector mode. One-hundred single-shotspectra were accumulated from each sample. The totalacceleration voltage was 25 kV [25]. Proteins containedin a human fetal brain total protein extract were isolatedby the large-gel 2-DE technique according to a previouslypublished protocol [26]. These proteins are, henceforth,referred to as homologous proteins. The fractionated pro-teins were stained with Coomassie Brilliant Blue G-250. Asubset of the detected protein spots was excised andcharacterized by MALDI-TOF-MS peptide mapping asdescribed [24, 25, 27], using default calibration constantsyielding a mass error of maximum 100 ppm.

2.3 Software and database searching

For protein identification, the Mascot software package(Matrix Science Ltd., UK, http://www.matrixscience.com)and the in-house developed software, MSA, were used[24], free access to which is provided at http://www.molgen.mpg.de/~mass-spec/. For comparison of MALDI-TOF mass spectra, the freeware “M/Z” was used, whichwas obtained from Proteometrics (www.proteometrics.com). For detailed analysis of peptide molecular massmaps, the software package GPMAW, Version 4.21(Lighthouse Data, Odense, Denmark) was used.

3 Results and discussion

3.1 Comparison of MPIs of homologousand recombinant proteins

From a 2-DE separation of a crude human fetal brainprotein extract, the proteins aconitate hydrogenase(Q99798), pyruvate kinase (P14618), GTP binding protein(P17080), tubulin �-1 chain (P04687) and tubulin �-3 chain(Q13509) were identified in the SWISS-PROT database(release January 18th, 2001) by MALDI-TOF-MS peptidemapping. The corresponding recombinant proteins wereselected from the UNIgene-UNIprotein set ([21]; Luekinget al., submitted) and expressed in E. coli. The expressionproducts were metal-chelate affinity purified, digested,and the cleavage peptides generated were analyzed byMALDI-TOF-MS. Trypsin was used as the cleavageenzyme for both homologous and recombinant proteins,and cysteine residues were not alkylated.

Figures 1A and B compare the peptide mass mapsobtained from the purified homologous and recombinanthuman GTP binding protein (GTP). Masses that matchwithin the maximum expected deviation of 100 ppmare labeled with triangles in the figure. These massesand the corresponding database hits are summarizedin Table 1. Fourteen peptides from recombinant GTPmatched signals from its corresponding homologous pro-tein obtained from 2-D gels. 9 peptides matched a subsetof the all possible cleavage products (complete digestion)predicted for the GTP sequence (P17080) (Table 1). Forthe homologous GTP protein, 12 of all possible peptideswere detected (Table 2).

3.2 Size distribution of the detected peptidemasses

Figures 2A and B show a comparison of the mass distri-bution of the detected peptides from the homologous andrecombinant protein digests. It is clearly visible that in all

622 F. Schmidt et al. Electrophoresis 2002, 23, 621–625

Page 3: Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant proteins

Figure 1. Comparison of the tryptic peptide mass mapsobtained from (A) homologous and (B) recombinant GTPbinding protein. Signals that are identical within theexperimental mass accuracy are labeled with trinagles.

Table 1. Monoisotopic molecular masses of peptide ionsdetected in the peptide maps of both homolo-gous and recombinant GTP binding protein(Fig. 1)

m/z homolo-gous GTP

m/z recom-binant GTP

m/zidentical

m/ztheoretical

758.43 758.41 � 758.42922.46 922.45 � 922.46938.45 938.44 � �954.45 954.44 � �

1214.62 1214.59 � 1214.601244.67 1244.64 � 1244.671294.62 1294.59 � 1294.601349.66 � � 1349.641543.84 � � 1543.791689.89 1689.83 � 1689.851784.91 1784.89 � 1784.901800.90 1800.88 � �1816.89 1816.86 � �2052.11 2052.08 � 2052.092066.17 2066.14 � �2180.21 2180.18 � 2180.192281.10 � � 2281.05

Those peptides that match the theoretical peptidemasses for this protein are also indicated.

Table 2. The number of matched peptide masses ofhomologous and recombinant proteins as com-pared to the theoretical digestion (completedigest) of the corresponding protein

Protein Databasehits homo-logousproteins

Databasehits re-combinantproteins

Identicalm/z

Pyruvate kinase 12 10 11GTP binding protein 12 9 14Aconitate hydratase 13 10 5Tubulin �-1 chain 11 5 4Tubulin �-3 chain 10 11 2

Additionally, the numbers of identical peptide massesfound in the corresponding homologous and recombinantproteins are shown.

Figure 2. Distribution of the peptide molecular massesfound in the tryptic digests of (A) the homologous proteinversus (B) the recombinant proteins analyzed.

cases the peptide maps of the recombinant proteinsincluded larger peptides (up to 4500 Da) than the corre-sponding homologous proteins. This was expectedbecause it is well known that larger cleavage productsare less efficiently recovered from the gel matrix, used forprotein separation. Our results suggest that for identifica-tion of 2-DE separated proteins in an MPI-database con-taining peptide mass maps of recombinant proteins, themost prominent peptide molecular mass range fallsbetween 600 Da and 3000 Da.

Electrophoresis 2002, 23, 621–625 Generation of minimal protein identifiers 623

Page 4: Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant proteins

3.3 Influence of expression by different hostson the MPIs

The generation of a database containing MPIs may useproteins expressed in different expression systems.Therefore, it is important to analyze whether the expres-sion by different hosts influences the recorded peptidemass spectra. Since cDNA expression libraries are mainlygenerated in E. coli [20] and, only recently, yeast ex-pression libraries have been described [22, 28], we havetested E. coli and the yeast P. pastoris as referenceexpression hosts. Human GAPDH (NCBI Acc. No.CAA25833) was expressed in both hosts using the dualexpression vector suitable for expression in P. pastorisand E. coli [22]. Twenty-two of a total of 50 monoisotopicpeptide masses assigned for GAPDH when expressed inE. coli, and 56 when expressed in P. pastoris, matchedwithin the experimental error bars (Figs. 3A and B, re-spectively). Comparing these to the 33 expected peptidemasses (complete digestion), 12 and 14 hits were ob-tained (Table 3). This indicates that MPIs can be deter-mined regardless of the expression host, offering the pos-

Figure 3. Comparison of the tryptic mass spectra ob-tained from recombinant human GAPDH expressed intwo different expression hosts. (A) Peptide mass map ofGAPDH expressed in P. pastoris. (B) Peptide mass mapof GAPDH expressed in E. coli.

Table 3. Monoisotopic molecular masses of peptide ionsdetected in the peptide maps of the recombi-nant GAPDH of P. pastoris and E. coli (Fig. 3)

m/z recombi-nant GAPDHP. pastoris

m/z recombi-nant GAPDHE. coli

m/zidentical

m/ztheoretical

794.40 794.41 � 794.41804.41 804.42 � 804.42869.48 869.50 � 869.49879.62 879.62 � �908.47 � � 908.48

1014.67 1014.69 � �1031.57 1031.59 � 1031.591064.55 1064.58 � 1064.581117.60 1117.63 � �1200.57 1200.61 � 1200.601241.59 1241.62 � �1370.64 1370.68 � �1410.73 1410.79 � 1410.781448.60 1448.65 � �1472.72 1472.78 � 1472.771541.51 1541.57 � �1612.85 1612.91 � 1612.891645.85 1645.91 � 1645.891738.90 1738.95 � 1738.931754.46 1754.52 � �1762.76 1762.82 � 1762.802115.72 2115.78 � �2160.48 2160.55 � �2594.29 � � 2594.35

Peptides, which are identical from both sources and withthe theoretical peptides within experimental error, are alsoindicated.

sibility to use different expression systems and libraries.In conclusion, we have demonstrated the feasibility ofgenerating MPIs from proteins isolated by 2-DE and frompurified recombinant proteins, and we have shown howthese MPIs may be used to identify and compare proteinsfrom multiple sources.

We thank K. D. Kloeppel for assistance with the massspectrometric instrumentation. This work was funded bythe German Ministry for Education and Research (BMBFGrant 0311870 DHGP Grant 01KW9914/7) and the Max-Planck-Society. D. J. C. acknowledges support of theHealth Education Authority, Dublin 2, Ireland. This workwas done in partial fulfillment of the requirements for thePhD of V. Egelhofer, Free University, Berlin, Germany.

Received July 2, 2001

4 References

[1] Blackstock, W. P., Weir, M. P., Trends Biotechnol. 1999, 17,121–127.

[2] Lee, K. H., Trends Biotechnol. 2001, 19, 217–222.

624 F. Schmidt et al. Electrophoresis 2002, 23, 621–625

Page 5: Generation of minimal protein identifiers of proteins from two-dimensional gels and recombinant proteins

[3] Anderson, N. L., Matheson, A. D., Steiner, S., Curr. Opin.Biotechnol. 2000, 11, 408–412.

[4] Harrison, S. M., Dunwoodie, S. L., Arkell, R. M., Lehrach, H.,Beddington, R. S. P., Development 1995, N8, 2479–2489.

[5] Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O.,Davis, R. W., Proc. Natl. Acad. Sci. USA 1996, 93, 10614–10619.

[6] Perret, E., Ferran, E. A., Marinx, O., Liauzun, P., Dumont, X.,Fournier, J., Kaghad, M., Ferrara, P., Caput, D., Gene 1998,208, 103–115.

[7] Schena, M., Heller, R. A., Theriault, T. P., Konrad, K., Lachen-meier, E., Davis, R. W., Trends Biotechnol. 1998, 7, 301–306.

[8] Yates III, J. R., Speicher, S., Griffin, P., Hunkapiller, T., Anal.Biochem. 1993, 214, 397–408.

[9] Pappin, D. J. C., Hojrup, P., Bleasby, A. J., Curr. Biol. 1993, 3,327–332.

[10] Mann, M., Hojrup, P., Roepstorff, P., Biol. Mass Spectrom.1993, 22, 338–345.

[11] James, P., Quadroni, M., Carafoli, E., Gonnet, G., Biochem.Biophys. Res. Commun. 1993, 1, 58–64.

[12] Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley,C., Watanabe, C., Proc. Natl. Acad. Sci. USA 1993, 90,5011–5015.

[13] Mann, M., Wilm, M., Anal. Chem. 1994, 66, 4390–4399.[14] Wilm, M., Shevchenko, A., Houthaeve, T., Breit, S., Schwei-

gerer, L., Fotsis, T., Mann, M., Nature 1996, 379, 466–469.[15] Shevchenko, A., Wilm, M., Vorm, O., Mann, M., Anal. Chem.

1996, 68, 850–858.

[16] Yates, J. R., Eng, J. K., McCormack, A. L., Anal. Chem.1995, 67, 3202–3210.

[17] Yates, J. R., Morgan, S. F., Gatlin, C. L., Griffin, P. R., Eng,J. K., Anal. Chem. 1998, 70, 3557–3565.

[18] Feny, D., Curr. Opin. Biotechnol. 2000, 11, 391–395.[19] Cahill, D. J., Nordhoff, E., O’Brian, J., Klose, J., Eickhoff, H.,

Lehrach, H., Proteomics: From Protein Sequence to Func-tion, BIOS Scientific Publishers, Oxford 2000, pp. 1–20.

[20] Büssow, K., Cahill, D. J., Nietfeld, W., Bancroft, D., Scher-zinger, E., Lehrach, H., Walter, G., Nucleic Acids Res. 1998,26, 5007–5008.

[21] Cahill, D. J., J. Immunol. Methods 2001, 250, 81 – 91.[22] Lueking, A., Holz, C., Gotthold, C., Lehrach, H., Cahill, D. J.,

Prot. Expr. Purif. 2000, 20, 372–378.[23] Büssow, K., Nordhoff, E., Lübbert, C., Lehrach, H., Walter,

G., Genomics 2000, 65, 1–8.[24] Egelhofer, V., Büssow, K., Luebbert, C., Lehrach, H., Nord-

hoff, E., Anal. Chem. 2000, 72, 2741–2750.[25] Gobom, J., Schuerenberg, M., Mueller, M., Theiss, D., Leh-

rach, H., Nordhoff, E., Anal. Chem. 2000, ASAP Article, WebRelease Date: December 28.

[26] Klose, J., Kobalz, U., Electrophoresis 1995, 16, 1034–1059.[27] Nordhoff, E., Egelhofer, V., Giavalisco, P., Eickhoff, H., Horn,

M., Przewieslik, T., Theiss, D., Schneider, U., Lehrach, H.,Gobom, J., Electrophoresis 2001, 22, 2844–2855.

[28] Holz, C., Lueking, A., Bovekamp, L., Gutjahr, C., Bolotina,N., Lehrach, H., Cahill, D. J., Genome Res. 2001, 11, 1730–1735.

Electrophoresis 2002, 23, 621–625 Generation of minimal protein identifiers 625