Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli...

12
REGULAR ARTICLE Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements Kunal Aggarwal, Leila H. Choe and Kelvin H. Lee School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA We describe the use of amine-specific isobaric tags for protein expression quantification to study the effect of rhsA element over-expression in Escherichia coli. The use of an isobaric tagging strategy facilitates a shotgun approach to proteomic analysis and enables quantitation of up to four samples in parallel, based on the reporter ion series using tandem mass spectrometry (MS/MS). Using a liquid chromatography matrix-assisted laser desorption/ionization approach, 23 139 MS/MS spectra were collected. Five thousand sixty-three peptides derived from 780 proteins were quantified including several lower abundance proteins, such as transcription factors, DnaB and DnaG. More than 65% of the proteins had at least two high confidence peptide matches per protein (p , 0.05). Further, a statistical test based on the Grubb’s and Rosner’s tests was able to discriminate outlier data. The removal of outlier data had no significant effect on the functional categories of proteins that were represented in the study. Received: October 30, 2004 Accepted: December 28, 2004 Keywords: Isobaric tag / iTRAQ / Protein quantitation rhsA / Shotgun proteomics / Tandem mass spectrometry Proteomics 2005, 5, 2297–2308 2297 1 Introduction RhsA is a member of five homologous rhs elements that encode hydrophilic proteins with repetitive sequence ele- ments and divergent C-termini [1, 2]. Although the function of rhsA is not known, it is suspected to have a role in trans- poson related activites [3]. Over-production of an rhsA frag- ment, ORF-ex, in E. coli cells has been observed to cause decreased cellular survival in the stationary phase [4]. How- ever, it has also been observed that this effect is suppressed with the over-expression of a second rhsA element, dsORF- a1, which lies directly next to ORF-ex on the E. coli genome [4]. In this study, the expression of these elements was dif- ferentially activated to three levels using an external inducer. Quantitative analysis of changes in the protein expression profiles as a result of the differential expression of these fragments was compared. There are several approaches available to quantify changes in protein expression. 2-DE separation of protein mixtures followed by fluorescent staining, image analysis, and identification of proteins using MS is one of the most commonly used approaches in quantitative proteomics [5, 6]. Although 2-DE has been used effectively in a variety of con- texts, there are some ongoing concerns with this approach including questions about quantitative reproducibility and limitations on the ability to study certain classes of proteins such as hydrophobic proteins. As an alternative to a gel- based strategy, one can pursue shotgun proteomics methods that rely on multidimensional LC to resolve complex protein Correspondence: Professor Kelvin H. Lee, 120 Olin Hall, School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853-5201, USA E-mail: [email protected] Fax: 11-607-255-9166 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de DOI 10.1002/pmic.200401231

Transcript of Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli...

Page 1: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

REGULAR ARTICLE

Quantitative analysis of protein expression using

amine-specific isobaric tags in Escherichia coli cells

expressing rhsA elements

Kunal Aggarwal, Leila H. Choe and Kelvin H. Lee

School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA

We describe the use of amine-specific isobaric tags for protein expression quantification to studythe effect of rhsA element over-expression in Escherichia coli. The use of an isobaric taggingstrategy facilitates a shotgun approach to proteomic analysis and enables quantitation of up tofour samples in parallel, based on the reporter ion series using tandem mass spectrometry(MS/MS). Using a liquid chromatography matrix-assisted laser desorption/ionization approach,23 139 MS/MS spectra were collected. Five thousand sixty-three peptides derived from780 proteins were quantified including several lower abundance proteins, such as transcriptionfactors, DnaB and DnaG. More than 65% of the proteins had at least two high confidence peptidematches per protein (p , 0.05). Further, a statistical test based on the Grubb’s and Rosner’s testswas able to discriminate outlier data. The removal of outlier data had no significant effect on thefunctional categories of proteins that were represented in the study.

Received: October 30, 2004Accepted: December 28, 2004

Keywords:

Isobaric tag / iTRAQ / Protein quantitation rhsA / Shotgun proteomics / Tandem massspectrometry

Proteomics 2005, 5, 2297–2308 2297

1 Introduction

RhsA is a member of five homologous rhs elements thatencode hydrophilic proteins with repetitive sequence ele-ments and divergent C-termini [1, 2]. Although the functionof rhsA is not known, it is suspected to have a role in trans-poson related activites [3]. Over-production of an rhsA frag-ment, ORF-ex, in E. coli cells has been observed to causedecreased cellular survival in the stationary phase [4]. How-ever, it has also been observed that this effect is suppressedwith the over-expression of a second rhsA element, dsORF-

a1, which lies directly next to ORF-ex on the E. coli genome[4]. In this study, the expression of these elements was dif-ferentially activated to three levels using an external inducer.Quantitative analysis of changes in the protein expressionprofiles as a result of the differential expression of thesefragments was compared.

There are several approaches available to quantifychanges in protein expression. 2-DE separation of proteinmixtures followed by fluorescent staining, image analysis,and identification of proteins using MS is one of the mostcommonly used approaches in quantitative proteomics [5, 6].Although 2-DE has been used effectively in a variety of con-texts, there are some ongoing concerns with this approachincluding questions about quantitative reproducibility andlimitations on the ability to study certain classes of proteinssuch as hydrophobic proteins. As an alternative to a gel-based strategy, one can pursue shotgun proteomics methodsthat rely on multidimensional LC to resolve complex protein

Correspondence: Professor Kelvin H. Lee, 120 Olin Hall, School ofChemical and Biomolecular Engineering, Cornell University,Ithaca, NY 14853-5201, USAE-mail: [email protected]: 11-607-255-9166

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

DOI 10.1002/pmic.200401231

Page 2: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

2298 K. Aggarwal et al. Proteomics 2005, 5, 2297–2308

mixtures. Such approaches are particularly useful whencombined with stable isotope tagging techniques to quantifyprotein expression [7–10]. However, among the challengesassociated with these techniques are the inability to multi-plex, limitations in quantifying zero expression levels be-cause of the need to observe mass shifts, and other method-specific issues. To study rhsA expression, we employ a newlydeveloped isobaric tagging approach that permits simulta-neous identification and quantification of peptides usingMS/MS [11].

1.1 Protein quantification based on reporter ions

using tandem mass spectrometry

There are two approaches that have been reported for therelative quantification of peptides using MS/MS. Tandemmass tags (TMTs) [12], reported by Proteome Sciences (Cob-ham, Surrey, UK), have a peptide reactive functionality andconsist of two major groups: a “mass differentiated” groupand a “mass normalization” group. The mass differentiatedgroup differs in mass for different TMTs and the mass nor-malization group ensures that each tag has the same overallmass. As with isotopic dilution strategies mentioned earlier,the peptides from samples of interest are differentiallylabeled with TMTs. Because different TMTs are isobaric,pairs of labeled peptides from different samples appear as asingle peak in an MS spectrum. Upon CID, the TMT labeledpeptides give rise to TMT fragment ions containing only the“mass differentiated” group (e.g., 287 and 290 m/z) and frag-mented peptide ions (i.e., b- and y-ion series), which are usedfor identification. The ratios of the intensity of the TMTfragment ions reflect the relative abundance of the peptidesfrom which they are derived. The reaction specificity of thesetags can be customized to link them to different functionalgroups on proteins, and more than two tags can be generatedpermitting multiplexing. As in the method described below,this approach allows the detection of the absence of a proteinin one of the samples, a feature not available in the presentdual labeling strategies for quantitative proteomic analysis.

An alternate approach that can also rely on a shotgunstrategy has been reported by Applied Biosystems (Framing-ham, MA, USA) and is termed iTRAQ. In this approach,four isobaric reagents are used to label peptides thusenabling quantitation in four different biological samples[11]. Proteins from samples of interest are reduced, alkylated,and digested independently prior to labeling. The reagentsconsist of a reporter group, a balance group, and a peptidereactive group (PRG) that labels primary amines. The labeledpeptides from each sample are then mixed, separated, andstudied by MS and MS/MS. As in the TMT approach, themass spectrum of the mixture resembles the spectrum of anindividual sample because of the isobaric nature of thereagents. Upon CID, the peptide-linked tag can fragmentresulting in the neutral loss of the balance group, release of areporter ion at 114.1, 115.1, 116.1, or 117.1 m/z, and b- andy-ion series (as well as higher energy and other peptide frag-

ments). Because the 114–117 m/z window in MS/MS isrelatively quiet, the reporter ions are clearly distinguished,the peak areas of these resultant reporter ions can be used tocalculate the relative abundance of a given peptide and thepeptide fragments can be used for protein identification. Inthis paper, we report on the use of this second MS/MS-basedprotein quantitation strategy for the study of rhsA expressionin E. coli. In particular, we describe the application of thistechnique and observations on different approaches to ana-lyze the resulting data. Figure 1 illustrates the experimentalworkflow used in this study.

2 Methods

2.1 Strains and vectors

Wild-type E. coli W3110 cells transformed with plasmidspklr1 and pKQV4 (designated as W3110 pklr1 cells) wereused for this study. Plasmid pklr1 consists of rhsA elements,ORF-ex, and dsORF-a1, under the control of an isopropyl-b-D-thiogalactopyranoside (IPTG) inducible tac promoter andalso contains the gene for kanamycin (Km) resistance. Thisplasmid was derived from plasmid pkFis4 [13], by substitut-ing the fis gene with rhsA elements. Plasmid pKQV4 encodeslacIq and the resistance for ampicillin (Amp) [13].

2.2 Culture conditions and sample collection

Three parallel cultures of W3110 pklr1 cells were grown in250 mL flasks containing 50 mL of yeast tryptone (YT) media(10 g bactotryptone, 5 g bactoyeast extract, 2.5 g NaCl in 1 Lwater) supplemented with 40 mg?mL21 Km and 80 mg?mL21

Amp. The cultures were grown to OD600 = 0.4 and wereinduced with three different concentrations of IPTG (0, 0.1,and 1 mM IPTG). The time at which IPTG was added to thecultures was designated as 0 h. Four 10 mL samples werecollected for protein extraction from these cultures: one inexponential growth phase (0 mM IPTG, 0 h; control state)and three in early stationary phase (0 mM IPTG, 3.5 h;0.1 mM IPTG, 3.5 h; and 1 mM IPTG, 3.5 h) as depicted inFig. 2. Cells were harvested from the collected samples bycentrifugation at 47C and stored at 2807C until further use.

2.3 Sample preparation and labeling

Pelleted cells were washed four times with 800 mL of a solu-tion containing 3 mM KCl, 1.5 mM KH2PO4, 68 mM NaCl, and9 mM NaH2PO4. Cells were subsequently resuspended in150 mL Dissolution Buffer (iTRAQ reagent kit, AppliedBiosystems) with the addition of 0.1% w/v CHAPS and0.05% w/v SDS. Cell suspensions were subjected to sonica-tion in a 550 Sonic Dismembrator (Fisher Scientific, Pitts-burgh, PA, USA) equipped with a cup horn for 2 min on ice.One hundred micrograms of protein from each sample, deter-

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 3: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

Proteomics 2005, 5, 2297–2308 Technology 2299

Figure 1. Experimental workflow. The proteins from control cells as well as cells treated with 0, 0.1, and 1 mM IPTGwere reduced/alkylated, digested, and labeled with reagents 114, 115, 116, and 117 in parallel. All four reagentshave a PRG to label primary amines as well as a balance arm (28, 29, 30, 31). Labeled peptides were mixed, sepa-rated into 12 different fractions by SCX. Each SCX fraction was further separated using RP-HPLC and spotted toMALDI target plates. MS analysis was performed using MALDI tandem TOF.

mined by a Lowry protein concentration assay, were reduced,alkylated, digested, and labeled with the isobaric reagentsaccording to the protocol given in the kit with the followingmodification, each sample was treated with two vials of the

same label plus an additional 5 mL Dissolution Buffer. Thefour samples were labeled as shown in Fig. 2 – 114: 0 mM

IPTG, 0 h; 115: 0 mM IPTG, 3.5 h; 116: 0.1 mM IPTG, 3.5 h;117: 1.0 mM IPTG, 3.5 h.

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 4: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

2300 K. Aggarwal et al. Proteomics 2005, 5, 2297–2308

Figure 2. Phenotypic response of W3110 pklr1 cells to the addition of IPTG to the growth medium. Differentamounts of IPTG (at final concentrations of 0, 0.1, and 1 mM) were added to three parallel cultures of W3110 pklr1cells at OD600 = 0.4. The points at which different samples were drawn for this study are indicated.

2.4 Strong cation-exchange fractionation

Labeled samples were pooled and subjected to strong cation-exchange (SCX) fractionation on an Agilent 1100 HPLC (Agi-lent Technologies, Inc, Palo Alto, CA, USA) using a PolyLCPolysulfoethyl A column (4.6 mm 6 100 mm). Twelve frac-tions were collected at 1 mL?min21 with 10 mM KH2PO4,25% ACN (A), and 500 mM KCl, 10 mM KH2PO4, 25% ACN(B) over the following gradient: 0–10% B in 2 min, 10–20% Bin 4 min, 20–45% B in 2 min, and 45–100% B in 5 min.Collected fractions were dried using a vacuum centrifuge.

2.5 Reversed-phase high performance liquid

chromatography separation

SCX fractions were reconstituted in 100 mL 2% ACN,0.1% TFA and were subjected to RP-HPLC separation on anUltiMate HPLC with Famos Micro Autosampler andSwitchos Micro Column Switching Module (Dionex-LCPackings, Sunnyvale, CA, USA). Thirty-three microliters ofeach sample were auto-injected and eluted through a C18PepMap100 trapping column (0.3 mm 6 5 mm) and a C18PepMap100 resolving column (0.1 mm 6 150 mm) (LCPackings) at a flow rate of 600 nL?min21. Separation occur-red with 2% ACN, 0.1% TFA (A) and 85% ACN, 5% IPA,and 0.1% TFA (B) using the following gradient: 12–41% B in48 min, 41–95% B in 2 min, and 95% B constant for 2 min.Eluent was monitored at 214 nm and mixed with matrix(7.5 mg?mL21 a-cyano-4-hydoxycinnamic acid in 75% ACN/0.1% TFA, 0.15 mg?mL21 dibasic ammonium citrate,

0.25 fmol?mL21 internal calibrant ACTH) in a 1:2 ratio via amicro-tee fitting and spotted at ten second intervals with aProbot Micro Fraction Collector (Dionex-LC Packings) ontoblank Opti-TOF MALDI target plates (Applied Biosystems).

2.6 Mass spectrometry

MALDI-MS was performed on a 4700 Proteomics Analyzerwith version 2.0 software (Applied Biosystems). PMF spectra,acquired in positive ion reflector mode with 1200 laser shotsper spot and processed with internal calibration and withoutcluster area optimization, were submitted for automatic pre-cursor selection. Using a 500 ppm spot-to-spot precursorexclusion, a maximum of 15 of the most intense precursors,with a minimum S/N ratio of 25 per spot were selected forMS/MS analysis. MS/MS spectra were acquired with 2500laser shots and 1 keV collision energy with air at 1e-6 torr asthe collision gas.

GPS Explorer version 2.0 (Applied Biosystems) was used toprocess MS/MS data, set search parameters, submit searchesto MASCOT version 1.9 [14] (Matrix Science, London, UK),and organize results. MS/MS peaks were filtered to excludereporter ions and peptides with S/N ratio ,10 and subse-quently submitted for searches with semi-trypsin specificityand the following criteria: 50 ppm peptide tolerance, 0.3 DaMS/MS peptide tolerance, two maximum missed cleavages,variable methionine oxidation, two fixed, and three variablemodifications associated with the labeling chemistry. Spec-tral data was searched against a database of the translationsof all genome-coding sequences from the E. coli K-12 ge-

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 5: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

Proteomics 2005, 5, 2297–2308 Technology 2301

nome (available at: ftp.ncbi.nih.gov), which was modified toinclude trypsin and several human keratin sequences. Datafrom the 12 target plates were submitted for individualsearches reporting the top 500 hits, as well as one combinedsearch reporting the top 1500 hits. Qualification of matchedpeptides for further data analysis was determined by GPSExplorer’s Confidence Interval (CI), a calculation based onthe MASCOT ion score with the significance thresholdremoved to normalize results across different databases.Only peptides that matched with a CI .95% were used forprotein identification and quantification.

2.7 Data analysis

Data collected for the uninduced W3110 pklr1 cells har-vested at 0 h (114 labeled sample) was used to normalizedata from the other three samples, which were collected at3.5 h (115, 116, and 117 labeled samples). The relativeamount of a peptide in each sample was calculated bydividing the peak areas observed at 115.1, 116.1, and 117.1m/z by that observed at 114.1 m/z. The calculated peak arearatios were corrected for overlapping isotopic contributionsper manufacturer’s instructions, and were used to estimatethe relative abundances of a particular peptide. For proteinswith two or more qualified peptide matches, three averagepeak area ratios (m), designated as 115/114, 116/114, and117/114, were calculated using the peak area ratios of thepeptides originating from the same protein. The standarddeviation (s), the CV for each average peak area ratio, andthe average CV (CV) using the CVs for the three averagepeak area ratios for each protein were also calculated. Fur-thermore, for proteins with more than four qualified pep-tide matches, we tested three different approaches to detectoutlier peak area ratios. In the first approach, a Grubb’s test(using 5% significance level critical values) was used todetect outlier ratios for proteins with less than 25 peptidematches [15] and a Rosner’s test (using 5% significancelevel critical values) was used to detect outlier ratios forproteins with 25 or more peptide matches [16–17]. In thesecond approach, designated “3 sigma”, peak area ratios lessthan (m-3s) and greater than (m13s) were termed “outliers”.In the third approach, designated “% deviation”, the per-centage of deviation of each peak area ratio from the aver-age value was calculated as follows:

% deviation = ((w 2 m)/m) 6 100

where w is the peptide peak area ratio. In this case, athreshold of 50% was chosen to identify the outlier peakarea ratios. For each of these approaches, the maximumnumber of outliers detected for a protein in each peak arearatio (115/114, 116/114, or 117/114) was restricted to 20%of the number of qualified peptides matched to a particularprotein. The average peak area ratios and the correspondingstandard deviations, CVs, and CV were recalculated for eachprotein after exclusion of outlier ratios. Proteins with one

high confidence (p , 0.05) peptide match only were quan-tified using its single relative abundance value without fur-ther calculations.

3 Results

3.1 Mass spectrometry and tandem mass

spectrometry data analysis

Figure 3a shows an MS spectrum containing the precursorion for the semi-tryptic peptide ENFEAMQGF (1216.55 Da)originating from the protein aldehyde dehydrogenase A,“AldA” (residues 440–448). A possible issue in this quantita-tion strategy is that mixing four differentially labeled sam-ples may lead to additional complexity in MS and in MS/MSspectra. We observed no apparent additional complexity inany of the MS spectra examined. As expected, the labeling ofpeptides with one or more 145 Da tags causes a shift in theobserved masses of the labeled peptides relative to unlabeledpeptides. The theoretical mass for the ENFEAMQGF peptideis 1071.43 Da (calculated using monoisotopic masses ofamino acid residues with no modifications). Figure 3b showsthe MS/MS spectra for precursor 1216.55 m/z. There areclearly resolved peaks corresponding to the reporter ions at114.1, 115.1, 116.1, and 117.1 m/z, and we do not observe anyapparent loss of resolution or additional complexity in theMS/MS spectra. Figures 3c and 3d depict the reporter ionregion from two other peptides from AldA that demonstratesimilar relative abundances.

In total, 23 139 MS/MS spectra were collected. Thesedata were queried against an E. coli database for identifica-tion. Five thousand sixty-three peptides matched with highMASCOT ion scores (p , 0.05) corresponding to CI . 95%in GPS Explorer. These 5063 peptides corresponded to780 unique proteins based on the criteria that a protein wasidentified if the best MS/MS ion score had CI . 95%. Weobserved that a subset of 4553 peptides matched with higherMASCOT ion scores (p , 0.01) corresponding to CI . 99%in GPS Explorer and these corresponded to 738 unique pro-teins.

More than 65% of the proteins identified in this study(527 out of 780) had at least two peptide matches (CI . 95%)per protein. Figure 4 shows the fraction of peptide matchesper protein for proteins that had a CI . 95%. We observedthat these relative ratios are unchanged if a CI . 99% criteriais used. As one may expect, there is a reasonably high degreeof consistency in reporter ion profiles among the many MS/MS spectra derived from a single protein. For example, therelative reporter ion abundances in Figs. 3b–d are similar.However, there are cases in which the observed relativeexpression levels are not consistent for multiple peptidesderived from a single protein. There are a number of techni-cal, biological, and other reasons for disparate measure-ments from a single protein, and an important considerationin this overall method is the identification of outlier data that

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 6: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

2302 K. Aggarwal et al. Proteomics 2005, 5, 2297–2308

Figure 3. (a) MS spectrum of a particular fraction containing semi-tryptic peptide ENFEAMQGF (1216.55 Da) originating from the proteinaldehyde dehydrogenase A (AldA). The inset shows the well-resolved peak at 1216.55 m/z with no apparent increase in complexity due tothe presence of differently labeled peptides from multiple samples. (b) MS/MS spectrum of the peptide 1216.55 m/z with an inset thatshows the reporter ion region. (c) Reporter ion region of peptide GYYYPPTLLL (1343.74 m/z) from AldA. (d) Reporter ion region of peptideASEISALIVEEGGK (1690.98 m/z) from AldA. The reporter ion regions of all the three peptides show similar reporter ion intensity patterns.

may alter or inappropriately skew the average observedexpression ratio. Indeed, this issue applies to all approachesthat use independent measures of relative abundance toquantify a change in protein expression, such as methodsrelying on isotopic dilution strategies. To address this issue,one can calculate a CV for observed proteins and make note

of those proteins that demonstrate large CV values. In anideal situation, if there were no technical or biological rea-sons for the appearance of outliers, the CV would provide ameasure for the reproducibility of this approach. However, inour non-ideal case, we used three different approachesto determine outlier expression ratios. The Grubb’s and

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 7: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

Proteomics 2005, 5, 2297–2308 Technology 2303

Figure 4. Pie chart of the percentage of proteins identified with agiven number of peptide matches. The data presented is for theproteins identified using peptides that matched with an individ-ual ion score CI . 95%. The relative ratios are unchanged when aCI . 99% criteria is used.

Rosner’s tests and the 3 sigma approach are based on thestatistical criteria to test and remove individual data points.The % deviation method makes an assumption that thetechnology should be reproducible to a CV = 0.5. To elim-inate outliers, the data point that was least consistent withthe average observed ratio (furthest from the average) wastested against these criteria. If the data was designated anoutlier and removed, then the next least consistent data point

was tested against these criteria. This process was iterateduntil no outliers remained or until not more than 20% of thepeptide quantifications for a given protein were removed. Weset 20% as a somewhat arbitrary upper bound to ensure thatthe majority of the data collected was used for protein quan-tification.

Table 1 lists the number of proteins quantified, where theCV is within a particular cutoff value, both before and afteroutlier peak area ratios were excluded from the average peakarea ratio calculation. The data is presented for proteinswhich were quantified using average peak area ratios fromthree or more peptides that matched with either a CI . 95%or the more stringent CI . 99%. The exclusion of outlierpeak area ratios result in a significant increase in the numberof proteins quantified with a CV , 0.25. For 95% CI, weobserve that 36% of the quantified proteins fall withinCV , 0.25 when all data are included, but this improves to46% when Grubb’s and Rosner’s tests are used. Among thethree-outlier identification strategies, the Grubb’s and Ros-ner’s tests are the most statistically appropriate and offer thegreatest reduction in observed CV. Further, we note that themajority of the data falls within a CV of 0.5. When the datasetis reduced by removing peptides with 0.95 , CI , 0.99, theamount of improvement that the outlier exclusion tests offeris somewhat reduced.

The number of outliers detected using the differentapproaches are listed in Table 2. Because of the presence ofthe four reporter ions and the ability to generate three ratiosfor each MS/MS (i.e., 115/114, 116/114, and 117/114), thereare a total of 15 189 (CI . 95%) or 13 659 (CI . 99%) totalratios directly measured. These measurements correspondto a maximum of 5063 (CI . 95%) or 4553 (CI . 99%)peptides characterized by MS/MS, which are related to 780(CI . 95%) or 738 (CI . 99%) proteins. The data in Table 2demonstrate that the Grubb’s and Rosner’s tests were able toidentify more outliers than the 3 sigma approach. The lessstatistically meaningful % deviation method was also able to

Table 1. The number of proteins quantified within several CV values before and after exclusion of outliers. Values in parenthesis are per-centage of the total proteins

Average CV No outlier exclusion Outlier exclusion method

Grubb’s 1 Rosner’s 3 sigma % deviationCV CI . 95% CI . 99% CI . 95% CI . 99% CI . 95% CI . 99% CI . 95% CI . 99%

0.25 154 (36%) 153 (38%) 195 (46%) 186 (47%) 162 (38%) 157 (39%) 181 (43%) 181 (45%)

0.5 396 (94%) 378 (95%) 408 (96%) 384 (96%) 399 (94%) 378 (95%) 410 (97%) 387 (97%)

0.75 415 (98%) 392 (98%) 422 (100%) 396 (100%) 417 (99%) 393 (99%) 421 (100%) 396 (100%)

1 421 (100%) 396 (100%) 423 (100%) 398 (100%) 422 (100%) 397 (100%) 423 (100%) 398 (100%)

Max CVa) 1.49 1.35 0.82 0.86 1.49 1.35 0.82 0.83

Total proteinsb) 423 398 423 398 423 398 423 398

a) Maximum CV observed under each categoryb) Total number of proteins quantified using average peak area ratios of three or more peptides

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 8: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

2304 K. Aggarwal et al. Proteomics 2005, 5, 2297–2308

Table 2. The number of outlier peak area ratios detected using different approaches

Outlier exclusion using

Grubb’s 1 Rosner’s 3 sigma % deviationCI.95% CI.99% CI.95% CI.99% CI.95% CI.99%

Outlier peak area ratios 307 256 82 64 402 311

No. peptides with outliers 232 196 55 40 289 220

No. proteins affected by outliers 150 129 38 28 139 116

Table 3. The percentage break up of the identified proteins intodifferent functional categories [18]

Functional category Percentageof totalproteins

1 Amino acid biosynthesis andmetabolism

5.00

2 Biosynthesis of cofactors, prostheticgroups and carriers

3.85

3 Carbon compound catabolism 3.08

4 Cell envelope 5.51

5 Cell processes (incl. adaptation,protection, and transport)

13.72

6 Central intermediary metabolism 6.41

7 Energy metabolism 8.33

8 Fatty acid and phospholipid metabolism 3.08

9 Hypothetical, unclassified, unknown 13.46

10 Nucleotide biosynthesis and metabolism 4.10

11 Other known genes (incl. phage, trans-poson, and plasmid)

2.31

12 Putative enzymes 3.46

13 Putative regulatory proteins 0.90

14 Putative transport proteins 1.79

15 Regulatory function 2.05

16 Structural proteins 0.38

17 Synthesis, modification, and degradationof macromolecules

22.56

remove a large number of outliers. A high degree of overlapamong the outliers detected by the three different statisticalapproaches was observed and, in general, more outliers weredetected in peptides that matched with CI . 95% than thosethat were matched with CI . 99%.

3.2 Proteome coverage

The proteins identified in this study were categorized intodifferent groups based on their functions. Table 3 shows therelative representation of identified proteins from different

functional categories (CI . 95%). As one might expect be-cause of their relative abundance, the majority of proteinsidentified in this study are involved in synthesis, modifica-tion and degradation of macromolecules, cell processes, andenergy metabolism. However, we also identified and quanti-fied lower abundance proteins using this approach. A total of21 transcription factors (TFs) and some other lower abun-dance proteins, including DnaB, DnaG, and PolA, wereidentified. Table 4 lists some of the lower abundance proteinsthat were studied, their approximate abundance in E. colicells under normal growth conditions, and sequence cover-age. In particular, note that DnaB, which is normally expres-sed in as few as 20 copies per cell, was matched with threepeptides with high MASCOT ion scores (p , 0.05) and 8.5%coverage. Although many of the proteins in this study werequantified with ten or more peptides, none of the lowerabundance proteins reported in Table 4 were quantified withmore than six independent peptides. We did not perform anabsolute quantitation to measure the number of copies percell for any of these lower abundance proteins, a feature thatis available with this technology.

Figure 5 shows the functional categorization of the pro-teins quantified using more than two peptide matches andCV , 0.5, where the labels for functional categories aretaken from Table 3. Although the number of proteins quan-tified with a CV , 0.5 before and after outlier exclusionchanges (396 at CI . 95%before and 408 at CI . 95%after vs.378 at CI . 99% before and 384 at CI . 99%after), there is nosignificant change in the percentage break up of quantifiedproteins into various functional categories. That is, perform-ing an outlier exclusion does not appear to skew the repre-sented functional categories. We note that the quantitation ofsome putative regulatory proteins is lost when proteins withonly one or two peptide matches are excluded from theanalysis, probably because these are lower abundance pro-teins.

3.3 Phenotypic and proteomic response in Rhsa cells

W3110 pklr1 cells show a reduction in growth rate uponinduction with IPTG and this reduction is observed toincrease when higher amounts of IPTG are added to thegrowth medium (Fig. 2). To determine the effect of IPTG oncells, control E. coli W3110 cells containing plasmid pKQV4

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 9: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

Proteomics 2005, 5, 2297–2308 Technology 2305

Table 4. Protein identification information on various lower abundance proteins

Protein name Protein abundance(molecules/cell) [19]

Numberof peptides

Best ionscore (CI%)

Sequencecoverage (%)

Transcription factors:Crp 3000 3 75 (100) 18.1Hns 1000 6 177 (100) 46.7IHF (HimA/HimD) 3000 6 88/73 (100/100) 29.0Lrp 3000 2 69 (99.99) 14.6

Other proteins:DnaB 20 3 107 (100) 8.5DnaG 50–100 1 48 (99.85) 2.6Ssb 800 4 86 (100) 43.8PolA 300 3 95 (100) 5.1Lig 300 2 75 (100) 3.0

Figure 5. Proteins by functional categories for identifications based on more than two peptides at a best ion scoreCI . 95% or CI . 99% and with a CV , 0.5, without outlier exclusion and with outlier exclusion using Grubb’s andRosner’s tests. The functional categories are as listed in Table 3 – 1, amino acid biosynthesis and metabolism; 2,biosynthesis of cofactors, prosthetic groups and carriers; 3, carbon compound catabolism; 4, cell envelope; 5, cellprocesses (including adaptation, protection, and transport); 6, central intermediary metabolism; 7, energy me-tabolism; 8, fatty acid and phospholipid metabolism; 9, hypothetical, unclassified, unknown; 10, nucleotide bio-synthesis and metabolism; 11, other known genes (incl. phage, transposon, and plasmid); 12, putative enzymes;13, putative regulatory proteins; 14, putative transport proteins; 15, regulatory function; 16, structural proteins;17, synthesis, modification, and degradation of macromolecules.

and a variant of plasmid pklr1 that lacked the rhsA elementsand tac promoter were engineered and cultured under simi-lar growth conditions as described in Section 2.2. The controlcells did not show any reduction in growth rate upon addi-tion of IPTG to the growth medium (data not shown). Theseresults suggest that W3110 pklr1 cells show a reduction ingrowth rate primarily due to the IPTG induced transcriptionof rhsA elements present in plasmid pklr1.

We confirmed the IPTG-inducible expression of LacZ inW3110 pklr cells induced with 0.1 mM and 1 mM IPTG whichwas observed to increase five-fold with respect to the cellsharvested at 0 h. Apart from LacZ, many other proteinsshowed a change in expression upon IPTG induced expres-sion of rhsA. Table 5 lists the percentage of proteins with aparticular change in protein expression for the case ofCI . 95% and after outlier exclusion. In the first three

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 10: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

2306 K. Aggarwal et al. Proteomics 2005, 5, 2297–2308

Table 5. Percent observed proteins with a particular fold change in protein expression

Foldchange

Percentage of proteins showing a change in expression in cells induced with

0 mM IPTGa) 0.1 mM IPTGa) 1 mM IPTGa) 0.1 mM IPTGb) 1 mM IPTGb)

. 6 50% 32.6 28.92 8.58 14.95 24.26

. 6 100% 12.25 5.15 0.49 4.17 10.05

. 6 300% 0.98 0.0 0.0 0.25 1.72

a) Change in protein expression was calculated relative to uninduced cells harvested at 0 hb) Change in protein expression was calculated relative to uninduced cells harvested at 3.5 h

columns, protein expression in the uninduced (0 mM

IPTG) and induced (0.1 mM, 1 mM IPTG) cells harvested at3.5 h has been calculated relative to the uninduced cellsharvested at 0 h (data relative to 114 in Fig. 2). The per-centage of proteins demonstrating a change in expressionfor a given cutoff in fold change in the cells harvested at3.5 h is decreased at the higher levels of IPTG. For exam-ple, 12.25% of the proteins demonstrate at least a two-foldchange in protein expression as measured by this tech-nique at 0 mM IPTG. This value is reduced to 5.15% andfurther reduced to 0.49% at higher levels of IPTG. In thecase where we consider changes in protein expression inIPTG induced cells at 3.5 h with respect to the uninducedcells at 3.5 h (data relative to 115 in Fig. 2) there are anincreasing number of changes in protein expression at thehigher IPTG concentrations. For example, 4.17% of pro-teins exhibit at least a two-fold change at 0.1 mM IPTG,whereas 10.05% of proteins exhibit a similar change at1.0 mM IPTG. Interestingly, a majority of proteins thatshow a decrease in expression in all three early stationaryphase samples with respect to the exponential phase sam-ple are involved in translation and in PTM. However, theobserved decrease in the expression of these proteins ishigher in the uninduced cells harvested at 3.5 h (115 inFig. 2) than in the IPTG induced cells harvested at 3.5 h(117 in Fig. 2). Moreover, a greater number of proteins ingeneral show a reduction in protein expression in IPTGinduced cells compared to the uninduced cells in earlystationary phase, suggesting a suppression of overall pro-tein expression in IPTG induced cells.

4 Discussion

In the system studied in this work, IPTG induced transcrip-tion and hence expression of rhsA elements, ORF-ex anddsORF-a1, was observed to cause an overall reduction in theexpression of proteins in an early stationary phase. Some ofthe lower abundance proteins, such as TFs, DnaB, andDnaG, were quantified in this study using a newly availableisobaric tagging strategy.

In general the ability to accurately quantify changes inprotein expression in a high-throughput manner is of greatutility. Many of the current shotgun proteomics methods thatrely on LC and MS are based on isotopic dilution strategiesthat permit the simultaneous analysis of only two samples.In this study, a commercially available isobaric taggingapproach was used to simultaneously quantify changes inprotein expression in genetically perturbed E. coli under fourdifferent biological conditions.

The technique relies on four amine-specific isobaric tagsto label peptides and enables simultaneous quantitation andidentification of proteins in all four samples in the same MS/MS experiment. Because this approach uses amine-specifictags, all peptides in the samples are targeted for labeling, andcan be analyzed and quantified which is different from someother isotopic dilution strategies such as ICAT that requirean affinity step for the reduction in complexity. Moreover,because multiple peptides from the same protein are tar-geted for labeling, there is a greater likelihood of increasedsequence coverage during identification. In this study,780 proteins corresponding to 5063 peptides were identifiedusing a best ion score CI . 95% and more than 65% of theproteins identified had at least two peptide matches(CI . 95%) per protein.

The 114–117 m/z region of MS/MS space is relativelyquiet, which facilitates the estimation of the relative abun-dance of the reporter ions. The identified proteins werequantified using the average of relative expression ratios ofcorresponding peptides after outlier peak areas were exclud-ed from the analysis. The exclusion of outliers enables amore statistically robust estimation of relative protein abun-dance. Grubb’s and Rosner’s tests detected outliers across agreater number of proteins than the other methods tested inthis work. We note an analogy in data analysis between thequantitation of protein expression by the independentmeasurement of multiple peptides and the quantitation ofmRNA expression using probe pairs on an Affymetrix gene-chip probe array. Some other possible methods to excludeoutliers that have been considered in the microarray com-munity and that may be applicable in quantitative shotgunproteomics studies are elimination of the maximum and

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 11: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

Proteomics 2005, 5, 2297–2308 Technology 2307

minimum values or the use of a Tukey biweight median todetermine a more robust average relative abundance. Aswith microarrays, it may be some time before the commu-nity agrees on a statistically appropriate method for outlierexclusion. When outliers were removed in this study, therewas no significant change either in the relative representa-tion of different functional categories of proteins (Table 3) orin the relative number of peptide matches per protein pre-sented in Fig. 4. Further, the use of more or less stringent ionscores (95 or 99%) did not have any significant effect on thecategories represented nor on the relative number of peptidematches per protein.

Some outlier ratios detected have very large percentagedeviations from the average ratio. One technical reason thatratios may not be consistent arises if base peak is signifi-cantly greater in intensity than any of the reporter ions or if acontamination or noise peak arises with high S/N in thereporter ion region. There are also biological reasons forlarge deviations, such as the presence of peptide sequencesthat are not unique to one protein. In such a case, theobserved peak area ratio may reflect the combined relativeabundance of multiple proteins. In this work, for proteinswith only one or two peptide matches, the matched sequen-ces were searched against all other E. coli sequences toensure the uniqueness of the sequence. For proteins identi-fied with more than two peptide matches against an E. colisequence database, a search of each high confidence peptidesequence, corresponding to a particular tandem MS spec-trum, is required to identify possible replicate sequences.One approach that we are employing is to parse the genomecoordinates from the result files and display the coordinatesin a graphical genome viewer (for example, Artemis [20]). Inmost cases, only one gene will be displayed for a given MS/MS spectrum. However, in cases of gene duplication orgenes with a high degree of homology, confounding resultscan be obtained. In E. coli, TufA and TufB have nearly iden-tical amino acid sequences and differ only at the carboxy ter-minal residue. If multiple proteins have homologoussequences (duplicate genes as above) then the relative abun-dance information of individual peptides originating fromthese proteins cannot be differentiated to estimate abun-dances of each of the individual proteins. However, if twoproteins sharing a particular sequence are non-homologousthen a straightforward algebraic calculation can be used todeconvolute the relative abundance information of peptidesoriginating from both proteins. This deconvolution woulduse the peak area information for peptides originating spe-cifically from one of the proteins, to estimate the abundanceof the individual proteins.

The ability to consider outlier data can only occur forproteins in which there are more than three MS/MS meas-urements of protein expression. In cases where only one ortwo MS/MS measurements are available, outlier exclusion isnot meaningful. As a result, one might consider the possi-bility of removing such data from the analysis. However, inthis study, this would represent 45.6% of all identified pro-

teins. This difficulty highlights the need for continueddevelopment of technologies that can aid in the quantifica-tion of protein expression. Isobaric tagging strategies, suchas this one, are possible by only using mass spectrometerswith high resolution and mass accuracy in MS/MS. But theability to probe further into the proteome in terms of sensi-tivity and coverage will depend on improvements in a num-ber of areas including hardware development, upstreamseparations, and chemistry for detection. Among the mostimportant needs are developments in search algorithms anddatabases. In this study, only 22% of the MS/MS collectedwere used to identify proteins with high confidence. A man-ual inspection of many of the remaining spectra suggeststhat the majority of the unassigned spectra contain highquality MS/MS (as determined by reasonable sequencing ionseries) and are derived from labeled peptides (as determinedby the presence of reporter ions). The assignment of theseMS/MS data to sequences would enhance the quality andquantity of data collected in these experiments. Although weare able to perform searches against translation productsfrom our genome of interest, algorithms for improved ORFassignments that address misannotations, incorrect ORFassignments, frameshifts, and pseudogenes remain keyissues to be addressed to achieve greater proteome coverage.

We thank Jason Marchese, Marjorie Minkoff, Stephen J.Hattan, Subhasish Purkayastha, and Darryl Pappin at AppliedBiosystems Framingham for access to reagents and useful discus-sions. The authors thank David Schneider for important discus-sions. This work was funded in part by the USDA AgriculturalResearch Service Specific Cooperative Agreement 58-1907-1-146and the National Science Foundation BES-0120315.

5 References

[1] Feulner, G., Gray, J. A., Kirschman, J. A., Lehner, A. F. et al.,J. Bacteriol. 1990, 172, 446–456.

[2] Sadosky, A. B., Davidson, A., Lin, R. J., Hill, C. W., J. Bacter-iol. 1989, 171, 636–642.

[3] Lin, R. J., Capage, M., Hill, C. W., J. Mol. Biol. 1984, 177, 1–18.

[4] Vlazny, D. A., Hill, C. W., J. Bacteriol. 1995, 177, 2209–2213.

[5] Lee, K. H., Trends Biotechnol. 2001, 19, 217–222.

[6] Lee, P. S., Lee, K. H., Biotechnol. Bioeng. 2003, 84, 801–814.

[7] Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F. et al., Nat. Bio-technol. 1999, 17, 994–999.

[8] Yao, X. D., Freas, A., Ramirez, J., Demirev, P. A., Fenselau, C.,Anal. Chem. 2001, 73, 2836–2842.

[9] Conrads, T. P., Alving, K., Veenstra, T. D., Belov, M. E. et al.,Anal. Chem. 2001, 73, 2132–2139.

[10] Cagney, G., Emili, A., Nat. Biotechnol. 2002, 20, 163–170.

[11] Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B. etal., Mol. Cell. Proteomics 2004, 3, 1154–1169.

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de

Page 12: Quantitative analysis of protein expression using amine-specific isobaric tags in Escherichia coli cells expressing rhsA elements

2308 K. Aggarwal et al. Proteomics 2005, 5, 2297–2308

[12] Thompson, A., Schafer, J., Kuhn, K., Kienle, S. et al., Anal.Chem. 2003, 75, 1895–1904.

[13] Richins, R., Chen, W., Biotechnol. Prog. 2001, 17, 252–257.

[14] Perkins, D. N., Pappin, D. J., Creasy, D. M., Cottrell, J. S.,Electrophoresis 1999, 20, 3551–3567.

[15] Taylor, J. K., Statistical Techniques For Data Analysis, LewisPublishers, Chelsea, Michigan 1990.

[16] Gilbert, R. O., Statistical Methods for Environmental Pollu-tion Monitoring, Van Nostrand Reinhold Co. New York 1987.

[17] Gibbons, R. D., Statistical Methods for Groundwater Mon-itoring, John Wiley & Sons, New York 1994.

[18] Neidhardt F. C., Escherichia coli and Salmonella: Cellularand Molecular Biology, 2nd edn., ASM Press, WashingtonDC 1996, Chapter 116.

[19] Neidhardt F. C., Escherichia coli and Salmonella: Cellularand Molecular Biology, 2nd edn., ASM Press, WashingtonDC 1996, Chapter 99.

[20] http://www.sanger.ac.uk/Software/Artemis/, last accessedon October 23, 2004.

2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de