Rdt

17
Under the regulation of strong promoters, as in numerous commercial plasmid-based vectors, heterologous proteins are typically expressed at high levels in Escherichia coli. The overexpression of plasmid-encoded genes can trigger transcription of heat-shock genes and other stress responses and often result in the aggregation of the encoded proteins as inclusion bodies [1]. The formation of inclusion bodies offers distinct advantages for the separation of overexpressed protein, because the aggregates that mostly contain the product in a high concentra tion c a n b e e a s i l y i s o l a t e d. H o w e v e r , the recombinant p rot e ins found in inc lus ion bodi e s a r e often in a misfolded state, methods that can be used to avoid aggregation to yield a soluble and active product are sometime very desirable. To improve the expression of soluble recombinant proteins, introducing a fusion partner (tag) such as N- utilization substance A (NusA), maltose-binding protein (MBP), thioredoxin (TRX), or gl ut a thione S- t rans f e r a s e (GST) , to the recombinant protein is one of the most commonly used methods to increase solubility. Recombinant protein overexpression has been known to induce significant physiological changes such as the stress response to heat-shock in E. coli [6]. The presence of the inducer isopropyl-b-D-1-thiogalactopyranoside (IPTG) alone can even influence E. coli metabolism substantially, altering the synthesis of certain proteins [7]. When a recombinant protein is expressed at high rates, the system of cytosolic chaperones and proteases in bacteria is presumably induced to express in an altered patt e r n , i n c omp a r i s o n w i t h t h e h o s t c e l l s w i t h o u t ov e r e xpr e s s ing recombinant p rot e ins . In addition to facilitating the folding of nascent proteins, several molecular chaperones and heat shock proteins are induced to inhibit the formation of inclusion bodies by reducing aggregation and promoting proteolysis of misfolded proteins. The simultaneous overexpression of chaperone/ heat shock protein encoding genes and recombinant target proteins proved effective in several instances [8]. To increase the solubility of recombinant proteins, the cooverproduction of individual chaperones as well as the combined overproduction of the functionally cooperating chaperone network of the E. coli cytosol has been attempted. Definition The concept of recombinant overexpression summarizes the biomolecular/ biochemical procedures, in which: coding nucleic acid segments (normally native genes) were positioned genetically under the control of a strong promoter/activator (recombination) AND this expression unit was functionally activated in a cellular system (overexpression)

Transcript of Rdt

Page 1: Rdt

Under the regulation of strong promoters, as in numerous commercial plasmid-based vectors, heterologous proteins are typically expressed at high levels in Escherichia coli. The overexpression of plasmid-encoded genes can trigger transcription of heat-shock genes and other stress responses and often result in the aggregation of the encoded proteins as inclusion bodies [1]. The formation of inclusion bodies offers distinct advantages for the separation of overexpressed protein, because the aggregates that mostly contain the product in a high concentra tion c a n b e e a s i l y i s o l a t e d. H o w e v e r , the recombinant p rot e ins found in inc lus ion bodi e s a r e often in a misfolded state, methods that can be used to avoid aggregation to yield a soluble and active product are sometime very desirable. To improve the expression of soluble recombinant proteins, introducing a fusion partner (tag) such as N-utilization substance A (NusA), maltose-binding protein (MBP), thioredoxin (TRX), or gl ut a thione S- t rans f e r a s e (GST) , to the recombinant protein is one of the most commonly used methods to increase solubility.

Recombinant protein overexpression has been known to induce significant physiological changes such as the stress response to heat-shock in E. coli [6]. The presence of the inducer isopropyl-b-D-1-thiogalactopyranoside (IPTG) alone can even influence E. coli metabolism substantially, altering the synthesis of certain proteins [7]. When a recombinant protein is expressed at high rates, the system of cytosolic chaperones and proteases in bacteria is presumably induced to express in an altered patt e r n , i n c omp a r i s o n w i t h t h e h o s t c e l l s w i t h o u t ov e r e xpr e s s ing recombinant p rot e ins . In addition to facilitating the folding of nascent proteins, several molecular chaperones and heat shock proteins are induced to inhibit the formation of inclusion bodies by reducing aggregation and promoting proteolysis of misfolded proteins. The simultaneous overexpression of chaperone/ heat shock protein encoding genes and recombinant target proteins proved effective in several instances [8]. To increase the solubility of recombinant proteins, the cooverproduction of individual chaperones as well as the combined overproduction of the functionally cooperating chaperone network of the E. coli cytosol has been attempted.

Definition

The concept of recombinant overexpression summarizes the biomolecular/ biochemical procedures, in which: coding nucleic acid segments (normally native genes) were positioned genetically under the control of a strong promoter/activator (recombination) AND this expression unit was functionally activated in a cellular system (overexpression)

Page 2: Rdt

RIBOSOME BINDING SITES

Protein synthesis is regulated by the sequence and structure of the 5' untranslated region (UTR) of the mRNA transcript. In prokaryotes, the ribosome binding site (RBS), which promotes efficient and accurate translation of mRNA, is called the Shine-Dalgarno sequence after the scientists that first described it. This purine-rich sequence of 5' UTR is complementary to the UCCU core sequence of the 3'-end of 16S rRNA (located within the 30S small ribosomal subunit). Various Shine-Dalgarno sequences have been found in prokaryotic mRNAs (see Figure 1 for the consensus sequence). These sequences lie about 10 nucleotides upstream from the AUG start codon. Activity of a RBS can be influenced by the length and nucleotide composition of the spacer separating the RBS and the initiator AUG.

In eukaryotes, the Kozak sequence A/GCCACCAUGG, which lies within a short 5' untranslated region, directs translation of mRNA. An mRNA lacking the Kozak consensus sequence may be translated efficiently in Ambion's in vitro systems if it possesses a moderately long 5' UTR that lacks stable secondary structure. Our data demonstrate that in contrast to the E. coli ribosome, which preferentially recognizes the Shine-Dalgarno sequence, eukaryotic ribosomes (such as those found in retic lysate) can efficiently use either the Shine-Dalgarno or the Kozak ribosomal binding sites.

TRANSCRIPTION TERMINATOR

In genetics, a terminator, or transcription terminator is a section of genetic sequence that marks the end of gene or operon on genomic DNA for transcription.

In prokaryotes, two classes of transcription terminators are known:

Intrinsic transcription terminators where a hairpin structure forms within the nascent transcript that disrupts the mRNA-DNA-RNA polymerase ternary complex.

Rho-dependent transcription terminators that require Rho factor, an RNA helicase protein complex, to disrupt the nascent mRNA-DNA-RNA polymerase ternary complex.

In eukaryotes, terminators are recognized by protein factors

The Rho-Independent termination signals

The intrinsic terminator sequence is an inverted repeat of GC-rich sequence followed by 4 or more adenines. The transcribed RNA forms stem-loop structure at inverted repeats via internal base pairing

The formation of this stem-loop structure (Fig. 11.1) disrupts hydrogen bonding between RNA uracils and DNA adenines at site of transcription (weak because only 2 H-bonds between A and U as compared to 3 between G and C)

As a result, RNA is released from the DNA template

What is an inverted repeat? A sequence of several bases in double-stranded DNA that is repeated in an inverted fashion

Page 3: Rdt

Example:

5'.....GCCGCCAG........CTGGCGGC....3'

3'.....CGGCGGTC........GACCGCCG....5' (template strand)

transcribed RNA: 5'.......GCCGCCAG........CTGGCGGC.....3'

Consequently there are internal sequences in the transcribed RNA that are complementary and can therefore base pair to form a stem-loop structure.

Rho-dependent Termination Signals

Some termination sequences lack the series of adenines which are transcribed in to URACILS on the RNA. The RNA in such situations needs assistance from a specific protein (termedRho) which is necessary for termination.

Rho binds at the 5'end of the RNA and scans down the RNA until it catches up with an RNA polymerase which is paused at a stem-loop structure.

In Rho dependent termination, the Rho protein forces the RNA to separate from the DNA template.

Eukaryotic Transcriptional Termination

RNA polymerase I terminates when it comes to a polymerase-specific DNA binding protein attached at the termination site.

RNA polymerase III terminates at a series of U residues but does not require an upstream stem-loop be present in the mRNA.

RNA polymerase II transcripts are essentially terminated by the cleavage near the polyadenylation site followed by the addition of the poly(A) tail. The cleaved 3' transcriptional product is rapidly degraded as are un-polyadenylated transcripts.

Attenuation

Attenuation provides a secondary mechanism for controlling expression of the prokaryotic trp operon. Attenuation requires simultaneous transcription and translation (Fig. 11.3) and therefore only occurs in

prokaryotes. In the presence of trp-charged tRNA leader sequence is closely translated behind RNA polymerase (Fig.

11.3, left figure). When trp-charged tRNA is low the ribosome pauses before end of leader and allows alternative stem-

loop to form that prevents termination (Fig. 11.3, right figure).

Antitermination

Binding of antitermination proteins, such as the N protein of lambda, between the promoter and the terminator allows a protein complex to form with nus proteins and prevent termination.

Antiterminators work at Rho-dependent and Rho-independent terminators. In eukaryotic cells transcribing he HIV genome the Tat protein functions as an antiterminator

(Fig. 11.6) by binding at the TAR site near the 5' end of the transcript. Without Tat, only short RNAs from the 5' end of the genome are transcribed.

FUSION PROTEIN TAGS

Definition:

A protein or a peptide located either on the C- or N- terminal of the target protein, which facilitates one or several of the following characteristics:

Page 4: Rdt

1. Improved solubility (S) - Fusion of the N-terminus of the target protein to the C-terminus of a soluble fusion partner often improves the solubility of the target protein.

2. Improved detection (D)- Fusion of the target protein to either terminus of a short peptide (epitope tag) or protein which is recognized by an antibody (Western blot analysis) or by biophysical methods (e.g. GFP by fluorescence) facilitates the detection of the resulting protein during expression or purification.

3. Improved purification (P)- Simple purification schemes have been developed for proteins used at either terminus which bind specifically to affinity resins.

4. Localization (L) - Tag, usually located on N-terminus of the target protein, which acts as address for sending protein to a specific cellular compartment.

5. Improved Expression (E)- Fusion of the N-terminus of the target protein to the C-terminus of a highly expressed fusion partner results in high level expression of the target protein.

Some frequently used Fusion Tags:-

His Tag

Purification - Affinity Column: His tag binds very tightly [Kd~10-13M) to immobilized divalent cations [e.g Ni+2, Cu+2, Zn+2]

Detection - Western Blot

GST (Glutathione S Transferase)

Solubility (lesser quality than Nus A or MBP) [N-term only] Purification: glutathione affinity or GST antibody purification Detection: Western Blot, Quantitative Assay (based on enzymatic activity)

A recombinant fusion protein is a protein created through genetic engineering of a fusion gene. This typically involves removing the stopcodon from a cDNA sequence coding for the first protein, then appending the cDNA sequence of the second protein in frame through ligationor overlap extension PCR. That DNA sequence will then be expressed by a cell as a single protein. The protein can be engineered to include the full sequence of both original proteins, or only a portion of either.

If the two entities are proteins, often linker (or "spacer") peptides are also added which make it more likely that the proteins fold independently and behave as expected. Especially in the case where the linkers enable protein purification, linkers in protein or peptide fusions are sometimes engineered with cleavage sites for proteases or chemical agents which enable the liberation of the two separate proteins. This technique is often used for identification and purification of proteins, by fusing a GST protein, FLAG peptide, or a hexa-his peptide (6xHis-tag) which can be isolated using affinity chromatography with nickel or cobalt resins. Fusion proteins can also be manufactured with toxins or antibodies attached to them in order to study disease development.

Functions

Some fusion proteins combine whole peptides and therefore contains all functional domains of the original proteins. However, other fusion proteins, especially those that are naturally occurring, combine only portions of coding sequences and therefore do not maintain the original functions of the parental genes that formed them.

Many whole gene fusions are fully functional, and can still act to replace the original peptides. Some, however, experience interactions between the two proteins that can modify their functions. Beyond these effects, some gene fusions may cause regulatory changes that alter when and where these genes act. For partial gene fusions, the shuffling of different active sites and binding domains can potentially result in new proteins with novel functions.

PURIFICATION TAGS

Recombinant proteins show large variability in terms of their expression, solubility, stability, and functionality, making them difficult targets for large-scale analyses and production. Advances in recombinant protein expression include the development of better expression systems and host strains, improving mRNA stability, host-specific codon optimization, the use of secretory pathways, post-translational modification, co-expression with chaperones, and decreasing the

Page 5: Rdt

amount of proteolytic degradation. However, no other technology has been as effective in improving the expression, solubility, and production of biologically active proteins as the addition of fusion tags, especially for difficult-to-express proteins.

Genetically engineered fusion tags allow the purification of virtually any protein without any prior knowledge of its biochemical properties.1–2 They can improve the variable yield and poor solubility of many recombinant proteins. Proper design and judicious use of the right fusion tag can enhance the solubility and promote proper folding of the protein of interest, leading to recovery of more functional protein. On the other hand, adding fusion tags has been reported to result in changes in protein conformation, poor yields, loss or alteration of biological activity, and toxicity of the target protein. For this reason, it is desirable to remove the tag from the target protein after expression. When designing a fusion tag, therefore, careful consideration must be given to how the tag will be removed to produce native proteins without any extraneous sequences.

Many fusion tags are available for the expression and purification of proteins (Table 1). These tags can be broadly classified into two categories: affinity tags that aid in purification but do not enhance the solubility of the proteins substantially, and solubility-enhancing tags that specifically enhance the solubility and recovery of functional proteins.

Affinity Tags

Affinity tags are the most commonly used tag for aiding in protein purification. They can be defined as exogenous amino acid (aa) sequences that bind with high affinity to a chemical ligand or an antibody. Most affinity tags are short peptide sequences that either bind to a ligand linked to a solid support (like the His tag) or contain an epitope recognized by immobilized antibodies (like the FLAG or Myc tags). The high affinity of these tags for their ligands and the availability of well developed immobilized supports for capturing the fusion proteins allow the protein of interest to be purified to a very high degree. Because of their small size, these affinity tags can be added at either end of the protein or in a region that is exposed to the surface. However, these tags generally do not increase the expression of the fusion proteins or enhance their solubility, and therefore are of little use in purifying hard-to-express proteins.

His-tags are the most widely used affinity tags. The purification of his-tagged proteins is based on the use of a chelated metal ion as an affinity ligand; one commonly used ion is the immobilized nickel-nitrilotriacetic acid chelate [Ni–NTA], which is bound by the imidazole side chain of histidine. Similarly, Streptag II, which consists of a streptavidin-recognizing octapeptide (WSHPQFEK), can be purified by affinity using a matrix with a modified streptavidin and eluted with a biotin analog. Other commonly used affinity tags like FLAG, Myc, and HA can be purified by binding to respective antibodies immobilized on chromatographic supports.

Because it is desirable to remove most tags at the end of the purification process, considerable advances have been made in design of affinity tags so that they can be cleaved without leaving any residues behind and also to simplify the entire process of purification and cleavage. One such system is the "Profinity eXact" fusion-tag system (Bio-Rad, Hercules, CA), which uses an immobilized subtilisin protease to carry out affinity binding and tag cleavage. The protease is not only involved with the binding and recognition of the tag, but upon application of the elution buffer, it also serves to precisely cleave the tag from the fusion protein directly after the cleavage recognition sequence. This delivers a native, tag-free protein in a single step. Another system for simple purification of proteins is based on elastin-like polypeptides (ELP) and intein. ELP consist of several repeats of a peptide motif that undergo a reversible transition from soluble to insoluble upon temperature upshift. The fusion protein is purified by temperature-induced aggregation and separation by centrifugation, and intein is used for tag removal.3 No affinity columns are needed for initial purification.

Solubility-Enhancing Tags

Solubility-enhancing tags are generally large peptides or proteins that increase the expression and solubility of fusion proteins. Fusion tags like GST and MBP also act as affinity tags and as a result, they are very popular for protein purification. Other fusion tags like NusA, thioredoxin (TRX), small ubiquitin-like modifier (SUMO), and ubiquitin (Ub), on the other hand, require additional affinity tags for use in protein purification.

No single fusion tag can increase the expression and solubility of all target proteins. However, some fusion tags have been more successful than others in increasing the solubility of many proteins. A comparison of some popular fusion

Page 6: Rdt

tags showed that large proteins like NusA and MBP are more effective in solubilizing proteins than the smaller affinity tags or GST.4–7 Novel tags like Skip and T7 protein kinase, in turn, have been shown to be successful in expressing hard-to-express proteins in E. coli.8 Similarly, ubiquitin-based tags have been used to increase the solubility and expression level of proteins. SUMO tags are emerging as a viable alternative for increasing both the expression and solubility of otherwise hard-to-express proteins.9 The SUMO tag can be cleanly excised using SUMO protease, which recognizes the conformation of SUMO protein rather than a specific sequence within SUMO. Initially, the SUMO system was confined to E. coli, as highly conserved SUMO proteases are present in eukaryotes that cleaved the SUMO tag. However, the recently developed SUMOstar tag, a modified version of SUMO, is not recognized by the native eukaryotic protease and is specifically cleaved by the genetically engineered SUMOstar protease. Thus, the SUMOstar system can be used effectively in both prokaryotic and eukaryotic systems. The usefulness of the SUMO system was substantiated by the study of Marblestone, et al., who examined the effects of various fusion partners on total and soluble expression yield.9–10 They evaluated the expression and solubility of three model proteins fused to the C terminus of MBP, GST, TRX, NusA, Ub, and SUMO tags. The tags were ranked in terms of increased total expression as

TRX > SUMO ~ NusA > Ub ~ MBP ~ GST

and increased soluble expression as

SUMO ~ NusA > Ub ~ GST ~ MBP ~ TRX.

Overall, SUMO and NusA were equally good in terms of increasing the expression and the solubility of fusion proteins. However, SUMO offers certain advantages over NusA, in that SUMO is smaller and because it can be cleaved off precisely from the target protein without leaving behind any residues.

The Rainbow tag is yet another new development in tag fusion technology. This technology allows the continuous monitoring of correctly folded proteins throughout the process of expression and purification. The Xavin mononucleotide (FMN)-binding domain of cytochrome P450 reductase (displaying a blue-green or yellow color, depending on the oxidation state of the FMN cofactor) and the red colored, heme-binding cytochrome b5 are used as tags, and the rainbow tags are visible with the naked eye.11 The use of rainbow tags, however, requires an additional affinity tag for purification.

PROTEASE CLEAVAGE SITES AND ENZYMES

Proteases

An integral part of the choice of a fusion tag is the choice of the method for removing the tag after purification. This step almost always involves using a protease to cleave a specific peptide bond between the tag and the protein of interest. A small number of highly specific proteases are routinely used for this purpose and are listed in Table 2. These include the tobacco etch virus (TEV) protease; thrombin (factor IIa, fIIa) and factor Xa (fXa) from the blood coagulation cascade; an enzyme involved in the cleavage or activation of trypsin in the mammalian intestinal tract, enterokinase (EK); proteases involved in the maturation and deconjugation of SUMO, SUMO proteases (Ulp1, Senp2, and SUMOstar); and a relative newcomer to the field, a mutated form of the Bacillus subtilis protease, subtilisin BPN' (Bio-Rad's Profinity eXact system). Many of these enzymes have been genetically engineered to enhance their stability (e.g., AcTEV, ProTEV) or their specificity, (e.g. SUMOstar, Profinity). With the exception of the SUMO proteases, all of these enzymes have the potential to cleave within the protein of interest.12–13 The SUMO proteases recognize not only their specific cleavage site, xaa-Gly-Gly/yaa, but also the tertiary structure of SUMO itself, giving them a very high degree of specificity. Bryan, et al., have attempted to introduce the same level of specificity into the Profinity system by mutating both the subtilisin prodomain as well as the active site of subtilisin to increase the affinity of the enzyme for the prodomain and to decrease the likelihood of digestion within the protein of interest.14 One interesting consequence of this is that the affinity for the prodomain is so high that these researchers observed product inhibition of the enzyme. Essentially, the enzyme carries out one catalytic cycle and is then inhibited by the prodomain, which is retained in the active site, thus preventing further cleavage by this otherwise promiscuous enzyme. Because capture on the immobilized, mutant subtilisin matrix is an integral part of the system, the column must have a capacity (in moles of subtilisin) equimolar with the fusion protein. Although this is not problematic on the research scale, it could become prohibitively expensive at the multigram scale.

Page 7: Rdt

The principle concerns with using a protease for removing a tag are

removing the protease following digestion, and non-specific digestion of the target protein by the protease. Resolving the first concern is relatively

straightforward, although in most cases it involves an additional chromatography step.

Recombinant forms of TEV and its variants and of the SUMO proteases are all produced with a hexahistidine (His6) tag, allowing easy removal of the enzyme by metal chelate chromatography. Alternatively, some of these enzymes have been immobilized on solid supports, allowing their removal by simple filtration or centrifugation steps. Thrombin, fXa, and EK, which generally are produced from natural sources, can be removed by affinity chromatography, for instance, on benzamidine-agarose. With the Profinity system, cleavage and separation from the enzyme are combined in a single step.

The second concern is more difficult to resolve. Non-specific cleavage is influenced by a number of parameters, such as the enzyme-to-substrate ratio (lower is better), temperature, pH, salt concentration, and length of exposure. TEV protease, thrombin, fXa, and EK all have well defined recognition sequences, but all of them have been found to cause "nicking" of the target protein in some instances. TEV protease has been re-engineered to try to increase its specificity (and stability), resulting in AcTEV (Invitrogen, Carlsbad, CA) and ProTEV (Promega, Madison, WI). Whether or not such engineering has reduced non-specific proteolysis remains to be seen. In addition, other tricks must be used with the native enzymes. For instance, one supplier recommends using fXa at pH 6.5, well below its pH optimum, to minimize non-specific cleavage. Of course, this requires the use of higher enzyme-to-substrate ratios and longer digestion times to achieve complete cleavage. Two of the enzymes listed (SUMO proteases and the Profinity enzyme) seem to be immune to this problem. SUMO proteases have evolved to recognize both the tertiary structure of SUMO as well as the cleavage sequence, xaa-Gly-Gly/yaa. The Profinity enzyme has been extensively mutated to derive a version that has very high affinity for the prodomain of the original enzyme. Thus, it also recognizes the tertiary structure of the prodomain as well as the cleavage sequence Phe-Met-Ala-Lys/yaa. On the other hand, SUMO proteases act catalytically (i.e., with a low enzyme-to-substrate ratio) whereas the Profinity enzyme requires equimolar concentrations of enzyme and substrate.

One final consideration should be mentioned. Although one would ideally have a protein that is fully soluble in phosphate buffered saline at neutral pH, the reality is that for many proteins to be soluble at useful concentrations, they require more acidic or more basic pH levels, high or low salt levels, or the presence of chaotropes or detergents. It is therefore essential that the protease of choice retain substantial activity under adverse conditions. The most robust of the enzymes cited appear to be the SUMO proteases, the Profinity enzyme, and the TEV protease. Thrombin, fXa, and EK are much more sensitive to high salt concentrations or to the presence of chaotropes or reducing agents.

PLASMID COPY NUMBER

Many expression systems in research and industry use plasmids as vectors for the production of recombinant proteins or non-proteinous recombinant substances. Plasmids have an essential impact on productivity. Related factors are plasmid copy number, structural plasmid stability and segregational plasmid stability. Plasmid copy number determines the gene dosage accessible for expression and many plasmids lead generally to a high productivity. To analyze an expression system the quantification of plasmid copy number is very helpful. Therefore, different methods for the determination of plasmid copy number are described.

The copy-number of a plasmid in the cell is determined by regulating the initiation of plasmid replication. The initiation of plasmid replication may be controlled by regulating the amount of available primer for the initiation of DNA replication, regulating the amount of essential replication proteins, or regulating the function of essential replication proteins. Two major mechanisms are used to control the initiation of plasmid replication:

1. Regulation by antisense RNA;2. Regulation by binding of replication proteins to repeated 18-22 bp sites called iterons.

A few examples of each type of regulation are shown in the figures below. Note that there are examples of high copy-number plasmids and low copy-number plasmids that use each mechanism.

Page 8: Rdt

Regulation of plasmid colE1 copy number by antisense RNA:

Regulation of plasmid R1 copy number by antisense RNA:

Regulation of plasmid copy number by iterons:

Page 9: Rdt

OVER EXPRESSION CONDITIONS

Sometimes the levels of protein expression are low despite the use of strong transcriptional and translational signals.

The following approaches can be used to optimize expression levels:

Varying induction conditions. The levels of expression of the target protein can be optimized by varying the time and/or

temperature of induction and the concentration of the inducer.

Examining the codon usage of the heterologous protein. Not all 61 mRNA codons are used equally. The so-called major

codons are those that occur in highly expressed proteins, whereas the minor or rare codons tend to be in genes

expressed at a low level. Which of the codons are the rare ones depends strongly on the organism.

Usually, the frequency of the codon usage reflects the abundance of their cognate tRNAs. Therefore, when the codon

usage of your target protein differs significantly from the average codon usage of the expression host, this could cause

problems during expression. The following problems are often encountered:

Decreased mRNA stability (by slowing down translation)

Premature termination of transcription and/or translation, which leads to a variety of truncated protein

products

Frameshifts, deletions and misincorporations (e.g. lysine for arginine).

Inhibition of protein synthesis and cell growth.

Page 10: Rdt

As a consequence, the observed levels of expression are often low or there will be no expression at all. Especially in

cases were rare codons are present at the 5'-end of the mRNA or in clusters expression levels are low and truncated

protein products are found.

The expressed levels can be improved by:

replacing codons that are rarely found in highly expressed E. coli genes with more favourable codons throughout

the whole gene. Codons that have been associated with translation problems inE. coli are:

AG

G

arginine

AGA arginine

CG

G

arginine

CGA arginine

GG

A

glycine

AUA isoleucine

CUA leucine

CCC proline

co-expressing the genes encoding for a number of the rare codon tRNAs. There are several commercial E.

coli strains available that encode for a number of the rare codon genes:

BL21 (DE3) CodonPlus-RIL arginine (AGG, AGA), isoleucine (AUA) and leucine (CUA)

BL21 (DE3) CodonPlus-RP arginine (AGG, AGA) and proline (CCC)

Rosetta or Rosetta (DE3) AGG/AGA (arginine), CGG (arginine), AUA (isoleucine)

CUA (leucine)CCC (proline), and GGA (glycine)

making changes in the coding sequence that reduce secondary structure in the translation initiation region. This is

mainly done by increasing the number of A residues.

Examining the second codon. In endogenous E. coli proteins not all codons are used to the same extend in the second

triplet (following the N-terminal methionine). The most used is AAA lysine (13.9%) while a number of other codons are

not used at all. Looman et al. showed that the expression efficiency of a modified lacZ gene varies at least 15 fold,

depending on this codon. Thus, chosing the right codon in this position or changing it into one that is more often used

in E. coli could improve expression levels.

Reference: Looman et al. (1987) EMBO J. 6, 2489-2492.

Page 11: Rdt

Minimizing the GC content at the 5'-end. A high GC content in the 5'-end of the gene of interest usually leads to the

formation of secondary structure in the mRNA. This could result in interupted translation and lower levels of expression.

Thus, higher expression levels could be obtained by changing G and C residues at the 5'-end of the coding sequence to A

and T residues without changing the amino acids.

Addition of a transcription terminator (or an additional one if one is already present).

Addition of a fusion partner. Fusion of the N-terminus of a heterologous protein to the C-terminus of a highly-expressed

fusion partner often results in high level expression of the fusion protein.

Using protease-deficient host strains. The use of host strains carrying mutations which eliminate the production of

proteases can sometimes enhance accumulation by reducing proteolytic degradation. BL21, the work horse of E.

coli expression, is deficient in two proteases encoded by the lon (cytoplasmic) and ompT (periplasmic) genes.

SOLUBILIZATION OF INSOLUBLE PROTEINS

In many cases the expressed protein is insoluble and accumulates in so-called inclusion bodies. This is especially true under conditions of high level expression. Several strategies are available to improve the solubility of the expressed protein.

Reducing the rate of protein synthesis.

This can be done by:

lowering the growth temperature. This decreases the rate of protein synthesis and usualy more soluble protein is obtained.

using a weaker promoter (e.g. trc instead of T7). using a lower copy number plasmid. lowering the inducer concentration.

Changing the growth medium:

addition of prostethic groups or co-factors which are essential for proper folding or for protein stability. addition of buffer to control pH fluctuation in the medium during growth. addition of 1% glucose to repress induction of the lac promoter by lactose, which is present in most rich media

(such as LB, 2xYT). addition of polyols (e.g. sorbitol) and sucrose. The increase in osmotic pressure caused by these additions leads

to the accumulation of osmoprotectants in the cell, which stabilize the native protein structure. addition of ethanol, low molecular weight thiols and disulfides, and NaCl.

Co-expression of chaperones and/or foldases.

Two classes of proteins play an important role in in vivo protein folding.

Molecular chaperones promote the proper isomerization and cellular targeting by transiently interacting with folding intermediates. The best characterized E. coli systems are:

GroES-GroEL DnaK-DnaJ-GrpE ClpB Foldases accelerate rate-limiting steps along the folding pathway. Three types of foldases play an important role: peptidyl prolyl cis/trans isomerases (PPI's) disulfide oxidoreductase (DsbA) and disulfide isomerase (DsbC)

Page 12: Rdt

protein disulfide isomerase (PDI) - an eukaryotic protein that catalyzes both protein cysteine oxidation and disulfide bond isomerization. It also exhibits chaperone activity.

Co-expression of one or more of these proteins with the target protein could lead to higher levels of soluble protein. The levels of co-expression of the different chaperones/foldases have to be optimized for each individual case. DsbA and DsbC have also shown possitive effects on expression levels when used as a fusion partner.

Periplasmic expression:

Secretion of the target protein to the periplasm has a number of distinct advantages:

the oxidizing environment of the periplasm allows for the formation of disulfide bonds, which does not occur in the reducing environment of the cytoplasm.

the periplasm contains two foldases, disulfide oxidoreductase (DsbA) and disulfide isomerase (DsbC), that catalyze the formation and isomerization of disulfide bonds.

reduced proteolysis (since less proteins are present). allows for the accumulation of proteins that are toxic in the cytoplasm. engineering of an authentic N-terminus.

Secretion is achieved by the addition of a leader sequence (signal peptide) to the N-terminus of the target protein. Most used leader sequences are pelB and ompT. Unfortunately, expression yield are usually much lower and not all expressed protein is secreted into the periplasm but is also found in the medium, the cytoplasm and the cytoplasmic membrane.

Using specific host strains:

The solubility of disulfide bond containing protein can be increased by using a host strain with a more oxidizing cytoplasmic environment. Two strains are commercially available (Novagen):

AD494, which has a mutation in thioredoxin reductase (trxB). Origami, a double mutant in thioredoxin reductase (trxB) and glutathione reductase (gor).

Addition of a fusion partner:

Fusion of the N-terminus of a heterologous protein to the C-terminus of a soluble fusion partner often improves the solubility of the fusion protein.

Expression of a fragment of the protein:

E. coli does not express well very large proteins (> 70 kDa). Chosing a smaller fragment of the target protein can improve expression levels and solubility.

The solubility of a poorly soluble (or insoluble) protein can also be improved by selecting only a soluble domain for expression.

In vitro denaturation and refolding of the protein:

When despite all efforts the target protein still is expressed in inclusion bodies, then the last resort is to denature and refold the protein in vitro. This procedure is carried out in three phases:

isolation of the inclusion bodies. solubilization and denaturation of the target protein. This is done by the addition of a denaturing agent (usually

guanidine or urea) under reducing conditions (e.g. 20 mM DTT). refolding of the protein by removing the denaturating agent using dialysis, dilution or chromatography. For

proteins containing disulfide bonds this has to be carried out in the presence of a redox shuttling system e.g. reduced and oxidized glutathione.