Recombinant protein plays a major role in all aspects of biological science. Selection of a suitable expression system is one of the most important strategies for the production of recombinant protein. Although many nonbacterial expression systems such as yeast, baculovirus, a mammalian cell and cell-free systems have been successfully applied for the production of protein.
Escherichia coli still remains the most preferred organism of choice (
Chen, 2012). The advantages of
E. coli are its fast growth, relatively high protein yields, low cost, easy handling and versatile strains for the production of demanding target proteins
(Yin et al., 2007). However,
E. coli also has several disadvantages including lack of eukaryotic post-translational modifications, production of recombinant products in a nonfunctional state / insoluble expression of proteins, tightly coupled transcription and translation, lack of required cell machinery and overexpression of heterologous proteins in the cytoplasm can result in the formation unfolded / misfolded protein and inclusion bodies
(Zhang et al., 2004). Some targets fail to express in
E. coli or express insolubly as inclusion bodies. Heterologous protein production needs longer time and requires molecular chaperones to fold correctly. In recent years’ considerable efforts have been taken to enhance solubilization in
E. coli. To successfully overcome these disadvantages five levels of strategies can be followed; (1) Optimization of target DNA, (2) Changing the vector, (3) Changing the host, (4) Changing the culture parameters of the recombinant host, (5) Co-expression with other genes which may help to increase expression and the proper folding of desired protein. This review mainly focuses on the strategies which are used for the successful production of recombinant proteins in
E. coli.
Strategies for the production of recombinant protein
(I) Optimization of target dna
Properties of the gene that affects the production of soluble proteins are, the presence of rare codons in the target mRNA, size of the protein and the protein sequence. Rare or low usage codons have been found in many organisms including
E. coli. The rarest codons such as AGG, AGA (Arginine), CUA (leucine), AUA (isoleucine) and CCC (proline) of
E. coli are playing an important role in expression regulation of different proteins (
Grunberg-Manago, 1999;
Harrison, 2000). The relative positions of these rare codons in the target leads to the suppression of the protein expression
(Lee et al., 1987b; Dumon-Seignovert et al., 2004; Hu et al., 2011). This suppression mostly occurs at the translational level, due to unavailability of the cognate tRNAs for those rare codons.
Moreover, if they expressed also, these rare codon rich targets incorrectly translated and high level of misincorporation will occur, mostly lysine will be incorporated instead of arginine
(Calderone et al., 1996). The codon bias of
E. coli can overcome this problem
. Nowadays, multiple websites are available to identify the location of the rare codons in the target genome and also to quantify them, eg. The Rare Codon Calculator (RaCC). Two approaches can be taken to overcome the rare codon issue. The first method is codon optimization of the target gene. Codon optimization is a key step for the successful expression of protein in
E. coli even from a distant host
(Snajder et al., 2015). Indeed, over the last decade, the use of codon-optimized genes in industrial biotechnology has reduced the cost of protein production, through improved protein expression (
Elena et al., 2014). Site-directed mutagenesis or gene synthesizing is the way for codon optimization. The later is done by multiple companies and this method is often faster and cheaper than site-directed mutagenesis. However, gene synthesis not only changes the rare codons but also changes the secondary structure of mRNA, which affects translation efficiency (
Hatfield and Roth, 2007;
Burgess-Brown et al., 2008;
Welch et al., 2009; Menzella, 2011). In the second method, the expression host is altered such a way that they can express the target which contains rare codon in it (
Francis and Page, 2010).
Signal peptides are usually involved in the export of protein from the site of synthesis. Matured endogenous protein does not have the signal peptide. The recombinant protein which contains signal peptides may alter its function and biochemical characteristics. So, removing the signal peptide coding sequence from the target protein sequence increases the stability and expression of the recombinant protein (
Gopal and Kumar, 2013).
(II) Changing the properties of a vector
Once the target gene is ready, it should be subcloned into a vector which contains all the elements essential for transcription and translation of the target (
Studier and Moffatt, 1986).
E. coli expression vector should contain fusion tags and other DNA sequence elements include the origin of replication, promoters, regulatory elements, transcription terminators, antibiotic-resistant gene, etc.
Origin of replication
Origin of replication is a particular sequence at which replication is initiated. The origin of replication is considered important when conducting the co-expression experiment in which two different plasmids with different protein sequence are expressed in the same expression host (
Johnston and Marmorstein, 2003). In such a situation, the origin of replication should be different for allowing the expression of both the proteins.
Selecting the suitable promoters
An effective promoter for protein expression should possesses some key characteristics, i.e. should be strong enough to allow the accumulation of recombinant protein to ≥ 10-30% of the total cellular proteins, should exhibit minimal basal transcriptional activity, should enable simple and inexpensive induction (
Jia and Jeon, 2016 ). The stringent regulation of promoter is essential for the synthesis of proteins which are detrimental to the host cell. Selection of the appropriate promoter entirely depends on the nature of the target protein and its downstream use. If the target is a toxic protein, one should use the promoter system that has an extremely low basal expression (
araBAD promoter). On the other hand, for higher yields, a strong promoter such as T7 or
tac promoter should be selected
(Lee et al., 1987b). Cold-shock promoters are used for aggregation-prone proteins so that expression can also occur at low temperature.
Nowadays, the promoters are genetically engineered for improving expression of recombinant protein in the host cell. These engineered promoters are showing three to four-fold increase in the activity than the natural promoters. A mutant promoter library was constructed with the randomization of
E. coli consensus promoter sequences. The mutant promoter library exhibited 27.5 fold higher activities than the lac promoter
(DeMey et al., 2007). Recent advances in the field of genetic engineering have paved the way for the system that can utilize two promoters which helps in the production of two different recombinant proteins simultaneously in the same expression system (
Joseph et al., 2015). Some of the most commonly used promoters are
T7 RNA promoter,
araBAD promoter,
cspA promoter and the hybrid promoters.
a. T7 promoter
In
T7 promoter gene expression is driven by
T7 RNA Polymerase of
T7 bacteriophage. It is the most widely used promoter system for heterogeneous expression in
E. coli (
Gräslund et al., 2008). It can able to transcribe the DNA five times faster than bacterial RNA polymerase (
Studier, 1991). This enzyme is absent in
E. coli, so it has to be delivered from the external via an inducible promoter (
Studier and Moffatt, 1986). IPTG addition drives the
T7 RNA polymerase transcription and synthesize.
T7 RNA polymerase initiates transcription of the target gene by binding to the
T7 promoter. If the inducer is absent, the
lacUV5 promoter controls the
T7 RNA polymerase gene, so there will be no transcription occurs (
Studier, 1991). Once the system is activated, it can accumulate up to 50% of total cell protein (
Studier and Moffatt, 1986;
Studier, 1991). Leakey expression is one of the major drawbacks of this promoter. However, even minimal production of RNA polymerase leads to leaky expression of target proteins which are toxic to the hosts. To overcome this problem one can use the bacterial strains which are specially developed for the expression of toxic proteins (
Moffatt and Studier, 1987).
b. araBAD promoter
The
araBAD promoter is also known as the arabinose promoter.
araBAD is a strong, tightly regulated and titrable promoter system
(Lee et al., 1987b). This promoter is mainly used for expressing highly toxic proteins. It exhibits the lowest basal transcriptional activity. L-arabinose acts as the inducer for this promoter
(Lee et al., 1987b). The absence of L-arabinose or addition of glucose will suppress the expression of the protein.
c. cspA promoter
cspA promoter is known as cold-shock protein promoter. It is efficiently expressed in low temperature and the expression optimal between 10°C to 25°C. It can reduce the formation of inclusion bodies and improves folding and it is suitable for expressing aggregation-prone proteins (
Francis and Page, 2010). The major drawback of this promoter is that it does not completely repress at a higher temperature which leads to basal target protein expression
(Qing et al., 2004).
d. Hybrid promoters
The
trc and
tac promoters are hybrids of naturally occurring
trp and
lacUV5 promoter. This promoters consisting of -10 regions of the
lacUV5 promoter and the -35 regions of the
trp promoter (
Khlebnikov and Keasling, 2002). The spacing between -35 and -10 sequences are 16 bp in
tac promoter whereas
trc promoter has 17 bp spacing. This promoter can accumulate 15%-30% of the total cell protein. The only disadvantage of this promoter system is a very leaky expression of the target protein. Therefore, this promoter cannot be used for toxic protein expression
(Brosius et al., 1985).
3. Selection of appropriate terminator
Termination of transcription plays an important role in the host cellular energy minimization. Transcription termination in prokaryotes is based on two different mechanisms; they are 1) Rho-dependent and 2) Rho-independent, in Rho-dependent mechanism the hexameric protein rho will help to release the RNA transcript from template, while in case of rho-independent mechanism, transcription termination entirely depends on the signals which are present in the template (
Richardson and Roberts, 1993;
Yang et al., 1995). Transcription termination reduces the metabolic burden of the host and also forms secondary structure at 3' end of the mRNA which will increase the stability of mRNA
(Newbury et al., 1987). Promoter occlusion is one of the criteria which inhibits its function. By inserting the transcription terminator in downstream or in upstream of the coding sequence will prevent continuous transcription with another promoter and also minimize background transcription
(Tohru et al., 1994). Stop codon usage plays a vital role in the regulation of gene expression. Universal stop codons are TAA, TGA and TAG, sequence analysis for several genes in
E. coli reveals that TAA is the major codon used. TAA can be read by both the release factors and as a stop codon it will not only secure termination but also ensures the termination at high speed with accuracy
(Saida et al., 2006).
4. Fusion tags
Fusion tags are generally divided into purification and solubility tags. Affinity tags allow rapid and efficient purification of proteins. While the solubility tags enhance the proper folding and solubility of a protein. Hence, they are frequently used in tandem with an affinity tag to aid purification
(Zhao et al., 2013). Some of the commonly used fusion tags are Glutathione S-transferase (GST), Maltose-binding protein (MBP), N-utilization substance (NusA) and Small ubiquitin modifier (Sumo). They have been widely reported to increase protein expression and solubility (
Esposito and Chatterjee, 2006).
a. Glutathione S-transferase (GST)
The GST is 211 amino acids (roughly 28 kDa) in size. Glutathione S-transferase tags are generally used for expression and purification and which is not used as a solubility enhancing tag (
Esposito and Chatterjee, 2006;
Brown et al., 2008). GST tags are widely used due to their specific and robust binding towards glutathione agarose and allow single-step purification of expressed protein (
Smith and Johnson, 1988).
b. Maltose-binding protein (MBP)
MBP is 396 amino acids, approximately 42 kDa in size. MBP is naturally present in
E. coli which is encoded by the
malE gene. In
E. coli it is responsible for the uptake, transport, and breakdown of maltodextrin (
Routzahn and Waugh, 2002;
Nallamsetty et al., 2005). It significantly enhances the solubility of a recombinant protein, which can even solubilize misfolded / unfolded protein (
Francis and Page, 2010;
Hewitt et al., 2011). MBP functions as an affinity tag as it binds efficiently to other sugars and enables protein purification. MBP can be used at both the ends of a protein so it can enhance the solubility at both N- and C- terminal end
(Dyson et al., 2004; Francis and Page, 2010).
c. N-utilization substance (NusA)
N-utilization substance (NusA) is a recently developed tag to enhance the solubility of a diverse set of proteins. NusA is 535 amino acids (59 kDa) in size. It is a transcription termination/ antitermination factor in
E. coli. NusA is not an affinity tag, so it has to be coupled with His 6-tag to facilitate protein purification (
DeMarco, 2006).
d. Small ubiquitin modifier (Sumo)
SUMO is a prokaryotic expression system. It was developed based on the observation that the addition of ubiquitin to the recombinant protein facilitates its solubility (
Peroutka et al., 2011). Several studies have been demonstrated the expression and solubility of Small ubiquitin modifier tag. Size of the Sumo tag is 11.2 kDa; the size of the tag is tiny compared to other tags. One of the advantages of Sumo is it has its specific protease, Ulp which recognizes the tertiary structure of SUMO protein rather than a specific amino acid sequence and cleaves immediately after the C- terminal residue of the SUMO protein
(Butt et al., 2005). The only disadvantage of SUMO protease is if proline is an N- terminal amino acid of the target protein it restricts the SUMO protease active site access.
(III) Changing the host
The efficiency of protein expression depends on the appropriate selection of expression host
(Joseph et al., 2015). Although a number of expression hosts are available for protein production, the standard in the field still remains
E. coli. Nowadays, so many companies provide different types of genetically altered
E. coli strains as per suitability of expression of foreign genes. Many bacterial hosts were selected and tested for efficient protein expression and some of those strains were modified to improve the protein expression (
Bass and Yansura, 2000). Commercially available
E. coli strains are specifically designed to express proteins that contain rare codons, are susceptible to proteolysis or require disulfide bond. BL21 is one of the widely used strain to check the basic protein expression in
E. coli.
Protease-deficient strains
1. BL21(DE3)
This
E. coli strain has T7 polymerase encoding gene introduced in its genome as well as deprived of Lon and OmpT protease. OmpT-is a bacterial endoprotease that readily cleaves T7 RNA polymerase. Lon protease is an ATP-dependent enzyme that rapidly degrades misfolded and recombinant proteins. Deletion of these two genes correlated with increased protein expression (
Gottesman, 1990).
2. BL21Star (DE3)
The protein yield also depends on the stability of the corresponding mRNA.
rne encodes RNase E, an enzyme that functions as an essential part of the “degradosome” to actively degrade mRNA within the cell. BL21Star (DE3) (Invitrogen) is a derivative of the BL21 (DE3) strain. This strain contains an additional mutation in the
rne 131 gene. The use of BL21Star (DE3) strain increases mRNA stability, which in turn increases protein expression (
Carpousis, 2007).
Codon-supplemented strains
The codon frequency difference between the target gene and the expression host can lead to premature translation termination, translational stalling and amino acid misincorporation. Rare codons are codon for arginine (AGA, AGG), isoleucine (AUA), leucine (CUA) and proline (CCC). There are two approaches followed to overcome rare codon associated problems. They are 1. Changes are made to the gene and 2. Changes are made to the expression host;
E. coli, expression strains supplemented with the rare tRNAs. These rare tRNAs are co-expressed with the wild-type (non-optimized) target gene.
1. CodonPlus-RIL (BL21-RIL)
CodonPlus-RIL strains have tRNAs that restrict translation of heterologous proteins from organisms that have AT-rich genomes
(Joseph et al., 2015). BL21 strains are engineered to contain extra copies of the gene that encodes rare tRNAs for Arg, Ile and Leu (
Rosano and Ceccarelli, 2009).
2. CodonPlus-RP (BL21-RP)
Strain is used to overcome GC rich genome bias. These bacterial strains contain extra copies of the
ileY, argU and l
euW tRNA genes. These genes encode tRNAs that recognize the codon of isoleucine, arginine and the leucine,
(Joseph et al., 2015).
3. Rosetta
Host strains are derivatives of BL21(DE3) strain, designed to enhance the expression of proteins which contains rare codons used in
E.coli (Joseph et al., 2015); these strains contain pRARE plasmid, which supplies tRNAs for all the above-mentioned codons plus GGA (Gly). Use of this strain will increase the heterologous protein expression, but it will decrease protein solubility
(Milisavljević et al., 2009).
Strains to express disulfide-bonded proteins
Mutation in glutathione reductase (gor) and thioredoxin reductase (
trxB) gene in the host strains will aid the formation of cytosolic disulfide bonds, and it will enhance the solubility of folded, disulfide-containing proteins
(Prinz et al., 1997).
1. E. coli Origam
Strain of ‘Novagene’. The Origami strain is trxB (thioredoxin reductase) mutants, so disulfide bond formation in the cytoplasm is enhanced and allows proper folding of the recombinant protein; the Origami strain also lacks the glutathione reductase gene.
2. ‘SHuffle’ E. coli strain
From ‘NEB’ are better than ‘Origami’ strain for the expression of putative disulfide bond forming protein. SHuffle strains express DsbC within the cytoplasm in addition to trxB and gor mutation. DsbC directs correct disulfide bond formation and also acts as a general chaperone for protein folding
(Lobstein et al., 2012).
Strains to express toxic proteins
Leaky expression of T7 polymerase is observed in BL21(DE3). To minimize the leaky expression of a toxic gene, the BL21 host strain was improved.
1. BL21-AI
araBAD promoter controls the T7 RNA polymerase gene in BL21-AI strain. The
araBAD promoter system is optimal for expressing toxic proteins. In BL21-AI cells, T7 RNA polymerase basal expression and the subsequent target gene expression is highly reduced in the presence of glucose and absence of arabinose (
Chen and Leong, 2009;
Yao et al., 2009).
2. BL21(DE3) pLysS
BL21(DE3) pLysS strains, express T7 phage lysozyme, an enzyme that effectively inhibits T7 RNA polymerase activity. So, the basal expression of the target protein is decreased. Culturing of BL21(DE3) pLysS requires Chloramphenicol
(Lefebvre et al., 2008).
Strains to express globular or membrane protein
Lemo21 (DE3) is a derivative of BL21(DE3) strain. This strain contains well-titrable rhamnose promoter (Prha) with T7 polymerase. Its inhibitor T7 lysozyme controls the activity of Prha promoter. Lemo21 (DE3) strain is compatible with ColE1 or pMB1 origin-containing plasmids. Chloram phenicol is required for the maintenance of this strain
(Schlegel et al., 2012).
Methylation deficient strains
Bacteria possess restriction and modification systems that allow them to identify and destroy foreign DNA. This is one of the significant problems in cloning and resulting in the substantially reduced recovery of desired sequences. Most of the
E. coli strains contain several methylation-dependent restriction systems, namely McrA, McrBC and Mrr. These problems can be avoided by using the strains in which restriction and modification systems are disabled.
1. McrA, McrBC
The methylcytosine restricting endonucleases (McrA, McrBC) cleave methylcytosines in the sequences CG and (A/C) G, respectively. Inactivation of the pathway that cleaves methylated cytosine DNA allows uptake of foreign DNA.
2. Mrr
Mrr (Methyl adenine recognition and restriction) will attack DNA with methyladenine in specific sequences. Inactivation of the pathway that cleaves methylated adenine DNA allows uptake of foreign DNA.
Strain for expression at low temperature
When
E. coli is transformed to manufacture large amounts of recombinant protein, the protein sometimes forms dense aggregates of insoluble misfolded proteins, known as inclusion bodies. Reduction in cultivation temperature (15-25°C) avoids or decreases the inclusion body formation.
San-Miguel et al., (2013) reported the successful protein expression at 4°C for 72 h. At low temperature, the expression of chaperone, which folds newly synthesized or misfolded protein, also reduces drastically.
Arctic Express
(Agilent technologies) strain is derived from the high-performance Stratagene BL21-Gold. This strain co-expresses the cold-adapted chaperonins such as Cpn10 and Cpn6 from a psychrophilic bacterium,
Oleispira antarctica (Ferrer
et al., 2004). Cpn10 and Cpn60 are effective folding-modulators at low temperatures (4°C to 12°C) and confer an enhanced ability for
E. coli to grow at lower temperatures
(Ferrer et al., 2003).
(IV) Changing the culture condition of the recombinant host
Culture condition of
E. coli also changes the recombinant protein expression and its solubility.
E. coli is the prokaryotic organism, so transcription and translation are coupled. Using strong promoters results in aggregation of protein before folding. This problem can be solved by reducing the rate of transcription or translation so that the protein folds properly (
Francis and Page, 2010). The culture condition of the particular recombinant strain should be changed to enhance the solubility of the protein.
1. Temperature and inducer concentration
Prolonged induction of low temperature increases the solubility of recombinant protein
(Kataeva et al., 2005; Volontè et al., 2008;
Piserchio et al., 2009). At lower expression temperature, most of the proteases get inactivated, so the rate of protein denaturation also reduced. Lower temperature reduces the rate of metabolism of bacteria, so it leads to reduced protein aggregation
(Sahdev et al., 2008). However, this comes along with prolonged cultivation times. In addition to lowering the expression temperature also reducing the inducer concentration and IPTG will result in reduced transcription rate and enhanced recombinant protein solubility
(Turner et al., 2005; Francis and Page, 2010;
Gopal and Kumar, 2013).
At lower temperatures, cell processes slow down, and thus lead to reduced rates of transcription, translation, cell division and reduced protein aggregation. Lowering the expression temperature also results in a reduction in the degradation of proteolytically sensitive proteins.
2. Media
Batch culture is the most common method used to cultivate the bacterial cells for recombinant protein expression. To optimize the level of expression, it is necessary to fine tune the culture medium because it is much cheaper and easier to manipulate the media compositions
(Vincentelli et al., 2003). Although changing the media concentration has the limited impact on the recombinant protein expression, all the essential nutrients required for the growth must be provided from the beginning
(Sahdev et al., 2008). Various media like LB, TB and 2YT can be used to optimize the protein concentration and addition of prosthetic groups or cofactors in the culture medium will prevent the formation of inclusion bodies
(Joseph et al., 2015).
(V) Co-Expression with other genes
Simultaneous expression of more genes is required to stabilize the recombinant protein or the protein produced by the counterparts will interact with it. The gene coding those proteins should be co-expressed with the target protein. One of such protein is a molecular chaperone; when chaperones co-expressed with the gene of interest, it will increase the solubility and expression of the protein (
Francis and Page, 2010;
DeMarco et al., 2005; Gopal and Kumar, 2013). Co-expression of single chain variable fragment (scFv) antibody with a molecular chaperon effectively improved the correct folding and enhanced the solubility of scFv
(Sonoda et al., 2011).