Recombinant Proteins: Production, Expression Systems, and Therapeutic Uses
What Are Recombinant Proteins?
A recombinant protein is a protein made by genetically engineered cells rather than extracted directly from its natural source. Scientists insert the gene for the desired protein into a production host such as Escherichia coli, yeast, insect cells, or mammalian cells, and the host cell then produces the protein during growth.
More precisely, a recombinant protein is a protein whose gene has been deliberately introduced into a host cell using recombinant DNA technology, so that the host’s transcription and translation machinery synthesizes the protein from scratch. The host – the cell or organism doing the synthesis – can be a bacterium, a yeast, a cultured insect or mammalian cell, or even a transgenic plant or animal. This approach replaces the older practice of extracting proteins directly from their natural source, such as insulin from animal pancreases or clotting factors from pooled human plasma, and the substitution is what makes modern biotechnology possible.
Key points: Recombinant proteins are made by inserting a protein-coding gene into a production host such as bacteria, yeast, insect cells, or mammalian cells. Simple proteins can often be made quickly in E. coli, while complex therapeutic proteins such as antibodies usually require mammalian cells because they need human-like folding and glycosylation. Recombinant proteins include insulin, growth hormone, monoclonal antibodies, vaccine antigens, clotting factors, and many industrial enzymes.
The technology works because the genetic code is essentially universal: a bacterial ribosome can translate an mRNA transcribed from a human coding sequence, provided that the expression construct includes bacterial regulatory elements such as an appropriate promoter and ribosome-binding site. Recombinant protein production grew directly out of the 1973 Cohen–Boyer experiment, which showed that DNA fragments cut with restriction enzymes could be ligated into a bacterial plasmid and replicated inside Escherichia coli.
Within five years, scientists at Genentech, City of Hope National Medical Center, and the University of California had used the same approach to express the human insulin A and B chains in E. coli, and in October 1982 the U.S. Food and Drug Administration approved Eli Lilly’s Humulin as the first recombinant protein drug for human use. By the late 1980s, recombinant human growth hormone, interferon-α, erythropoietin, and tissue plasminogen activator had followed Humulin to market, establishing the template – gene cloned into a bacterial or mammalian host, expressed in a stirred-tank bioreactor, purified to clinical grade – that still defines the industry.
Today recombinant proteins are one of the largest and most commercially important classes of biological drugs and a foundational tool in research. They include therapeutic insulins, growth hormone, erythropoietin, clotting factors, interferons, granulocyte colony-stimulating factor, virtually all therapeutic monoclonal antibodies, many vaccine antigens, and a long list of industrial enzymes used in laundry detergents, food processing, and biofuel production. Between January 2018 and June 2022 alone, regulators in the United States and European Union approved 197 new biopharmaceutical products, the great majority of which were recombinant proteins.
How Recombinant Proteins Are Made
Production of a recombinant protein begins with the gene encoding it. The coding sequence is either amplified from a natural source by polymerase chain reaction or, increasingly, synthesized chemically from scratch with codon usage optimized for the chosen host. The gene is cloned into an expression vector – typically a plasmid – that carries a strong promoter to drive transcription, a ribosome-binding site or Kozak sequence appropriate to the host, a transcription terminator or polyadenylation signal, and a selectable marker that allows researchers to identify cells carrying the construct. The vector is then introduced into the host by transformation (for bacteria), transfection (for mammalian cells), electroporation, or viral transduction, and individual high-producing clones are isolated and expanded.
Inside each cell, the host’s ribosomes translate the foreign messenger RNA into the recombinant protein, which then folds and undergoes whatever post-translational modifications the host is capable of performing – see translation.
Industrial-scale production then scales this workflow up by orders of magnitude. A research cell line is adapted to a chemically defined, animal-component-free medium, expanded through progressively larger seed cultures, and finally inoculated into a stainless-steel or single-use bioreactor. Chinese hamster ovary (CHO) cell production runs typically last 10 to 14 days in fed-batch mode, with continuous feeding of glucose, amino acids, and other nutrients to sustain high cell density and product titer. Bacterial processes use fermentation rather than mammalian cell culture and operate on much shorter cycles, often producing harvestable biomass within 24 to 48 hours.
Once expression is complete, the protein must be recovered and purified. Soluble proteins secreted into the culture supernatant are captured directly from the clarified medium; proteins retained inside the cell are released by mechanical or chemical lysis; and proteins that misfold and aggregate into dense intracellular deposits called inclusion bodies are solubilized in strong denaturants such as urea or guanidinium chloride and refolded in vitro.
Downstream bioprocessing typically combines a capture step, one or more polishing chromatography steps such as ion exchange and size exclusion, and viral inactivation and removal steps for therapeutic products. Affinity tags such as polyhistidine sequences are often appended to the protein to simplify laboratory-scale purification, although they are usually removed or omitted in clinical-grade material. Each step is characterized and the overall process validated against regulatory specifications for purity, potency, and the absence of host cell proteins, residual DNA, and endotoxin.
Expression Systems: Bacteria, Yeast, Insect, and Mammalian Cells
Choosing an expression host is the single most consequential decision in a recombinant protein project, because each system imposes its own ceiling on protein complexity, post-translational modification, scale, and cost. The four mainstream platforms – bacteria, yeast, insect cells, and mammalian cells – each occupy a distinct niche, and recombinant proteins approved as drugs use a mixture of all four. Among mammalian platforms, Chinese hamster ovary (CHO) cells are by far the most important, accounting for the majority of approved monoclonal antibodies and complex therapeutic proteins. A fifth category, transgenic plants and animals, contributes a small but growing number of approved products.
| Expression system | Typical host | Strengths | Limitations | Representative products |
|---|---|---|---|---|
| Bacterial | E. coli, Bacillus subtilis | Fast growth (hours), low media cost, high yield, simple genetics | No glycosylation, frequent misfolding into inclusion bodies, endotoxin contamination | Human insulin, growth hormone, interferons, granulocyte colony-stimulating factor |
| Yeast | Saccharomyces cerevisiae, Pichia pastoris | Eukaryotic folding, secretion, basic glycosylation, scalable fermentation | Hyper-mannosylated glycans not native to humans, slower than bacteria | Hepatitis B vaccine antigen, insulin analogs, virus-like particles |
| Insect cell / baculovirus | Sf9, Sf21, High Five | Complex folding, disulfide bonds, suitable for virus-like particles and structural studies | Glycosylation differs from human, transient expression, slower scale-up | Cervarix HPV vaccine, FluBlok influenza vaccine antigens |
| Mammalian cell | Chinese hamster ovary (CHO), HEK293, NS0, BHK, PER.C6 | Human-like glycosylation, complex folding, secretion, regulatory familiarity | Slow growth (days), expensive media, lower volumetric yield | Monoclonal antibodies, erythropoietin, clotting factors, tissue plasminogen activator |
| Transgenic plant / animal | Tobacco, carrot cells, transgenic goats and rabbits | Potentially low cost at large scale; complex glycosylation possible | Long development times, regulatory complexity, batch-to-batch variability; commercial track record remains thin | Elelyso (taliglucerase alfa), ATryn (recombinant antithrombin) |
In practice, the choice is dominated by whether the target protein needs glycosylation. In recent approval datasets, CHO cells account for close to 90 percent of mammalian-cell-derived recombinant therapeutics. Bacterial systems remain dominant for small, non-glycosylated hormones and growth factors, where their speed and cost advantages are decisive. Yeast occupies a middle ground for proteins that need eukaryotic folding but tolerate yeast-pattern glycans, including most virus-like-particle vaccines.
Applications in Medicine and Industry
The largest medical application of recombinant proteins is replacement therapy for missing or defective endogenous proteins. Recombinant human insulin treats diabetes, recombinant human growth hormone treats pediatric growth deficiency, recombinant erythropoietin treats anemia in chronic kidney disease and chemotherapy patients, and recombinant clotting factors VIII and IX treat hemophilia A and B. Each of these replaced an earlier product extracted from animal tissues or pooled human plasma, eliminating supply constraints and the risk of bloodborne pathogen transmission that became starkly apparent when contaminated plasma-derived clotting factors caused widespread HIV and hepatitis C infection in hemophilia patients during the 1980s.
A second large category is therapeutic antibodies – recombinant immunoglobulins designed to bind a specific molecular target on cancer cells, immune cells, cytokines, or pathogens. Monoclonal antibodies as a group account for the largest number of biopharmaceutical approvals and the highest sales of any drug class, with products such as adalimumab, pembrolizumab, and trastuzumab generating tens of billions of dollars in annual revenue. Recombinant subunit vaccines – including the hepatitis B and human papillomavirus vaccines and the protein-based COVID-19 vaccines – deliver a recombinant pathogen antigen to elicit immunity without using live virus. Recombinant proteins also underpin adjacent therapeutic platforms: the adeno-associated virus vectors used in gene therapy, for example, are produced by transient transfection of plasmids encoding recombinant viral capsid and helper proteins.
Outside medicine, recombinant proteins drive a substantial industrial enzyme market. Recombinant amylases, proteases, lipases, and cellulases produced in bacterial and fungal hosts are used in laundry and dishwasher detergents, food processing, leather and textile manufacturing, biofuel ethanol production, and animal feed. Recombinant chymosin, the calf stomach enzyme used in cheese-making, was the first recombinant enzyme approved for use in food and now accounts for the great majority of industrial cheese production worldwide. Diagnostic kits, structural biology, drug discovery screens, protein engineering projects, and laboratory reagents likewise depend on recombinant proteins as their core inputs.
Limitations and Challenges
Despite five decades of refinement, recombinant protein production remains technically demanding and commercially expensive. Many proteins fail to fold correctly in bacterial hosts and accumulate as inclusion bodies, requiring laborious solubilization and refolding to recover active material. Eukaryotic expression systems fold complex proteins more reliably but are slower, costlier, and lower-yielding. Membrane proteins, large multi-subunit complexes, and proteins with extensive disulfide networks remain difficult to express in any system, which limits the structural and pharmacological study of important drug targets such as G-protein-coupled receptors and ion channels.
Glycosylation differences between hosts create both clinical and reproducibility problems. Yeast-produced proteins carry hyper-mannosylated glycans that are recognized as foreign by the human immune system; insect cells produce paucimannose glycans that lack the terminal sialic acid required for long serum half-life; and even CHO cell glycans differ subtly from human patterns and can introduce non-human sugars such as N-glycolylneuraminic acid. These differences affect a recombinant protein’s pharmacokinetics, immunogenicity, and effector function.
Reproducibility is challenged from a second direction as well: even within a single clonal CHO cell line, individual cells differ in transgene copy number, integration site, and glycosylation profile, producing a population of product molecules that vary in charge, glycan composition, and biological activity. Modern strategies to control this heterogeneity include site-specific transgene integration into pre-validated genomic landing pads, glycoengineered host cell lines with edited glycosyltransferase genes, and tighter control of bioreactor parameters such as dissolved oxygen, temperature, and pH.
Cost and access are the other defining limitations. Complex recombinant biologics can require very large investments in development, clinical trials, regulatory work, manufacturing facilities, and quality control. Manufacturing costs vary widely by molecule and process, but downstream chromatography resins, single-use bioreactor components, sterile operations, and analytical testing can all add substantial recurring expense. The result is that many biopharmaceuticals still carry high per-patient treatment costs, especially before biosimilar competition develops, and these prices can strain healthcare systems even in wealthy countries.
Future Directions
The next phase of recombinant protein production is being shaped by three converging developments. Continuous, intensified bioprocessing using perfusion bioreactors and integrated chromatography aims to reduce manufacturing footprints and capital costs, although fed-batch production remains the industry standard for many biologics. Cell-free expression systems, which use purified ribosomes and translation machinery without intact cells, are moving from research curiosities toward more practical platforms for personalized therapeutics and difficult-to-express targets such as membrane proteins.
AI-driven protein design – recognized by the 2024 Nobel Prize in Chemistry awarded for computational protein design and structure prediction – is being combined with high-throughput recombinant expression to engineer proteins with properties rarely or never seen in nature, from enzymes that degrade plastic to therapeutic binders custom-designed against intractable disease targets. Together, these advances are likely to keep recombinant proteins central to biotechnology even as new modalities such as messenger RNA, gene therapy, and cell therapy expand the medicines pipeline alongside them.
Frequently Asked Questions
What is the difference between a natural protein and a recombinant protein? A natural protein is one extracted from the tissue or organism in which it is normally produced, such as insulin from animal pancreases or clotting factors from human plasma. A recombinant protein has the same amino acid sequence but is produced by a host cell into which the corresponding gene has been deliberately introduced, typically a bacterium, yeast, or mammalian cell line. The two molecules can be chemically identical, but the recombinant version offers advantages in supply, purity, and freedom from contaminants such as bloodborne viruses.
What was the first recombinant protein drug? Recombinant human insulin, marketed by Eli Lilly as Humulin, was the first recombinant protein drug approved for human use. The U.S. Food and Drug Administration approved it in October 1982, less than six months after Lilly submitted the new drug application. The product was produced in Escherichia coli using technology developed by Genentech and City of Hope National Medical Center, and it replaced animal-derived insulin extracted from cow and pig pancreases.
Why are most therapeutic recombinant proteins made in CHO cells rather than bacteria? Chinese hamster ovary (CHO) cells perform the post-translational modifications, especially N-linked glycosylation, that complex human proteins such as antibodies, clotting factors, and many hormones require to be active and non-immunogenic in patients. Bacteria such as E. coli grow faster and more cheaply but cannot add human-like sugar chains to proteins. CHO cells also tolerate suspension culture in large bioreactors and produce proteins with glycosylation patterns close enough to human that they are well tolerated clinically. As a result, CHO cells account for close to 90 percent of mammalian-cell-derived recombinant therapeutics in recent approval datasets.
What is a biosimilar of a recombinant protein? A biosimilar is a recombinant protein drug developed to be highly similar to an already-approved ‘reference’ biologic, typically launched after the reference product loses patent protection. Unlike small-molecule generics, biosimilars cannot be exact copies because each manufacturer uses its own proprietary cell line, culture conditions, and purification process, and these differences inevitably produce small variations in glycosylation and structure. Regulators therefore require extensive analytical characterization and bridging clinical studies to demonstrate that a biosimilar matches the reference product in efficacy and safety. The first U.S. biosimilar, filgrastim-sndz, was approved in March 2015.
How long does it take to develop a recombinant protein from gene to clinical-grade product? Generating a stable mammalian cell line that produces a candidate recombinant protein has traditionally taken six to twelve months, though platform CHO lines using pre-validated genomic landing pads can compress this to roughly three to four months for routine targets. Adapting the cell line to a chemically defined medium, optimizing the bioreactor process, and developing a downstream purification train usually adds another six to twelve months. Including regulatory submissions and clinical trials, moving a new recombinant protein from gene to approved drug remains a process measured in years rather than months.
Further Reading
Proceedings of the National Academy of Sciences, Construction of Biologically Functional Bacterial Plasmids In Vitro
Proceedings of the National Academy of Sciences, Expression in Escherichia coli of Chemically Synthesized Genes for Human Insulin
Nature Biotechnology, Biopharmaceutical Benchmarks 2022
International Journal of Molecular Sciences, From Cell Clones to Recombinant Protein Product Heterogeneity in Chinese Hamster Ovary Cell Systems
Bioscience, Biotechnology, and Biochemistry, Advances in Recombinant Protein Production in Microorganisms and Functional Peptide Tags

