"It is always the simple that produces the marvelous."
- Amelia Barr
Engineering eukaryotic cells to perform desired complex functions demands the coexpression of an increasing number of genes. This poses a challenge due to the low prevalence of polycistronic expression in eukaryotic organisms compared to bacteria. Despite the existence of well-studied polycistronic expression systems, newer and more effective alternatives entail a valuable tool for reducing the length and complexity of genetic constructs.
We have found in the literature a study by Yue et al. (2023) which generated a sequence, named IGG6, capable of providing multicistronic expression in various fungal species. More recently (Ma et al., 2024), this sequence was renamed to NAL10, for Nucleid Acid Linker, and was successfully implemented in plants and mammals.
Here we discuss the existing and most utilized systems for polycistronic expression available in eukaryotes: IRES and 2A peptides, and compare them with the recently identified IGG6/NAL10 sequence, now available in the Registry for the iGEM community to access (BBa_K5466001).
One of the most studied strategies are internal ribosome entry sites (IRES), sequences of RNA native to viruses that promote translation of downstream open reading frames in a 5’-cap independent manner.
Typically, in eukaryotic cells, the 7-methylguanosine cap constitutes a signal for the assembly of initiation factors and recruitment of the ribosome. IRES fold into complex secondary structures that can interact with initiation factors as well, leading to ribosomal positioning and translation.
Schematic representation of IRES mechanism: IRES recruitment of a ribosome can drive translation in a 5'CAP-independent manner
IRES have been extensively applied in yeast and superior eukaryotes for the construction of bicistronic and polycistronic constructs, permitting the production of separate, unmodified proteins. Three different types of IRES are known, with different performance depending on cell line. Natural IRES are usually over 500 nucleotides long although minimal alternatives have been optimised. However, IRES-driven translation is less efficient than 5-cap-dependent one, resulting in a lower expression that might not fit the circuit’s requirements. (Wang & Marchisio, 2021).
Also original from viruses, 2A peptides are oligopeptides of 18-22 amino acids long which appear between two CDSs in certain parts of their genome. These peptides have a highly conserved sequence at the C-terminus (GDVEXNPGP) and they are said to have a “self-cleavage” capacity, owing to the fact that they can impair the formation of a glycyl-prolyl peptide bond at the end of said sequence.
This process, called ribosomal skip, permits the coexpression of genes separated by 2A peptide-coding sequences and therefore the construction of multicistronic sequences. Many 2A peptides with different efficiencies have been characterised, allowing for fine-tuning the levels of expression of different proteins in multigene constructs.
Mechanism of 2A peptides: three possible outcomes can occur: (i) readthrough with production of a fusion protein, (ii) ribosome skip with the production of two separate but modified proteins and (iii) ribosome drop-off with synthesis of the upstream protein
The short length of 2A peptides and their generally higher efficiency compared to IRES makes them a good alternative for synthetic biology applications. However a clear downside is the modification of the sequence of the desired proteins, with the first remaining fused to the 2A peptide (except for the C-terminal proline) and the second being added a proline residue to its C-terminal (Wang & Marchisio, 2021). This alterations can result in improper protein folding and malfunction, as well as interference with signal peptides function. In addition, a failure in the process of ribosomal skip can result in no production of the second protein or the synthesis of a fusion protein when readthrough occurs.
IGG6/NAL10 is a 9-bp-long sequence which resulted from the optimization of IGG1, an intergenic sequence that had been reported to generate functional operons in the fungus Glarea lozoyensis.
The optimized IGG6 sequence has been demonstrated to be functional for GFP bicistrons in various fungi including the yeasts Saccharomyces cerevisiae, Pichia pastoris and Yarrowia lipolytica, as well as the filamentous fungus Aspergillus nidulans.
In their work, Lue et al. also demonstrated polycistronic expression in Saccharomyces cerevisiae. IGG6-mediated expression of the zeocin-resistance gene (KanR) was achieved, providing resistance when positioned up to the fourth ORF. At the same time, the expression of the rest of the ORFs was proven by β-carotene production through the introduction of the carotenoid biosynthetic genes crtYB, crtE, crtI.
Cell growth in the presence of zeocin of strains expressing KanR gene at different positions of a polycistronic expression cassette. Figure from Yue et al., (2023)
To assess whether IGG6-separated coding sequences produced individual unmodified proteins, the researchers coexpressed mCherry and GFP with two different subcellular localization signals, targeting the peroxisome (MDH3) and nucleus (NLS), respectively. As shown in the figure below, each fluorescence was only observed in the expect subcompartment, demonstrating that the proteins were being produced separately and not as a fusion.
Fluorescence microscopy of IGG6-mediated mCherry-MDH3 and GFP-SV40 NLS-coexpressing S. cerevisiae strain. Figure from Yue et al., (2023)
Regarding the mechanism of IGG6, Lue et al. (2023) designed a series of bicistronic constructs harboring TDH3-FLAG followed by the GFP gene, with or without a translation blocking sequence (TBS) in different parts of the construct. As seen in the figure below, GFP expression was only evident when no TBS was added, showing that the translation of the downstream message is dependent on the translation of the upstream coding sequence.
Translation re-initiation at the distal gene GFP, mediated by IGG6. Translation of Tdh3p and GFP from different gene expression units was monitored by Western blotting and fluorescence microscopy, respectively. The scale bars of the fluorescence images are 20 μm. Red triangles depict translation blocking sequences (TBSs). Figure from Yue et al., (2023)
Based on this result, a mechanism of translation re-inititation was proposed, where the RNA transcribed forms a loop which, through interaction with the ribosome, avoids its disassembly and promotes its movement towards the next translation initiation codon.
Overview of IGG6-mediated polycistronic expression: the IGG6 sequence promotes translation re-initiation by the ribosome, permitting the separate translation of consecutive CDSs in an mRNA
In the article by Ma et al, NAL10 sequence was demonstrated to be functional in organisms further from fungi. By generating a NAL10-mediated bicistron expressing mCherry and GFP with distinct localization signals, fluorescent intensities were shown to localize to different cellular subcompartments both in maize protoplasts and human 293 T cells.
In maize protoplast, NAL10 was used to produce mCherry with a Golgi localization signal and GFP with a nuclear localization signal.
Confocal microscopy of maize protoplast expressing GLS-mCherry-NAL10-NLS-GFP cassette. Figure from Ma et al. (2024)
In human 293 T cells, a bicistron was implemented producing GFP and mCherry with SV40 NLS. Green fluorescent was visible in the whole cytoplasm while red fluorescence localized to the nucleus.
Human 293 T cells expressing mCherry-NAL10-NLS-GFP cassette. Figure from Ma et al. (2024)
To evaluate the performance of IGG6 compared to those of the previously described systems in Saccharmoyces cerevisiae, 6 IRES with high ribosome-recruiting activities and 2 highly efficient 2A peptides were selected for the construction of GFP bicistrons.
Based on the fluorescence intensity of the resulting strains, IGG6 was shown to outperform IRES with a 12 to 130-fold increase in signal. Meanwhile, the IGG6-based system’s efficiency fell closely behind that of 2A peptides, with a 37-47% of the fluorescence intensity.
GFP-signal from IRES, 2A peptides and IGG6-mediated bicistrons. Six IRES (IRES8, IRES10, IRES32, IRES40, IRES41, IRES47) and two 2A peptides (ERBV2A, P2A) were chosen for the assay. Figure from Yue et al., (2023)
Therefore, IGG6/NAL10-based polycistronic constitutes a great alternative to the existing methods due to its high efficiency and the unmodified sequence of the proteins produced.
The properties of this novel polycistronic expression system confers the potential to revolutionize several aspects of synthetic biology and new technologies based on this system are being developed.
In the context of metabolic engineering, the development of polycistronic genes in fungi, plants and animals can simplify the implementation of intricate and complex metabolic pathways. The successful introduction of a new pathway requires the fine-tuning of protein expression and ensurance of stable levels of production. Yue et al. (2023) developed a system called HACKing (Highly efficient and Accessible system by CracKing genes into the genome). Utilizing a library of 65 validated driver genes, they employed GTR-CRISPR21 for the rapid, multiplexed integration of multiple genes of interest (GOIs) into bicistronic transcription units, eliminating the need for selection markers and facilitating the simultaneous expression of GOI-encoded enzymes at stable, pre-calibrated levels. The HACKing system was validated through the rapid and efficient creation of a S. cerevisiae strain that produces high amounts of triterpene squalene.
HACKing system: A host gene with an appropriate, pre-validated translation level under the desired cultivation conditions is selected for each GOI to serve as a driver. Figure from Yue et al., (2023)
IGG6/NAL10 also facilitates gene stacking, which combines multiple genes to enhance traits such as disease resistance, crop productivity, or other desirable horticultural characteristics in plants. In humans, it enables the simultaneous delivery of several therapeutic agents necessary for treating diseases.
In short, we incorporate into the Part Registry a new basic part named IGG6/NAL10, taken from the work of Lue et al. (2023) and Ma et al. (2024), which promises to have great advantages over IRES and 2A peptides for polycistronic expression in eukaryotes and open new posibilites to engineering biological systems.
We highly encourage future iGEM Teams to further characterize this part and exploit its functionality to simplify multigene constructions for several applications.
Ma, X., Yue, Q., Miao, L., Li, S., Tian, J., Si, W., Zhang, L., Yang, W., Zhou, X., Zhang, J., Chen, R., Xu, Y., & Liu, X. (2024). A novel nucleic acid linker for multi-gene expression enhances plant and animal synthetic biology. The Plant journal, 118(6), 1864–1871. https://doi.org/10.1111/tpj.16714
Wang, X., & Marchisio, M. A. (2021). Synthetic polycistronic sequences in eukaryotes. Synthetic and systems biotechnology, 6(4), 254–261. https://doi.org/10.1016/j.synbio.2021.09.003
Yue, Q., Meng, J., Qiu, Y. et al. A polycistronic system for multiplexed and precalibrated expression of multigene pathways in fungi. Nature Communication 14(1), 4267 (2023). https://doi.org/10.1038/s41467-023-40027-0