Engineering Success

The Design-Build-Test-Learn (DBTL) cycle—also known as the engineering cycle—is essential in synthetic biology for optimizing and modularly controlling biological systems. Throughout the development of progRAM, we iterated through this cycle multiple times, integrating wet lab work, modeling, and human practices into the construction of our tapes and our experimental proof-of-concept. This iterative approach highlights our team’s ongoing efforts and commitment to rigorous scientific work.

Choice of RNA editor

DESIGN

The restoration of NanoLuciferase activity in our assays confirmed that the REPAIR v2 system functions as intended, but we obverved too low efficiency of the REPAIR v2 system. Hence, we decided to first optimize the editing conditions in the assay, before proceeding with further experiments

The highest level of NanoLuciferase activity restoration was observed at a 1:5:5 ratio of NanoLuc plasmid, gRNA, and editor, respectively. This indicates that higher concentrations of the gRNA and editor are more effective at inducing RNA editing activity. With this experimental set-up we could achieve a significant increase in the editing eficiency - up to 13% compared to 2% in the first set-up. This ratio (1:5:5) was subsequently adopted for future experiments, including tape-switching with both NLuc and XFP constructs.

Once the concept of molecular recording using sequential edits on an RNA tape was formed, we needed to choose the most suitable editing system. Through extensive research on various RNA-editing systems (refer to our Human Practices page for more details on our research sessions and expert consultations), we ruled out the LEAPER system, which leverages endogenous ADAR for programmable editing. LEAPER uses 100-200 nucleotide long gRNAs, which were too lengthy for our sequential recording tape design. We also considered RESCUE (RNA Editing for Specific C to U Exchange) (Abudayyeh et al., 2019), but its mechanism was incompatible with our readout system, which relies on the destruction of START codons. After several cycles of research sessions, we chose the REPAIR (RNA Editing for Programmable A to I Replacement) system (Cox et al., 2017), as it supported the sequential destruction of START codons without requiring large aptamer sequences. Since REPAIR v2 has been shown to be more efficient than the first version (Cox et al., 2017), we decided to begin our experiments with it.

To ensure the most efficient rate of RNA editing from the REPAIR v2 system, before proceeding with our planned tape-switching experiments, we decided to optimize the ratio of transfected plasmids ratios in our assays.

To test the functionality of the REPAIR v2 system itself, we first designed a NanoLuciferase (NLuc) restoration assay (details on the Results page). The assay required three components: a plasmid encoding NanoLuciferase with an in-frame STOP codon in the N-terminus of its CDS that renders the protein as non-functional; a plasmid encoding dPspCas13b/ADAR2DD (REPAIR2-system); and a plasmid encoding a gRNA to guide the dPspCas13b/ADAR2DD fusion to the target site. This system would restore NanoLuciferase activity only if the gRNA directed the dPspCas13b/ADAR2DD fusion to modify the target from A to 'G', eliminating the STOP codon (UAG) in the CDS.

We updated the NanoLuciferase assay to test different plasmid ratios for optimal editing efficiency. The transfection ratios of plasmid encoding the non-functional NLuc, gRNA, and the dPspCas13b/ADAR2DD editor were varied in the following proportions, respectively: 1:1:1, 1:2:2, 1:3:3, 1:4:4, and 1:5:5.

To evaluate REPAIR, the NanoLuciferase restoration assay was performed on HEK293T cells previously transfected with the necessary plasmids. After 72h, luminescence readings were taken to monitor restoration levels, with control groups transfected without gRNA or the editor to rule out spontaneous recovery or off-target effects

HEK293T cells were transfected with the specified plasmid combinations, as the NanoLuciferase restoration assay was performed to evaluate RNA editing efficiency at different ratios.

1. Iteration

Minimal synthetic vector design

DESIGN

Successful cloning was achieved, and after cloning our composite part into the backbone, we could conclude that it functioned as intended (BBa_K5102000).

After choosing a base-editing system, we needed to select the most suitable backbone to insert our recording tape into. To minimize potential cross-interactions with our sensitive recording tape, we decided to design and clone our own minimal vector, which we refer to as pRAM (BBa_K5102000).

The construction of pRAM began with a pcDNA3.4-TOPO vector available to us in the lab. Building pRAM required five cloning steps. The first step involved removing the CDS via KLD reaction to create an empty vector. Next, the T7 promoter and T7 terminator were introduced through Gibson Assembly. In the third and fourth steps, we removed the HSV TK poly(A) signal, f1 ori, SV40 promoter, and NeoR/KanR using KLD reaction. Finally, the last two steps involved introducing point mutations to eliminate BsaI and BgIII restriction sites via KLD. Plasmid map of pcDNA and pRAM

The generated plasmids were evaluated via agarose gel electrophoresis and Sanger/ whole plasmid sequencing several times along the way.

1. Iteration

RNA-recording tape and gRNA design

DESIGN

None of the tested spacers designed by manual nucleotide replacement yielded a correctly folding guide RNA (gRNA) within the expected parameters. We learned that manual work was too time-consuming and not able to yield the results we expected, making us realize the need for a more automated approach.

Recognizing the limitations of manual nucleotide replacement, we transitioned to a more systematic approach with automated tape and gRNA design that scores tapes based on the implemented functions. From the generated output, we selected the highest-scoring tapes for further experimental testing.

Due to inconclusive results with the NLuc Assay, we recognized its limitations for testing tape-switching efficiency in our system. Due to time and availability constraints, we opted to shift to testing with XFPs to explore whether their inherent readout output correlates better with successful tape-switching events in future attempts.

See results page

In addition to our RNA-editor and backbone-plasmid, our main goal was to design a synthetic recording tape that would feature a series of START codons in three different open reading frames. This design would allow for the downstream placement of fluorescent proteins, serving as an in vivo read-out system for the cellular events we intend to record. Since we had to design the recording tape from scratch, we started with a minimal design, which included only Kozak sequences and START codons placed in these three reading frames. In between are stretches of nucleotides that can be replaced by any of the three nucleotides: G, C, or U. The dPspCas13b/ADAR2DD fusion is directed to the target site by a guide RNA (gRNA). The gRNA is composed of spacers and a hairpin sequence essential for dPspCas13b function. So as a next step, we had to design the binding region of the spacer gRNA sequences to be complementary to the target with two mismatches: one that would disrupt unspecific binding, and an A to C mismatch at the target deamination site (See our parts page for more information on the tape-design process). The hairpin sequence was identical to the one used in the REPAIR v2 system developed by Cox et al., 2017.

After realizing the limitations of manual nucleotide replacement, we decided to write specific code for designing the recording tape and guide RNAs (gRNAs) that would enable us to include more functions and parameters that we could manually optimize for. Apart from correct folding of gRNAs and its specificity, we included functions addressing tape accessibility (by preventing the formation of unwanted secondary structures within the tape sequence), selectivity (by enhancing the binding affinity of gRNAs to their targets within the tape while minimizing affinity for sequence-similar mismatches), precision (by control nucleotide composition to minimize unintended local deamination events), and compatibility with BioBrick standards.

After obtaining sequences for our recording tapes, we proceeded to evaluate the switching states from state zero to one. Here, state zero represents a condition where none of the adenosines in START codons are deaminated, allowing translation of proteins in the first reading frame. State one occurs after the deamination of the first adenosine in the tape and is characterized by the disruption of the first START codon. This leads to a switch of the open reading frame by one. To preliminarily validate these switches in state, we designed a NanoLuciferase assay where the NanoLuc protein sequence was positioned in the second open reading frame.

Following preliminary experiments with NanoLuciferase, we proceeded to validate our primary readout system involving fluorescent proteins (XFPs). We first designed composite parts tailored to visualize tape state switching effectively, visualized via a switch in fluorescence protein expression, with eUnaG (a fluorescent protein that can be translated in all reading frames) as a protein expression control. Additionally, in this experimental setup we extended our design to test multiple state switching events, with the addition of all gRNAs guiding the dPspCas13b/ADAR2DD system to perform three sequential edits.

We manually replaced the missing nucleotides in our design with G, C, or U, ensuring that the nucleotide combinations between the START codons are as distinct as possible. Adenosines are excluded to minimize the inherent off-target effects of ADAR.

To automate the design process of recording tapes, we built a tape construction algorithm that systematically generates all possible tape sequences from a minimal consensus sequence using a brute-force approach and evaluates them based on their associated gRNAs (you can learn more about the algorithm on our Model page).

We cloned the chosen recording tapes in state zero upstream of the NanoLuc CDS (coding sequence) (BBa_K5102058). In this configuration, NanoLuc could only be expressed if the first adenosine in the START codon of the tape gets deaminated, disrupting the original START codon and causing a shift in reading frames. This shift would allow translation to proceed from the second open reading frame, validating the switch in states. Tapes were cloned into a pcDNA_Zeo_NLuc vector via Gibson assembly.

The assembly of the composite parts used four gBlocks: recording tape, T2A_miRFP670nano3_P2A_eUnaG, T2A_mScarlet-3_P2A_eUnaG, T2A_mTagBFP2_P2A_eUnaG, and our previously-designed minimal vector, pRAM (learn more about this plasmid in our Parts page or (Registry)Registry). DNA assembly was performed via Golden Gate assembly, using BsmBI-v2.

To evaluate our gRNAs, we first ran a BLAST search of our designs against the human genome. After checking for off-target effects, we analyzed the secondary structures formed by the gRNAs. For that, we tested the folding of over 30 potential tape designs using the publicly available ViennaRNA software.

To verify the output from the tape construction algorithm, we manually checked the top hits to ensure that stretches of C, G and U nucleotides in different regions of the tape were sufficiently distinct from one another. We also checked for a correct folding of gRNAs that maintains the necessary hairpin structure for the binding of dPspCas13b. Additionally, we conducted a BLAST search of our designed gRNAs against the human genome to check for potential off-target effects, with significantly better results than previously.

Successful cloning of the state zero tapes was confirmed via Sanger and whole plasmid sequencing. Tape switching was tested with the Nano-Glo® Luciferase Assay System (Promega, N110), which uses a reporter system that quantitatively detects the presence of NanoLuciferase, allowing us to assess whether the state switching has occurred as intended.

Successful cloning was confirmed via whole plasmid sequencing. Tape switching was tested via imaging of transfected HEK293T cells in a CellInsight CX7 Imager at specific time points (24h, 36h, 48h, 60h, 72h).

1. Iteration

Expression of fluorescent proteins

DESIGN

We learned that the proteins with similar excitation and emission wavelength can be imaged without cross-excitation on our CX7 imager. Therefore, we concluded that our selected fluorescent proteins should also perform well.

We found that it is not feasible to optimize the sequences to avoid STOP codons in all forward open reading frames. Specifically, combinations of methionine and lysine, as well as methionine and asparagine, cannot be optimized this way. Consequently, we had to mutate mTagBFP2 at one amino acid position and mScarlet-3 at two amino acid positions.

Alignment of mutated and wt proteins generated a RMSD score of 0.4 Å for mTagBFP2, 0.7 Å for mScarlet3 and 1.2 Å for miRFP670nano3. All scores are within acceptable ranges, and we concluded that the fluoroform of the mutated proteins have a high likelihood to fold correctly and proteins should be functional.

We learned that our codon optimization tool is comparable to the commercially available software. While the commercial software efficiently optimized codons for a single reading frame, it lacked the capability to handle multi-frame scenarios. Our tool successfully optimized all three forward reading frames simultaneously, addressing global concerns like GC content, restriction sites, and splice site exclusion. This comparison highlighted the necessity of our custom tool for ensuring balanced optimization across frames, which the commercial software was unable to achieve.

We discovered that, although the algorithm effectively excluded an extensive database of canonical splice sites, this step alone was insufficient. A final screening step is crucial to ensure the removal of non-canonical splice sites from the output sequences.

XFPs are being expressed in HEK293T cells and can be imaged without cross-excitations.

We learned that while colony PCR and agarose gel electrophoresis provide initial insight into whether cloning was successful, whole plasmid sequencing is crucial to confirm the integrity of the entire construct. Point mutations or small insertions/deletions cannot be verified via cPCR, while they are important for the functionality of our readout system.

See results page

Each recorded state of the tape is visualized by expression of one of three fluorescent proteins. We selected three fluorescent proteins — miRFP670nano3, mScarlet-3, and mTagBFP2 — based on criteria such as emission and excitation wavelengths, maturation time, and fluorescence intensity. We opted to choose three fluorescent proteins that have distinct excitation and emission wavelengths, so that the three recording states can be exactly differentiated. Furthermore, we also had to avoid the excitation and emition specter of eUnaG, a fluorescent protein expressed in every state for RNA-trancription referance. We also opted for proteins with a similar fluorescence intensity to have comparable measurements of the states. However, for preliminary analysis, we decided to test imaging of XFPs with similar excitation/emission wavelengths that were available to us in the lab: miRFP670nano3, mScarlet-I, mTagBFP2.

After selecting the XFPs for our project, we performed in silico design of our reporter composite part, which will ultimately be positioned downstream of our designed recording tapes.

The decision on which amino acids to mutate in order to address STOP codons in one of the forward frames was based on amino acids structure similarities, literature research and alignment of proteins within the same family. Lysine in the 16th position of mTagBFP2 has been mutated to arginine based on biochemical similarity of those amino acids (K16R). M22V mutation in mScarlet-3 has been performed based on homology to mRed7, whereas N98V based on structural similarity of amino acids.

We then need to codon optimize our proteins in all three forward open reading frames for expression in HEK293T cells. While codon optimization tools are easily accessible, none of them allow for optimization in more than one frame. Therefore, we needed to create our own tool tailored to the project. This task required extensive research on the mechanisms of translation efficiency, codon usage bias, and mRNA stability specific to HEK293T cells. We had to ensure that the tool could optimize codon usage across all three reading frames without altering the amino acid sequence, while also eliminating problematic motifs such as premature stop codons, splice sites, and restriction sites. Additionally, controlling factors like GC content and preventing secondary structures that could trigger mRNA degradation pathways were critical to the success of the project.

During one of our iHP (Integrated Human Practices) meetings (learn more about the expert meetings on our HP wiki page), we realized that while our code effectively excludes canonical splice sites, non-canonical splice sites are more challenging to screen for.

The next stage of our engineering cycle was to design cloning of fluorescent proteins into our minimal synthetic vector, pRAM. Each composite included a T2A site, one of the three XFPs of miRFP670nano3, mScarlet-3, mTagBFP2, P2A site and eUnaG.

After successful imaging of single XFPs, we proceeded with the design of the final composite parts that include the recording tape-T2A-miRFP670nano3-P2A-eUnaG-T2A-mScarlet3-P2A-eUnaG-T2A-mTagBFP2-P2A-eUnaG.

After successful cloning, we moved on to validate the functionality of our readout system, which is dependent on the recorded state of the tape. We tested the ability to switch between one or more states by introducing all gRNAs designed to guide dPspCas13b/ADAR2DD for sequential RNA edits.

While testing our readout system using tapes in state zero was crucial, we wanted to go one step further in testing the complete recording system. We identified the need for additional controls with tapes that had already been modified once (state 1) or twice (state 2).

The selected fluorescent proteins (miRFP670nano3, mScarlet-I, mTagBFP2) were cloned individually into pcDNA_Zeo Mammalian Expression Vector.

The constructed composite part consists of the following sequence: T2A site-miRFP670nano3-P2A site-eUnaG-T2A-mScarlet3-P2A-eUnaG-T2A-mTagBFP2-P2A-eUnaG.

Applying AlphaFold3 we predicted the 3D structure-models of the mutated versions of proteins.

The code was built on the basis of a genetic algorithm. The implemented functions included frame-dependent fitness functions: codon adaptation index, frequency of optimal codons, mean of typical decoding rates, STOP codon exclusion, splice site exclusion, as well as global fitness functions: GC content, STOP codon exclusion, restriction site exclusion, repeat exclusion, splice site exclusion, sum of squared residuals. (See more in our Model page).

To address this, we incorporated a final manual quality control step. We used SpliceAI-based detection to screen for non-canonical splice sites, replacing any detected splice sites with synonymous codons. The code was written in Python and designed to run on Google Colab for user-friendly use.

The constructs were cloned via Gibson assembly, and then individually transfected into HEK 293T cells.

All the components of the plasmid were assembled together via Golden Gate cloning with BsmBI-v2 Type IIS restriction enzyme.

HEK293T cells were transfected with plasmids containing the recording tape and XFPs, along with a plasmid encoding the REPAIR v2 editor and different combinations of gRNAs. Controls included conditions to assess potential off-target effects of the editor and ribosome slippage (detailed on the Results wiki).

We performed Golden Gate assembly to clone plasmids containing the fluorescent proteins and recording tapes in states one and two. After cloning, the plasmids were transfected into HEK293T cells, following the appropriate experimental setup (detailed on our Results page).

Successful cloning was confirmed via whole plasmid sequencing. To evaluate the cross-excitation of the selected XFPs and assess the imaging quality of the proteins in our laboratory, we imaged the transfected HEK293T cells using the CX7 imager.

Following the construction of the composite part, we tested the potential for optimizing the protein sequences to eliminate STOP codons in all three forward open reading frames.

Models generated by Alpha Fold predictions were aligned to wild type proteins, also folded by Alpha Fold. Alignment was performed in Pymol. RMSD

To evaluate our tool we compared the output to a commercially available software for codon optimization in one reading frame and analyzed the differences in performance.

We tested the occurrence of splice sites in the top hits output, using a splicing probability cut-off score of 0.3. Any sites with a score higher than 0.3 were manually corrected by replacing the detected splice sites with synonymous codons.

Successful cloning has been confirmed via Sanger sequencing. Fluorescent proteins were imaged individually with the CellInsight CX7 imager.

The initial validation of the cloning process was performed via colony PCR, followed by agarose gel electrophoresis. Colonies showing DNA bands of the expected size were inoculated for overnight culture, and DNA was then miniprepped. The final confirmation of successful cloning was achieved through whole plasmid sequencing.

Transfected HEK293T cells were imaged at multiple time points (24h, 36h, 48h, 60h, 72h) to monitor XFP expression and analyze dynamic changes over time.

Cloning was verified through colony PCR and confirmed by whole plasmid sequencing.

1. Iteration

RNA Expression and Stability

DESIGN

The integration of stability-enhancing elements into our recording construct was successful, as verified through sequencing.

We found that 50 bp UTRs provided the best balance between enhancing translation efficiency and minimizing unfavorable secondary structures compared to longer or shorter UTRs.

Successful cloning was achieved, and subsequent work can continue.

Through building and testing our mathematical model, we gained valuable insights into the dynamics of RNA transcription and degradation within cells. Key learnings include: - Model Validation: The successful recovery of true parameters from synthetic data using MLE confirmed the effectiveness of our parameter estimation method.- Parameter Sensitivity: Understanding parameter sensitivity helps in identifying critical factors that influence RNA stability and expression levels. - Practical Implications: The insights from the model inform strategies to optimize our RNA construct for enhanced stability and expression. By manipulating parameters such as transcription rates and degradation rates, we can predict and design constructs with desired expression profiles. - Future Directions: Incorporating experimental data from qPCR analyses will allow us to refine the model parameters further, increasing the model’s predictive power. Extending the model to include additional biological factors, such as RNA-binding proteins or microRNAs, could provide a more comprehensive understanding of RNA dynamics.

During the development of our project, we explored several strategies to improve the stability of our RNA recording tape. These concerns were brought into discussion during several of our research sessions, as well as mentioned in our meetings with experts, and raised in specific and more general (feasibility) terms on our survey on molecular recording. More details on all these topics can be found in our STIR protocols and overall Human Practices page.

In addition to increasing the stability of our RNA construct, we aimed to enhance its expression. During an iHP meeting with Prof. Danny Nedialkova, it was suggested that incorporating a strong 5’ UTR could help achieve this. While exploring the idea, we came across a study demonstrating the use of synthetic 5’ UTRs to improve mRNA translation for gene editing delivered via mRNA.

After verifying the synthetic UTRs, we selected the highest-scoring UTR for cloning into our construct, which includes the recording tape and reporter system with XFPs. Cloning primers were designed to ensure that the synthetic UTR is placed on primer overhangs, allowing the primers to overlap and facilitate a Gibson assembly.

After integrating the necessary elements to enhance the expression and stability of our RNA construct in the plasmid, we aimed to test its performance within a cellular environment. To achieve this, we developed a predictive mathematical model to simulate RNA transcription and degradation dynamics in living cells. This model allows us to understand how our construct behaves over time and to optimize its performance accordingly.

Based on our research, we decided to incorporate the Woodchuck Hepatitis Virus (WHV) Posttranscriptional Regulatory Element (WPRE) and the beta-globin 3’UTR into our construct, as both elements are known to enhance transcript stability.

To design the synthetic UTRs, we utilized the Deep Learning Model developed by Castillo-Hair et al. (2024), which optimizes 5’ UTRs for efficient mRNA translation using generative neural networks and gradient descent (refer to the Model page for more details). The model, trained on polysome profiling data from randomized 5’ UTR libraries across multiple cell types, generated UTR sequences aimed at maximizing translation efficiency, accounting for Kozak sequences, upstream ORFs, and RNA secondary structures.

The cloning of the 5’ synthetic UTR into our tape plasmids was accomplished through Gibson Assembly.

In the 'build' phase, we developed a mathematical model to predict RNA concentration dynamics in a cell using a first-order ordinary differential equation (ODE). The transcription rate is captured with a double-sigmoid function, and the degradation rate is either constant or varies exponentially. We implemented the model in Python, solving the ODE using `solve_ivp` and estimated parameters with Maximum Likelihood Estimation (MLE).

Cloning was confirmed by whole plasmid sequencing.

We validated the optimized UTRs by calculating the mean ribosome load (MRL) as an indicator of translation efficiency. We also assessed the minimum free energy (MFE) of the UTRs using RNA folding tools to ensure minimal secondary structures that could interfere with ribosome scanning.

Cloning was verified via Sanger sequencing.

To validate and test our model, we generated synthetic data simulating RNA concentration over time, adding normally distributed noise to mimic experimental variability. This synthetic dataset serves as a proxy for experimental observations, allowing us to test the parameter estimation process in a controlled setting. We then applied the MLE approach to estimate the model parameters from this synthetic data. The optimization process adjusts the parameters to minimize the negative log-likelihood, effectively finding the parameter set that makes the observed data most probable under the model. Additionally, we plan to conduct quantitative PCR (qPCR) analysis to evaluate the behavior of our construct in HEK293T cells. This experimental data will provide real-world observations to further refine our model.

1. Iteration

References

Abudayyeh, O. O., Gootenberg, J. S., Franklin, B., Koob, J., Kellner, M. J., Ladha, A., Joung, J., Kirchgatterer, P., Cox, D. B. T., & Zhang, F. (2019). A cytosine deaminase for programmable single-base RNA editing. Science, 365(6451), 382–386. https://doi.org/10.1126/science.aax7063

Castillo-Hair, S., Fedak, S., Wang, B., Linder, J., Havens, K., Certo, M., & Seelig, G. (2024). Optimizing 5’UTRs for mRNA-delivered gene editing using deep learning. Nature Communications, 15(1), 5284. https://doi.org/10.1038/s41467-024-49508-2

Cox, D. B. T., Gootenberg, J. S., Abudayyeh, O. O., Franklin, B., Kellner, M. J., Joung, J., & Zhang, F. (2017). RNA editing with CRISPR-Cas13. Science, 358(6366), 1019–1027. https://doi.org/10.1126/science.aaq0180