Model | UTokyo - iGEM 2024

Overview

Introduction

Mathematical modeling is the representation of complex phenomena and processes in mathematical equations. Mathematical modeling is a very attractive tool in the interpretation and prediction of experimental results. In the DBTL cycle (Design, Build, Test, Learn), appropriate mathematical modeling can optimize Design, reduce the burden of Build and Test, deepen the knowledge in the Learn phase, and make this cycle more efficient.

"If all you have is a hammer, everything looks like a nail."

We were told the above saying on Human Practices with Dr. Sakuraba. This is the saying that describes how one's own preconceptions distort one's thinking and cause one to miss the essence of the issue at hand. The choice of which mathematical model to use in the Dry Lab is very important. If we stick to one method when making a choice, we may lose our objective. We have taken this saying that Dr. Sakuraba told us to keep in mind and carefully selected the best method for our purpose.

Model in Our Project

In our project, POIROT, we have developed a glaucoma detection device using miRNA biomarker in tear fluid. In the system we developed, the biomarkers are amplified as dsDNA through an isothermal amplification, and the amplified dsDNA is detected by lateral flow to determine positive or negative.

Modeling reduces experiments and accelerates the engineering cycle, and aids in the design as well as the further advancement of projects. Figure 1 outlines the role of the Model in our project. From here, we describe how our models have progressed along with the development of our project.

miRNA Selection

Bioinformatics analysis is often used for biomarker selection, using data from papers and databases. We also wanted to contribute to biomarker selection from the Dry Lab side. In the course of our literature review, we found that the research on biomarkers in tear fluid related to glaucoma had not progressed very far. Therefore, we asked Prof. Ochiya, who specializes in biomarker-based disease detection, if we could do a meta-analysis integrating data from a few papers to examine appropriate biomarkers. During Human Practices with Prof. Ochiya, he said that the current research on tear fluid biomarkers is a mixed bag and that it is technically very difficult to perform a meta-analysis and that it is not reliable. Therefore, Dry Lab decided not to use meta-analysis to select biomarkers. Instead, we selected miRNAs biomarkers for glaucoma from a highly reliable paper introduced to us by Prof. Ochiya ¹.

In considering what the Dry Lab could contribute, we focused on the sequence specificity needed for the amplification system. It is predicted that there are approximately 1600 different types of miRNAs in tear fluid ². Furthermore, miRNAs have subsets, and miRNAs belonging to the same subset have very similar sequences.

As described above, the sequence specificity of the amplification and detection system is very important when detecting miRNAs, which are expected to have many similar sequences. We decided to evaluate the sequence specificity needed for our amplification and detection system by detecting for the presence of similar sequences of the miRNAs we selected as biomarkers in the tear fluid. Our research revealed that miRNAs with sequences very similar to the miRNAs we selected as biomarkers are present in tear fluid.

These results indicate that Exponential amplification reaction (EXPAR), which was intended to be used as the amplification system, has insufficient sequence specificity. Dry Lab suggested that a more sequence-specific amplification method is necessary to ensure that only the target biomarker is amplified.

For more information, see:

TWJ Specificity

The suggestion in miRNA Selection showed that the amplification system requires an amplification method with high sequence specificity. Through our literature review, we chose Three-Way-Junction SDA (TWJ-SDA) as the amplification method with high sequence specificity. In the course of our research on TWJ-SDA, we realized that there is no solid theory as to why TWJ-SDA has high sequence specificity. Therefore, we theoretically identify the cause of the sequence specificity of TWJ-SDA.

We initially thought that Molecular Dynamics simulation (MD simulation) could be used to identify the cause of the sequence specificity. Therefore, we asked Dr. Sakuraba, who specialized in MD simulation. As a result of Human Practices with Dr. Sakuraba, we found that it is very difficult to identify the cause of the sequence specificity in MD simulation. We decided to use the ODE Model recommended by Dr. Sakuraba. The results of the analysis using the ODE Model revealed the reason why the amplification of TWJ-SDA is sequence-specific. TWJ-SDA keeps the association constant low by making the complex formation a multistep process. This is the reason for high sequence specificity. In the process of investigating the cause of the sequence specificity, we learned about the constraints of TWJ-SDA template and helper to have the sequence specificity.

For more information, see:

Sequence Design

The TWJ Specificity section has helped us to understand the constraints of TWJ-SDA template and helper to have the sequence specificity. However, these constraints are not the only ones that are important when designing the template and helper. Another important factor in sequence design is the Signal to Noise ratio (S/N ratio), which is the ratio of amplified products in the presence and absence of the target miRNA. The reason why this S/N ratio is important is that in TWJ-SDA, if the template and helper are not designed well, the reaction of template and helper alone will produce the amplified product even if the target miRNA is not present. We focused on the S/N ratio and decided to seek conditions that would increase the S/N ratio.

While the focus of the TWJ Specificity section was the sequence specificity of TWJ-SDA, the S/N ratio is the focus of this section. We built an ODE model different from the one used in the TWJ Specificity section and tried to predict the S/N ratio for various conditions. Through comparison with experimental results and improvement of the model, we were able to create the model that predicted the S/N ratio in the experimental results to some extent.

We have integrated the predictive model for S/N ratio with the constraints for being sequence specific identified in the TWJ Specificity section. The integrated model allows us to design the template and helper that will enable TWJ-SDA with sequence specificity and high S/N ratio.

For more information, see:

Amplification Comparison

From the miRNA Selection section, it was found that tear fluid can contain miRNAs that have similar sequences of the biomarker miRNAs. In order to distinguish these similar sequences, TWJ-SDA should be used. Then, is TWJ-SDA alone sufficient for amplification? If not enough, what amplification methods would be appropriate to add to TWJ-SDA? The goal of this section is to suggest from the model what amplification methods would be best to use in the amplification system.

First, the simulation results showed that TWJ-SDA alone was not sufficient for amplification. Therefore, we had to select further amplification methods to use after TWJ-SDA. Based on the conditions required for the amplification system, literature review, ³and Human Practices, we narrowed down the candidate amplification methods after TWJ-SDA to three : Exponential amplification reaction (EXPAR), 2step-SDA, and 3step-SDA.

In order to select the best one among these amplification methods, we developed an ODE Model for each amplification method. We compared the amplification efficiency, Positive-to-Negative Ratio, and robustness to time and enzyme activity of the three options using the ODE Model. As a result, we proposed 3step-SDA as the amplification method after TWJ-SDA. We not only contributed to the selection of the amplification reaction, but also provided support for Wet Lab by predicting the amount of reactants that would increase the amplification efficiency and robustness.

For more information, see:

CRISPR-Cas 3/12a MD Simulation

We decided to use CRISPR-Cas as a system to detect amplified dsDNA. There are several options for how many steps of SDA to connect to CRISPR-Cas in the signal amplification process. Although we suggested the appropriate amplification method in the Amplification Comparison section, as an alternative approach, we attempted to examine the most desirable number of n step-SDA steps using MD simulation. Based on the assumption that the higher the stability of the protein, crRNA, and DNA complexes, the better the detection ability, we evaluated the stability of each complex. As a result of MD simulation, we reached the conclusion that 3step-SDA was the most desirable amplification method after TWJ-SDA.The results in the Amplification Comparison section indicate that 3step-SDA is also expected to be superior in terms of amplification efficiency and robustness. Therefore, Dry Lab could confidently propose that 3step-SDA is the best amplification method after TWJ-SDA.

For more information, see

For the Future of Our System

"The model takes our project to the next level !"

POIROT was created for the detection of glaucoma. However, the system is highly flexible and can be applied to miRNA-based detection of a variety of diseases, not just glaucoma. Dry Lab focused on its applicability and wanted to help people to use our system with a minimum of required steps.

Similarity Sequence Search Software for miRNA

First, we developed software using the miRNA similarity search algorithm created in the miRNA Selection section. The software is designed to search not only for miRNAs in tear fluid, but also for similar sequences of miRNAs in blood plasma and leukocyte.

For more information, see:

Sequence Design Software for TWJ-SDA

Using the integrated model created in the Sequence Design section, we have also created software that makes it easy to design templates and helpers that will enable TWJ-SDA with sequence specificity and high S/N ratio. This software makes it easy to know the optimal template and helper without conducting experiments. One of the hurdles in applying POIROT's system is the design of templates and helpers to be used in TWJ-SDA. Our software facilitates the application of POIROT's system to other disease biomarker miRNAs.

For more information, see:

Software derived from models will allow us to use our system easily once you select miRNA biomarkers. These software will increase the applicability of our system to other diseases.

Conclusion

Our Models accompanied and accelerated the progress of the project. In particular, the Models contributed to the selection of the amplification system, which is the most important part of our system. In addition, it was able to suggest the appropriate amount of reactants in the amplification system. In addition, the model has become the foundation of the software and has strongly contributed to expanding the applicability of our system!

Contribution

We have created a manual explaining how to use NUPACK Analysis, which we used for modeling, and a manual explaining how to perform MD simulation.
for more information, see

We have summarized the formulation methods necessary for modeling the Lateral Flow Assay, although we did not perform simulations.

LFA

Lateral Flow Assay Modeling as Proposal of a Versatile Method for Simulation Preparation.

Lateral Flow Assay (LFA) is a user-friendly disease testing tool, making it suitable as the quantitative component of a device designed for home-based glaucoma detection. This study introduces a comprehensive model for the Lateral Flow Assay (LFA) that focuses on time-dependent flow rates driven by capillary action, enhancing conventional approaches that combine the advection-diffusion equation with reaction equations. While traditional models often assume constant flow rates, leading to significant inaccuracies in reaction results on the assay strip, our approach effectively addresses these limitations. This enhancement provides a more accurate representation of reaction dynamics and underscores the necessity for improved modeling techniques in LFA. Additionally, we propose a discretization approach for simulating the model, employing appropriate approximation methods for the advection and diffusion terms. Future research directions include optimization of LFA system conditions for improved interpretation of positive and negative results.

For more information, see

Source Code

The code used in the Model simulation can be obtained at the following link: GitLab

References

Cho, H.-k., Seong, H., Kee, C., Song, D. H., Kim, S. J., Seo, S. W., & Kang, S. S. (2022). MicroRNA profiles in aqueous humor between pseudoexfoliation glaucoma and normal tension glaucoma patients in a Korean population. Sci Rep, 12(1), 6217. https://doi.org/10.1038/s41598-022-09572-4
Tanaka, Y., Tsuda, S., Kunikata, H. et al. (2014). Profiles of Extracellular MiRNAs in the Aqueous Humor of Glaucoma Patients Assessed with a Microarray System. Sci Rep, 4, 5089. https://doi.org/10.1038/srep05089
Komiya, K., Noda, C., & Zamamura, M. (2024). Characterization of cascaded DNA generation reaction for amplifying DNA signal. New Generation Computing, 1-16. https://doi.org/10.1007/s00354-024-00249-2

MiRNA Selection

Abstract

Tear fluids contain various types of miRNA. Therefore, to quantify only the selected miRNA as a biomarker for disease, the sequence specificity of the amplification system is crucial. In the Dry Lab, we discovered multiple sequences similar to the miRNA chosen as the target in the POIROT, using the database obtained from a comprehensive search for miRNAs predicted to be present in tear fluids. It was found that identifying these as the target using SDA would be difficult, and it was favored to adopt a highly specific TWJ system for the amplification system.

Introduction

Sequence specificity is one of the most significant factors in the miRNA amplification and quantification system. In amplification systems with low specificity, amplification reactions can proceed due to not only the target miRNA but also miRNAs with similar sequence, resulting in output signals that do not accurately reflect the concentration of the target miRNA in the sample. Since numerous miRNAs are present in samples derived from humans, it is essential to avoid amplification reactions triggered by non-target miRNAs in order to accurately quantify the miRNA selected as a biomarker. In fact, it has been confirmed by the Wet Lab that when SDA without a TWJ complex is used, amplifications do occur even with miRNAs that have only a few differences in the bases compared to the target miRNA (refer to Results).

On the other hand, the required level of sequence specificity should be determined considering the miRNAs contained in the sample. If miRNAs with similar sequence to the sequence of target miRNA were not contained in the sample, the required level of sequence specificity would be lower. When using complex reactions involving many more types of molecules, such as TWJ, which is known for its high sequence specificity, there is a possibility of drawbacks such as reduced amplification efficiency. Hence, it cannot be conclusively stated that a system with higher specificity is always better. It is important to ensure reliability by designing a system that can accurately quantify the biomarker even under the presence of miRNAs that can potentially exist in the sample and interfere with the system.

With this in mind, the Dry Lab aimed to contribute to the Wet Lab and the project by using data from previous studies to search for similar sequences among the miRNAs that may be present in the sample.

Methods

There are few studies that have comprehensively explored miRNAs in tear fluids, and there was no appropriate data available for searching similar sequences of the miRNAs selected as biomarkers. Therefore, we considered a Meta-analysis, which integrates and analyzes multiple research reports. However, Human Practices to Professor Ochiya, an expert in disease-related miRNAs, has proven that it is quite challenging to extract raw data from different studies, align the data formats, and reanalyze them. It was also revealed that a Meta analysis on miRNAs in tear fluids, an area that is still not researched enough yet, would be unable to ensure reliability. Although research using a Meta-analysis to identify disease-specific biomarker miRNAs is actively conducted, no established method exists yet, and it would be difficult to develop a highly reliable method within the one-year duration of the project. Target miRNAs identified by unreliable methods would be nothing less than abdicating the responsibility as a scientist. It was concluded that it would be more appropriate to refer to the data from a single, reliable paper that has comprehensively explored miRNAs. We referred to data from the study 1 introduced by Professor Ochiya, which conducted a comprehensive exploration of miRNAs in aqueous humor, and we researched the presence of similar sequences in the sample.

The study 1 reported the detection of 1623 types of miRNAs in aqueous humor. However, the data presented in the study does not include the specific sequences of each miRNA, making it impossible to directly search for sequences similar to the target. Therefore, a database containing 1493 miRNAs was created by referencing miRBase Release 22.1 ², which includes numerous reported miRNA sequences, and extracting the human miRNAs with the same names as those included in the data of the study along with their sequences.

General similarity search software, such as BLAST, aims to evaluate the homology between sequences and uses its own scoring system to assess the statistical significance of sequence occurrence frequency ³. While such software is very useful for searching similar sequences, it does not take the interactions between nucleotides during evaluation into consideration. Therefore, it may not accurately assess the concerns of this project regarding the binding of miRNAs similar to the target and templates, and the potential for unintended amplification reactions. Consequently, a program was developed to search for similar sequences using simpler and more intuitive scoring metrics, such as the percentage of matching nucleotides and the number of mismatches. Although these metrics may not accurately evaluate the interactions between miRNAs with similar sequences and templates, they are easier for users to understand. This makes it easier to set search conditions according to the expected performance of the amplification system and helps narrow down similar sequences that should be considered in the selection of target miRNAs and system design, as well as sequences that should be verified through wet experiments or simulations.

In this program, all miRNAs from the database that have a similarity score above a certain threshold when compared to a given miRNA sequence are output. The algorithm is as follows:

Input the following three parameters: "The sequence of the miRNA for which similar sequences are to be searched", "The minimum length of the subsequence to be compared", "The minimum similarity score threshold for the output sequences".
Extract miRNAs from the database.
Compare the lengths of the input sequence and the extracted database sequences, and select an n such that (length of the shorter sequence) ≧ n ≧ (minimum subsequence length)
For each n, extract consecutive subsequences of n nucleotides from both the input sequence and the database sequences, and calculate the similarity (either the percentage of matching nucleotides or the number of mismatches).
Calculate the similarity for all possible n from step 3 and all possible subsequences from step 4, taking the maximum value as the similarity score between the two sequences.
Execute steps 2 to 5 for all sequences in the database, and display all miRNAs with similarity scores above the given threshold.

hoeghoge — *Figure 1.* Behavior of similar sequence search program.
The program extracts sequences from the database and evaluates their similarity to the target sequence by checking for base matches.

In this project, considering the concentration ratios of aqueous humor between glaucoma patients and the control group, hsa-miR-10b-5p, hsa-miR-375, and hsa-miR-30d-5p were selected as target miRNAs. Similarity searches for these three miRNAs were conducted using the aforementioned methods. Taking into account the results of specificity evaluation from wet experiments using the 22 nt miRNA hsa-let-7b and the let-7 family, the minimum subsequence length was set to 20, and similar sequences with a sequence match ratio of 80% or higher to the target miRNAs were extracted.

Results

hsa-miR-10b-5p

miRNAs that are similar in sequence to hsa-miR-10b-5p include hsa-miR-10a-5p, hsa-miR-100-5p, hsa-miR-99a-5p, and hsa-miR-99b-5p. miR-10a-5p consists of 23 nt, differing from miR-10b-5p at only the 12th nucleotide. miR-100-5p, miR-99a-5p, and miR-99b-5p consist of 22 nucleotides, with 4, 5, and 6 mismatches, respectively, when aligned with the 3' end of miR-10b-5p.

hsa-miR-375

As a result of the similarity sequence search, no miRNAs with sequences similar to hsa-miR-375 were found.

hsa-miR-30d-5p

miRNAs with sequences similar to hsa-miR-30d-5p were identified as hsa-miR-30a-5p and hsa-miR-30e-5p. All of these miRNAs are 22 nucleotides long, with hsa-miR-30a-5p having a mismatch of 1 nucleotide and hsa-miR-30e-5p having a mismatch of 2 nucleotides.

Conclusion and Future Prospects

As a result of the biomarker similarity sequence search for miRNAs expected to be present in tear fluid, similar sequences were found for miR-10b-5p and miR-30d-5p. Among these, there are miRNAs that differ by only 1 or 2 nucleotides from the biomarkers, making it challenging to distinguish them using simple SDA based on wet results. These similar miRNAs found in the aqueous humor are also thought to be present in tear fluid, potentially leading to false detections during quantitative analyses using tears. Therefore, it is supported that a more sequence-specific TWJ, rather than a simple SDA, should be employed in the amplification system, leading to improvements in the project.

The program created this time can search for similar sequences of miRNAs contained in various samples other than aqueous humor by changing the database being targeted. POIROT can be used for the quantification of any miRNA by modifying the template sequence. Therefore, when applied to samples other than tear fluid, it is expected to contribute to the appropriate selection of biomarkers, ensuring the specificity and quantitativeness of the system by searching for similar sequences of target miRNAs within a database that includes miRNAs present in the sample.

In the current search for similar sequences, a similarity evaluation metric focusing solely on the matches and mismatches of nucleotide sequences was used. However, it is also possible to evaluate the binding energy between miRNAs with similar sequences to the template and target, as well as assess the impact of these similar sequences on the amplification system through simulations using an ODE model. Since these evaluations take longer to compute compared to the similarity evaluation methods used in this analysis, extracting similar sequences from numerous miRNAs using this program enables efficient assessment.

The program and database used for the sequence search in this study have been incorporated into the software and are publicly available on GitLab. For more details, please visit: Software

References

Tanaka, Y., Tsuda, S., Kunikata, H. et al. (2014). Profiles of Extracellular MiRNAs in the Aqueous Humor of Glaucoma Patients Assessed with a Microarray System . Sci Rep 4, 5089. https://doi.org/10.1038/srep05089
Faculty of Biology, Medicine and Health, The University of Manchester. miRBase. https://www.mirbase.org
The National Center for Biotechnology Information. The Statistics of Sequence Similarity Scores. https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html

TWJ specificity

Abstract

We created a model in the Dry Lab to reproduce the amplification results when mutations were introduced into the target in NJ Amplification and TWJ Amplification. By analyzing the model, we found that TWJ Amplification has superior sequence specificity compared to NJ Amplification because, in TWJ Amplification, the binding process involves the formation of a TWJ complex through multiple steps, resulting in a lower binding constant. This finding suggests that adjusting the thermodynamic stability of the TWJ complex can improve specificity, allowing us to propose sequence designs with superior specificity from the Dry lab.

Understand why TWJ Amplification has sequence specificity

Purpose

In Non-Junction (NJ) Amplification, the amplification reaction proceeds even with miRNAs that differ by several bases, whereas in TWJ Amplification, it has been experimentally reported in prior studies that amplification does not occur even with miRNAs that differ by only a single base from the target miRNA, demonstrating high sequence specificity ¹. We aimed to enhance the sequence specificity in Three-Way Junction (TWJ) Amplification by carefully designing the template and Helper DNA. To achieve this, it is crucial to accurately understand the factors contributing to the high sequence specificity in TWJ Amplification. Although prior reports suggest that this specificity arises from thermodynamic factors within the TWJ structure, the understanding is not yet developed enough to inform precise sequence design. To address this, we attempted to construct a new model to better understand why TWJ Amplification exhibits higher sequence specificity than NJ Amplification. Just as developing a treatment requires an accurate identification of the disease's cause, designing templates or primers with higher specificity requires a precise understanding of the underlying factors that give TWJ Amplification its superior specificity.

Wet Results

In our Wet Lab, we demonstrated that TWJ Amplification shows higher specificity than NJ Amplification by comparing cases where the target is hsa-let-7b (let-7b) with cases where mutations are introduced in the sequence, similarly to previous studies. Our next step is to build a model that can replicate these experimental results.

Methods

Models for Complex Formation and Amplification

We created models (Model 0 and Model 1) for NJ Amplification and TWJ Amplification, where the target binds with the template and helper to form double and TWJ complexes, respectively. To make it simple, in Model 1, we assumed that the target, helper and template bind simultaneously to form the TWJ complexes.

We use Model 0 and 1 for the simulation of complex formation, and Model P for the strand elongation by polymerase after the complex formation. In other words, we simulate NJ Amplification and TWJ Amplification by replacing T→YT3 in Figure 4. with Model 0 and 1, respectively. Here, we assume that W and Z react with SYBR GREEN and that fluorescence value corresponds to X+Z. Although Figure 4. represents the case for NJ Amplification, note that the corresponding parts for W and Z will change in the case of TWJ Amplification.

The Model above can be written using Mass Action as the following system of Ordinary Differential Equations (ODEs).

Equation 1. ODE for NJ Amplification (Model 0 & Model P)

\[ \begin{equation} \begin{aligned} \frac{d[X]}{dt} &= -k_1 [X] [T] + \frac{k_1}{a_1} [XT] \notag\\ \frac{d[T]}{dt} &= -k_1 [X] [T] + \frac{k_1}{a_1} [XT] \notag\\ \frac{d[XT]}{dt} &= k_1 [X] [T] - \frac{k_1}{a_1} [XT] - \frac{k_{\text{cat}} P_0}{32 m_1 c} [XT] \notag\\ \frac{d[W]}{dt} &= \frac{k_{\text{cat}} P_0}{32 m_1 c} [XT] - k_2 [W] [N] + \frac{k_{\text{cat}} P_0}{23 m_2 c} [Z] \notag\\ \frac{d[Z]}{dt} &= -\frac{k_{\text{cat}} P_0}{23 m_2 c} [Z] + k_2 [W] [N] \notag\\ \frac{d[N]}{dt} &= -b [N] \notag\\ \end{aligned} \tag{1} \end{equation} \]

Equation 2. ODE for TWJ Amplification(Model 1 & Model P)

\[ \begin{equation} \begin{aligned} \frac{d[X]}{dt} &= -k_1 [X] [P] [T] + \frac{k_1}{a_1} [XPT] \notag\\ \frac{d[P]}{dt} &= -k_1 [X] [P] [T] + \frac{k_1}{a_1} [XPT] \notag\\ \frac{d[T]}{dt} &= -k_1 [X] [P] [T] + \frac{k_1}{a_1} [XPT] \notag\\ \frac{d[XPT]}{dt} &= k_1 [X] [P] [T] - \frac{k_1}{a_1} [XPT] - \frac{k_{\text{cat}} P_0}{32 m_1 c} [XPT] \notag\\ \frac{d[W]}{dt} &= \frac{k_{\text{cat}} P_0}{32 m_1 c} [XPT] - k_2 [W] [N] + \frac{k_{\text{cat}} P_0}{23 m_2 c} [Z] \notag\\ \frac{d[Z]}{dt} &= -\frac{k_{\text{cat}} P_0}{23 m_2 c} [Z] + k_2 [W] [N] \notag\\ \frac{d[N]}{dt} &= -b [N] \notag\\ \end{aligned} \tag{2} \end{equation} \]

Nucleic acid Sequences

The binding constants in Model 0 and 1 depend on the nucleotide sequences. By using the sequences actually used in the Wet Lab, more accurate binding constants can be determined.

\[ \begin{array}{|c|c|c|} \hline \textbf{Type} & \textbf{Name} & \textbf{Sequence(5' \, \text{to} \, 3')} \\ \hline \text{target (p.m.)} & \text{let-7b} & \text{UGAGGUAGUAGGUUGUGUGGUU} \\ \hline \text{target (m.m.)} & \text{mismatch-5} & \text{UGAGAUAGUAGGUUGUGUGGUU} \\ \hline \text{primer (helper)} & & \text{AACCACACAACCCCAAA} \\ \hline \text{template (NJ)} & & \text{AAGTGTGTGTGTCCTCGCTGAG}\\ &&\text{GAACCACACAACCTACTACCTCATTT} \\ \hline \text{template (TWJ)} & & \text{AAGTGTGTGTGTCCTCGCTGA}\\ &&\text{GGTTGTTTTGGAATACTACCTCATTT} \\ \hline \end{array} \]

Table 1. target and primer Sequence

First, we compare the template that is fully complementary to the target (let-7b) with a sequence that has a single base mutation at the 5th position from the 5' end of let-7b (mismatch-5). NJ Amplification cannot distinguish between let-7b and mismatch-5 while TWJ Amplification can differentiate between the two (Figure 14.). According to Figure 14., with NJ Amplification, let-7b and mismatch-5 are amplified similarly. However, with TWJ Amplification, let-7b is amplified more strongly than mismatch-5.

Parameters

We assume that parameters other than binding constant a are independent of the sequence. (Dry lab_Model_Amplification_comparison) From onwards, we will continue to use this for the constants of Model P. The binding constant a is determined by using the Nearest Neighbor (NN) model, by deriving the free energy of the binding from Equation 3. and then applying Equation 4. The NN model is a theory used to work out the stability of DNA duplexes. Besides the binding of each base pair in the sequence, we take \(\Delta H\) and \(\Delta S\), which change depending on the adjacent base pairs, into consideration, to compute \(\Delta G\) of the binding of the entire duplex. Since, the experimental temperature is 37 ℃, \(\Delta G\) of the entire sequence can be calculated as follows.

\begin{equation} \begin{aligned} \Delta G = \Delta H - T\Delta S~(T = 310.15 K) \end{aligned} \tag{3} \end{equation}

\(\Delta G\) is from Table2 ², and Table3 ³.

The conversion of \(\Delta G\) due to the Buffer concentration was based on ⁴.

The binding constant a can be written using \(\Delta G\) as follows.

\begin{equation} \begin{aligned} a = \exp\left(\frac{-\Delta G}{RT}\right) \end{aligned} \tag{4} \end{equation}

Based on Mathews, D. H., & Turner, D. H. (2002), when forming TWJ complexes, we need to add \(\Delta G^\circ_{37\text{MBL}} = 3.64\ (kcal/mol)\) to the total free energy⁵. Furthermore, according to Hooyberghs, J., Van Hummelen, P., & Carlon, E. (2009), the difference in the free energy between let-7b and mismatch-5 is \(\Delta\Delta G =\Delta G_{\text{m.m.}} - \Delta G_{\text{p.m.}}= 2.7\ (kcal/mol)\) ⁶.

Using the parameters above and determining the binding constants of Model 0 and 1 (\(a\) and \(a'\) respectively), we can obtain Table 2.

\[ \begin{array}{|c|c|c|} \hline \textbf{} & \textbf{let-7b} & \textbf{mismatch-5} \\ \hline \textbf{Model 0} & a=5.0 \times 10^{11} & a=3.8 \times 10^{9} \\ \hline \textbf{Model 1} & a'=7.97 \times 10^{5} & a'=6.11 \times 10^{3} \\ \hline \end{array} \]

Table 2. Binding constant for Model 0, Model 1

Results

We simulated the amplification of W+Z using Model 0 for the complex formation of NJ Amplification and Model 1 for the complex formation of TWJ Amplification. (Figure 5.)

Note that, with NJ Amplification, the graph for let-7b and mismatch-5 completely overlap, because there is almost no difference between the two. From Figure 5. we can see that Model 0 is sufficient to show the lack of specificity in NJ Amplification. Model 1 shows that TWJ Amplification cannot distinguish between let-7b and mismatch-5, and this contradicts the eXPerimental results shown in Figure 1. This suggests that Model 1 is insufficient to show the characteristics of TWJ Amplification. Therefore, we decided to develop a new model.

New Methods

We could not reproduce the fact that TWJ Amplification can distinguish between let-7b and mismatch-5 in Model 1. In this section, we devise a new model that can differentiate the two.

New Complex Formation Models

The previous Model 1 was a model where 3 nucleic acids bind at once. Based on Dr. Sakuraba's advice from Human Practices, we created Model 2, where a TWJ complex is formed through a two-step binding process. Note that, since the binding is multi-step, there are multiple pathways to form a TWJ complex. Here, we ignored the reaction in which the helper and the template bind, in this model. This is because it can be predicted that when the helper and template bind, the polymerase elongation reaction proceeds even without the target, and we assume that this is the reason why the graph of fluorescence signal rises for NC. We ignored the pathway where the helper and template bind, because there is no need to take the rise of the graph for NC into account when we want to compare the specificity. Here, we call the pathways \(X→XT→XPT, X→XP→XPT\) as \(1→4, 2→3\), respectively. (Hereafter, we will refer to \(X→XT, XT→XPT, X→XP, XP→XPT\) as reactions \(1, 2, 3, 4\), respectively.)

As before, we will replace Model 2 by Y→YT3 in Model P. In this case, ODE can be written as Equation 5.

\begin{equation} \begin{aligned} \frac{d[X]}{dt} &= -k_1 [X] [T] + \frac{k_1}{a_1} [XT] - k_2 [X] [P] + \frac{k_2}{a_2} [XP] \notag\\ \frac{d[T]}{dt} &= -k_1 [X] [T] + \frac{k_1}{a_1} [XT] - k_3 [XP] [T] + \frac{k_3}{a_3} [XPT] \notag\\ \frac{d[P]}{dt} &= -k_2 [X] [P] + \frac{k_2}{a_2} [XP] - k_4 [XT] [P] + \frac{k_4}{a_4} [XPT] \notag\\ \frac{d[XT]}{dt} &= k_1 [X] [T] - \frac{k_1}{a_1} [XT] - k_4 [XT] [P] + \frac{k_4}{a_4} [XPT] \notag\\ \frac{d[XP]}{dt} &= k_2 [X] [P] - \frac{k_2}{a_2} [XP] - k_3 [XP] [T] + \frac{k_3}{a_3} [XPT] \notag\\ \frac{d[XPT]}{dt} &= k_3 [XP] [T] - \frac{k_3}{a_3} [XPT] + k_4 [XT] [P] - \frac{k_4}{a_4} [XPT] - \frac{k_{\text{cat}} P_0}{32 m_1 c} [XPT] \notag\\ \frac{d[W]}{dt} &= \frac{k_{\text{cat}} P_0}{32 m_1 c} [XPT] - k_5 [W] [N] + \frac{k_{\text{cat}} P_0}{23 m_2 c} [Z] \notag\\ \frac{d[Z]}{dt} &= -\frac{k_{\text{cat}} P_0}{23 m_2 c} [Z] + k_5 [W] [N] \notag\\ \frac{d[N]}{dt} &= -b [N] \notag\\ \end{aligned} \tag{5} \end{equation}

Equation 5. ODE for TWJ Amplification(Model 2 & Model P)

Parameters

Given that the differences in the reaction rate constants are small compared to the differences in binding constants due to the nucleic acid sequence variance, we assume \(k1=k2=k3=k4=11.4\) ⁷. The absolute values are based on the k1 obtained from the Fitting below. (Dry lab_Model_Amplification_comparison) The values of \(a_1, a_2, a_3, a_4\) are derived from the NN model and are presented in Table 3.

\[ \begin{array}{|c|c|c|} \hline \textbf{Model 2} & \textbf{let-7b} & \textbf{mismatch-5} \\ \hline a_1 & 3.06 & 2.35 \times 10^{-2} \\ \hline a_2 & 7.65 \times 10^{2} & 7.65 \times 10^{2} \\ \hline a_3 & 4.05 \times 10^{1} & 3.10 \times 10^{-1} \\ \hline a_4 & 3.81 \times 10^{3} & 3.81 \times 10^{3} \\ \hline \end{array} \]

Table 3. Binding constant for Model 2

As before, we will use mismatch-5 for the miRNA with a single base mutation. It is important to note that since the mutation is occurring at the fifth base from the 5' end of the target, only the binding constants \(a_1, a_3\) will show changes.

Results

The simulation result of the amplification of W+Z using Model 2 is shown below. (Figure 7.)

Figure 7. shows that by changing from Model 1 to Model 2, TWJ Amplification can distinguish between let-7b and mismatch-5, which matches the eXPerimental results obtained from the Wet Lab. What is crucial here is understanding why the change from Model 1 to Model 2 allows us to replicate the specificity of TWJ Amplification. By comprehending this, we can identify the underlying reasons for the specificity of TWJ Amplification.

Analysis

We developed two distinct models, Model 1 and Model 2, as representations of the TWJ complex in TWJ Amplification. However, based on the discussions thus far, we can conclude that Model 2 is the superior model. In this section, we will eXPlore the fundamental differences between Model 1 and Model 2.

Breaking Down Model2

As mentioned above, there are two differences between Model 1 and 2: the number of reaction pathways leading to complex formation and whether the complex is formed in a single step or two-step process. It is challenging to clarify which of these factors are significant in the differences in Model 1 and 2. By breaking down Model 2 to consolidate the two reaction pathways into one, we will isolate the difference between Model 1 and Model 2. This approach will allow us to identify the fundamental cause of the differences in specificity between the two models.

We simulated the amplification of W+Z using Model 2 (1→4) and Model 2 (2→3) as before.

From Figure 10., we can see that both Model 2 (1→4) and Model 2 (2→3) can distinguish between let-7b and mismatch-5. This indicates that the higher specificity of Model 2 compared to Model 1 is more significantly attributed to the two-step binding process rather than the existence of two reaction pathways.

Analysis on the Broken Down Model 2

In the previous section, we confirmed that both Model 2 (1→4) and Model 2 (2→3) can distinguish between let-7b and mismatch-5 when the reaction pathways are consolidated into one. In this section, we will demonstrate that the high specificity of TWJ Amplification is important because the reaction is multi-step, which keeps the binding constants low by manipulating the binding constants in both Model 2 (1→4) and Model 2 (2→3).

Model 2 (1→4)

Let \(W + Z (\text{p.m.})\) denote the value of \(W + Z\) when the target sequence is perfectly complementary to the template, and let \(W + Z (\text{m.m.})\) denote the value of \(W + Z\) when the target sequence contains a mutation. As in the previous methods, we assume that the mutation occurs at the fifth nucleotide from the 5' end of the target. Using Model 2 (1→4), we varied \(a_1\) and \(a_4\) as \(a_1 = 10^{n_1}\) and \(a_4 = 10^{n_2}\text{, where} -2 \leq n_1, n_2 < 8\), and examined \(\frac{F'}{F} = \frac{W + Z (\text{m.m.})}{W + Z (\text{p.m.})}\) at \(t = 3000\) (Figure 11.).

When let-7b and mismatch-5 were used as targets, the values of \(a_1 and a_4\) were as listed in Table 3, and under these conditions, Model 2 (1→4) exhibited specificity. However, Figure 11. shows that Model 2 (1→4) loses specificity as the values of \(a_1 and a_4\) increase. It is important to note that specificity is more sensitive to changes in \(a_1 than a_4\). The high specificity observed in Model 2 (1→4) suggests that maintaining a small binding constant, especially \(a_1\), by making the reaction multistep is crucial.

While decreasing \(a_1\) makes the reverse reaction more likely, it is not immediately obvious why the ratio \(\frac{W + Z (\text{m.m.})}{W + Z (\text{p.m.})}\) decreases. Therefore, we will next investigate the reason for this.

The forward reaction, reverse reaction, and net flux for \(X+T→XT\) can be expressed as follows. From here, we will use analogous notation to represent the reaction flux.

Equation 6. Flux, \(X\_XT\_f, X\_XT\_b, X\_XT \)

\begin{equation} \begin{aligned} X\_XT\_f &= k_1 [X] [T] \notag\\ X\_XT\_b &= \frac{k_1}{a_1} [XT] \notag\\ X\_XT&=k_1 \left( [X] [T] - \frac{1}{a_1} [XT] \right) \notag\\ \end{aligned} \tag{6} \end{equation}

From Figure 11., we can see that decreasing \(a_1\) will reduce \(\frac{X\_XT(\text{m.m.})}{X\_XT(\text{p.m.})}\), resulting in the increase in the specificity. Note that when \(a_1=1e+7\), the two curves overlap almost completely.

\begin{equation} \begin{aligned} \frac{X\_XT(\text{m.m.})}{X\_XT(\text{p.m.})} = \frac{X\_XT\_f(\text{m.m.}) - X\_XT\_b(\text{m.m.})}{X\_XT\_f(\text{p.m.}) - X\_XT\_b(\text{p.m.})} \notag\ \end{aligned} \tag{7} \end{equation}

Regardless of the value of \(a_1, X_XT_f\) can be considered almost unchanged. On the other hand, as \(a_1\) decreases, \(X\_XT_b\) increases. Therefore, the value of Equation 7. increases, and specificity improves. This can be understood with simple arithmetic. For example, assume that \(X\_XT\_f=1\) regardless of the magnitude of \(a_1\) (since Eq. (7) focuses on the ratio, the absolute values are not of great importance). If the values of \(X\_XT\_b\) are \(0.001, 0.01, 0.05, 0.5\) when \(a_1=1e+7,\ 7.6e+4,\ 1e+3,\ 7.6\), respectively, the values of Equation 7. become \(0.99\) and \(0.52\). This demonstrates that increasing \(X\_XT\_b\) by decreasing \(a_1\) is crucial for enhancing specificity.

Model 2 (2→3)

In the previous section, we varied \(a_1 and a_4\) in Model 2 (1→4). In this section, we investigated Model 2 (2→3) by varying \(a_2\_ \text{and}\_ a_3 \_\text{as}\_ a_2 = 10^{n_1} \_ \text{and}\_ a_3 = 10^{n_2}\), where \(-2 \leq n_1, n_2 \leq 7\), and examined \(\frac{F'}{F} = \frac{W + Z (\text{m.m.})}{W + Z (\text{p.m.})}\) at \(t=3000\) (Figure 13.).

When let-7b and mismatch-5 were used as targets, the values of \(a_2\) and \(a_3\) were as listed in Table 3, and under these conditions, Model 2 (2→3) exhibited specificity. However, Figure 13. shows that Model 2 (2→3) loses specificity as the values of \(a_2\) and \(a_3\) increase. It is important to note that specificity is more sensitive to changes in \(a_3\) than \(a_2\). The high specificity observed in Model 2 (2→3) suggests that maintaining a small binding constant, especially \(a_3\), by making the reaction multistep is crucial.

By making \(a_3\) small, \(XP\_XPT\_b\) increases, and due to the same reason as the previous section,
\begin{equation} \begin{aligned} \frac{XP\_XPT(\text{m.m.})}{XP\_XPT(\text{p.m.})} = \frac{XP\_XPT\_f(\text{m.m.}) - XP\_XPT\_b(\text{m.m.})}{XP\_XPT\_f(\text{p.m.}) - XP\_XPT\_b(\text{p.m.})} \notag\ \end{aligned} \tag{8} \end{equation}

will become smaller. What is significant in the specificity in Model 2 (2→3) is that \(XP\_XPT\_b\) increases as \(a_3\) decreases.

Conclusions

By breaking down Model 2 and analyzing Model 2 (1→4) and Model 2 (2→3), each of which has a single reaction pathway, we were able to identify the essential reason for the high specificity of TWJ Amplification. TWJ Amplification keeps the binding constant low by making the complex formation a multistep process. Among the binding constants, it is important that the one that becomes smaller when there is a single-nucleotide mismatch in the target is kept low—\(a_1\) in Model 2 (1→4) and \(a_3\) in Model 2 (2→3). Furthermore, we found that the threshold for specificity is \(a_1, a_3 < 1e+3\). When \(a_1\) and \(a_3\) are below \(1e+3\), specificity is maintained regardless of the values of \(a_2\) and \(a_4\).

The above conclusions are based on the case where the mutation is located at the fifth nucleotide from the 5' end (mismatch-5), but when the mutation is located at the 15th nucleotide from the 5' end (e.g., mismatch-15), it is \(a_2\) and \(a_4\) that are important for specificity, not \(a_1\) and \(a_3\). Specifically, due to the symmetry in Model 2, \(a_2\) corresponds to \(a_1\), and \(a_4\) corresponds to \(a_3\). The threshold values can also be determined through a similar discussion as for mismatch-5.

The Difference in the Specificity due to the Mutation Position

Up to this point, we have only considered sequences with mutations located at the 5th base position from the 5' end of the target (mismatch-5). As shown in Figure 14., NJ Amplification cannot distinguish between let-7b and mismatch-5, whereas TWJ Amplification can. However, even with TWJ Amplification, there are cases where single-base mutations cannot be distinguished from let-7b, depending on the position of the mutation. These are when the mutation occurs at the 1st, 9th, or 11th base position from the 5' end (mismatch-1, mismatch-9, mismatch-11). This occurs when the mutation is located near the strand's end or near the junction structure of the TWJ complex.

To consider why the specificity varies depending on the position of the mutation, we will now consider sequences with mutations located at the 1st, 9th, and 11th base positions from the 5' end (mismatch-1, mismatch-9, mismatch-11). Using the \(\Delta\Delta G\) values from Table 4, we will simulate mismatch-1, mismatch-9, and mismatch-11 and compare the results with those of mismatch-5 and let-7b.

\[ \begin{array}{|c|c|} \hline & \Delta\Delta G \, (\text{kcal/mol}) \\ \hline \textbf{mismatch-1} & 1.6 \\ \hline \textbf{mismatch-9} & 1.9 \\ \hline \textbf{mismatch-11} & 2.3 \\ \hline \end{array} \]

Table 4 Binding constant for Model2

Although the difference in specificity is not as pronounced as in Figure 14., we can confirm through simulation that mismatch-1, mismatch-9, and mismatch-11 exhibit lower specificity compared to mismatch-5. This is because mutations located in the middle of the double strand result in larger \(\Delta\Delta G\) values. When \(\Delta\Delta G\) is larger, the ratio \(\dfrac{a(\text{m.m.})}{a(\text{p.m.})}\) becomes smaller, and the specificity increases. The fact that we cannot observe as much difference in specificity as seen in Figure 14. suggests that the model used in this analysis is insufficient.

Application to Sequence Design

It has been determined that Model 2 is more suitable than Model 1 as a model for TWJ Amplification. Additionally, for Model 2 (1→4) and Model 2 (2→3) to exhibit specificity, it is necessary for \(a_1\) and \(a_3\) to be small. In actual TWJ Amplification, does reducing \(a_1\) and \(a_3\) similarly increase specificity? In previous discussions, \(a_1\) and \(a_4\), as well as \(a_2\) and \(a_3\), were varied simultaneously in the broken down models of Model 2. In this section, we will consider how to design sequences that enhance specificity by varying \(a_1\) and \(a_3\) simultaneously in the full Model 2.

To begin, in order to evaluate the specificity of mismatch-5, we will assume \(\Delta\Delta G = 2.7\) kcal/mol and vary \(a_1\) and \(a_3\) as \(a_1 = 10^{n_1}\) and \(a_3 = 10^{n_2}\), where \(-2 \leq n_1, n_2 < 8\). We will then examine \(\frac{F'}{F} = \frac{W + Z (\text{m.m.})}{W + Z (\text{p.m.})}\) at \(t=3000\).

\begin{equation} \begin{aligned} \Delta G_1 + \Delta G_2 &= \text{const.} \notag\\ \Delta G_1 + \Delta G_4 &= \Delta G_2 + \Delta G_3 = \text{const.} \notag\\ \end{aligned} \tag{9} \end{equation}

If the target length is constant and \(\Delta G\) is equal in the pathways \(1→4, 2→3\), it could be approximated to Eq. (9). Hence, the relationship among \(a_1, a_2, a_3, a_4\) can be written as Equation 10.

\begin{equation} \begin{aligned} a_2 &\propto \frac{1}{a_1} \notag\\ a_4 &= \frac{a_2 \cdot a_3}{a_1} \notag\\ a_1 &< a_3 \notag\\ \end{aligned} \tag{10} \end{equation}

It can be predicted from Figure 16. that when \(a_1 < a_3 < 1e+3\), there will be sufficient specificity. In fact, when target, template, and helper are of the sequence in the Table 1, \(a_1=3.06, a_3=4.05e+1\) can be obtained, and it can be shown that the specificity is sufficient from Figure 16.

Let the length of the binding region between the target and the template be \(tt\), and the length of the binding region between the helper and the template be \(ht\). We will investigate at what length of \(ht\) the specificity decreases, with \(tt\) being fixed. Here, the target is let-7b, and the template is the sequence of the template (TWJ) from Table 1. By varying \(ht\) to complement the template, we modify \(a_3\).

\[ \begin{array}{|c|c|c|} \hline \textbf{ht} & \Delta G \, (\text{kcal/mol}) & a_3 \\ \hline 3 & -8.76 & 1.50 \\ \hline 4 & -9.78 & 7.86 \\ \hline 5 & -10.8 & 4.05 \times 10^{1} \\ \hline 6 & -11.5 & 1.39 \times 10^{2} \\ \hline 7 & -12.0 & 2.96 \times 10^{2} \\ \hline 8 & -12.4 & 5.13 \times 10^{2} \\ \hline 9 & -13.1 & 1.75 \times 10^{3} \\ \hline 10 & -14.3 & 1.17 \times 10^{4} \\ \hline 11 & -15.8 & 1.35 \times 10^{5} \\ \hline \end{array} \]

Table 5. Ht, \(\Delta{G}, a_3\)

From Table 5, when the target and template sequences are set to those in Table 1, we can predict that in order to maintain sufficient specificity, \(ht > 8\) is incompatible.

Next, we will research how to improve the sequence to enhance the specificity for mismatch-1. Assuming \(\Delta\Delta G (\text{kcal/mol}) = 1.6\), we vary \(a_1\) and \(a_3\) as \(a_1 = 10^{n_1}\) and \(a_3 = 10^{n_2}\), where \(-2 \leq n_1, n_2 < 8\), and investigate \(\frac{F'}{F} = \frac{W + Z (\text{m.m.})}{W + Z (\text{p.m.})}\) at \(t=3000\).

Note that for mismatch-1, the range of \(a_3\) that yields specificity is slightly narrower compared to mismatch-5. Considering that in Figure 15., sufficient specificity was not obtained in the simulation when \(ht=5\), we can predict that to maintain sufficient specificity, \(ht \leq 4\) is required.

When examining ways to enhance the specificity of mismatch-15 or mismatch-17, the parameters to adjust are \(a_2\) and \(a_4\). As in the above discussion, adjusting the length of the binding regions between the helper and the target, as well as between the helper and the template, can maintain or improve specificity.

Future

We were able to replicate the fact obtained in the wet lab that the specificity for mismatch-5 is high and the specificity for mismatch-1 is low in TWJ Amplification using the model created in the Dry lab. Although we predicted the conditions to improve the specificity for mismatch-1 using this model, we did not have the time to confirm it in the Wet Lab. If we could verify the Dry lab predictions in the Wet Lab, it would further enhance the reliability of the model and allow it to be used for predicting other phenomena. Additionally, while we demonstrated that the specificity for mismatch-1, 9, and 11 is higher than that for mismatch-5, we were unable to reproduce the significant differences observed in the Wet Lab. Moving forward, it will be necessary to further investigate the reasons behind this.

Appendix

Based on the previous discussions, it can be anticipated that the high sequence specificity of TWJ Amplification is achieved by making the complex binding a multi-step process, particularly by keeping \(a_3\) small, which increases \(XP\_XPT\_b\) and ensures specificity. In this section, we will investigate the strong involvement of \(XP\_XPT\_b\) in the reaction specificity by manipulating \(k_2\) and \(k_3\) in Model 2 (2→3). We will vary \(k_2 = 10^{n_1}\) and \(k_3 = 10^{n_2}\) within the range of \(-4 \leq k_2, k_3 \leq 5\) and examine \(\frac{F'}{F} = \frac{W + Z (\text{m.m.})}{W + Z (\text{p.m.})}\) at \(t=3000\).

We can see from Figure 17. that the specificity is robust with respect to \(k_2\). We will analyze why the specificity improves when changing \(k_2\): \(1e+1\) (const.), \(k_3\): \(1e-3\)→\(1e-1\).

\begin{equation} \begin{aligned} XP\_XPT &= k_3 [XP] [T] - \frac{k_3}{a_3} [XPT] \notag\\ XPT\_W &= \frac{k_{\text{cat}} P_0}{32 m_1 c} [XPT] \notag\\ \end{aligned} \tag{11} \end{equation}

From Equation 11., we can immediately understand that \(XP\_XPT\) increases as \(k_3\) increases, which results in the increase in the amount of \(XPT\).

We can see from Figure 18. that increasing \(k_3\) decreases \(\frac{XP\_XPT(\text{mismatch-5})}{XP\_XPT(\text{let-7b})}\), which means that the specificity improved. Then why does \(\frac{XP\_XPT(\text{mismatch-5})}{XP\_XPT(\text{let-7b})}\) decrease when \(XP\_XPT\) increases?

\begin{equation} \begin{aligned} XP\_XPT\_f = k_3 [XP] [T] \notag\\ XP\_XPT\_b = \frac{k_3}{a_3} [XPT] \notag\\ \end{aligned} \tag{12} \end{equation}

From Equation 12., increasing \(k_3\) by a factor of \(1e+2\) results in \(XP\_XPT\_f\) also increasing by a factor of approximately \(1e+2\). However, since \(k_3\) significantly enhances \(XPT\), \(XP\_XPT\_b\) becomes more than \(1e+2\) times larger. Therefore, as discussed previously, \(\frac{XP\_XPT(\text{mismatch-5})}{XP\_XPT(\text{let-7b})}\) decreases. This again highlights the importance of \(XP\_XPT\_b\).

References

Zhang, Q., Chen, F., Xu, F., Zhao, Y., & Fan, C. (2014). Target-triggered three-way junction structure and polymerase/nicking enzyme synergetic isothermal quadratic DNA machine for highly specific, one-step, and rapid microRNA detection at attomolar level.Analytical Chemistry, 86(16), 8098-8105. https://doi.org/10.1021/ac501038r
Banerjee, D., Tateishi-Karimata, H., Ohyama, T., Ghosh, S., Endoh, T., Takahashi, S., & Sugimoto, N. (2020). Improved nearest-neighbor parameters for the stability of RNA/DNA hybrids under a physiological condition.Nucleic Acids Research, 48(21), 12042–12054. https://doi.org/10.1093/nar/gkaa572
SantaLucia, J., Jr., Allawi, H. T., & Seneviratne, P. A. (1996). Improved nearest-neighbor parameters for predicting DNA duplex stability.Biochemistry, 35(12), 3555-3562. https://doi.org/10.1021/bi951907q
SantaLucia, J. Jr., & Hicks, D. (2004). The thermodynamics of DNA structural motifs.Annual Review of Biophysics and Biomolecular Structure, 33(1), 415–440. https://doi.org/10.1146/annurev.biophys.32.110601.141800
Mathews, D. H., & Turner, D. H. (2002). Experimentally derived nearest-neighbor parameters for the stability of RNA three- and four-way multibranch loops.Biochemistry, 41(3), 869-880. https://doi.org/10.1021/bi011441d
Hooyberghs, J., Van Hummelen, P., & Carlon, E. (2009). The effects of mismatches on hybridization in DNA microarrays: Determination of nearest neighbor parameters.Nucleic Acids Research, 37(7), e53. https://doi.org/10.1093/nar/gkp109
Rauzan, B., McMichael, E., Cave, R., Sevcik, L. R., Ostrosky, K., Whitman, E., ... & Deckert, A. A. (2013). Kinetics and thermodynamics of DNA, RNA, and hybrid duplex formation. Biochemistry, 52(5), 765-772. https://doi.org/10.1021/bi3013005

Sequence Design

Abstract

Our ultimate goal is to establish the sequence design guideline for templates and helpers used in Three-Way-Junction SDA (TWJ-SDA). In this section, we focus on the Signal-to-Noise Ratio (S/N Ratio) of TWJ-SDA and build an ordinary differential equation (ODE) Model to predict templates and helpers that will result in a higher S/N ratio. Since the first ODE Model required improvement due to discrepancies with experimental results, we decided to create a new ODE Model. This new Model is closer to the experimental results by taking into account the influence of ab initio synthesis. By combining this new ODE Model with the template and helper conditions for sequence-specific amplification as determined in the TWJ Specificity section, we developed software to assist in the design of optimal templates and helpers for TWJ-SDA. This software will lower the hurdle to use our amplification and detection systems for various miRNAs, and will greatly expand the applicability of our project, POIROT.

Purpose

Based on the results of miRNA research at Dry Lab (For more information, see Model_miRNA_Selection), we decided to use TWJ-SDA, which has high sequence specificity, for our amplification system. For TWJ-SDA, we need to design the sequence of a helper and a template. Previous studies have investigated optimal helpers and templates for individual microRNAs (miRNAs) ¹. However, no general guidelines for the design of TWJ-SDA helpers and templates have been investigated. Our ultimate goal is to create guidelines for optimal helpers and templates design.

TWJ Specificity section (Model_TWJ_Specificity) helped to identify the causes of TWJ-SDA sequence specificity. This allowed us to know the conditions for sequence specificity. In this section, we focus on the Signal-to-Noise Ratio (S/N ratio) of TWJ-SDA and investigate the conditions for helpers and templates to increase the S/N ratio. The S/N ratio of TWJ-SDA can even be less than 1 if the helper and template are not well designed. Therefore, Seeking conditions for a high S/N ratio is crucial to achieving the desired results.

Method and Model

In designing the TWJ-SDA helper and template, it has to be decided how many bases the target miRNA and template are complementary and how many bases the helper and template are complementary. We tried to identify these two conditions for a high S/N ratio by using an ODE Model. An ODE Model of TWJ-SDA was also used to study TWJ specificity, but in this section it is the S/N ratio that is of particular interest. Therefore, we decided to create a different ODE Model from the ODE Model used in TWJ Specificity section.

ODE Model

The amplification reaction of TWJ-SDA start from the three pathways in Figure 2.

We simulated TWJ-SDA by creating the ODE Model that represents the reactions in Figure 2.

The equation of the ODE Model is Equation1.

Equation 1:

ODE for TWJ-SDA

\[ \begin{equation} \begin{aligned} \dfrac{d[X]}{dt} &= -k_1 [X] [T] + \dfrac{k_1}{a_1} [XT] - k_2 [X] [H] + \dfrac{k_2}{a_2} [XH] \notag\\ \dfrac{d[T]}{dt} &= -k_1 [X] [T] + \dfrac{k_1}{a_1} [XT] - k_3 [XH] [T] + \dfrac{k_3}{a_3} [XHT] - k_5 [H] [T] + \dfrac{k_5}{a_5} [HT] \notag\\ \dfrac{d[H]}{dt} &= -k_2 [X] [H] + \dfrac{k_2}{a_2} [XH] - k_4 [XT] [H] + \dfrac{k_4}{a_4} [XHT] - k_5 [H] [T] + \dfrac{k_5}{a_5} [HT] \notag\\ \dfrac{d[XT]}{dt} &= k_1 [X] [T] - \dfrac{k_1}{a_1} [XT] - k_4 [XT] [H] + \dfrac{k_4}{a_4} [XHT] \notag\\ \dfrac{d[XH]}{dt} &= k_2 [X] [H] - \dfrac{k_2}{a_2} [XH] - k_3 [XH] [T] + \dfrac{k_3}{a_3} [XHT] \notag\\ \dfrac{d[XHT]}{dt} &= k_3 [XH] [T] - \dfrac{k_3}{a_3} [XHT] + k_4 [XT] [H] - \dfrac{k_4}{a_4} [XHT] - \dfrac{k_{\text{cat}} P_0}{32 m c} [XHT] \notag\\ \dfrac{d[HT]}{dt} &= k_5 [H] [T] - \dfrac{k_5}{a_5} [HT] - \dfrac{k_{\text{cat}} P_0}{32 m c} [HT] \notag\\ \dfrac{d[W]}{dt} &= \dfrac{k_{\text{cat}} P_0}{32 m c} [XHT] - k_n [W] [N] + \dfrac{k_{\text{cat}} P_0}{23 m c} [Z] \notag\\ \dfrac{d[Z]}{dt} &= -\dfrac{k_{\text{cat}} P_0}{23 m c} [Z] + k_n [W] [N] \notag\\ \dfrac{d[P1]}{dt} &= \dfrac{k_{\text{cat}} P_0}{23 m c} [Z] \notag\\ \end{aligned} \end{equation} \] where, \[ \begin{equation} \begin{aligned} c &= 1 + \dfrac{[XHT] + [Z] + [W] + [HT]}{m}\notag\\ \end{aligned} \end{equation} \]

Assumption

The above model is based on the law of mass action. The polymerase and nickase reactions follow the Michaelis-Menten equation. The parameter c found in the denominator of the Michaelis-Menten term is acknowledging the competition between DNA species XHT, Z, W, HT for polymerase ². The amplification reaction from Path C is strictly different from that from Path A and Path B. We assumed that once the amplification reaction starts, the helper doesn't leave the template and the polymerase reaction proceeds in the same way as in XPT in HT, where there is no target miRNA. Therefore, in the equation, we assumed that the reaction proceeds from HT to W.

The binding rate between nucleic acids was set to \(k_1 = k_2 = k_3 = k_4 = k_5\) based on the study of Rauzan et al. ³. The binding speed was obtained by genetic algorithm fitting as described in the Amplification Comparison section (for more information, see Model_Amplification_Comparison).

The parameter values used are as follows.

Parameters:

We have simulated 12 different combinations of association constants \(a_1, a_2, a_3, a_4, a_5\). These association constants were calculated by Nearest-Neighbor Model ^7,⁸. These 12 combinations of association constants can be seen at Model_Sequence design.

Program for Simulation

The above ODE Model simulation was performed using python. The code used for simulation is available at the following link.

Please visit our GitLab.

Used Experimental Results

We used the results from Wet Lab to validate The ODE Model. As the target miRNA, hsa-miR-30d-5p was used. Amplification of TWJ-SDA was monitored by fluorescence from SYBR Green II. Experiments were performed under 4 different complementary strand length for helper and template (5 bp, 6 bp, 7 bp, 8 bp) and 3 complementary strand lengths for miRNA and template (10 bp, 11 bp, 12 bp), totaling 12 different conditions. For each condition, three experiments were performed with target miRNA 0nM and 10nM. The S/N ratio was calculated as Signal/Noise, where Signal and Noise were calculated by subtracting the value at 0 min as background from the amount of fluorescence at 60 min when miRNA was 10 nM and 0 nM, respectively.

Simulation of Model

The fluorescence in the experiment is mainly due to ssDNA, and in the simulation, it is assumed to correspond to the amplification product (P1) of TWJ-SDA. In the simulation, the concentration of P1 after 60 min when the target miRNA is 10 nM is Signal and the concentration of P1 after 60 min when the target miRNA is 0 nM is Noise. Signal and Noise units are in \(\mu M\). S/N ratio was calculated as Signal / Noise.

Figure 3 shows the Signal, Noise, and S/N ratio results when \(a_5\) and \(a_1\) are varied. A larger \(a_5\) means a longer length of the complementary strand of helper and template. On the other hand, a larger \(a_1\) means a longer length of the complementary strand of target miRNA and template. A 10-fold increase in \(a_5\) and \(a_1\), respectively, corresponds to an increase of about 1 bp in the length of the complementary strand. The results of this simulation indicate that the S/N ratio increases as the length of the helper and template complementary strand decreases.

Comparison with Experimental Results

Here we compare the simulation results with the results from Wet Lab. The left figure shows the experimental results, and the Wet Lab results suggest that the S/N ratio does not increase by a factor of 10 or more as in simulation.Furthermore, in the simulation, the smaller \(a_5\) is, the higher the S/N ratio is, but this doesn't seem to be the case in the experimental results. In all cases where length of TT was 10 bp, 11 bp, and 12 bp, the S/N ratio was the highest when the length of HT was 6 bp.

Interpretation

Comparing the simulation results and experimental results, the S/N ratio in simulation is more than 10, but in the experimental results, the S/N ratio is not that large. In addition, the helper and template conditions that give the maximum S/N ratio are different between the simulation predictions and the experimental results. The reason why the S/N ratio of the simulation is more than 10 is that the path C reaction is almost non-existent when the length of HT is short, and the Noise is very small. The simulation doesn't predict the experimental results at all. We thought that this model needs to be improved.

Model Improvements

What aspects of reality did our Model not represent well? From the comparison of simulation and experimental results, we thought that the difference from reality might be that the Noise is very small relative to the Signal when the length of HT is short. As a result of the literature review, we came up with a remedy. Zyrina's research shows that when there is a polymerase and some DNA, the polymerase works to produce a complementary strand even if there is no primer present ⁹. We hypothesized that this phenomenon called ab initio synthesis might also be occurring in TWJ Amplification. If ab initio synthesis is occurring, we can say that the amplification product (P1) is amplified by factors other than Path A, B, and C in both Signal and Noise cases. If we assume that ab initio synthesis is taking place, it will no longer be the case that the Noise is very small compared to Signal when the length of HT is short.

Considering the effect of ab inito synthesis, the ODE of the improved model is as in Equation 2. We have assumed that P1 is increasing at a constant rate by ab initio synthesis. Therefore, the change from the previous model is that \(a_b\) is added to the right-hand side of \(\dfrac{d[P1]}{dt}\). the value of \(a_b\) was determined so that the S/N ratio scale is consistent with the simulation and experimental results.

Equation 2:

New ODE for TWJ-SDA

\[ \begin{equation} \begin{aligned} \dfrac{d[X]}{dt} &= -k_1 [X] [T] + \dfrac{k_1}{a_1} [XT] - k_2 [X] [H] + \dfrac{k_2}{a_2} [XH] \notag\\ \dfrac{d[T]}{dt} &= -k_1 [X] [T] + \dfrac{k_1}{a_1} [XT] - k_3 [XH] [T] + \dfrac{k_3}{a_3} [XHT] - k_5 [H] [T] + \dfrac{k_5}{a_5} [HT] \notag\\ \dfrac{d[H]}{dt} &= -k_2 [X] [H] + \dfrac{k_2}{a_2} [XH] - k_4 [XT] [H] + \dfrac{k_4}{a_4} [XHT] - k_5 [H] [T] + \dfrac{k_5}{a_5} [HT] \notag\\ \dfrac{d[XT]}{dt} &= k_1 [X] [T] - \dfrac{k_1}{a_1} [XT] - k_4 [XT] [H] + \dfrac{k_4}{a_4} [XHT] \notag\\ \dfrac{d[XH]}{dt} &= k_2 [X] [H] - \dfrac{k_2}{a_2} [XH] - k_3 [XH] [T] + \dfrac{k_3}{a_3} [XHT] \notag\\ \dfrac{d[XHT]}{dt} &= k_3 [XH] [T] - \dfrac{k_3}{a_3} [XHT] + k_4 [XT] [H] - \dfrac{k_4}{a_4} [XHT] - \dfrac{k_{\text{cat}} P_0}{32 m c} [XHT] \notag\\ \dfrac{d[HT]}{dt} &= k_5 [H] [T] - \dfrac{k_5}{a_5} [HT] - \dfrac{k_{\text{cat}} P_0}{32 m c} [HT] \notag\\ \dfrac{d[W]}{dt} &= \dfrac{k_{\text{cat}} P_0}{32 m c} [XHT] - k_n [W] [N] + \dfrac{k_{\text{cat}} P_0}{23 m c} [Z] \notag\\ \dfrac{d[Z]}{dt} &= -\dfrac{k_{\text{cat}} P_0}{23 m c} [Z] + k_n [W] [N] \notag\\ \dfrac{d[P1]}{dt} &= \dfrac{k_{\text{cat}} P_0}{23 m c} [Z] + a_b\notag\\ \end{aligned} \end{equation} \] where, \[ \begin{equation*} \begin{aligned} c &= 1 + \dfrac{[XHT] + [Z] + [W] + [HT]}{m}\notag\\ \end{aligned} \end{equation*} \]

Parameters:

The parameters used are as follows.

Simulation of Improved Model

Figure 5 shows the Signal and Noise results of the new model. The difference from the previous model is that the Noise is more than \(0.3 \ \mu M\) even when the length of HT is short. This is due to the ab initio synthesis added in the new model.

Figure 6 shows the S/N ratio results of the new model. The new model with ab initio synthesis influence showed that S/N ratio was less than 10, which is consistent with the experimental results. In the old model, the shorter the length of HT, the higher S/N ratio, but in the new model, the shorter the length of HT is not necessarily better. It can be said that the new model is able to generally reproduce the trend of experimental results.

Comparison with Experimental Results

We will specifically examine whether this new model accurately predicts helper and template with large S/N Ratio.

Figure 7 shows the consistency between the order of S/N ratio in the simulation and the order of S/N ratio in the experiment. If the simulation perfectly predicts the experimental results, all plots lie on the line \(y=x\). From Figure 7, we can say that the new ODE Model predicts the experimental results to some extent.

Integration of Sequence Design Guidelines

We will integrate the discussion in this section with the constraints about template and helper that were identified in TWJ Specificity (Model_TWJ_Specificity). TWJ Specificity section found that the association constant \(a_3\) must be less than \(10^3\) to have sequence specificity. To predict the optimal TWJ-SDA template and helper, we need to integrate the prediction of the new model made in this section with the constraints obtained in TWJ Specificity section.

Implement The Algorithm in Software

Integrated sequence design guidelines can predict templates and helpers that allow TWJ-SDA to have high S/N ratio and sequence specificity. This allows us to omit many experiments looking for optimal conditions. Our project POIROT detects glaucoma using miRNA biomarker. The POIROT system can be applied to detect for various other diseases by changing the biomarker miRNAs. One of the hardest hurdles in its application is the design of TWJ-SDA template and helper. Therefore, we have created a software that predicts the optimal template and helper for each miRNA according to the integrative sequence design guidelines.

Algorythm of Our Software

Our TWJ-SDA sequence design software works as follows:

The miRNA sequence is entered by the user.
Extract miRNAs from the database.
helper and template candidates are prepared. The helper candidates have HT length of 4 bp, 5 bp, 6 bp, 7 bp, 8 bp. The template candidates have TT length of 9 bp, 10 bp, 11 bp, 12 bp.
After omitting the condition where \(a_3\) is greater than \(10^3\), simulation is performed to obtain the S/N ratio for all other conditions using the new model.
The template and helper with the highest predicted S/N ratio will be output.

Click here for more information on how to use the software

Conclusion

We designed the sequence with the goal of a high S/N ratio in TWJ-SDA. The ODE Model was improved to be consistent with the experimental results, and the improved model was able to predict the S/N ratio of the experimental results to some extent. By integrating this ODE Model with the constraints for sequence specificity obtained in TWJ Specificity section, we create software to design the optimal TWJ-SDA template and helper. This software will lower the hurdles to using our amplification and detection systems for various miRNAs, and will make the applicability of POIROT take a leap.

References

Zhang, Q., Chen, F., Xu, F., Zhao, Y., & Fan, C. (2014). Target-triggered three-way junction structure and polymerase/nicking enzyme synergetic isothermal quadratic DNA machine for highly specific, one-step, and rapid microRNA detection at attomolar level. Analytical chemistry, 86(16), 8098-8105. https://doi.org/10.1021/ac501038r
Özay, B., Murphy, S. D., Stopps, E. E., Gedeon, T., & McCalla, S. E. (2022). Positive feedback drives a secondary nonlinear product burst during a biphasic DNA amplification reaction. Analyst, 147(20), 4450-4461. https://doi.org/10.1039/D2AN01067D
Rauzan, B., McMichael, E., Cave, R., Sevcik, L. R., Ostrosky, K., Whitman, E., ... & Deckert, A. A. (2013). Kinetics and thermodynamics of DNA, RNA, and hybrid duplex formation. Biochemistry, 52(5), 765-772. https://doi.org/10.1021/bi3013005
New England Biolabs. (n.d.). Bst DNA Polymerase, Large Fragment. https://www.neb.com/ja-jp/products/m0275-bst-dna-polymerase-large-fragment
New England Biolabs. (2011, December 17). Can Bst DNA Polymerase be used at temperatures other than 65 °C ? https://www.neb.com/ja-jp/faqs/2011/12/17/can-bst-dna-polymerase-be-used-at-temperatures-other-than-65-176-c
SantaLucia Jr, J. (1998). A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proceedings of the National Academy of Sciences, 95(4), 1460-1465. https://doi.org/10.1073/pnas.95.4.1460
New England Biolabs. (n.d.). Effect of various temperatures on nicking endonucleases. https://www.neb.com/ja-jp/tools-and-resources/selection-charts/effect-of-various-temperatures-on-nicking-endonucleases
Banerjee, D., Tateishi-Karimata, H., Ohyama, T., Ghosh, S., Endoh, T., Takahashi, S., & Sugimoto, N. (2020). Improved nearest-neighbor parameters for the stability of RNA/DNA hybrids under a physiological condition. Nucleic Acids Research, 48(21), 12042-12054. https://doi.org/10.1093/nar/gkaa572
Zyrina, N. V., Antipova, V. N., & Zheleznaya, L. A. (2014). Ab initio synthesis by DNA polymerases. FEMS Microbiology Letters, 351(1), 1-6. https://doi.org/10.1111/1574-6968.12326

Amplification Comparison

Abstract

Our goal is to develop a home detection device for glaucoma. To this end, we explored the use of strand displacement amplification (SDA) methods that do not require complex temperature changes. In Dry lab, we constructed an ordinary differential equation (ODE) model to select appropriate amplification reactions. First, based on the experimental data, we fitted the ODE model parameters that were unknown and experimentally challenging to determine in order to replicate the experimental results in Wet lab. The ODE model revealed that three-way junction (TWJ-SDA) alone was insufficient to activate Cas due to low amplification efficiency. Based on previous researches, then, we proposed the alternative amplification systems, including TWJ-SDA > Exponential amplification reaction (EXPAR), TWJ-SDA > 2step-SDA and TWJ-SDA > 3step-SDA. About 2step-SDA and 3step-SDA were introduced precisely at Optimise Amplification. The amplification efficiency, the positive-to-negative signal ratio (P/N ratio), and robustness to enzyme activity and time were compared by using the ODE model. As a result, we proposed using TWJ-SDA > 3step-SDA as the best amplification system and suggested the optimal initial template concentration.

Fitting the parameters

Purpose

In constructing the ODE model, it is necessary to define the parameters used within the model. However, in this amplification system, the reaction rate constants for nucleic acid binding and nicking, and the Michaelis constant were unknown and difficult to determine experimentally. The goal of this section is to appropriately estimate these parameters based on wet lab experimental results and construct an ODE model that replicates a portion of these results.

Methods

In constructing the ODE model, the parameters like these were generally determined through fitting. We developed several ODE models throughout the project. As several ODE modellings of EXPAR had been constructed, the parameters on EXPAR were fitted first and the fitted parameters were used in the ODE models of other amplification systems.

In EXPAR, the process begins with the target binding to the template, followed by a polymerase extension reaction that produces double strands (Figure 1.). Next, nickase recognizes the nicking recognition sequence within the double strands, leading to nicking. The polymerase then initiates a new extension reaction from the nicking site, displacing the downstream single-strand DNA (ssDNA) encountered during synthesis and producing the full double-strands DNA (dsDNA). Since the displaced ssDNA has the same sequence as the target, it binds to the template again, initiating further extension reactions and achieving overall exponential amplification.

The ODE model for EXPAR in this project was constructed with the references to multiple prior studies that made ODE models of EXPAR (Figure 2.) ^1,² and the reaction mechanism of ultrasensitive DNA amplification reactions ³, which is similar to that of EXPAR. As X could bind to the 5' end of T, ODE models were built in these studies based on different scenarios, such as X binding to both the 3' and 5' ends of T or only to the 3' end. These ODE models were constructed for each case by Dry lab. Still, since no significant difference was observed in the amplification behaviour of the reaction products, we opted to construct the simpler ODE model, assuming X binds only to the 3' end of T to reduce the number of parameters.

This ODE model is formulated as shown in Equcation 1. The way to determine the reaction rates in this ODE model follows prior research ³, with the assumption that binding and dissociation reactions follow the law of mass action and enzyme reactions follow the Michaelis-Menten equation. For more assumptions in the construct of ODE model, see Assumption.
The experiments of EXPAR were conducted at 50 ℃, with an initial T concentration of 4.40e-2 µM and an initial X concentration ranging from 0 M (NC) to 1 pM-1 µM SYBR Green Ⅰ was used to detect fluorescence, which indicates the presence of dsDNA. Bst 2.0 polymerase, optimal at 65 ℃, was employed, and Nt.BstNBI nickase, optimal at 55 ℃, was used. Their initial concentrations were 1.50e-2 U/µL and 9.38e-2 U/µL, respectively. The parameters used in the ODE model are listed below in Table 1. Notably, \(k_{cat}P_{0}\) is expressed as such because \(k_{cat}\) and \(P_{0}\) appear only as a product and not individually. Given that 1 unit of Bst 2.0 polymerase is defined as the amount of the enzyme required to incorporate 25 nmol of dNTP into acid-insoluble material in 30 minutes at 65 ℃ ⁴, U is translated to mol as \(1U=25 nmol/1800 s/k_{cat}\), eliminating the need to measure the exact value of \(k_{cat}\). Since the fitting experiments were conducted at 50℃, where polymerase activity is reduced to 30-50% ⁵, we assumed 40% activity for the calculations. For the initial nickase concentration, we referred to prior studies ³ modelling the reaction in which the same enzyme was used and 0.2 U/µL corresponded to 2.6e-2 µM. Nt.BstNBI Nickase retains 100% activity at 50℃. Additionally ⁶, the experimental data suggested its activity remains constant during the reaction, though a prior research considered the heat inactivation of nickase. The association constants were calculated using NUPACK. For more detailed experimental methods, see Experiments_EXPAR.

\[ \begin{align} \dfrac{d[X]}{dt}&=-k_{1}[X][T]+\dfrac{k_{1}}{a}[XT]+\dfrac{k_{cat}}{17}\dfrac{P_{0}}{mc}[Y]\notag\\ \dfrac{d[T]}{dt}&=-k_{1}[X][T]+\dfrac{k_{1}}{a}[XT]\notag\\ \dfrac{d[XT]}{dt}&=k_{1}[X][T]-\dfrac{k_{1}}{a}[XT]-\dfrac{k_{cat}}{27}\dfrac{P_{0}}{mc}[XT]\notag\\ \dfrac{d[W]}{dt}&=\dfrac{k_{cat}}{27}\dfrac{P_{0}}{mc}[XT]-k_{2}[N][W]+\dfrac{k_{cat}}{17}\dfrac{P_{0}}{mc}[Y]\notag\\ \dfrac{d[Y]}{dt}&=k_{2}[N][W]-\dfrac{k_{cat}}{17}\dfrac{P_{0}}{mc}[Y]\notag\\ \end{align} \]

Equation 1. The equations of EXPAR.

Table 1. The parameters of EXPAR

The mean fluorescence of the three experiments for each target concentration, excluding the background, was as follows (Figure 3.).

The fluorescence was predicted by the simulation as follows (Figure 4.). Since the fluorescence is measured by using SYBR Green Ⅰ, which basically detects double-stranded DNA, it was assumed that the total concentration of W and Z was proportional to the fluorescence intensity, as well as the prior study ¹ that also performed fitting of an ODE model of EXPAR. When the concentration graph reaches a plateau, the total concentration of W and Z was approximately equal to the initial template concentration of 4.4e-2 μM as there was almost no T or XT in the simulation. Therefore, this value 4.4e-2 was assumed to correspond to the plateau fluorescence value of 3.00e5 in Figure 3. Both the fluorescence graph from the wet experiment and the concentration graph from the dry simulation reach a plateau. After reaching the plateau, however, the concentration remains stable in the simulation, while the experimental fluorescence continues to increase. According to advice from Professor Komiya in Human Practices, this might be due to SYBR Green Ⅰ emitting slight fluorescence in response to increasing amounts of X, which is much more abundant than dsDNA.

The parameters \(k_{1},k_{2},m\) was fitted using a genetic algorithm with a population size of 100 and 50 generations, and the used data was located between the dotted lines \((0, 2.80e+5)\) in Figure 4. In the simulation, the initial fluorescence was set to 0, but in the wet experiment, when the initial X concentration was high, the amplification began immediately, and before the solution was placed in the measurement device, the graph of the fluorescence started to rise. Hence, in Dry lab simulation, it was assumed that the reaction started, and that 1 minute later the fluorescence measurement began. The amplification of the negative control (NC) and of the high concentrations like 100 nM and 1 μM, where the fluorescences reached at 2.80e5 within the first minute was excluded from the data used in the fitting. The amplification of NC is thought to occur because of ab initio synthesis, which refers free dNTPs bind to the template without primers, triggering the extension reaction ⁷.

A genetic algorithm is a method that explores solutions by representing data (a set of proposed solutions) as "individuals" with genes and repeatedly performing operations such as selection, crossover, and mutation, selecting individuals with high fitness. In this project, we used a genetic algorithm to fit the reaction rate constants of nucleic acid binding and nicking, and the Michaelis constant.
First, in each generation, 100 individuals were determined with genes representing the three constants, and the fitness was defined by how closely the simulated results matched the experimental data based on the mean squared error (MSE). Individuals with smaller MSEs were considered fitter. The algorithm was designed so that, for each generation, individuals were defined to undergo crossover with probability of 50 %, mutation with that of 20 %, and copied with that of 30 %. For crossover, in which two individuals need to be selected, mutation and copying, in which one individual needs to be selected, a selection process known as "tournament selection" was used. The tournament selection involves randomly picking several individuals and selecting the best one among them. In this fitting process, two individuals were randomly chosen, and the individual the fitness of which is the higher was selected. The tournament selection was occured twice for crossover, and once for mutation and copying. By limiting the number of participants in each tournament, the randomness of this algorithm increases, promoting the diversity in the selected individuals.

For crossover, the "Blend Crossover" method was applied. Blend Crossover generates the new genes of the child individual by mixing the genes of two parent individuals. The gene of the child individual is randomly selected from an extended range of the parent individuals' genes, controlled by a parameter called \(\alpha\). Let's assume the two parent individuals \(P_{1}\),\(P_{2}\) have genes \(x_{1}\),\(x_{2}\), respectively. When performing Blend Crossover, the gene c of the child individual is randomly chosen from the following range.

\begin{align} c \in \left[ x_\text{min} - \alpha \cdot (x_\text{max} - x_\text{min}), x_\text{max} + \alpha \cdot x_\text{max} - x_\text{min} \right]\notag\\ \end{align}

In these equation, \(x_{\text{min}}, x_{\text{max}}\) were defined as follows.

\begin{align} x_{\text{min}} &= \min(x_1, x_2)\notag\\ x_{\text{max}} &= \max(x_1, x_2)\notag\\ \end{align}

\(\alpha\) is the parameter controlling the degree of expansion in the range for the child individual's genes and typically takes values in the range \(0 \leq \alpha \leq 1\). In this way, Blend Crossover allows the child individual's genes to explore beyond the range of the parent genes, expanding the search space and reducing the risk of getting stuck in a local optimum.

Next, in the mutation process for this fitting, each gene of an individual had a 20 % probability of being modified by adding a random value drawn from a Gaussian distribution. The Gaussian distribution has a mean of 0 and a standard deviation of 0.01. The child individual's gene is calculated as follows

\begin{align} x_{\text{new}} = x + N(0, 0.01)\notag\\ \end{align}

The small standard deviation allows for fine-tuning of the genes, and by gradually changing the genes, the search range is expanded, preventing the algorithm from settling into a local optimum. Mutation is essential for exploring solutions that cannot be found by crossover alone.

Additionally, copying occurs with a 30 % probability, and these three processes were repeated until the number of the child generation reaches 100. After that, the child generation became the current generation, and the same process was repeated for 50 generations. Finally, the individual with the highest fitness was selected and its genes were used as the fitted constants.

Result

The parameters obtained through fitting were listed in Table. 1, and the fluorescence of the simulation results compared with that of the experimental results was shown below (Figure 5).

Conclusion

The purpose of this section was to determine the unknown parameters in the ODE model of the amplification system in this project, specifically the reaction rate constants for nucleic acid binding, the nicking reaction, and the Michaelis constant, which are tough to obtain experimentally. By employing a genetic algorithm, we were able to fit the experimental results of EXPAR with the simulation results from the ODE model to evaluate these unknown constants. The determined values are summarised in Table 1.

The amplification rate of TWJ-SDA

Purpose

In the previous section discussing specificity, the high specificity of TWJ-SDA (Figure 6.) was demonstrated. For more precise information, see Model. However, the amplification efficiency of TWJ-SDA is limited by the nature of the reaction, where the target binds to the template and proceeds through strand displacement, resulting in a maximum yield of amplification products determined by the target concentration. The purpose of this section was to evaluate whether TWJ-SDA alone had sufficient effeciency as the amplification system in this project by using simulations based on the ODE model of TWJ-SDA. While the presence of amplification can be examined by observing the fluorescence, the actual concentration of the amplification products cannot be measured, making this modelling essential.

Method

We constructed the ODE model for TWJ-SDA based on the reaction mechanism of TWJ-SDA (Figure 7.). In this model, the binding of the target to the template and that of primer to the template were not considered, as they did not contribute to the overall amplification efficiency and the robustness of the amplification system, which is a key criterion for evaluating the amplification system. By excluding these complexes, it was also aimed to reduce the number of parameters in this ODE model.

Table 2. The concentrations of TWJ-SDA complexes.

Based on the above model diagram (Figure 7.), the following ODE was formulated (Equation 2.). Considering practical device implementations, the simulation was conducted setting the temperature, which was required to determine enzyme activity, to 37 ℃. Since the amplification efficiency did not change significantly when there was an ample amount of template, excess initial template concentration of 1 M and the initial target concentration of 100 fM were assumed in the simulation. The enzyme concentrations were optimised based on Wet experiments. For Polymerase, we used Bst. Large Fragment, which has an optimal temperature of 65 ℃, with an initial concentration of 6.25e-2 U/µL, and for Nickase, we used Nb.BbvCI, which has an optimal temperature of 37 ℃, with an initial concentration of 1.20e-1 U/µL. As for the sequences, the target was designated as a biomarker for glaucoma, with the template and primer being complementary to it.

The constants used in the ODE were referenced from fitting, NEB, and NUPACK (Tab. 3). The definition of 1 unit of the Polymerase, Bst Large Fragment, is the amount of enzyme required to incorporate 10 nmol of dNTP into acid-insoluble material in 30 minutes at 65 ℃ ⁹, expressed as \(1 U=10 nmol/1800 s/k_{cat}\). The Polymerase activity at 37 ℃ is calculated to be 12.5%, as it ranges from 10 % to 15 % ¹⁰. For the initial concentration of Nickase, we performed unit conversion from U to mol in the same manner as done in EXPAR, assuming the Nickase, Nt. BbvCI, operated at 100 % activity at the optimal temperature of 37 ℃ ⁶.

\begin{align} \frac{d[C]}{dt} &= -k_{1}[C][T] + \frac{k_{1}}{a}[CT]\notag \\ \frac{d[T]}{dt} &= -k_{1}[C][T] + \frac{k_{1}}{a}[CT]\notag \\ \frac{d[CT]}{dt} &= k_{1}[C][T] - \frac{k_{1}}{a}[CT] - \frac{k_{cat}}{26}\frac{P_{0}}{mc}[CT]\notag \\ \frac{d[W]}{dt} &= \frac{k_{cat}}{26}\frac{P_{0}}{mc}[CT] - k_{2}[N][W] + \frac{k_{cat}}{18}\frac{P_{0}}{mc}[Z]\notag \\ \frac{d[Z]}{dt} &= k_{2}[N][W] - \frac{k_{cat}}{18}\frac{P_{0}}{mc}[Z]\notag \\ \frac{d[X]}{dt} &= \frac{k_{cat}}{18}\frac{P_{0}}{mc}[Z]\notag\\ \end{align}

Equation 2. The equations of TWJ-SDA.

Table 3. The parameters of TWJ-SDA

Results

When considering the reaction time upon device implementation, it was estimated to be around 40 minutes. However, the simulation results (Figure 8.) indicated that after 40 minutes, the concentration of the amplification product reached approximately only 3 pM. This concentration is far from the required 2.5 nM concentration of the PAM sequence needed for Cas activation, which was shown by the Wet result, making it clear that achieving this efficiency with TWJ-SDA alone is virtually impossible. For more information about the result of Cas activation, see Results.

Conclusion

From the above, it is evident that in our project to develop a glaucoma detection device, the amplification efficiency of TWJ-Amp is insufficient. The ODE model suggests that it is necessary to combine TWJ-Amp with a more efficient amplification reaction to achieve the desired results.

Optimise Amplification

Purpose

The TWJ-SDA model indicated that it was insufficient for the amplification of PAM-containing dsDNA to the approximately required 2.5 nM for Cas recognition from the result of experiment. For more information about the result of Cas activation, see Results.
Therefore, it was necessary to use the amplified product of TWJ-SDA as a target and conduct other reactions with high amplification efficiency. Based on a review of prior studies, linking EXPAR and Multi step-SDA ¹¹ were proposed, suggesting TWJ-SDA > EXPAR, TWJ-SDA > 2step-SDA and TWJ-SDA > 3step-SDA as the overall amplification system (Figure 9., 10., 11.). However, experimentally conducting these three amplification systems was challenging due to cost constraints (money, effort, and time). Consequently, the objective of this section was to propose the optimal amplification system and initial concentrations using simulations based on the ODE model.

Methods

We constructed the ODE models of the three amplification systems (Figure 12., 13., 14.; Equation. 3., 4., 5.; Table 4., 5., 6.). In the previous ODE models for EXPAR and TWJ-SDA, the binding of amplification products to the 5' end of the template was not considered. However, since when combining multiple amplification reactions, this binding can inhibit subsequent amplification reactions, the ODE models that account for this factor were constructed. Since there was no existing ODE model for Multi step-SDA in prior studies, the ODE models were constructed based on the reaction mechanisms (Figure 9, 10, 11) as those of EXPAR and TWJ-SDA. Assuming practical device implementations, the simulation was conducted setting the temperature, which is required to determine enzyme activity, to 37 ℃, and using enzyme concentrations optimised through actual Wet experiments. For Polymerase, Bst. Large Fragment, which has an optimal temperature of 65 ℃, was used with an initial concentration of 8.00e-2 U/µL, and for Nickase, Nb.BbvCI, which has an optimal temperature of 37 ℃, was used with an initial concentration of 2.00e-1 U/µL. As for sequences, they could be freely changed for the amplification products, including TWJ-SDA, and were discussed in other sections. Thus, they were omitted from these ODE models discussion as they were irrelevant to the overall amplification efficiency and the robustness of the amplification system. The constants used in the ODE were determined in the same way as for TWJ-SDA. For binding constants, we used the same constants if the base pairs near the binding sites remain unchanged (Table 4., 5., 6.).

Detail:

\begin{align} \frac{d[C]}{dt} &= -k_{1}[C][T_{1}] + \frac{k_{1}}{a_{1}}[CT_{1}] - k_{1}[C][X_{2}T_{1}] + \frac{k_{1}}{a_{1}}[X_{2}CT_{1}]\notag \\ \frac{d[T_{1}]}{dt} &= -k_{1}[C][T_{1}] + \frac{k_{1}}{a_{1}}[CT_{1}] - k_{1}[X_{2}][T_{1}] + \frac{k_{1}}{a_{2}}[X_{2}T_{1}]\notag \\ \frac{d[CT_{1}]}{dt} &= k_{1}[C][T_{1}] - \frac{k_{1}}{a_{1}}[CT_{1}] - k_{1}[X_{2}][CT_{1}] + \frac{k_{1}}{a_{2}}[X_{2}CT_{1}] - \frac{k_{cat}}{28}\frac{P_{0}}{mc}[CT_{1}]\notag \\ \frac{d[W_{1}]}{dt} &= \frac{k_{cat}}{28}\frac{P_{0}}{mc}[CT_{1}] - k_{2}[N][W_{1}] + \frac{k_{cat}}{22}\frac{P_{0}}{mc}[Z_{1}]\notag \\ \frac{d[Z_{1}]}{dt} &= k_{2}[N][W_{1}] - \frac{k_{cat}}{22}\frac{P_{0}}{mc}[Z_{1}] + \frac{k_{cat}}{6}\frac{P_{0}}{mc}[X_{2}CT_{1}]\notag \\ \frac{d[X_{2}CT_{1}]}{dt} &= k_{1}[CT_{1}][Z_{2}] - \frac{k_{1}}{a_{2}}[X_{2}CT_{1}] - \frac{k_{cat}}{6}\frac{P_{0}}{mc}[X_{2}CT_{1}] + k_{1}[C][X_{2}T_{1}] - \frac{k_{1}}{a_{1}}[X_{2}CT_{1}]\notag \\ \frac{d[X_{2}T_{1}]}{dt} &= k_{1}[X_{2}][T_{1}] - \frac{k_{1}}{a_{2}}[X_{2}T_{1}] - k_{1}[C][X_{2}T_{1}] + \frac{k_{1}}{a_{1}}[X_{2}CT_{1}]\notag \\ \frac{d[X_{2}]}{dt} &= -k_{1}[X_{2}][CT_{1}] + \frac{k_{1}}{a_{2}}[X_{2}CT_{1}] - k_{1}[X_{2}][T_{1}] + \frac{k_{1}}{a_{2}}[X_{2}T_{1}]\notag \\ &\quad + \frac{k_{cat}}{22}\frac{P_{0}}{mc}[Z_{1}] + \frac{k_{cat}}{22}\frac{P_{0}}{mc}[Z_{2}] - k_{1}[X_{2}][T_{3}] + \frac{k_{1}}{a_{2}}[X_{2}T_{3}]\notag \\ \frac{d[T_{2}]}{dt} &= k_{1}[X_{2}][T_{2}] - \frac{k_{1}}{a_{2}}[X_{2}T_{2}^{3}] + k_{1}[X_{2}][T_{2}] - \frac{k_{1}}{a_{2}}[X_{2}T_{2}^{5}]\notag \\ \frac{d[X_{2}T_{2}^{5}]}{dt} &= k_{1}[X_{2}][T_{2}] - \frac{k_{1}}{a_{2}}[X_{2}T_{2}^{5}] - k_{1}[X_{2}][X_{2}T_{2}^{5}] + \frac{k_{1}}{a_{2}}[X_{2}T_{2}] - \frac{k_{cat}}{22}\frac{P_{0}}{mc}[X_{2}T_{2}^{5}]\notag \\ \frac{d[X_{2}T_{2}^{3}]}{dt} &= k_{1}[X_{2}][T_{2}] - \frac{k_{1}}{a_{2}}[X_{2}T_{2}^{3}] - k_{1}[X_{2}][X_{2}T_{2}^{3}] + \frac{k_{1}}{a_{2}}[X_{2}T_{2}]\notag \\ \frac{d[X_{2}T_{2}]}{dt} &= k_{1}[X_{2}][X_{2}T_{2}^{5}] - \frac{k_{1}}{a_{2}}[X_{2}T_{2}] + k_{1}[X_{2}][X_{2}T_{2}^{3}] - \frac{k_{1}}{a_{2}}[X_{2}T_{2}] - \frac{k_{cat}}{4}\frac{P_{0}}{mc}[X_{2}T_{2}]\notag \\ \frac{d[Z_{2}]}{dt} &= k_{2}[N][W_{2}] - \frac{k_{cat}}{22}\frac{P_{0}}{mc}[Z_{2}] + \frac{k_{cat}}{4}\frac{P_{0}}{mc}[X_{2}T_{2}]\notag \\ \frac{d[W_{2}]}{dt} &= \frac{k_{cat}}{26}\frac{P_{0}}{mc}[X_{2}T_{2}^{5}] - k_{2}[N][W_{2}] + \frac{k_{cat}}{22}\frac{P_{0}}{mc}[Z_{2}]\notag \\ \frac{d[T_{3}]}{dt} &= -k_{1}[X_{2}][T_{3}] + \frac{k_{1}}{a_{2}}[X_{2}T_{3}]\notag \\ \frac{d[X_{2}T_{3}]}{dt} &= k_{1}[X_{2}][T_{3}] - \frac{k_{1}}{a_{2}}[X_{2}T_{3}] - \frac{k_{cat}}{26}\frac{P_{0}}{mc}[X_{2}T_{3}]\notag \\ \frac{d[W_{4}]}{dt} &= \frac{k_{cat}}{26}\frac{P_{0}}{mc}[X_{2}T_{3}] \end{align}

Equation 3. The equations of TWJ-SDA > EXPAR.

Table 4. The parameters of TWJ-SDA > EXPAR.

Detail:

\begin{align} \frac{d[C]}{dt} &= - k_{1} [T_{1}] [C] + \frac{k_{1}}{a_{1}} [CT_{1}] - k_{1} [C] [X_{2}T_{1}] + \frac{k_{1}}{a_{4}} [X_{2}CT_{1}] \notag\\ \frac{d[T_{1}]}{dt} &= - k_{1} [T_{1}] [C] + \frac{k_{1}}{a_{1}} [CT_{1}] - k_{1}[T_{1}][X_{2}] + \frac{k_{1}}{a_{3}} [X_{2}T_{1}] \notag\\ \frac{d[CT_{1}]}{dt} &= k_{1} [C] [T_{1}] - \frac{k_{1}}{a_{1}} [CT_{1}] - k_{1} [CT_{1}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}CT_{1}] - \frac{k_{cat}}{28} \frac{P_{0}}{mc} [CT_{1}] \notag\\ \frac{d[Z_{1}]}{dt} &= k_{2} [N] [W_{1}] - \frac{k_{cat}}{22} \frac{P_{0}}{mc} [Z_{1}] + \frac{k_{cat}}{6} \frac{P_{0}}{mc} [X_{2}CT_{1}] \notag\\ \frac{d[W_{1}]}{dt} &= \frac{k_{cat}}{28} \frac{P_{0}}{mc} [CT_{1}] - k_{2} [N] [W_{1}] + \frac{k_{cat}}{22} \frac{P_{0}}{mc} [Z_{1}] \notag\\ \frac{d[X_{2}CT_{1}]}{dt} &= k_{1} [CT_{1}] [X_{2}] - \frac{k_{1}}{a_{2}} [X_{2}CT_{1}] - \frac{k_{cat}}{6} \frac{P_{0}}{mc} [X_{2}CT_{1}] + k_{1} [C] [X_{2}T_{1}] - \frac{k_{1}}{a_{4}} [X_{2}CT_{1}] \notag\\ \frac{d[X_{2}T_{1}]}{dt} &= - \frac{k_{1}}{a_{3}} [X_{2}T_{1}] + k_{1} [T_{1}] [X_{2}] - k_{1} [C] [X_{2}T_{1}] + \frac{k_{1}}{a_{4}} [X_{2}CT_{1}] \notag\\ \frac{d[X_{2}]}{dt} &= \frac{k_{cat}}{22} \frac{P_{0}}{mc} [Z_{1}] - k_{1} [T_{1}] [X_{2}] + \frac{k_{1}}{a_{3}} [X_{2}T_{1}] - k_{1} [CT_{1}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}CT_{1}] + \frac{k_{1}}{a_{5}} [X_{2}T_{2}] \notag\\ & - k_{1} [X_{2}] [T_{2}] - k_{1} [X_{2}] [X_{3}T_{2}] + \frac{k_{1}}{a_{8}} [X_{3}X_{2}T_{2}] \notag\\ \frac{d[T_{2}]}{dt} &= - k_{1} [T_{2}] [X_{2}] + \frac{k_{1}}{a_{5}} [X_{2}T_{2}] - k_{1} [T_{2}] [X_{3}] + \frac{k_{1}}{a_{7}} [X_{3}T_{2}] \notag\\ \frac{d[X_{2}T_{2}]}{dt} &= k_{1} [X_{2}] [T_{2}] - \frac{k_{1}}{a_{5}} [X_{2}T_{2}] - \frac{k_{cat}}{26} \frac{P_{0}}{mc} [X_{2}T_{2}] - k_{1} [X_{2}T_{2}] [X_{3}] + \frac{k_{1}}{a_{6}} [X_{3}X_{2}T_{2}] \notag\\ \frac{d[Z_{2}]}{dt} &= k_{2} [N] [W_{2}] - \frac{k_{cat}}{24} \frac{P_{0}}{mc} [Z_{2}] + \frac{k_{cat}}{2} \frac{P_{0}}{mc} [X_{3}X_{2}T_{2}] \notag\\ \frac{d[W_{2}]}{dt} &= \frac{k_{cat}}{26} \frac{P_{0}}{mc} [X_{2}T_{2}] - k_{2} [N] [W_{2}] + \frac{k_{cat}}{24} \frac{P_{0}}{mc} [Z_{2}] \notag\\ \frac{d[X_{3}X_{2}T_{2}]}{dt} &= k_{1} [X_{2}T_{2}] [X_{3}] - \frac{k_{1}}{a_{6}} [X_{3}X_{2}T_{2}] - \frac{k_{cat}}{2} \frac{P_{0}}{mc} [X_{3}X_{2}T_{2}] + k_{1} [X_{2}] [X_{3}T_{2}] - \frac{k_{1}}{a_{8}} [X_{3}X_{2}T_{2}] \notag\\ \frac{d[X_{3}T_{2}]}{dt} &= k_{1} [T_{2}] [X_{3}] - \frac{k_{1}}{a_{7}} [X_{3}T_{2}] - k_{1} [X_{2}] [X_{3}T_{2}] + \frac{k_{1}}{a_{8}} [X_{3}X_{2}T_{2}] \notag\\ \frac{d[X_{3}]}{dt} &= \frac{k_{cat}}{24} \frac{P_{0}}{mc} [Z_{2}] - k_{1} [T_{2}] [X_{3}] + \frac{k_{1}}{a_{7}} [X_{3}T_{2}] - k_{1} [X_{2}T_{2}] [X_{3}] + \frac{k_{1}}{a_{6}} [X_{3}X_{2}T_{2}] + \frac{k_{1}}{a_{9}} [X_{3}T_{3}] \notag\\ & - k_{1} [X_{3}] [T_{3}] - k_{1} [X_{3}] [X_{4}T_{3}] + \frac{k_{1}}{a_{12}} [X_{4}X_{3}T_{3}] \notag\\ \frac{d[T_{3}]}{dt} &= - k_{1} [T_{3}] [X_{3}] + \frac{k_{1}}{a_{9}} [X_{3}T_{3}] - k_{1} [T_{3}] [X_{4}] + \frac{k_{1}}{a_{11}} [X_{4}T_{3}] \notag\\ \frac{d[X_{3}T_{3}]}{dt} &= k_{1} [X_{3}] [T_{3}] - \frac{k_{1}}{a_{9}} [X_{3}T_{3}] - \frac{k_{cat}}{26} \frac{P_{0}}{mc} [X_{3}T_{3}] - k_{1} [X_{3}T_{3}] [X_{4}] + \frac{k_{1}}{a_{10}} [X_{4}X_{3}T_{3}] \notag\\ \frac{d[Z_{3}]}{dt} &= k_{2} [N] [W_{3}] - \frac{k_{cat}}{24} \frac{P_{0}}{mc} [Z_{3}] + \frac{k_{cat}}{2} \frac{P_{0}}{mc} [X_{4}X_{3}T_{3}] \notag\\ \frac{d[W_{3}]}{dt} &= \frac{k_{cat}}{26} \frac{P_{0}}{mc} [X_{3}T_{3}] - k_{2} [N] [W_{3}] + \frac{k_{cat}}{24} \frac{P_{0}}{mc} [Z_{3}] \notag\\ \frac{d[X_{4}X_{3}T_{3}]}{dt} &= k_{1} [X_{3}T_{3}] [X_{4}] - \frac{k_{1}}{a_{10}} [X_{4}X_{3}T_{3}] - \frac{k_{cat}}{2} \frac{P_{0}}{mc} [X_{4}X_{3}T_{3}] + k_{1} [X_{3}] [X_{4}T_{3}] - \frac{k_{1}}{a_{12}} [X_{4}X_{3}T_{3}] \notag\\ \frac{d[X_{4}T_{3}]}{dt} &= k_{1} [T_{3}] [X_{4}] - \frac{k_{1}}{a_{11}} [X_{4}T_{3}] - k_{1} [X_{3}] [X_{4}T_{3}] + \frac{k_{1}}{a_{12}} [X_{4}X_{3}T_{3}] \notag\\ \frac{d[X_{4}]}{dt} &= \frac{k_{cat}}{24} \frac{P_{0}}{mc} [Z_{3}] - k_{1} [T_{3}] [X_{4}] + \frac{k_{1}}{a_{11}} [X_{4}T_{3}] - k_{1} [X_{3}T_{3}] [X_{4}] + \frac{k_{1}}{a_{10}} [X_{4}X_{3}T_{3}] \notag \end{align}

Equation 4. The equations of TWJ-SDA > 2step-SDA

table5

Table 5. The parameters of TWJ-SDA > 2step-SDA

Detail:

\begin{align} \frac{d[C]}{dt} &= - k_{1} [T_{1}] [C] + \frac{k_{1}}{a_{1}} [CT_{1}] - k_{1} [C] [X_{2}T_{1}] + \frac{k_{1}}{a_{1}} [X_{2}CT_{1}]\notag \\ \frac{d[T_{1}]}{dt} &= - k_{1} [T_{1}] [C] + \frac{k_{1}}{a_{1}} [CT_{1}] - k_{1} [T_{1}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}T_{1}]\notag \\ \frac{d[CT_{1}]}{dt} &= k_{1} [C] [T_{1}] - \frac{k_{1}}{a_{1}} [CT_{1}] - k_{1} [CT_{1}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}CT_{1}] - \frac{k_{cat}P}{28 mc} [CT_{1}]\notag \\ \frac{d[X_{1}]}{dt} &= k_{2N} [W_{1}] - \frac{k_{cat}P}{22 mc} [X_{1}] + \frac{k_{cat}P}{6 mc} [X_{2}CT_{1}]\notag \\ \frac{d[W_{1}]}{dt} &= \frac{k_{cat}P}{28 mc} [CT_{1}] - k_{2N} [W_{1}] + \frac{k_{cat}P}{22 mc} [X_{1}]\notag \\ \frac{d[X_{2}CT_{1}]}{dt} &= k_{1} [CT_{1}] [X_{2}] - \frac{k_{1}}{a_{2}} [X_{2}CT_{1}] - \frac{k_{cat}P}{6 mc} [X_{2}CT_{1}] + k_{1} [C] [X_{2}T_{1}] - \frac{k_{1}}{a_{1}} [X_{2}CT_{1}]\notag \\ \frac{d[X_{2}T_{1}]}{dt} &= - \frac{k_{1}}{a_{2}} [X_{2}T_{1}] + k_{1} [T_{1}] [X_{2}] - k_{1} [C] [X_{2}T_{1}] + \frac{k_{1}}{a_{1}} [X_{2}CT_{1}]\notag \\ \frac{d[X_{2}]}{dt} &= \frac{k_{cat}P}{22 mc} [X_{1}] - k_{1} [T_{1}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}T_{1}] - k_{1} [CT_{1}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}CT_{1}]\notag \\ &\quad + \frac{k_{1}}{a_{2}} [X_{2}T_{2}] - k_{1} [X_{2}] [T_{2}] - k_{1} [X_{2}] [X_{3}T_{2}] + \frac{k_{1}}{a_{2}} [X_{3}X_{2}T_{2}]\notag \\ \frac{d[T_{2}]}{dt} &= - k_{1} [T_{2}] [X_{2}] + \frac{k_{1}}{a_{2}} [X_{2}T_{2}] - k_{1} [T_{2}] [X_{3}] + \frac{k_{1}}{a_{6}} [X_{3}T_{2}]\notag \\ \frac{d[X_{2}T_{2}]}{dt} &= k_{1} [X_{2}] [T_{2}] - \frac{k_{1}}{a_{2}} [X_{2}T_{2}] - \frac{k_{cat}P}{26 mc} [X_{2}T_{2}] - k_{1} [X_{2}T_{2}] [X_{3}] + \frac{k_{1}}{a_{3}} [X_{3}X_{2}T_{2}]\notag \\ \frac{d[X_{3}]}{dt} &= k_{2N} [W_{2}] - \frac{k_{cat}P}{24 mc} [X_{3}] + \frac{k_{cat}P}{2 mc} [X_{3}X_{2}T_{2}]\notag \\ \frac{d[W_{2}]}{dt} &= \frac{k_{cat}P}{26 mc} [X_{2}T_{2}] - k_{2N} [W_{2}] + \frac{k_{cat}P}{24 mc} [X_{3}]\notag \\ \frac{d[X_{3}X_{2}T_{2}]}{dt} &= k_{1} [X_{2}T_{2}] [X_{3}] - \frac{k_{1}}{a_{3}} [X_{3}X_{2}T_{2}] - \frac{k_{cat}P}{2 mc} [X_{3}X_{2}T_{2}] + k_{1} [X_{2}] [X_{3}T_{2}] - \frac{k_{1}}{a_{2}} [X_{3}X_{2}T_{2}]\notag \\ \frac{d[X_{3}T_{2}]}{dt} &= k_{1} [T_{2}] [X_{3}] - \frac{k_{1}}{a_{3}} [X_{3}T_{2}] - k_{1} [X_{2}] [X_{3}T_{2}] + \frac{k_{1}}{a_{2}} [X_{3}X_{2}T_{2}]\notag \\ \frac{d[X_{3}]}{dt} &= \frac{k_{cat}P}{24 mc} [X_{3}] - k_{1} [T_{2}] [X_{3}] + \frac{k_{1}}{a_{3}} [X_{3}T_{2}] - k_{1} [X_{2}T_{2}] [X_{3}] + \frac{k_{1}}{a_{3}} [X_{3}X_{2}T_{2}]\notag \\ &\quad + \frac{k_{1}}{a_{3}} [X_{3}T_{3}] - k_{1} [X_{3}] [T_{3}] - k_{1} [X_{3}] [X_{4}T_{3}] + \frac{k_{1}}{a_{3}} [X_{4}X_{3}T_{3}]\notag \\ \frac{d[T_{3}]}{dt} &= - k_{1} [T_{3}] [X_{3}] + \frac{k_{1}}{a_{3}} [X_{3}T_{3}] - k_{1} [T_{3}] [X_{4}] + \frac{k_{1}}{a_{4}} [X_{4}T_{3}]\notag \\ \frac{d[X_{3}T_{3}]}{dt} &= k_{1} [X_{3}] [T_{3}] - \frac{k_{1}}{a_{3}} [X_{3}T_{3}] - \frac{k_{cat}P}{26 mc} [X_{3}T_{3}] - k_{1} [X_{3}T_{3}] [X_{4}] + \frac{k_{1}}{a_{4}} [X_{4}X_{3}T_{3}]\notag \\ \frac{d[X_{4}]}{dt} &= k_{2N} [W_{3}] - \frac{k_{cat}P}{24 mc} [X_{4}] + \frac{k_{cat}P}{2 mc} [X_{4}X_{3}T_{3}]\notag \\ \frac{d[W_{3}]}{dt} &= \frac{k_{cat}P}{26 mc} [X_{3}T_{3}] - k_{2N} [W_{3}] + \frac{k_{cat}P}{24 mc} [X_{4}]\notag \\ \frac{d[X_{4}X_{3}T_{3}]}{dt} &= k_{1} [X_{3}T_{3}] [X_{4}] - \frac{k_{1}}{a_{4}} [X_{4}X_{3}T_{3}] - \frac{k_{cat}P}{2 mc} [X_{4}X_{3}T_{3}] + k_{1} [X_{3}] [X_{4}T_{3}] - \frac{k_{1}}{a_{3}} [X_{4}X_{3}T_{3}]\notag \\ \frac{d[X_{4}T_{3}]}{dt} &= k_{1} [X_{4}] [T_{3}] - \frac{k_{1}}{a_{4}} [X_{4}T_{3}] - k_{1} [X_{3}] [X_{4}T_{3}] + \frac{k_{1}}{a_{3}} [X_{4}X_{3}T_{3}]\notag \\ \end{align}

Equation 6. The equations of TWJ-SDA > 3step-SDA.

Table 6. The parameters of TWJ-SDA > 3step-SDA

The amplificataion optimization was based on these ODE models. Firstly, for each amplification system, a lot of simulations were conducted using constructed ODE models with several initial template concentrations. The initial template concentrations (the initial \(T_{n})\) concentration described as \((T_{n0})\) were proposed by Wet, assuming amplification would occur in the experiment, with \(T_{10}\) set to 50 pM, 500 pM, and 5 nM. The ratio of the initial template concentrations in consecutive reactions varied from 1 to 5 in increments of 0.5. Those simulations in which the initial concentration exceeded 1µM were excluded because it was predicted that amplification reactions would be hindered by crowding in the reaction environment. The reason why we used the concentration ratio as the parameter was that the initial template concentration of one reaction was expected to be greater than that of the previous reaction, and this could be satisfied by setting the concentration ratio to 1 or more. Then, the two criteria were established that must be achieved at the very least. The first was the amplification efficiency. More specifically, the concentration of dsDNA containing the PAM sequence after 40 minutes, which is considered as the amplification time when used as a device, must be at least 2.5 nM, the concentration required for Cas activation. For more information about the result of Cas activation, see Results.

The second is the ratio of P/N ratio. Here, the simulation with a target concentration of 100 fM was considered Positive, and one with 1 fM was considered Negative. The criterion was set so that the ratio of the concentration of dsDNA containing the PAM sequence after 40 minutes at 100 fM to that at 1 fM would be at least 50, which is predicted to be the minimum requirement for quantification in lateral flow assay (LFA).

Using the simulations of several initial template concentrations from each amplification system that met these two criteria, the amplification systems were compared. Two aspects of robustness were used as the comparison criteria. The first was the robustness of PAM-containing dsDNA. By comparing the ratio of the dsDNA concentration containing the PAM sequence at 35 minutes to that at 40 minutes, the robustness of the PAM sequence to time was compared across systems, and the robustness to polymerase was compared by analysing the ratio of the dsDNA concentration containing the PAM sequence after 40 minutes to that when the \(k_{cat}P\), was reduced to 80 %, as well as the robustness of the PAM sequence to nickase when \(k_{2}\) was reduced to 80 %.

The second was the robustness of P/N ratio. By comparing P/N ratio at 35 minutes to that at 40 minutes, the robustness of the P/N ratio sequence to time was compared across systems, and the robustness to polymerase was compared by analysing P/N ratio after 40 minutes to that when the polymerase activity, represented by \(k_{cat}P\), was reduced to 80 %, as well as the robustness of the P/N ratio to nickase when \(k_{2}\) was reduced to 80 %.

Finally, in the selected optimal amplification system, considering the inhibition of amplification due to crowding in the reaction environment, which could not be accounted for in these ODE models, the optimal initial template concentration was chosen to minimise the total initial template concentration.

Results

First, the histogram of the concentrations of dsDNA including the PAM sequence at 40 minutes in TWJ-SDA > EXPAR at multiple initial template concentrations was as follows (Figure 15.). The values located to the right of the red vertical line indicating 2.5 nM met the criteria for amplification efficiency.

Regarding the initial template conditions of simulations that met the criteria for amplification efficiency, the histogram of P/N ratio was as follows (Figure 16.). The values located to the right of the red vertical line indicating 50 met the criteria for the P/N ratio.

In TWJ-SDA > EXPAR, there were 22 patterns of initial template concentration sets that met the criteria for both amplification efficiency and P/N ratio. Next, the histogram of the concentrations of dsDNA including the PAM sequence in TWJ-SDA > 2step-SDA at various initial template concentrations was shown below (Figure 17.). Since there were no values located to the right of the red vertical line indicating 2.5 nM, none met the criteria for amplification efficiency, suggesting that TWJ-SDA > 2step-SDA was unsuitable as an amplification system in this project.

The histogram of the concentrations of dsDNA including the PAM sequence in TWJ-SDA > 3step-SDA at various initial template concentrations was shown below (Figure 18.). The values located to the right of the red vertical line indicating 2.5 nM meet the criteria for amplification efficiency.

Regarding the initial template conditions of simulations that met the criteria for amplification efficiency, the histogram of P/N ratio was as follows (Figure 19.). The values located to the right of the red vertical line indicating 50 met the criteria for the P/N ratio.

In the TWJ-SDA > 3step-SDA system, there were 995 patterns of initial template concentration sets that met the criteria for amplification efficiency and P/N ratio. Based on these results, the robustness of TWJ-SDA > EXPAR and TWJ-SDA > 3step-SDA were compared in terms of time and enzyme. The histograms of the six robustness values at initial template concentrations meeting the criteria for amplification efficiency and P/N ratio in each amplification system are shown below (Figure 20. - Figure 31.).

The closer these robustness values are to 1, the more robust the system is considered to be. In all six values, it was believed that TWJ-SDA > 3step-SDA demonstrated greater robustness. Therefore, TWJ-SDA > 3step-SDA was proposed as the optimal amplification system for this project.

Next, we determined the initial template concentrations in TWJ-SDA > 3step-SDA that satisfied the criteria for amplification efficiency and P/N ratio while minimising the sum of the initial template concentrations. The optimal initial template concentrations are as follows: T10 = 5 nM, T20 = 15 nM, T30 = 45 nM, T40 = 135 nM, and T50 = 473 nM.

Conclusion

The purpose of this section was to estimate the optimal amplification system and initial template concentrations through simulations using the ODE model. Based on the perspectives of amplification efficiency, P/N ratio, and robustness to time and enzymes, it was proposed to perform amplification under the conditions of T10 = 5 nM, T20 = 15 nM, T30 = 45 nM, T40 = 135 nM, and T50 = 473 nM using TWJ-SDA > 3step-SDA.

Assumption

In constructing the ODE model for amplification reactions, the following assumptions were made:

The binding and dissociation of nucleic acids, as well as the nicking by nickase, follow the law of mass action.
The extension reaction by polymerase follows the Michaelis-Menten equation.
The enzyme activity remains constant within the same reaction.
The binding constants follow the equilibrium concentration ratios of reactants and products predicted by NUPACK.
The reaction rate constants of nucleic acids, the turnover number of polymerase, and the Michaelis constant are equal across EXPAR, SDA, and TWJ.
In the case of a single amplification reaction, the impact of the amplification product attaching to the 3' end of the template can be ignored.
In EXPAR, the fluorescence generated by SYBR-Green I primarily derives from the complete double strand W and Z, and additionally from excess X.
In EXPAR, one minute has elapsed from the actual start of the reaction to the fluorescence measurement.
The enzyme activity decreases as the temperature deviates from the optimal range.
The inhibition of reactions due to the congestion of the reaction system is ignored.

References

Van Ness, J., Van Ness, L. K., & Galas, D. J. (2003). Isothermal reactions for the amplification of oligonucleotides. Proceedings of the National Academy of Sciences, 100(8), 4504-4509. https://doi.org/10.1073/pnas.0730811100
Chen, J., Zhou, X., Ma, Z., Lin, X., Dai, Z., & Zou, X. (2016). Asymmetric exponential amplification reaction on a toehold/biotin featured template: an ultrasensitive and specific strategy for isothermal microRNAs analysis. Nucleic acids research, 44(15), e130-e130. https://doi.org/10.1093/nar/gkw504
Özay, B., Murphy, S. D., Stopps, E. E., Gedeon, T., & McCalla, S. E. (2022). Positive feedback drives a secondary nonlinear product burst during a biphasic DNA amplification reaction. Analyst, 147(20), 4450-4461. https://doi.org/10.1039/D2AN01067D
New England Biolabs. (n.d.). Bst 2.0 DNA Polymerase. https://www.neb.com/ja-jp/products/m0537-bst-20-dna-polymerase
New England Biolabs. (2012, August 28). Can Bst 2.0 DNA Polymerase be used at temperatures other than 65 ℃? https://www.neb.com/ja-jp/faqs/2012/08/28/can-bst-2-0-dna-polymerase-be-used-at-temperatures-other-than-65-c
New England Biolabs. (n.d.). Effect of various temperatures on nicking endonucleases. https://www.neb.com/ja-jp/tools-and-resources/selection-charts/effect-of-various-temperatures-on-nicking-endonucleases
Antipova, V. N., Zheleznaya, L. A., & Zyrina, N. V. (2014). Ab initio DNA synthesis by Bst polymerase in the presence of nicking endonucleases Nt. AlwI, Nb. BbvCI, and Nb. BsmI. FEMS Microbiology Letters, 357(2), 144-150. https://doi.org/10.1111/1574-6968.12511
Zhang, Q., Chen, F., Xu, F., Zhao, Z., & Fan, C. (2014). Target-triggered three-way junction structure and polymerase/nicking enzyme synergetic isothermal quadratic DNA machine for highly specific, one-step, and rapid microRNA detection at attomolar level. Analytical chemistry, 86(16), 8098-8105. https://doi.org/10.1021/ac501038r
New England Biolabs. (n.d.). Bst DNA Polymerase, Large Fragment. https://www.neb.com/ja-jp/products/
New England Biolabs. (2011, December 17). Can Bst DNA Polymerase be used at temperatures other than 65 ℃? https://www.neb.com/ja-jp/faqs/2011/12/17/can-bst-dna-polymerase-be-used-at-temperatures-other-than-65-176-c
Komiya, K., Noda, C., & Zamamura, M. (2024). Characterization of cascaded DNA generation reaction for amplifying DNA signal. New Generation Computing, 1-16. https://doi.org/10.1007/s00354-024-00249-2

MD Simulation

Abstract

Using the Molecular Dynamics (MD) simulation on the Cascade-crRNA-DNA complex and the Cas12a-crRNA-DNA complex, to compare the RMSF and the number of hydrogen bonds formed between protein and crRNA of different combinations. From the comparison, it was predicted that the stability at the equilibrium is at the highest when the DNA recognized by the Cas12a is the ds-template3. We assumed that the more stable the complex, the higher the CRISPR-Cas's ability to detect, and thus concluded that 3step-SDA is the most desirable amplification method. This result aligns with the conclusion regarding the preferable amplification method as discussed in Dry Lab_Model_Amplification_comparison.

Purpose

We plan to use either CRISPR-Cas3 or CRISPR-Cas12a in the final stage of detecting the target miRNA. Additionally, there are several options for the crRNA, and there is flexibility in the number of multistep-SDA stages that can be inserted before connecting to CRISPR-Cas. Here, we will consider the most desirable options by varying the crRNA and the complementary DNA to the crRNA for both CRISPR-Cas3 and CRISPR-Cas12a.

Methods

The stability of the CRISPR-Cas3 complex will be evaluated using Molecular Dynamics (MD) simulation for the complex of crRNA, DNA, and Cascade, while the stability of the CRISPR-Cas12a complex will be evaluated for the crRNA-DNA-Cas12a complex. Table 1. shows the combinations of Protein, crRNA, and DNA used in the simulations. After connecting n step SDA and dsDNA elongation, we call the dsDNA recognized by CRISPR-Cas as ds-template.

\[ \begin{array}{|c|c|c|} \hline \textbf{Protein} & \textbf{crRNA} & \textbf{DNA} \\ \hline \text{Cascade} & \text{SARS-CoV-2 N2} & \text{Positive Control (PC)} \\ \hline \text{Cascade} & \text{SARS-CoV-2 N2} & \text{ds-template2} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template0} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template1} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template2} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template3} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{Positive Control (PC)} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template0} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template1} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template2} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template3} \\ \hline \end{array} \]

Table 1. crRNA sequences for different proteins and species

The initial structure of MD simulation uses structure prediction from AlphaFold3¹. The amino acid sequences of Cascade and Cas12a are the same as the protein of PDB ID:5H9F and PDB ID:5ID6, respectively. The base sequences of crRNA and DNA are shown in Table 2. and 3., respectively.

In the MD simulation of Cascade and Cas12a, GROMACS and GENESIS were used, respectively ^2,^3,^4,^5,^6,^7,^8,⁹. The initial structure does not contain water. After the solvation, ionization, energy minimization, and equilibration with constant pressure and temperature of the protein, a full 2 ns molecular dynamics simulation was conducted.

Sequences

Protein	Species	Sequence (5' to 3')
Cascade	SARS-CoV-2 N2 crRNA	GUGUUCCCCGCGCCAGCGGGGAUAAACCGUGCAAUUUGCG GCCAAUGUUUGUAAUCAGUUCGUGUUCCCCGCGCCAGCGGGGAUAAACCG
Cas12a	EMX1	AAUUUCUACUAAGUGUAGAUAAUGACUAGGGUGGGCAACCA
Cas12a	SARS-CoV-2 N2	AAUUUCUACUAAGUGUAGAUGAACGCUGAAGCGCUGGGGG

Table 2. crRNA sequences for different proteins and species

Protein	Species	Name	Sequence (5' to 3')
Cascade	SARS-CoV-2 N2	PC	AGGAACTAATCAGACAAGGAACTGATTACAAACATTGGCCGCAAATTGCACAATTTGCCC \| GGGCAAATTGTGCAATTTGCGGCCAATGTTTGTAATCAGTTCCTTGTCTGATTAGTTCCT
Cascade	SARS-CoV-2 N2	ds-template2	TTGAAGGAACTGATTACAAACATTGGCCGCAAATTGCATTCCTGCTGAACTGAGCCACCTCA \| TGAGGTGGCTCAGTTCAGCAGGAATGCAATTTGCGGCCAATGTTTGTAATCAGTTCCTTCAA
Cascade	SARS-CoV-2 N2	ds-template3	TTGAAGGAACTGATTACAAACATTGGCCGCAAATTGCAGCCCTGTACAATGCTGCTCCTCA \| TGAGGAGCAGCATTGTACAGGGCTGCAATTTGCGGCCAATGTTTGTAATCAGTTCCTTCAA
Cas12a	EMX1	ds-template0	CTTATCAGACTGATGTTGATTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG \| CTAGTCATTGAATGACTAGGGTGGGCAACCACAAATCAACATCAGTCTGATAAG
Cas12a	EMX1	ds-template1	ACATTGTCTGCTGGGTTTCTTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG \| CTAGTCATTGAATGACTAGGGTGGGCAACCACAAAGAAACCCAGCAGACAATGT
Cas12a	EMX1	ds-template2	TGGCTCAGTTCAGCAGGAATTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG \| CTAGTCATTGAATGACTAGGGTGGGCAACCACAAATTCCTGCTGAACTGAGCCA
Cas12a	EMX1	ds-template2	TGGCTCAGTTCAGCAGGAATTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG \| CTAGTCATTGAATGACTAGGGTGGGCAACCACAAATTCCTGCTGAACTGAGCCA
Cas12a	EMX1	ds-template3	AGCAGCATTGTACAGGGCTTTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG \| CTAGTCATTGAATGACTAGGGTGGGCAACCACAAAAGCCCTGTACAATGCTGCT
Cas12a	SARS-CoV-2 N2	PC	GCCGCAAATTGCACAATTTGCCCCCAGCGCTTCAGCGTTCTTCGGAATGTCGCAGCTTGG \| CCAAGCTGCGACATTCCGAAGAACGCTGAAGCGCTGGGGGCAAATTGTGCAATTTGCGGC
Cas12a	SARS-CoV-2 N2	ds-template0	CTTATCAGACTGATGTTGATTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG \| CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAATCAACATCAGTCTGATAAG
Cas12a	SARS-CoV-2 N2	ds-template1	ACATTGTCTGCTGGGTTTCTTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG \| CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAAGAAACCCAGCAGACAATGT
Cas12a	SARS-CoV-2 N2	ds-template2	TGGCTCAGTTCAGCAGGAATTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG \| CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAATTCCTGCTGAACTGAGCCA
Cas12a	SARS-CoV-2 N2	ds-template3	AGCAGCATTGTACAGGGCTTTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG \| CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAAAGCCCTGTACAATGCTGCT

Table 3. DNA Target and Primer Sequences

In Table 1. and Table 2., complementary binding crRNA and DNA sequences are underlined.

RMSD

Video 1. 2 ns motion of Cas12a-crRNA-DNA complex obtained by molecular dynamics simulation.

Root Mean Square Deviation (RMSD) represents the difference between two structures, the target structure and the initial structure. If the RMSD converges to a constant value, it can be considered that the system has reached equilibrium. After completing the full 2 ns MD simulation, the RMSD is calculated using MDAnalysis ¹⁰.

\begin{equation} \begin{aligned} \text{RMSD} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left( \mathbf{r}_i(t) - \mathbf{r}_i(0) \right)^2} \notag\\ \end{aligned} \end{equation}

Then, we conducted the MD simulation using the combinations in Table 1, and the obtained RMSD are shown in Figure 1, 2, and 3.

As seen in Figure 1., the Cascade-crRNA-DNA complex does not show a flat line, particularly regarding PC, during the 2 ns simulation, indicating that the complex has not reached equilibrium. In contrast, many of the Cas12a-crRNA-DNA complexes exhibit a flat line within 2 ns, suggesting that they have reached equilibrium.

When analyzing larger structures, the time required for simulation is long. Cascade is a complex of multiple proteins, and due to its significantly larger structure compared to Cas12a, the 2 ns simulation time was insufficient for it to reach equilibrium. In the following analysis, we will focus on the Cas12a complexes, where many combinations show RMSD convergence, i.e. equilibration, within the 2 ns simulation time, to analyze the stability of the complex at equilibrium.

Analysis

RMSF

Root Mean Square Fluctuation (RMSF) calculates the deviation from the average structure and represents fluctuations from the equilibrium state. It allows us to identify fluctuating regions or residues, which can be used to evaluate flexibility and consider functionality. Here, RMSF is calculated to assess the stability of the complex in its equilibrium state. The RMSF of the i-th residue is defined by the following equation. When plotting the RMSF for different DNA sequences, the graph becomes difficult to interpret due to the large amount of noise, causing overlap between the lines. Therefore, a Gaussian filter was applied to remove the noise, making it easier to compare RMSF across different DNA sequences.

\begin{equation} \begin{aligned} \text{RMSF}_i = \sqrt{\frac{1}{T} \sum_{t=1}^{T} \left( \mathbf{r}_i(t) - \langle \mathbf{r}_i \rangle \right)^2} \notag\\ \end{aligned} \end{equation}

Number of hydrogen bonds

To see the interactions between the protein and crRNA, we examined the time evolution of the number of hydrogen bonds formed between them ¹¹. The criteria for a hydrogen bonds were defined as \(donor-acceptor\ distance < 3\text{\AA},\ donor-hydrogen-acceptor\ angle > 150^{\circ}\). The donor was considered to be the protein, and the acceptor was the nucleic acid. Since the number of hydrogen bonds also exhibits significant noise, a Gaussian filter was applied to remove the noise, making it easier to compare the number of hydrogen bonds across different DNA sequences.

Conclusions

A smaller RMSF value indicates smaller fluctuations from the equilibrium state, meaning the complex is more stable. Additionally, the more hydrogen bonds formed between Cas12a and crRNA, the stronger the interaction between them, leading to increased stability of the complex. Here, we evaluate DNA sequences where the complex stability is higher as superior.

For the combination of Cas12a and EMX1 crRNA, as shown in Figure 4., it was confirmed that the RMSF of DNA ds-template1 and ds-template2 is relatively large. Furthermore, Figure 6. shows that the number of hydrogen bonds is relatively low when the DNA is ds-template0 or ds-template2. These results suggest that the complex stability is highest when the DNA is ds-template3. In our Dry Lab, we predicted that when 3step-SDA is connected with dsDNA elongation, the signal amplification efficiency, as well as the robustness of the amplification efficiency in terms of time, enzyme, and S/N ratio, would be the highest (Dry lab_Model_Amplification_comparison). By conducting MD simulations and demonstrating that the stability of the Cas12a-crRNA-DNA complex is highest when the DNA is ds-template3, we were able to propose from a different approach that linking 3step-SDA with dsDNA elongation is the preferable amplification method.

On the other hand, as shown in Figure 5. and Figure 7., for the combination of Cas12a and SARS-CoV-2 N2 crRNA, there are no clear differences in terms of RMSF or the number of hydrogen bonds among ds-template0, ds-template1, ds-template2, and ds-template3. This indicates that there is no significant difference in complex stability among the four DNA sequences. Therefore, we concluded that the 3step-SDA, predicted to be superior in terms of amplification efficiency and robustness, is also preferable when using SARS-CoV-2 N2 crRNA.

Future

In this discussion, we implicitly assumed that the more stable the equilibrium state of the Cas12a-crRNA-DNA complex, the more favorable it is for DNA detection. However, this assumption lacks sufficient evidence, and further investigation is needed to determine whether it is indeed correct. Additionally, there are residues that exhibit notably high RMSF values (for instance, around residue 600 for ds-template2, and around residue 1100 for both ds-template1 and ds-template2 in Figure 4.). Residues with high RMSF values may be involved in binding interactions. Further research is needed to explore the relationship between the differences in DNA and the residues with relatively high RMSF values, and this could potentially lead to the design of crRNA that reduces RMSF.

Appendix

We create an MD simulation manual to help future iGEM teams run MD simulations smoothly. The manual describes how to use GROMACS ³ and GENESIS ^12,¹³, which we used.

The manual is as follows,

MD Manual

References

Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Willmore, L., Ballard, A. J., Bambrick, J., Bodenstein, S. W., Evans, D. A., Hung, C.-C., O’Neill, M., Reiman, D., Tunyasuvunakool, K., Wu, Z., Žemgulytė, A., Arvaniti, E., Beattie, C., … Jumper, J. M. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630493-500. https://doi.org/10.1038/s41586-024-07487-w
Abraham, M., Alekseenko, A., Basov, V., Bergh, C., Briand, E., Brown, A., Doijade, M., Fiorin, G., Fleischmann, S., Gorelov, S., Gouaillardet, G., Gray, A., Irrgang, M. E., Jalalypour, F., Jordan, J., Kutzner, C., Lemkul, J. A., Lundborg, M., Merz, P., … Lindahl, E. (2024). GROMACS 2024.3 Source code (2024.3). Zenodo. https://doi.org/10.5281/zenodo.13456374
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 119-25. https://doi.org/10.1016/j.softx.2015.06.001
Páll, S., Abraham, M.J., Kutzner, C., Hess, B., Lindahl, E. (2015). Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS.In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale, 87593-27. https://doi.org/10.1007/978-3-319-15976-8_1
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl. (2013). GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.Bioinformatics, 29845-854. https://doi.org/10.1093/bioinformatics/btt055
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl. (2008). GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation.J. Chem. Theory Comput., 4(3), 435-447. https://doi.org/10.1021/ct700301q
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen. (2005). GROMACS: Fast, Flexible and Free.J. Comp. Chem., 26 1701-1719. https://doi.org/10.1002/jcc.20291
E. Lindahl and B. Hess and D. van der Spoel. (2001). GROMACS 3.0: A package for molecular simulation and trajectory analysis.J. Mol. Mod., 7 306-317. https://doi.org/10.1007/s008940100045
H. J. C. Berendsen, D. van der Spoel and R. van Drunen. (1995). GROMACS: A message-passing parallel molecular dynamics implementation.Comp. Phys. Comm., 91 43-56. https://doi.org/10.1016/0010-4655(95)00042-E
MDAnalysis 4.2.4. Calculating root mean square quantities. https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/rms.html
Smith, P., Ziolek, R. M., Gazzarrini, E., Owen, D. M., & Lorenz, C. D. (2019). On the interaction of hyaluronic acid with synovial fluid lipid membranes.Physical Chemistry Chemical Physics, 19. https://doi.org/10.1039/C9CP01532A
Kobayashi, C., Jung, J., Matsunaga, Y., Mori, T., Ando, T., Tamura, K., ... & Sugita, Y. (2017). GENESIS 1.1: A hybrid‐parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms. Journal of Computational Chemistry, 38, 2193-2206. http://dx.doi.org/10.1002/jcc.24874
Jung, J., Mori, T., Kobayashi, C., Matsunaga, Y., Yoda, T., Feig, M., & Sugita, Y. (2015). GENESIS: a hybrid‐parallel and multi‐scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations. Wiley Interdisciplinary Reviews: Computational Molecular Science, 5(4), 310-323. https://doi.org/10.1002/wcms.1220

Lateral Flow Assay

Overview

The principles of LFA differ depending on the reagents used. We assembled the LFA with reference to CONAN developed by Yoshimi et al. ¹.
The strip used was pre-installed with AuNPs with rb anti FITC antibody on the conjugate pad, streptavidin on the control line (C-line), and anti rb antibody on the test line (T-line). FITC-ssDNA-biotin was used as a reporter ². When CRISPR-Cas3 is activated, the ssDNA within the reporter molecule is cleaved, producing FITC-labeled reporter fragments and biotin-labeled reporter fragments. Using this, both the T-line and C-line turn red if the test is positive, and only the C-line turns red if the test is negative.
For more information about the principle of LFA, see Proposed Implementation_Selection of Amplification and Quantitative Detection Module.
For more information about the actual experimental procedure of LFA, see Experiments.

1. Preliminary Experiments

Purpose:

We conducted an experiment to confirm the performance of the strip used. The purpose of this experiment was to confirm that only the C-line turns red when a reaction solution containing only FITC-ssDNA-biotin is used.
According to the product guide ², if the concentration of FITC-ssDNA-biotin is not appropriate or the viscosity of the buffer is low, the T-line may turn red even if ssDNA is not cleaved. Therefore, we referred to the product guide and adjusted the concentration of FITC-ssDNA-biotin to 1.00 pM/LFA. A comparison was made between using the assay buffer that comes with the product as is and adding 5% polyethylenglycol (PEG) to the assay buffer.
The product used this time is designed to be used by standing the strip vertically, immersing the bottom part in the developing solution, and using the developing solution to flow the reaction solution. However, we are convinced that POIROT would be easier to use by placing the strip horizontally so that the user did not have to hold the strip vertically during the reaction. So we conducted an experiment with the strip horizontal. We kept the strip horizontal without touching it by folding a piece of paper to create a stand like the one pictured below.

Result:

After 5 min of development, the following bands were obtained. The upper strip had no PEG added to the buffer, and the lower strip had PEG added to the buffer.

Consideration:

When PEG was not added to the assay buffer, not only the C-line but also the T-line became colored. On the other hand, when 5% PEG 6000 was added to the buffer, the T-line did not turn red and only the C-line turned red. When performing quantitative determination using this product, it was shown that it is necessary to increase the viscosity of the assay buffer by adding PEG, etc.

2. 1step-SDA > Cas3 > LFA

Purpose:

We conducted an experiment to determine whether LFA functions when connected from the amplification system. First, we thought about connecting from a simple 1step-SDA.
A case where primer2 of multistep-SDA was added to the reaction solution at a final concentration of 5 nM was compared with a case where primer2 was not added (NC).

Result:

After 10 min of development, the following bands were obtained. The upper strip is NC and the lower strip is when primer2 was added.

Consideration:

Even when primer2 was added, no T-line coloration was observed.
The graph below shows the fluorescence change measured using an FQ probe up to 20 minutes later when 1step-SDA was connected to Cas3. Regardless of whether the final concentration of ds-template is 8 nM or 16 nM, an increase in fluorescence intensity can be confirmed when the final concentration of primer2 is 4 nM.

The conditions for this experiment were an incubation time of 20 min, ds-template (12 nM), and primer2 (5 nM) or NC (primer2: 0 M). From the above graphs, it is expected that a sufficient amount of ssDNA would be cleaved when the primer is added, and no cleavage would occur when the primer2 is not added. That is, if primer2 was present, both the C-line and T-line would be colored, and if primer2 was not present, only the C-line would be colored. However, in reality, even when primer2 was added, the T-line did not turn red.
Furthermore, the color of the C-lines were extremely pale compared to when PEG 6000 was not added to the assay buffer in experiment 1. AuNPs with rb anti FITC antibody flowing on the conjugate pad are captured by either C-line or T-line. Therefore, if the T-line is not colored, it is expected that the C-line will be strongly colored. However, in this experiment, the coloring of the C-line was weak even though the T-line was not colored. From this, it is thought that the binding between FITC-ssDNA-biotin or FITC-labeled reporter fragment and AuNPs with rb anti FITC antibody did not occur sufficiently on the conjugate pad, and the AuNPs with rb anti FITC antibody remained on the conjugate pad.

Conclusion

In the product used this time, it was suggested that the negative control only works by increasing the viscosity of the assay buffer. However, if the viscosity of the assay buffer is made too high, the reporter and AuNPs with rb anti-FITC antibody will not be able to bind in sufficient quantities on the conjugate pad, and the T-line may not be colored even under conditions that should be positive. When incorporating this product into POIROT, it is necessary to optimize the viscosity of the buffer.

References

Yoshimi, K., Takeshita, K., Yamayoshi, S., Shibumura, S., Yamauchi, Y., Yamamoto, M., Yotsuyanagi, H., Kawaoka, Y., & Mashimo, T. (2022). CRISPR-Cas3-based diagnostics for SARS-CoV-2 and influenza virus.iScience 25, 103830, 1-13. https://doi.org/10.1016/j.isci.2022.103830
Milenia Biotec GmbH. (n.d.). CRISPR/Cas-based Detection Methods and HybriDetect: Universal Test Strips - Individual Readout. https://www.milenia-biotec.com/uploads/2019/07/improved-CRISPR-readout_final-1.pdf