Why do we need a threshold detection system?

A disease like Multiple Sclerosis (MS) can cause miRNAs to be dysregulated compared to a healthy person.¹ The miRNAs measured by our diagnostic test are generally always present in the blood, just in lower or higher concentrations than in a healthy control. Solely detecting the presence or absence of an miRNA is therefore not sufficient for a correct test result. To solve this problem, a mechanism was designed that could distinguish between the concentrations below or above the healthy control. We named this design the "threshold system". Ideally, there would be no output of the threshold system below the threshold concentration and thus no activation of the subsequent toehold switch (Figure 1). Once the threshold has been passed, output RNA will be produced, which enters the ribocomputing toehold switch circuit. In this sense, we would also create a binary input for the ribocomputing circuit, which is beneficial for its functioning.

The ideal dose-response curve created by our threshold system. The system will produce little output for a healthy input concentration, but a sharp switch to a higher output for an input concentration associated with a disease. Here, the input would be an (amplified) miRNA and the output would be another RNA. The curve can be mathematically described by the Hill function.

Toehold mediated strand displacement reaction can create a threshold

Toehold mediated strand displacement reaction with fuel

In literature, we found an RNA-based reaction that could achieve the desired behaviour (Figure 1).² This reaction consists of four steps (Figure 2). First, the input miRNA (X) binds to a threshold complex (TH) to catch away the amount of miRNA associated with a healthy control (Figure 2a). Second, the unbound input (X) can initiate a toehold mediated strand displacement reaction (TMSD), releasing the output strand (O) (Figure 2b). The output strand (O) was designed to bind to a reporter (R) to produce a fluorescent output signal (Figure 2c). A fuel will reverse the strand displacement reaction, releasing X (Figure 2d). This advantageously speeds up the production of O, as X can once again initiate the TMSD reaction. The complete reaction (hereon named TMSD-F) results in a dose-response curve that already comes close to our desired behaviour, but it would benefit from further optimisation (Figure 3).

The toehold mediated strand displacement with fuel (TMSD-F) system.² a) The TMSD reaction, where annealing of the input strand X to C1 will result in the production of the output strand O. b) The threshold reaction, where input strand X will anneal to the threshold strand. c) The fuel reaction, where the fuel strand releases the input X from C2. d) The reporter reaction, where output strand O displaces the quencher, allowing the fluorophore to produce fluorescence. The universal toehold is shown in COLOUR.

Dose-response curves of our optimised TMSD-F system. The simulations show a perfect system (purple), the TMSD-F from Qian & Winfree (2011) (light-blue) and the optimised TMSD-F (orange) for a threshold value of 50. The input is the miRNA, the output is a fluorescent signal. Our optimised TMSD-F had a reduced basal expression and higher threshold accuracy.

Simplified toehold mediated strand displacement reaction without fuel

In a cell-free test harbouring a complex system like ours, simplicity is of great importance. Therefore, a simplified version of the TMSD-F described above was engineered in our lab. Essentially, the threshold reaction and the TMSD still take place, but because of the simplified sequence design, the fuel reaction is removed (Figure 4). The output strand C will bind to an aptamer to create a fluorescent signal. The simplified system will be referred to as TMSD-NF (toehold mediated strand displacement with no fuel).

The TMSD system engineered by our team. a) The threshold reaction, where input A binds to threshold strand B. b) The TMSD reaction, where A releases output strand C, which will bind to the aptamer D after. No fuel reaction is possible because B is both the threshold complex as well as part of the initial TMSD complex. Strand A can only anneal to BC once there is more A present than loose B (i.e. A has passed the threshold concentration).

Optimisation of the TMSD-F

We could now solely focus on our engineered TMSD-NF, but the comparison to the TMSD-F also interested us. In this way, we could judge the ability of both systems to create a dose-response curve better and discuss which system would better suit future test requirements. Furthermore, the TMSD-F was validated experimentally by Qian et al. 2011, where their model was fitted to data.² This gave us an excellent starting point that later guided us in designing the TMSD-NF.

Based on our simulations, the TMSD-F shows a dose-response curve with high leakiness and not enough accuracy compared to the desired dose-response curve (Figure 3). Therefore, the first step was to optimise this existing system through modelling. The optimisation framework was taken from existing work in multi-objective optimisation (MOO; see the dropdown box for more details).³ The desired threshold switch can be scored with a Hill function

Output = Intercept + Maximum * \frac{Intercept^n}{K^n + Intercept^n} which describes the curves generally seen in dose-response graphs (Figure 1).⁴
This resulted in four optimisation objectives:
1. A high threshold accuracy: the switch should occur at the desired input concentration. This is defined by calculating K from the Hill function. This value should be as close to the threshold input value as possible.
2. A high slope: to ensure a sharp switch at the input concentration. This is defined by calculating n from the Hill function, which should be as high as possible.
3. A large maximum output: the output of the system should reach the maximum possible production. This is defined as log_{10}(\frac{max(output)}{output(input=0.001)}).
4. A small basal expression before the threshold: the production of output should be limited before the threshold is reached. This is defined by limiting the sum of the output before the threshold input concentration to a user-input value.

Setting up the mathematical reaction equations

The TMSD-F system can be described in four main reactions, which describe the threshold reaction, the output strand reaction, the fuel reaction and the fluorescent output production, respectively: X + TH {\stackrel{k_f}{\rightarrow}} W X + C1 \underset{k_s}{\stackrel{k_s}{\rightleftharpoons}} C2 + O C2 + F \underset{k_s}{\stackrel{k_s}{\rightleftharpoons}} C3 + X O + R {\stackrel{k_s}{\rightarrow}} G

Notice that in the threshold reaction, k_f (fast kinetic rate) is assumed to be faster than the other rate k_s (slow kinetic rate), because the binding length of the input (X) to the threshold (TH) is longer and thus stronger than to the TMSD complex (C1) (Figure 2). The initial binding in the TMSD is the rate-limiting step.⁵ Therefore, the threshold, output strand, and fuel reaction can be approximated by the same rate k_s as they are all initiated by the same toehold sequence, called a universal toehold (Figure 2). The universal toehold results in some possible side reactions between the different components, of which two examples are:

O + C1 \underset{k_{rf}}{\stackrel{k_f}{\rightleftharpoons}} C4 F + TH \underset{k_{rs}}{\stackrel{k_f}{\rightleftharpoons}} C10 where, k_{rf} is a faster reverse rate than k_{rs}. Moreover, a leakiness reaction is modelled as:

C1 + F \underset{k_l}{\stackrel{k_l}{\rightleftharpoons}} C3 + O

In total, this gave us five reaction parameters to optimise, namely: k_f, k_s, k_{rf}, k_{rs} and k_l. The upper and lower bounds of these values were taken from literature or defined by ourselves.^6,7 The initial conditions were taken from Qian & Winfree (2011).

Setting the objectives as optimisation constraints
The third and fourth optimisation objectives can be set as constraints on the optimisation, meaning that solutions not fulfilling this constraint are discarded. We decided that the maximum output should be the same as in the reference paper, with an increase in the objective value of almost 20. Multiple values for the maximum basal expression were tested.
To optimise both objectives 1 and 2, a small trick is employed, as it is hard to maximise two objectives at once. Here, objective 2 is split into consecutive intervals (Figure 5). For each interval, objective 2 is set as a constraint, while objective 1 is optimised. If there are in total 10 intervals for objective 2, 10 separate optimisations will be done, and thus 10 solutions are produced. Through this, we can compare the effect objectives 1 and 2 have on each other.

Schematic of multi-objective optimisation. Splitting up objective 2 into multiple intervals, allows us to only optimise objective 1 for each interval. For each interval, one solution is generated. Image adapted from Otero-Muras \& Banga, 2017.

The optimisation algorithms
The optimisation was done with a hybrid solver, combining the global solver eSS (enhanced scatter search) and the local solver fmincon in MATLAB.¹⁰ After 10 iterations of the global solver, the local solver was used instead to further refine the optimisation. This was repeated until 10,000 evaluations were done. In the global solver, 320 diverse solutions were generated in the initial stage to ensure a large part of the parameter space was searched. From these, 20 solutions were put in the reference set, which is improved upon in each iteration of the optimisation.³

The optimal reaction rates for creating a sharp threshold switch

Through the comparison of three optimisations varying in maximum values of objective 4, we found that limiting the basal production to 15 before the threshold value decreased the basal expression without compromising the accuracy of the switch (overlap of blue and green dots in Figure 6). Limiting the basal expression to 7.5 detrimentally reduces the accuracy of the switch (red dots in Figure 6). Another generally observed trend is the reduction in threshold accuracy (objective 1) when the slope (objective 2) is decreased. From this plot, it was determined that the optimisation for constraint interval 18-16 on objective 2 was most optimal because it has a high slope and high accuracy. We have now optimised the TMSD-F to produce less output before the threshold is reached (objective 3), have a steeper slope (objective 2), as well as a greatly improved threshold accuracy (objective 1) (Figure 3). The parameters for this solution were: k_f = 3.4*10^7\: M^{-1} \: s^{-1}, k_s = 2.5*10^4\: M^{-1} \: s^{-1}, k_{rf} = 32.3 \: s^{-1}, k_{rs} = 0.13 \: s^{-1}, k_l = 0.02 \: M^{-1} \: s^{-1}.

Comparative results for the main objectives: the threshold accuracy (objective 1) and the slope (objective 2). The accuracy (objective 1) is denoted as the absolute difference of the calculated K from the dose-response curve and the threshold value (50). The numbers at the dots represent the upper boundary of the slope constraint (objective 2) in the optimisation. The points are coloured for their values for objective 4: no limit on the basal expression (blue), a limit of 7.5 (red) and 15 (green).

We next wanted to test the impact of small perturbations in reaction rates on the system. This could give an idea about the robustness of the system and also where potential improvements could be made. A sensitivity analysis on the optimised TMSD-F informed us that small perturbations in k_l do not influence the fluorescence output, while changes in k_{rf} and k_{rs} do (Figure 7). The k_l rate, describing the leakiness of the system, should have little effect on the produced fluorescent signal, and so we do not have to worry that leakiness would negatively impact our system. The rates k_{rf} and k_{rs} describe the side reactions involving the universal toehold. These results indicate a universal toehold might not be the best sequence design for our purpose. By removing the universal toehold, the side reactions would be eliminated and the whole system would be stabilised.

Parameter sensitivity analysis on the output fluorescence of the optimised TMSD-F system. We tested this for a dose below (30), at (50) and above (70) the threshold value of 50. The sensitivities were normalised for their parameter value. A positive sensitivity indicates an increase in fluorescence when a respective parameter is increased, while a negative sensitivity indicates a decrease in fluorescence when a respective parameter is increased.

Modelling the simple TMSD with no fuel

Setting up the reaction equations

With the optimised rates of the TMSD-F, we can also make predictions about the TMSD-NF (Figure 4). We assume that the rates describing the binding of the input (A) to the threshold (B) and of the input (A) to the strand displacement complex (BC) are the same in both the TMSD-F and the TMSD-NF. Thus, k_1 = k_f and k_2 = k_s. The rate describing the aptamer binding reaction, k_3, was assumed to be equal to k_2. When designing this system in NUPACK we found that no side reactions occur and thus none were included in the mathematical model. Furthermore, it is assumed that B and C will dimerise before C can bind to D and before A is added. The reactions describing the system are:
A + B {\stackrel{k_1}{\rightarrow}} AB A + BC \underset{k_2}{\stackrel{k_2}{\rightleftharpoons}} C + AB C +D {\stackrel{k_3}{\rightarrow}} O

Dose-response curve of TMSD-NF

The TMSD-NF system achieved a sharp threshold but was slightly different compared to the TMSD-F. The basal expression was greatly reduced. Besides, the TMSD-NF increases in output only after the threshold is reached but can reach lower maximum output levels. It was hypothesised that this is caused by the absence of a fuel reaction. Indeed, when we remove the fuel reaction from TMSD-F, similar behaviour is observed (Figure 8).

Dose-response curves of the perfect response (for TMSD-NF) the TMSD-NF system and the TMSD-F system without the fuel reaction. The input is the miRNA, the output is a fluorescent signal from either of the systems.

Further optimising the system

The TMSD-NF system is sensitive to the amount of C & D in the system compared to A & B. If B is lower than C &D, the switch is not fast enough and increases linearly after the threshold value is passed (Figure 9). In our diagnostic test, we cannot change the concentration of B as it is determined by the concentration associated with healthy controls. Thus, we need to find a way to establish better scalable dose-response curves using other parts of the system.

We were able to scale the TMSD-F system for multiple threshold values by changing the other input concentrations for the TMSD-F system in a fixed ratio (Figure 9).² Therefore, we performed a qualitative analysis on the TMSD-NF system to find the optimal ratio in input concentrations. It is important that the output is kept as high as possible, while maintaining the steep switch in the curve. Here, the concentrations of C and D are dependent on the concentration of B since C=D=\frac{B}{r} , where r is iteratively changed to find the best dose-response curve. It was found that r=3 results in a good curve, where the switch has a steep curve, but also sufficient maximum output is produced (Figure 10). Thus, for a functioning TMSD-NF system, three times as little C&D compared to B is necessary. Another observation from this modelling experiment was that the dose-response curve flattens once D has been saturated (Figure 10). The maximum output concentrations are equal to the input D concentration. This was communicated to the wet lab to improve experimental performance.

Comparison of TMSD-F (orange) and TMSD-NF (red) in producing dose-response curves for different threshold values. The curves were simulated for three different threshold values (0.1, 0.5, 2.5). In TMSD-F, the ratio is as follows: C1: 2*threshold value; F: 4*threshold value; TH: threshold value and R: 3*threshold value. In TMSD-NF, the concentration of C and D are kept constant at 0.5. The TMSD-F curves have a constant shape, while the TMSD-NF curve improves at higher threshold values.

Simulated dose-response curves for varying levels of r. The threshold value is set to 2. Lower concentrations of C&D (caused by higher values of r), result in a sharper dose-response curve and a lower maximum output.

For our test, it is also important to know how long it would take for the two TMSD systems to reach their final steady states, as this indicates how long it would take for the test to change colour. The original TMSD-F takes 2500 seconds, while for the optimised system, 10000 seconds were needed. Potentially, by decreasing the basal expression before the threshold, we slowed down the entire system. In particular, the doses around the threshold concentration take a long time to reach a steady state, while doses further away from the threshold achieve their steady states quicker. Thus, in perfecting the threshold mechanism, we did increase the test run-time. Interestingly, the TMSD-NF can reach steady states within 5 seconds for all input doses and gives a great time advantage.

Parameter space exploration

In general, the threshold system should work when the reaction rate for the threshold (k_f, k_1) is faster than the reaction for the output production (k_s, k_2).² We were interested in finding out how big this difference needs to be and what ranges of parameters show good dose-response curves for the TMSD-NF. Additionally, knowing the permissible ranges of parameters tells us something about the sequence design of the RNA strands in the threshold system. Recent developments have shown that reaction rates of dimerisation and TMSD can be calculated based on sequence alone.^5,8,9 Therefore, for each amplified miRNA that needs to go into a thresholding system, we can determine if the reaction is within the permissible parameter ranges. From an exploratory sensitivity analysis, we already knew that the output of the system is not highly sensitive to the parameters in the system, which indicated a large permissible parameter range.

To analyse the permissible parameters, a Latin hypercube sample was created of the whole parameter space. For each of the 10,000 tested parameter sets, a dose-response curve is simulated, and the behaviour is scored according to slightly adjusted objectives as previously used in the MOO (see dropdown box).

Objectives for TMSD-NF dose-response
1. Maximum output: We know from our previous tests that the optimal maximum possible output is three times lower than the threshold value. Therefore, objective 1, which should be as low as possible, can be described as: obj1 = max(response) - \frac{threshold \: value}{3} 2. Low basal expression: we want the basal expression before the threshold value to be as low as possible. Therefore, objective 2 can be calculated by summing up the output of the system until the threshold value. Again, this value should be as low as possible.
3. High slope: The slope at the threshold value should be as high as possible, which is calculated by the area under the gradient curve from the threshold value onwards. The bigger this area, the better.
4. Threshold accuracy: It should be judged whether the output starts to rise at the threshold value or a different point. For this, we judge the change in the gradient of a point just before and right after the threshold value. The higher this value, the sharper the switch.
The output of each objective is rescaled to be within a corresponding value between 0 and 10, after which they are added to create a final score for each parameter set. Objective 1 is weighted lower than the other three (0.1 vs. 0.3, respectively).

Latin hypercube sampling
In Latin hypercube sampling, the parameter sets are picked such that they are equally distributed over the whole parameter space. We set the boundaries for k_1 to the same values as k_f and the boundaries for k_2 and k_3 were set to the same values as k_s.

Parameter space exploration. Parameter sets from the Latin hypercube sampling plotted with their \frac{k_1}{k_2} ratio (log scale) and score, where 10 is a perfect score. The colour bar represents the log_{10}(k_3). From around a \frac{k_1}{k_2} ratio of 1000 onwards, each parameter set will give the best possible dose-response curve.

It was determined that k_1 should at least be 1000 times higher than k_2 to reach sufficient scores for all values of k_3 (Figure 11). In other words, from that point onwards it does not matter how strong the binding of the output from the TMSD (C) to the aptamer is (D). At lower \frac{k_1}{k_2} ratios, a weaker binding of C to D could restore the system behaviour. This indicates that even though C might be produced quicker because of the input annealing first to the TMSD complex instead of the threshold, a larger pool of C has to accumulate before it can activate the aptamer D.

Besides, these results tell us that a large range of parameters is permissible, as long as k_1 >> k_2. This suggests that most miRNA sequences should be able to function in the TMSD-NF threshold system as long as the threshold sequence can be long enough to create strong annealing and thus a higher k_1 compared to k_2. Otherwise, a weaker binding to the aptamer could restore the functioning.

Future experimental and modelling recommendations

Through modelling the threshold system, we found that our engineered TMSD-NF system produced a dose-response curve with a sharp switch at the threshold value. For this, it is important that the concentrations of C&D compared to B are well balanced. Next, we also learned that the sequence design of the RNA strands in the TMSD-NF system should always lead to a good dose-response curve if the threshold annealing sequence is longer than the toehold binding region in the TMSD-NF reaction. These points improved the system design in the lab.

However, while testing the TMSD-NF experimentally, it was discovered that one assumption of the mathematical model is likely violated. The NUPACK model used an input concentration of C and D in a 1:1 ratio, missing that, if free D is still in the solution, A will bind to D and produce fluorescence. The sequences could be redesigned such that A cannot bind to D anymore, but the model could also be adapted to include an interaction term between A and D. This way we can learn more about the behaviour of the TMSD-NF system under this updated assumption.

A future extension to our work would be to model the maximum concentration of free D that still produces good dose-response curves. These results could guide the production of D in the wet lab. Additionally, the parameter space analysis could be reperformed to find the maximum binding rate of A to D. This in turn could direct the sequence design of the RNA strands in the TMSD-NF system, where caution is paid to the binding strength of A and D. Unfortunately, we did not have time to work this out and include it in a rebuild in the DBTL cycle. Nevertheless, the modelling of the threshold system has shown the possibilities of the two alternative systems in distinguishing the concentration of an miRNA to be healthy or indicative of MS.

(1)

Ho, P. T. B.; Clark, I. M.; Le, L. T. T. MicroRNA-Based Diagnosis and Therapy. International Journal of Molecular Sciences 2022, 23 (13), 7167. https://doi.org/10.3390/ijms23137167.

(2)

Qian, L.; Winfree, E. Scaling Up Digital Circuit Computation with DNA Strand Displacement Cascades. Science 2011, 332 (6034), 1196–1201. https://doi.org/10.1126/science.1200520.

(3)

Otero-Muras, I.; Banga, J. R. Automated Design Framework for Synthetic Biology Exploiting Pareto Optimality. ACS Synthetic Biology 2017, 6 (7), 1180–1193. https://doi.org/10.1021/acssynbio.6b00306.

(4)

Ang, J.; Harris, E.; Hussey, B. J.; Kil, R.; McMillen, D. R. Tuning Response Curves for Synthetic Biology. ACS Synthetic Biology 2013, 2 (10), 547–567. https://doi.org/10.1021/sb4000564.

(5)

Akay, A.; Reddy, H. N.; Galloway, R.; Kozyra, J.; Jackson, A. W. Predicting DNA Toehold-Mediated Strand Displacement Rate Constants Using a DNA-BERT Transformer Deep Learning Model. Heliyon 2024, 10 (7), e28443. https://doi.org/10.1016/j.heliyon.2024.e28443.

(6)

Zhang, D. Y.; Winfree, E. Control of DNA Strand Displacement Kinetics Using Toehold Exchange. Journal of the American Chemical Society 2009, 131 (47), 17303–17314. https://doi.org/10.1021/ja906987s.

(7)

Machinek, R. R. F.; Ouldridge, T. E.; Haley, N. E. C.; Bath, J.; Turberfield, A. J. Programmable Energy Landscapes for Kinetic Control of DNA Strand Displacement. Nature Communications 2014, 5 (1), 5324. https://doi.org/10.1038/ncomms6324.

(8)

Berleant, J.; Berlind, C.; Badelt, S.; Dannenberg, F.; Schaeffer, J.; Winfree, E. Automated Sequence-Level Analysis of Kinetics and Thermodynamics for Domain-Level DNA Strand-Displacement Systems. Journal of The Royal Society Interface 2018, 15 (149), 20180107. https://doi.org/10.1098/rsif.2018.0107.

(9)

Zhang, J. X.; Yordanov, B.; Gaunt, A.; Wang, M. X.; Dai, P.; Chen, Y.-J.; Zhang, K.; Fang, J. Z.; Dalchau, N.; Li, J.; Phillips, A.; Zhang, D. Y. A Deep Learning Model for Predicting Next-Generation Sequencing Depth from DNA Sequence. Nature Communications 2021, 12 (1), 4387. https://doi.org/10.1038/s41467-021-24497-8.