Why do we need a threshold detection system?
A disease like Multiple Sclerosis (MS) can cause miRNAs to be
dysregulated compared to a healthy person.1 The miRNAs measured by our
diagnostic test are generally always present in the blood, just in lower
or higher concentrations than in a healthy control. Solely detecting the
presence or absence of an miRNA is therefore not sufficient for a
correct test result. To solve this problem, a mechanism was designed
that could distinguish between the concentrations below or above the
healthy control. We named this design the "threshold system". Ideally,
there would be no output of the threshold system below the threshold
concentration and thus no activation of the subsequent toehold switch (Figure 1).
Once the threshold has been passed, output RNA will be produced, which
enters the ribocomputing toehold switch circuit. In this sense, we would also
create a binary input for the ribocomputing circuit, which is beneficial
for its functioning.
Toehold mediated strand displacement reaction can create a threshold
Toehold mediated strand displacement reaction with fuel
In literature, we found an RNA-based reaction that could achieve the
desired behaviour (Figure 1).2
This reaction consists of four steps (Figure 2). First, the
input miRNA (X) binds to a threshold complex (TH) to catch away the
amount of miRNA associated with a healthy control (Figure 2a). Second,
the unbound input (X) can initiate a toehold mediated strand
displacement reaction (TMSD), releasing the output strand (O) (Figure
2b). The output strand (O) was designed to bind to a reporter (R) to
produce a fluorescent output signal (Figure 2c). A fuel will reverse the
strand displacement reaction, releasing X (Figure 2d). This
advantageously speeds up the production of O, as X can once again
initiate the TMSD reaction. The complete reaction (hereon named TMSD-F)
results in a dose-response curve that already comes close to our desired
behaviour, but it would benefit from further optimisation (Figure 3).
Simplified toehold mediated strand displacement reaction without fuel
In a cell-free test harbouring a complex system like ours, simplicity
is of great importance. Therefore, a simplified version of the TMSD-F
described above was engineered in
our lab. Essentially, the threshold reaction and the TMSD still take
place, but because of the simplified sequence design, the fuel reaction
is removed (Figure 4). The output strand C will bind to an
aptamer to create a fluorescent signal. The simplified system will be
referred to as TMSD-NF (toehold mediated strand displacement with no
fuel).
Optimisation of the TMSD-F
We could now solely focus on our engineered TMSD-NF, but the comparison to the TMSD-F also interested us. In this way, we could judge the ability of both systems to create a dose-response curve better and discuss which system would better suit future test requirements. Furthermore, the TMSD-F was validated experimentally by Qian et al. 2011, where their model was fitted to data.2 This gave us an excellent starting point that later guided us in designing the TMSD-NF.
Based on our simulations, the TMSD-F shows a dose-response curve with high leakiness and not enough accuracy compared to the desired dose-response curve (Figure 3). Therefore, the first step was to optimise this existing system through modelling. The optimisation framework was taken from existing work in multi-objective optimisation (MOO; see the dropdown box for more details).3 The desired threshold switch can be scored with a Hill function
Output = Intercept + Maximum *
\frac{Intercept^n}{K^n + Intercept^n} which describes the curves
generally seen in dose-response graphs (Figure 1).4
This resulted in four optimisation objectives:
1. A high threshold accuracy: the switch should occur
at the desired input concentration. This is defined by calculating
K from the Hill function. This value should be as close to the
threshold input value as possible.
2. A high slope: to ensure a sharp switch at the input
concentration. This is defined by calculating n from the Hill
function, which should be as high as possible.
3. A large maximum output: the output of the system
should reach the maximum possible production. This is defined as log_{10}(\frac{max(output)}{output(input=0.001)}).
4. A small basal expression before the threshold: the
production of output should be limited before the threshold is reached.
This is defined by limiting the sum of the output before the threshold
input concentration to a user-input value.
Setting up the mathematical reaction equations
The TMSD-F system can be described in four main reactions, which describe the threshold reaction, the output strand reaction, the fuel reaction and the fluorescent output production, respectively: X + TH {\stackrel{k_f}{\rightarrow}} W X + C1 \underset{k_s}{\stackrel{k_s}{\rightleftharpoons}} C2 + O C2 + F \underset{k_s}{\stackrel{k_s}{\rightleftharpoons}} C3 + X O + R {\stackrel{k_s}{\rightarrow}} G
Notice that in the threshold reaction, k_f (fast kinetic rate) is assumed to be faster than the other rate k_s (slow kinetic rate), because the binding length of the input (X) to the threshold (TH) is longer and thus stronger than to the TMSD complex (C1) (Figure 2). The initial binding in the TMSD is the rate-limiting step.5 Therefore, the threshold, output strand, and fuel reaction can be approximated by the same rate k_s as they are all initiated by the same toehold sequence, called a universal toehold (Figure 2). The universal toehold results in some possible side reactions between the different components, of which two examples are:
O + C1 \underset{k_{rf}}{\stackrel{k_f}{\rightleftharpoons}} C4 F + TH \underset{k_{rs}}{\stackrel{k_f}{\rightleftharpoons}} C10 where, k_{rf} is a faster reverse rate than k_{rs}. Moreover, a leakiness reaction is modelled as:
C1 + F \underset{k_l}{\stackrel{k_l}{\rightleftharpoons}} C3 + O
In total, this gave us five reaction parameters to optimise, namely:
k_f, k_s, k_{rf}, k_{rs} and k_l. The upper and lower bounds of these
values were taken from literature or defined by ourselves.6,7 The initial conditions were
taken from Qian & Winfree (2011).
Setting the objectives as optimisation constraints
The third and fourth optimisation objectives can be set as constraints
on the optimisation, meaning that solutions not fulfilling this
constraint are discarded. We decided that the maximum output should be
the same as in the reference paper, with an increase in the objective
value of almost 20. Multiple values for the maximum basal expression
were tested.
To optimise both objectives 1 and 2, a small trick is employed, as it is
hard to maximise two objectives at once. Here, objective 2 is split into
consecutive intervals (Figure 5). For
each interval, objective 2 is set as a constraint, while objective 1 is
optimised. If there are in total 10 intervals for objective 2, 10
separate optimisations will be done, and thus 10 solutions are produced.
Through this, we can compare the effect objectives 1 and 2 have on each
other.
The optimisation algorithms
The optimisation was done with a hybrid solver, combining the global solver eSS (enhanced scatter search) and the local solver fmincon in MATLAB.10 After 10 iterations of the global solver, the local solver was used instead to further refine the optimisation. This was repeated until 10,000 evaluations were done. In the global solver, 320 diverse solutions were generated in the initial stage to ensure a large part of the parameter space was searched. From these, 20 solutions were put in the reference set, which is improved upon in each iteration of the optimisation.3
The optimal reaction rates for creating a sharp threshold switch
Through the comparison of three optimisations varying in maximum
values of objective 4, we found that limiting the basal production to 15
before the threshold value decreased the basal expression without
compromising the accuracy of the switch (overlap of blue and green dots
in Figure 6). Limiting the basal expression to
7.5 detrimentally reduces the accuracy of the switch (red dots in Figure
6). Another generally observed trend
is the reduction in threshold accuracy (objective 1) when the slope
(objective 2) is decreased. From this plot, it was determined that the
optimisation for constraint interval 18-16 on objective 2 was most
optimal because it has a high slope and high accuracy. We have now
optimised the TMSD-F to produce less output before the threshold is
reached (objective 3), have a steeper slope (objective 2), as well as a
greatly improved threshold accuracy (objective 1) (Figure 3). The parameters for this solution
were: k_f = 3.4*10^7\: M^{-1} \: s^{-1}, k_s =
2.5*10^4\: M^{-1} \: s^{-1}, k_{rf} = 32.3 \: s^{-1}, k_{rs} = 0.13 \:
s^{-1}, k_l = 0.02 \: M^{-1} \: s^{-1}.
We next wanted to test the impact of small perturbations in reaction
rates on the system. This could give an idea about the robustness of the
system and also where potential improvements could be made. A
sensitivity analysis on the optimised TMSD-F informed us that small
perturbations in k_l do not influence
the fluorescence output, while changes in k_{rf} and k_{rs} do (Figure 7). The k_l rate, describing the leakiness of the
system, should have little effect on the produced fluorescent signal,
and so we do not have to worry that leakiness would negatively impact
our system. The rates k_{rf} and k_{rs} describe the side reactions involving
the universal toehold. These results indicate a universal toehold might
not be the best sequence design for our purpose. By removing the
universal toehold, the side reactions would be eliminated and the whole
system would be stabilised.
Modelling the simple TMSD with no fuel
Setting up the reaction equations
With the optimised rates of the TMSD-F, we can also make predictions
about the TMSD-NF (Figure 4). We assume
that the rates describing the binding of the input (A) to the threshold
(B) and of the input (A) to the strand displacement complex (BC) are the
same in both the TMSD-F and the TMSD-NF. Thus, k_1 = k_f and k_2 =
k_s. The rate describing the aptamer binding reaction, k_3, was assumed to be equal to k_2. When designing this
system in NUPACK we found that no side reactions occur and thus none
were included in the mathematical model. Furthermore, it is assumed that
B and C will dimerise before C can bind to D and before A is added. The
reactions describing the system are:
A + B {\stackrel{k_1}{\rightarrow}}
AB A + BC
\underset{k_2}{\stackrel{k_2}{\rightleftharpoons}} C + AB C +D {\stackrel{k_3}{\rightarrow}} O
Dose-response curve of TMSD-NF
The TMSD-NF system achieved a sharp threshold but was slightly different compared to the TMSD-F. The basal expression was greatly reduced. Besides, the TMSD-NF increases in output only after the threshold is reached but can reach lower maximum output levels. It was hypothesised that this is caused by the absence of a fuel reaction. Indeed, when we remove the fuel reaction from TMSD-F, similar behaviour is observed (Figure 8).
Further optimising the system
The TMSD-NF system is sensitive to the amount of C & D in the system compared to A & B. If B is lower than C &D, the switch is not fast enough and increases linearly after the threshold value is passed (Figure 9). In our diagnostic test, we cannot change the concentration of B as it is determined by the concentration associated with healthy controls. Thus, we need to find a way to establish better scalable dose-response curves using other parts of the system.
We were able to scale the TMSD-F system for multiple threshold values by
changing the other input concentrations for the TMSD-F system in a fixed
ratio (Figure 9).2 Therefore, we performed a
qualitative analysis on the TMSD-NF system to find the optimal ratio in
input concentrations. It is important that the output is kept as high as
possible, while maintaining the steep switch in the curve. Here, the
concentrations of C and D are dependent on the concentration of B since
C=D=\frac{B}{r} , where r is
iteratively changed to find the best dose-response curve. It was found
that r=3 results in a good curve, where the switch has a steep
curve, but also sufficient maximum output is produced (Figure 10). Thus, for a functioning
TMSD-NF system, three times as little C&D compared to B is
necessary. Another observation from this modelling experiment was that
the dose-response curve flattens once D has been saturated (Figure 10). The maximum output
concentrations are equal to the input D concentration. This was
communicated to the wet lab to improve experimental performance.
For our test, it is also important to know how long it would take for the two TMSD systems to reach their final steady states, as this indicates how long it would take for the test to change colour. The original TMSD-F takes 2500 seconds, while for the optimised system, 10000 seconds were needed. Potentially, by decreasing the basal expression before the threshold, we slowed down the entire system. In particular, the doses around the threshold concentration take a long time to reach a steady state, while doses further away from the threshold achieve their steady states quicker. Thus, in perfecting the threshold mechanism, we did increase the test run-time. Interestingly, the TMSD-NF can reach steady states within 5 seconds for all input doses and gives a great time advantage.
Parameter space exploration
In general, the threshold system should work when the reaction rate for the threshold (k_f, k_1) is faster than the reaction for the output production (k_s, k_2).2 We were interested in finding out how big this difference needs to be and what ranges of parameters show good dose-response curves for the TMSD-NF. Additionally, knowing the permissible ranges of parameters tells us something about the sequence design of the RNA strands in the threshold system. Recent developments have shown that reaction rates of dimerisation and TMSD can be calculated based on sequence alone.5,8,9 Therefore, for each amplified miRNA that needs to go into a thresholding system, we can determine if the reaction is within the permissible parameter ranges. From an exploratory sensitivity analysis, we already knew that the output of the system is not highly sensitive to the parameters in the system, which indicated a large permissible parameter range.
To analyse the permissible parameters, a Latin hypercube sample was
created of the whole parameter space. For each of the 10,000 tested
parameter sets, a dose-response curve is simulated, and the behaviour is
scored according to slightly adjusted objectives as previously used in
the MOO (see dropdown box).
Objectives for TMSD-NF dose-response
1.
Maximum output: We know from our previous tests that
the optimal maximum possible output is three times lower than the
threshold value. Therefore, objective 1, which should be as low as
possible, can be described as: obj1 =
max(response) - \frac{threshold \: value}{3} 2. Low basal
expression: we want the basal expression before the threshold
value to be as low as possible. Therefore, objective 2 can be calculated
by summing up the output of the system until the threshold value. Again,
this value should be as low as possible.
3. High slope: The slope at the threshold value should be as high as possible, which is calculated by the area under the gradient curve from the threshold value onwards. The bigger this area, the better.
4. Threshold accuracy: It should be judged whether the output starts to rise at the threshold value or a different point. For this, we judge the change in the gradient of a point just before and right after the threshold value. The higher this value, the sharper the switch.
The output of each objective is rescaled to be within a corresponding value between 0 and 10, after which they are added to create a final score for each parameter set. Objective 1 is weighted lower than the other three (0.1 vs. 0.3, respectively).
Latin hypercube sampling
In Latin hypercube sampling, the parameter sets are picked such that they are equally distributed over the whole parameter space. We set the boundaries for k_1 to the same values as k_f and the boundaries for k_2 and k_3 were set to the same values as k_s.
It was determined that k_1 should at least be 1000 times higher than k_2 to reach sufficient scores for all values of k_3 (Figure 11). In other words, from that point onwards it does not matter how strong the binding of the output from the TMSD (C) to the aptamer is (D). At lower \frac{k_1}{k_2} ratios, a weaker binding of C to D could restore the system behaviour. This indicates that even though C might be produced quicker because of the input annealing first to the TMSD complex instead of the threshold, a larger pool of C has to accumulate before it can activate the aptamer D.
Besides, these results tell us that a large range of parameters is permissible, as long as k_1 >> k_2. This suggests that most miRNA sequences should be able to function in the TMSD-NF threshold system as long as the threshold sequence can be long enough to create strong annealing and thus a higher k_1 compared to k_2. Otherwise, a weaker binding to the aptamer could restore the functioning.
Future experimental and modelling recommendations
Through modelling the threshold system, we found that our engineered TMSD-NF system produced a dose-response curve with a sharp switch at the threshold value. For this, it is important that the concentrations of C&D compared to B are well balanced. Next, we also learned that the sequence design of the RNA strands in the TMSD-NF system should always lead to a good dose-response curve if the threshold annealing sequence is longer than the toehold binding region in the TMSD-NF reaction. These points improved the system design in the lab.
However, while testing the TMSD-NF experimentally, it was discovered that one assumption of the mathematical model is likely violated. The NUPACK model used an input concentration of C and D in a 1:1 ratio, missing that, if free D is still in the solution, A will bind to D and produce fluorescence. The sequences could be redesigned such that A cannot bind to D anymore, but the model could also be adapted to include an interaction term between A and D. This way we can learn more about the behaviour of the TMSD-NF system under this updated assumption.
A future extension to our work would be to model the maximum concentration of free D that still produces good dose-response curves. These results could guide the production of D in the wet lab. Additionally, the parameter space analysis could be reperformed to find the maximum binding rate of A to D. This in turn could direct the sequence design of the RNA strands in the TMSD-NF system, where caution is paid to the binding strength of A and D. Unfortunately, we did not have time to work this out and include it in a rebuild in the DBTL cycle. Nevertheless, the modelling of the threshold system has shown the possibilities of the two alternative systems in distinguishing the concentration of an miRNA to be healthy or indicative of MS.