| BNDS-China - iGEM 2024

Press to download a PDF version of our model.

Modeling is essential to simulating and understanding dynamic systems and genetic parts in biology. We could subsequently improve our designs through the insights it provides. In synergy with our wet lab, modeling contributes to our ultimate implementation of the probiotic platform as a supplement product. Specifically, we are more equipped with the details about the genetic circuits, metabolic stress on chassis, biosensors functioning, and supplement administration to ensure the project feasibility and practicality. Next, we will expound on each corresponding model.

1. Genetic Circuit Calculator

The genetic circuit is composed of multiple parts integrated. We aim to design a "calculator" that simulates the complicated sensing and expression process to predict the circuit's performance under various inputs similar to the Cello 2.0 program (Nielsen et al, 2016; Jones et al., 2022). Specifically, users could utilize our calculator program to model the dynamic output of arbitrary circuits. The results of this calculator can serve as a reference for our experiments, design, and implementation.

Assumptions and Parameters

Throughout the modeling process, we've unified the units of time and concentrations as minutes and millimoles/liter respectively. Hence, we will refer to the following parameters:


Name	Definition
$t / min$	Time in hours is the independent variable of the following functions.
$y / mM$	The concentration of the stimulated part over time in the stimulation model.
$x / mM$	The concentration of the stimulator part over time in the stimulation model.
$α$	The constant of production rate in the stimulation model.
$β$	The constant of degradation rate in the stimulation model.
$y_{m i n} / mM$	The concentration of the minimum transcriptional output.
$y_{m i n} / mM$	The concentration of the maximum transcriptional output.
$K$	The half-maximal constant for the hill function.
$n$	The hill constant of the hill function.
$P / mM$	The transcriptional output of the promoter over time in the expression model.
$ϵ$	The conversion factor in the expression model.
$C / mM$	The constant rate of the promoter functions in the transcriptional output model.

For practical convenience when modeling, we assume the following basic assumptions:

We assume the production and degradation rate of molecules, such as that of mRNA, enzymes, and metabolites related to GABA, follows mass-action kinetics. The rate of change in a reactant's concentration is proportional to its present concentration. For example, the rate of degradation $\frac{d y}{d t}$ is assumed to be proportional to $y$ in $\frac{d y}{d t} = αx - β y$
We assume the system reaches a dynamic equilibrium where the rates of production and degradation balance each other over time. For example, the parameter $β$ that represents natural decay processes is constant for any type of molecules.
We assume a cooperative binding process or a threshold effect where the response, such as transcriptional output $P$ , is nonlinear and saturates at higher concentrations of the inhibiting molecule. Thus, a repression relationship could be described by the hill function. For example, $\frac{K ^{n}}{K ^{n} + x ^{n}}$ suggests that $x$ affects the transcriptional output $P$ of the repressed part in a nonlinear fashion.

Stimulation Model

The stimulation model assumes that higher $x$ leads to higher $y$ whereas $y$ stops increasing when $x$ is very low. Thus, when a part with $x$ concentration stimulates the expression of another part with $y$ concentration, the two parts satisfy the following differential equation:

\frac{d y}{d t} = αx - β y

Repression Model

The repression model assumes that higher $x$ lowers $y$ and $y$ reaches 0 when $x$ is very high. However, there's no ideal repression in reality, with leaky expressions leading to minimum amounts of $y$ . Thus, when a part with $x$ concentration inhibits the expression of another part with $y$ concentration, the two parts satisfy the following differential equation:

\frac{d y}{d t} = α (y_{m i n} + (y_{m a x} - y_{m i n}) \frac{K ^{n}}{K ^{n} + x ^{n}}) - β y

Expression Model

The expression model assumes that higher $P$ leads to higher change in $y$ whereas $y$ reaches 0 when $P$ is very low. Thus, when the expression of a part with $y$ concentration is induced by a promoter with transcriptional output $P$ , they satisfy the following differential equation:

\frac{d y}{d t} = ϵ P - β y

Transcriptional Output Model

The expression model is based on the assumption that promoter functions at a constant rate $C$ but the final output $P$ reaches equilibrium due to degradation illustrated by the following differential equation:

\frac{d P}{d t} = C - βP

Multi-layered Relation Model

The multi-layered relation model assumes that the outer layer relation affects the inner layer relation via its relation with an intermediate substance. If $A$ represses $B$ and $C$ represses this relation, we assume that an intermediate $D$ exists between $A$ and $B$ as a messenger while $C$ represses it. This largely follows biochemical processes, the messenger ranging from protein to RNA to simple compounds. The higher $C$ is the stronger the relation it acts upon, indicated by $A$ in the inner layer. Depending on the specific relation, the part involved in the outer layer relation ( $C$ ) and the part that initiates the inner layer relation ( $A$ ) satisfy the following differential equations:

⎩ ⎨ ⎧ \frac{d y}{d t} \frac{d y}{d t} = αx - β y OR = α (y_{m i n} + (y_{m a x} - y_{m i n}) \frac{K ^{n}}{K ^{n} + x ^{n}}) - β y

The user needs to break down the genetic circuit into basic segments that begin with a promoter and end with a terminator. Then, the user inputs all the relations, either stimulation or repression between the segments. The calculator sets up the differential equations segment by segment, though the relationships are analyzed after the system of differential equations is complete. To analyze complex relations like "repress of repression" efficiently, we assume the outer layer of relation directly acts on the "process" of the inner layer in multiple layers of relations. Therefore, relations may need simplifying before input concentrations over time and equation coefficients are inputted. We programmed the calculator with Python. The full code is in our project's public GitHub repository, as are all our codes for the dry lab. We include the essentials of our code here:

# Define a function for the derivatives
        def model(y, t):
            N = len(y)  # Number of variables
            dydt = np.zeros(N)
            parts_map = {} # Name -> number
            current_number = 0
            for i in range(0, len(user_structures)):
                current_structure = user_structures[i]
                current_part = 0
                constant_rate = 0.8
                death_rate = 0.5
                growth_rate = 0.6
                while current_part < len(current_structure):
                    alpha, beta, y_min, y_max, K, n = parameters[current_structure[current_part]]
                    parts_map[current_structure[current_part]] = current_number
                    if current_part == 0:
                        promoter_number = current_number
                        dydt[current_number] = constant_rate - death_rate * y[current_number] # assume promoters work at a constant rate
                        dydt[current_number + N//2] = constant_rate - death_rate * y[current_number + N//2]
                    else:
                        dydt[current_number] = alpha * y[promoter_number + N//2] - beta * y[current_number] # protiens that are connected after the promoters grow by a rate proportional to the concentration of the promoters
                        dydt[current_number + N//2] = alpha * y[promoter_number + N//2] - beta * y[current_number + N//2] # the plus N/2 in the second bracket might go wrong
                    current_number += 1
                    current_part += 1
            for key, value in user_relations.items():
                current_relation = value # [A, relation, B]
                A = parts_map[current_relation[0]] # serial number
                relation = current_relation[1]
                if relation == '2': #repress
                    if user_relations.get(current_relation[2]) is None: # B is not a relation, it might be a promoter or a protein
                        alpha, beta, y_min, y_max, K, n = parameters[current_relation[2]]
                        B = parts_map[current_relation[2]] 
                        dydt[B] = alpha * (y_min + (y_max - y_min) * ((K**n) / (K**n + y[A + N//2]**n))) - beta * y[B]
                        dydt[B + N//2] = alpha * (y_min + (y_max - y_min) * ((K**n) / (K**n + y[A + N//2]**n))) - beta * y[B + N//2]
                    else: # B is a relation, then the current relation affects the amount of effective concentration of the repressor in relation B
                        alpha, beta, y_min, y_max, K, n = parameters[user_relations[current_relation[2]][0]]
                        B = parts_map[user_relations[current_relation[2]][0]] 
                        dydt[B + N//2] = alpha * (y_min + (y_max - y_min) * ((K**n) / (K**n + y[A + N//2]**n))) - beta * y[B + N//2]
                else: #stimulate
                    if user_relations.get(current_relation[2]) is None: # B is not a relation, it might be a promoter or a protein
                        alpha, beta, y_min, y_max, K, n = parameters[current_relation[2]]
                        B = parts_map[current_relation[2]] 
                        dydt[B] = alpha * y[A + N//2] - beta * y[B]
                        dydt[B + N//2] = alpha * y[A + N//2] - beta * y[B + N//2]
                    else: # B is a relation, then the current relation affects the amount of effective concentration of the repressor in relation B
                        alpha, beta, y_min, y_max, K, n = parameters[user_relations[current_relation[2]][0]]
                        B = parts_map[user_relations[current_relation[2]][0]] 
                        dydt[B + N//2] = alpha * y[A + N//2] - beta * y[B + N//2]
            return dydt

After solving the system of differential equations via scipy repository, we visualize the concentrations of output molecules and all intermediate molecules over time. Therefore, the user needs to input: the names of the proteins and promoters, the segments (starting from a promoter and ending at the terminator) of the circuit, and the relations (repress or stimulate) between parts. The following is an example of an anticipated outcome that verifies the validity of our calculator. We input the relations demonstrated (Figure. 1) according to the following format (Figure. 2) and the calculator produces the following graphs (Figure. 3).

Figure 1. A simple example of a "NOT" gate genetic circuit.

Figure 2. Interaction between our calculator and user when inputing necessary logic and parameters for the "NOT" gate example.

The calculator constructs the following system of differential equations:

⎩ ⎨ ⎧ \frac{d P _{1}}{d t} \frac{d a}{d t} \frac{d P _{2}}{d t} \frac{d b}{d t} = C_{1} - β_{1} P_{1} = α_{a} P_{1} - β_{a} a = α_{2} (y_{m i n} + (y_{m a x} - y_{m i n}) \frac{K _{2}^{n_{2}}}{K _{2}^{n_{2}} + a ^{n_{2}}}) - β_{2} P_{2} = α_{b} P_{2} - β_{b} b

Figure 3. Output of concentrations of all related molecules in the "NOT" gate example.

From the actual graph, we see that protein $a$ needs promoter $P_{1}$ to transcript, hence, in the outcome graph, we see that $a$ grows slowly at the beginning when the concentration of $P_{1}$ is low. Furthermore, protein $a$ represses promoter $P_{2}$ . Accordingly, as the concentration of $a$ gets higher the growth rate of $P_{2}$ decreases, and the concentration of $P_{2}$ eventually falls due to degradation. Similarly, since protein $b$ needs $P_{2}$ to transcript, as the concentration of $P_{2}$ decreases, the rate of growth of $b$ also decreases. Finally, notice that the concentration of the repressed substance does not fall to zero, thus, simulating the actual situation rather than ideally modeling repression as a closing switch.

Application

First, we consider the negative feedback of IAA. We input the relations demonstrated (Figure. 4) according to the following format (Figure. 5) and the calculator produces the following graphs (Figure. 6). From the results, we can see that the concentration of J₂₃₁₁₉ will rise continuously until an equilibrium because it is not affected by any other parts. The concentration of iacR rises as the concentration of J₂₃₁₁₉ increases because iacR is promoted by J₂₃₁₁₉. The concentration of P_iacR initially rises, however, as the concentration of iacR increases, the concentration of P_iacR decreases, because it is repressed by iacR. The concentrations of iadCDE and GFP both remain at a low level, because the concentration of P_iacR is low, and iadCDE and GFP are promoted by P_iacR. The concentration of IAA is relatively higher than iadCDE because the concentration of iadCDE is constrained by its repressed promoter, P_iacR.

Figure 4. The negative feedback loop of IAA genetic circuit.

Figure 5. Interaction between our calculator and user when inputing necessary logic and parameters for the IAA negative feedback example.

Figure 6. Output of concentrations of all related molecules in the IAA negative feedback example.

Next, we consider the negative feedback of butyrate. We input the relations demonstrated (Figure. 7) according to the following format (Figure. 8) and the calculator produces the following graphs (Figure. 9). From the results, we can see that the segment starting with promoter P₂₃₁₀₁ does not need anything to stimulate it nor has anything to repress it. Therefore, it naturally becomes the part with the greatest concentration and induces the continual increase in lrp concentration. With the slow rise in butyrate concentration, there is an even slower increase in P_pchA and the gene it induces, cI. However, as cI inhibits P_lam, we see a decrease in its transcriptional output and the genes it regulates, Tes4 and RFP, though neither had a distinct decrease in their concentration. This demonstrates the delay in biological genetic circuits.

Figure 7. The negative feedback loop of butyrate genetic circuit.

Figure 8. Interaction between our calculator and user when inputing necessary logic and parameters for the butyrate negative feedback example.

Figure 9. Output of concentrations of all related molecules in the butyrate negative feedback example.

Conclusion

Our calculator is capable of predicting any complex genetic circuit due to its generality, and its straightforward input system makes the calculator a simple tool for future iGEM teams to use. By inputting the structure of our team’s circuit and incorporating the basic mass action kinetic function and hill function, the calculator predicts the results, visualizing the change of each substance and giving insights for experiments. It is also useful if we want to adjust the genetic circuit, preventing us from working from scratch and constructing a new model for the changed circuit. However, our current outputs are graphs of concentration as a function of time, which might be less accurate considering the chemical reactions in the circuit produce products that are described as probability distributions rather than expected values. Therefore, our future goal is to establish a probability distribution predicting calculator.

2. Metabolic Stress Model

Concerned with the metabolic stress of the genetic pathways we integrated into the genome of our engineered E. coli Nissle 1917 (EcN), we want to analyze whether this leads to reduced fitness in our engineered EcN so it may be outnumbered and outcompeted by existing gut microbiota. We also find estimating and simulating metabolic stress on expression systems commonplace across research projects, so our modeling process may provide insights for other iGEM projects.

Assumptions and Parameters

Throughout the modeling process, we've unified the units of concentration, time, and EcN population as millimoles/liter, hours, and $OD_{600}$ respectively. Hence, we will refer to the following parameters:


Name	Definition
$t / h$	Time in hours is the independent variable of the following functions.
$P (t) / OD_{600}$	The theoretical logistic growth model represents EcN population colonizing the human gut as a function of time.
$P_{m a x} / OD_{600}$	Human gut environment carrying capacity for EcN under logistic growth.
$r$	Intrinsic EcN population growth rate under the logistic growth model's theoretical conditions.
$λ$	The constant representing butyrate's repression of bacteria growth.
$α$	The constant of production rate of Tes4 enzyme.
$β$	The constant of degradation rate of Tes4 enzyme.
$γ$	The constant of degradation rate of transcriptional output of the promoter that initiates Tes4 enzyme.
$δ$	The constant representing bacteria's regulation of butyrate concentration.
$p / mM$	The transcriptional output of the promoter that initiates Tes4 gene expression.
$E / mM$	The concentration of Tes4 enzyme that catalyze reaction from Butyryl-ACP to Butyrate.
$S / mM$	The concentration of Butyryl-ACP.
$B / mM$	The concentration of Butyrate.
$C / mM$	The constant rate of the promoter functions in the transcriptional output model.
$K$	The half-maximal constant for the Michaelis-Menten equation.
$k$	The rate constant for the Michaelis-Menten equation.

We assume the following basic assumptions for all the following metabolic stress models for convenience and unity when modeling:

We assume that the bacteria's environment, such as temperature, pH, and nutrient availability, is relatively stable. The system is at a pseudo-steady state, so metabolite concentrations and chemical reactions don't change significantly over time. Therefore, EcN growth adheres to the logistic growth model while its metabolic pathways function predictably and coefficients or parameters are constants.
Reaction rates are either limited by enzyme kinetics, which can be described by the Michaelis-Menten equation, or proportional to reactants as in spontaneous first or second-order chemical reactions.

Bacteria Logistic Growth

The growth of our engineered probiotic, E. coli Nissle 1917 (EcN) is referred to repeatedly throughout our models. Moreover, EcN is a chassis widely utilized throughout biological research. Therefore, based on the comprehensive fluorescence kinetics data we've gathered (Figure. 10), we calculated the parameters $r = 0.49700362$ , $P_{0} = 0.00121806$ and $P_{m a x} = 0.66132188$ with the following logistic growth model fitting:

\frac{d P ( t )}{d t} = r P (t) (1 - \frac{P ( t )}{P _{m a x}})

Figure 10. Data on logistic growth of EcN in vitro from fluorescence kinetics experiments.

Simplified Model

Under this simplified scenario, we consider that the concentration of our targeted metabolite is catalyzed by an enzyme to form metabolite proportional to the growth of our engineered EcN. Therefore, we assume that the metabolic pathway is a linear one-way flux from substrate to product while the biochemical reactions of potential intermediate metabolites can be encompassed by one enzyme kinetics. We also assume that the metabolic reactions only produce the desired product with minimal or no byproducts. Following the previous modeling process in Genetic Circuit Calculator construction, we derive the following equations:

⎩ ⎨ ⎧ \frac{d p}{d t} \frac{d E}{d t} \frac{d B}{d t} \frac{d P ( t )}{d t} = C - γ p = α p - βE = \frac{k ES}{K + E} - δ P = r P (t) (1 - \frac{P ( t )}{P _{m a x}}) - λ B

Programming with Python to solve the system of differential equations above, we derived the following dynamic (Figure. 11).

Figure 11. Simulation of inhibitory reactions between butyrate and EcN population.

Web-based Model

However, the experimental results represent a scenario so much more complicated than the assumed conditions in the simplified model that it fails to capture the nature of metabolic pathways. On the other hand, if we consider each metabolite as dots and biochemical reactions as the lines linking them, a web-based model is highly applicable. From the metabolic pathway containing butyrate in EcN, we could simplify the relationship to four metabolites: butyrate, Acetyl-CoA, Succinate, and Acetate (Figure. 12).

Figure 12. Strategies of pyruvate catabolism by the human gut microbiome (Oliphant et al., 2019).

We assume that butyrate is the key metabolite indirectly related to EcN growth with the following differential equation:

\frac{d P ( t )}{d t} = r P (t) (1 - \frac{P ( t )}{P _{m a x}}) - λ B

When modeling the relationships (biochemical reactions) between nodes (metabolites), we assume they can be unrelated, related in a simple chemical reaction, or related in an enzyme-catalyzed reaction in either one of the following differential equations (see unrelated as simple chemical reaction with $k = 0$ ). Suppose that this is a one-way reaction from reactant $R$ to product $P$ :

⎩ ⎨ ⎧ \frac{d P}{d t} \frac{d P}{d t} = k RP OR = \frac{k R}{K + R}

For our scenario with butyrate, we produced the following results (Figure. 13).

Figure 13. Output of concentrations of all butyrate-related molecules and EcN population.

This is the essentials of our code for the web-based model:

def model(y, t, params, n):
            dydt = np.zeros(n)
            for i in range(n):
                for j in range(n):
                    if i != j:
                        if i in K_m:
                            K = K_m[i]
                            v = v_max[i]
                            dydt[i] -= v * y[i] / (K + y[i])
                            dydt[j] += v * y[i] / (K + y[i])
                        else:
                            k_f = params[i][j]  # Forward rate constant
                            dydt[i] -= k_f * y[i] * y[j]   
                            dydt[j] += k_f * y[i] * y[j] 
        
            return dydt

Conclusion

According to both the visualizations of our simplified model and web-based model, we observe that expression of metabolites have impact on the growth of EcN. Therefore, we're justified in incorporating the "food input" sensor VFA0359 or BreR in the genetic circuit to reduce our chassis' metabolic stress and increase its fitness when colonizing the gut environment.

3. Butyrate Biosensor Analysis

In our experiment, we found that the butyrate sensor, HpdR, has been unable to function properly. Therefore, we employed molecular dynamics simulation to try to identify our issue.

Firstly, we used Autodock-Vina to dock HpdR with butyrate, obtaining the static binding structure of the small molecule with the protein (Figure. 14).

Figure 14. Structural diagram of protein HpdR binding to butyrate.

Subsequently, we performed molecular dynamics simulation using Discovery Studio and conducted Hdock docking with DNA. The results showed that the docking score after molecular dynamics was higher than the docking score before molecular dynamics. The pre-molecular dynamics score was -306.7, and the post-molecular dynamics score was -322.08. This implies that after docking with butyrate, the docking between HpdR and PhpdH became more compact, which does not conform to the experimental results provided by the reference literature.

Upon analysis, we discovered that SarJ, a gene fragment in the biosensor, would tightly dock with the protein conformation after folding and butyrate docking, thereby blocking the transcription process (Figure. 15).

Figure 15. The structure of protein HpdR after molecular dynamics simulation bound to DNA PhpdH.

Later, we attempted to create PhpdH $Δ$ SarJ, but due to equipment failure, we did not obtain the final experimental results. We re-analyzed using the same method and obtained the following results (Figure. 16).

Figure 16. The structure of protein HpdR before/after MD bound to DNA PhpdHΔSarJ.

We found that the protein docking score after 200ps of docking was somewhat lower compared to the protein-DNA docking score before the molecular dynamics simulation. Upon analyzing the active sites, we discovered that there was a noticeable release or a significant decrease in the bonding of nucleic acids in the RBS region. Therefore, we believe that SarJ is the cause of the experimental failure.

Next, we will attempt to repeat the experiment and believe that the results will confirm our analysis.

4. AHL Biosensor Design

In the human small intestine, 3-oxo-C12:2 (3OC12:2) is a specific factor, and we aim to identify a protein that allows the expression of a specific protein using 3OC12:2 as an inducer. Unfortunately, our literature search did not yield any relevant proteins.

Attempt to generate an active site and perform homology modeling

First, we attempted to use RF diffusion to generate a model of a target protein that can bind with 3OC12:2. Then, we used the BLAST tool to search for transcriptional control proteins with similar binding sites for mutagenesis. Unfortunately, despite generating multiple target site sets for BLAST, we did not find suitable proteins (Figure. 17).

Figure 17. Protein structures designed using RF Diffusion and BLAST results.

Attempt to identify target proteins with mutational potential

Since there were no proteins suitable for direct mutation of the active site, we attempted docking 3OC12:2 with common regulatory proteins. We found that AhrC, which interacts with arginine to regulate DNA transcription, can also dock with 3OC12:2 at similar sites (Figure. 18). Therefore, we decided to modify the AhrC protein structure in hopes of discovering an AhrC-like protein that can competitively bind to the target molecule and regulate downstream gene transcription.

Figure 18. AhrC docking structure with arginine and 3OC12:2.

Target Protein Analysis and Active Site Transfer

Structural Analysis of AhrC

C-terminal domain: Responsible for the recognition and binding of arginine, it is a key region for regulating the activity of the transcription factor. This region typically includes a pocket-like binding site suitable for interaction with arginine molecules. It involves some polar and charged amino acid residues, forming hydrogen bonds and electrostatic interactions with the guanidinium and carboxyl groups of arginine.

N-terminal domain: Responsible for binding to the promoter double-stranded DNA, regulating gene transcription. The structure of this region is relatively fixed and directly affects the transcriptional regulatory function of AhrC.

Analysis of the Active Residues of AhrC

Through alanine scanning analysis using Discovery Studio, we can essentially identify the arginine-binding site in the C-terminal of the protein, and the range of amino acids maintaining the pocket structure as 97-126 (with a free energy difference of greater than or equal to 1 after mutation). To maintain the pocket structure, we have decided to retain residues 68-146 from the crystal structure when transferring the pocket (Figure. 19).

Figure 19. Alanine scanning results of AhrC small molecule docking domain.

Modification of the Active Residues of AhrC

QsrC (PDB: 3SZT) is a protein that can dock with 3OC12. By analyzing its sequence and crystal structure, we can determine its active site and interaction with 3OC12. Since the difference between 3OC12 and 3OC12:2 lies in the double bond, we believe that the activity will not be significantly affected. Therefore, we transferred the entire active site region of QsrC to AhrC to obtain the final sequence.

Verification

We used Autodock-Vina to dock 3-oxo-C12:2 to the modified sequence and exported the complex structure using PyMOL (Figure. 20).

Figure 20. The docking results of AhrC_like protein with small molecules.

Afterward, we conducted molecular dynamics (MD) simulations using Discovery Studio. The simulation was performed with the CHARMM36 force field. Due to computational resource limitations, we could only simulate for 300 ps, but based on our energy analysis, we believe this is sufficient to observe some changes in the DNA docking site. While the simulation could not perfectly capture the two states of the protein before and after modification, we observed a trend where the protein becomes more compact after the small molecule binds.

Next, we used HDOCK to dock the DNA from the original crystal structure with the protein both before and after the molecular dynamics simulation. This allowed us to obtain theoretical values for the closeness and docking scores of the protein-DNA interaction, both before and after small molecule binding. We found that, compared to the pre-binding protein structure, the interaction between DNA and the protein showed a significant decrease due to the small molecule causing the protein structure to become more compact. The reduction in the number of bonds resulted in a meaningful drop in the docking score. We believe that with a longer MD simulation time, or in real experimental conditions, the DNA will be released, allowing transcription to be initiated (Figure. 21).

Figure 21. Structure and score of AhrC_like protein before and after molecular dynamics binding to DNA.

However, due to time and financial constraints, we are unable to validate this through further experiments at the moment. In the future, we plan to conduct ITC experiments or use assays like qPCR and Western Blot (WB) in E. coli to detect downstream products and verify our hypothesis.

Sequence of Protein

  > sp|P17893|ARGR_BACSU Arginine repressor OS=Bacillus subtilis (strain 168) OX=224308 GN=argR PE=1 SV=1
      MNKGQRHIKIREIITSNEIETQDELVDMLKQDGYKVTQATVSRDIKELHLVKVPTNNGSYKYSLPADQRFNPLSKLKRALMDAFVKIDSASHMIVLKTMPGNAQAIGALMDNLDWDEMMGTICGDDTILIICRTPEDTEGVKNRLLELL
          
  > tr|G3XD77|G3XD77_PSEA Quorum-sensing control repressor OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=qscR PE=4 SV=1
      MHDEREGYLEILSRITTEEEFFSLVLEICGNYGFEFFSFGARAPFPLTAPKYHFLSNYPGEWKSRYISEDYTSIDPIVRHGLLEYTPLIWNGEDFQENRFFWEEALHHGIRHGWSIPVRGKYGLISMLSLVRSSESIAATEILEKESFLLWITSMLQATFGDLLAPRIVPESNVRLTARETEMLKWTAVGKTYGEIGLILSIDQRTVKFHIVNAMRKLNSSNKAEATMKAYAIGLLN
          
  > AhrC-like protein
      MNKGQRHIKIREIITSNEIETQDELVDMLKQDGYKVTQATVSRDIKELHLVKVPTNNGSYKYSLPADQRFNPLSKLKRALMDAFVKIDSASHMIVLKTMPGNAQAIGALMDNLDWDEMMGTICGDDTILIICRTPEDTEGVKNRLLELLSFGARAPFPLTAPKYHFLSNYPGEWKSRYISEDYTSIDPIVRHGLLEYTPLIWNGEDFQENRFFWEEALHHGIRHGWSIPVRGKYGLISMLSLVRSSESIAATEILEKESFLLWITSMLQATFGDLLAPRIVPESNVRLTARETEMLKWTAVGKTYGEIGLILSIDQRTVKFHIVNAMRKLNSSNKAEATMKAYAIGLLN
          
  > DNA sequence that combine with protein
      CATGAATAAAAATTCAAG

5. Pharmacokinetics Model

As our goal is to design a probiotic supplement that could mediate key metabolites in the gut environment in the long run, determining the optimum drug dosage and dosage regimen to optimize therapeutic benefit and its half-life is essential. We implemented the design of classical dose-response modeling to retain a balanced effective E. coli Nissle 1917 (EcN) population above 90% of its maximum population with data from fluorescence kinetics experiments and literature reviews. We predict the optimum dosage regimen with $T$ days between each dosage and the corresponding maximum EcN population, $P_{m a x}$ . To this end, we propose the following model based on logistic growth and population decay over time.

Assumptions and Parameters

Throughout the modeling process, we've unified the units of time and drug concentration/EcN population as hours and $OD_{600}$ respectively. Hence, we will refer to the following parameters:


Name	Definition
$t / h$	Time in hours is the independent variable of the following functions.
$H / OD_{600}$	Maximum safe drug concentration.
$L / OD_{600}$	Minimum effective drug concentration.
$C / OD_{600}$	Every single EcN dosage given at the beginning of each course of treatment.
$n$	The number of repeated dosages given.
$C (t) / OD_{600}$	The concentration of the drug as a function of time.
$R_{i} / OD_{600}$	The concentration of the drug at the end of its dosage. $i \in N, [1, n]$ .
$k$	Elimination constant (the rate at which the drug is metabolized).
$T / h$	The time interval of a course of treatment, or the time between 2 EcN dosage administrations.
$T_{m a x} / h$	The maximum time interval of a course of treatment that satisfies our constraint of $P_{m a x} - P (t) \leq 10% P_{m a x}$ for all $t$ in a course of treatment.
$G (t) / OD_{600}$	The theoretical logistic growth model represents EcN population colonizing the human gut as a function of time.
$G_{0} / OD_{600}$	Initial EcN population of the logistic growth model.
$G_{m a x} / OD_{600}$	Human gut environment carrying capacity for EcN under logistic growth.
$r$	Intrinsic EcN population growth rate under the logistic growth model's theoretical conditions.
$L (t) / OD_{600}$	Loss model for EcN population that fails to colonize the human gut as a function of time.
$λ$	Intrinsic rate that EcN population fails to colonize the human gut concerning its current population in the gut.
$P (t) / OD_{600}$	The EcN population presence in the human gut as a function of time.
$P_{m a x} / OD_{600}$	The maximum EcN population presence in the human gut during each course of treatment.

For practical convenience when modeling, we assume the following basic assumptions for the all of the following models:

There is a maximum population size for EcN (carrying capacity $G_{m a x}$ ) that the human gut environment can support. This is the upper limit of population size due to limited resources such as nutrients, space, and other necessities. The intrinsic growth rate $r$ of the population is dependent on its current density. As the population size increases, the growth rate decreases due to the increasing competition for resources. Conversely, the population has the potential to grow exponentially in the absence of those limiting factors.
The environment is assumed to be homogeneous, meaning that all individuals in the population experience the same conditions, which is also consistent over time. Therefore, parameters that don't reflect the intrinsic properties of our engineered EcN, such as $G_{ma x}$ are assumed to be constant over time in our model whereas those reflecting intrinsic properties such as $r$ and $λ$ may change.
Although it is assumed in the logistic growth model that there is no net movement of EcN into or out of its population, the initial population $G_{0}$ changes from single dosage EcN population $C$ after the first course of treatment on repeated dosage administrations over multiple courses of treatment. Therefore, we consider the end population of a course of treatment in addition to the single dosage population as the initial population $G_{0}$ for the next course of treatment.
Apart from the limitations of resources such as nutrients on population growth, EcN population fails to colonize the human gut due to competitive exclusion or immunoreaction, thus excreted with feces. Therefore, we considered a loss function of EcN population, $P (t)$ , assuming that the more EcN population is present in the gut environment, the more EcN fails to colonize due to interspecific and intraspecific competition. Therefore, the rate of losing EcN population is proportional to the current EcN population, $G (t) - L (t)$ .
The logistic growth of the EcN population starts with a population much less than $G_{m a x}$ . The existing gut microbiota far outnumber and outcompete our engineered EcN at the beginning of the first supplement dosage. Thus, the rate of population loss for our engineered EcN will surpass that of population growth as they fail to colonize the gut environment. Hence, without further administration of EcN dosage, the population of EcN present in the gut environment is expected to drop to a value between $G_{0}$ and $G_{m a x}$ .

Drug Dosage Model

This model is a preliminary model assuming that administered EcN behaves similarly to the average pharmaceutical product, overlooking its intrinsic growth as living organisms due to strong selection pressure from factors like gut microbiota and immunoreaction. Therefore, we assume that the rate of elimination of our probiotic "drug" is linearly related to its current concentration in the gut. This leads to a proportionate decrease from the current drug level, described by the following differential equation. This shows that drug concentration drops exponentially over time, decaying at a rate of $e^{- k t}$ .

\frac{d C ( t )}{d t} C (t) = - k C (t) = C e^{- k t}

Consider repeated dosages after each time interval $T$ over multiple courses of treatment. Each dose "inherits" the drug level in the gut from previous administrations. The repeated dosages are consistent with the initial dosage $C$ . Therefore, the $n$ th dose corresponds to the drug concentration through iteration:

R_{1} C_{i} (t) R_{i} (t) R_{n} (t) R = C e^{- k T} = C + R_{i} = C_{i - 1} (t) e^{- k T} = C (e^{- k T} + e^{- 2 k T} + \dots + e^{- nk T}) = C \frac{e ^{- k T} ( 1 - e ^{- nk T} )}{1 - e ^{- k T}} = n \to \infty lim R_{n} (t) = \frac{C e ^{- k T}}{1 - e ^{- k T}} = \frac{C}{e ^{k T} - 1}

To ensure the drug concentration remains between minimum and maximum safe drug concentrations, we let $C = H - L$ . For long-term stability, we assume $R = L = \frac{C}{e ^{k T} - 1}$ so $H = C + R$ . Therefore, $T = \frac{1}{k} ln \frac{C + L}{L} = \frac{1}{k} ln \frac{H}{L}$ . We draw an illustration of our probiotic supplement functioning according to the repeated dosage model (Figure. 22) where the blue line shows a peak in concentration between doses, then decays until the next dose is given. From our illustration, we ascertained the optimum dosage interval $T$ at about 6.25 days.

Figure 22. The fluctuation of drug concentration (EcN population colonizing the gut) with respect to time (blue line) and 90% of the "maximum safe concentration" (red line).

Integrated Model

Gut microbiota growth essentially follows the logistic growth function, shown in the equation below. This growth is limited by a carrying capacity $G_{m a x}$ , which stands for the maximum sustainable population that can be present in the gut. The rate constant $r$ governs the growth rate of the bacteria replication.

\frac{d G ( t )}{d t} = r \cdot G (t) (1 - \frac{G ( t )}{G _{m a x}})

This logistic growth function results in the population increasing leading up to $G_{m a x}$ and then decreasing upon approaching this carrying capacity. $G (t)$ could be represented by the following equation:

G (t) = \frac{G _{m a x}}{1 + ( \frac{G _{m a x} - G _{0}}{G _{0}} ) e ^{- r t}}

Along with this growth is a natural decay in population that occurs over time, due to various factors such as immunoreaction, competitive exclusion, or even reduced fitness of engineered EcN due to metabolic stress. Such decay can be thought of as a loss function that changes proportionally to the current population $P (t)$ . Therefore, the EcN population that succeeds in colonizing the gastrointestinal tract are represented by:

P (t) = G (t) - L (t)

Where the loss function is represented as follows, $λ$ indicating the rate of decay of the microbiome population per unit time:

\frac{d L ( t )}{d t} = λ \cdot (G (t) - L (t))

To optimize the time interval of periodic dosing, we approach the problem by setting the derivative of EcN population colonizing the gut, $\frac{d P ( t )}{d t}$ , as 0. Then, we performed a nonlinear optimization with the constraints that $\frac{d P ( t )}{d t} = 0$ and $P_{m a x} - P (t) \leq 10% P_{m a x}$ . We found that the optimized time interval $T$ is about 200h/8 days (Figure. 22.).

Figure 23. Change in engineered EcN gut population and loss population with respect to time.

Conclusion

The model we presented here establishes a mathematical basis for optimizing the dosing interval $T$ to maximize the EcN population colonizing the human gut. From the comprehensive results of the two models, we ascertained the optimum dosing interval to be about a week. This suggests that for the long-term function of our probiotic platform, increasing the population of EcN that successfully colonizes the gut is essential.

References

Nielsen, A. A., Der, B. S., Shin, J., Vaidyanathan, P., Paralanov, V., Strychalski, E. A., ... & Voigt, C. A. (2016). Genetic circuit design automation. Science, 352(6281), aac7341.

Jones, T. S., Oliveira, S. M., Myers, C. J., Voigt, C. A., & Densmore, D. (2022). Genetic circuit design automation with Cello 2.0. Nature protocols, 17(4), 1097-1113.

Oliphant, K., & Allen-Vercoe, E. (2019). Macronutrient metabolism by the human gut microbiome: major fermentation by-products and their impact on host health. Microbiome. 2019; 7: 91. Iss, 1, 91.

Menu

1. Genetic Circuit Calculator

Assumptions and Parameters

Stimulation Model

Repression Model

Expression Model

Transcriptional Output Model

Multi-layered Relation Model

Application

Conclusion

2. Metabolic Stress Model

Assumptions and Parameters

Bacteria Logistic Growth

Simplified Model

Web-based Model

Conclusion

3. Butyrate Biosensor Analysis

4. AHL Biosensor Design

Attempt to generate an active site and perform homology modeling

Attempt to identify target proteins with mutational potential

Target Protein Analysis and Active Site Transfer

Structural Analysis of AhrC

Analysis of the Active Residues of AhrC

Modification of the Active Residues of AhrC

Verification

Sequence of Protein

5. Pharmacokinetics Model

Assumptions and Parameters

Drug Dosage Model

Integrated Model

Conclusion

References