| SUSTechOCEAN - iGEM 2024

Section I

Aims

To provide guidance for the experiment through modeling: comparing the efficiency of Carbonate Anhydrase (CA) production by Vibrio natriegens when using glucose and sucrose as the carbon sources.

Problem Framing and Approach

Vibrio natriegens does not naturally produce CA. We achieved CA production by introducing a plasmid containing the CA gene. Since CA is not a reactant, we cannot directly evaluate Vibrio natriegens ' CA production efficiency by measuring CA output in the model. However, CA acts as a catalyst that facilitates the reaction between carbon dioxide and water to form bicarbonate. Thus, from a modeling perspective, by using Flux Balance Analysis (FBA), we can indirectly assess Vibrio natriegens ' CA production efficiency by monitoring the rate of this reaction.

Assumptions

► The carbon source is sufficiently abundant, and its shortage does not reduce Vibrio natriegens ' CA production efficiency.
► The enzyme concentration is linearly related to the reaction rate, meaning the higher the enzyme content, the faster the reaction.
► No other factors besides the enzyme affect the reaction rate.

Tools

► Python 3.8.18 package cobra (Constraint-Based Reconstruction and Analysis).
► Models: iLC858.sbml ^[1] and Recon3D.xml ^[2] .

Experiment

Since Vibrio natriegens does not naturally produce CA, the existing models cannot directly simulate the experiment. We identified two models: iLC858.sbml , which provides and configures the basic growth environment of Vibrio natriegens , including all associated reactions, simulating the required medium conditions as closely as possible; and Recon3D.xml , which contains the HCO3 equilibration reaction associated with the CA3 gene. By incorporating this reaction into the first model, we simulate the plasmid’s introduction to create a new reaction pathway.

Upon examining the reaction components, we found that all necessary substances exist in the first model, allowing us to construct the CA-catalyzed bicarbonate production pathway within it. The rate of this reaction reflects the rate of CA production driven by the CA3 gene, thus quantifying Vibrio natriegens ' CA production efficiency. The reaction’s upper and lower bounds were set to 1000, indicating it is reversible and reaches a steady rate at equilibrium.

Both glucose and sucrose were found among the substrates in the model. By setting one of them to zero at the initial condition, we ensured the carbon source input was isolated to a single type. The optimization criterion was changed to maximize biomass, and we ran the model twice. The first run was a pre-experiment to simulate natural conditions in the medium. During the second run, we altered the carbon source input rates and monitored growth rates and biomass at equilibrium, thus evaluating Vibrio natriegens ' CA production efficiency and plotting the corresponding curves.

From the experimental results, we observe that under varying input rates, when sucrose is used as the carbon source, both the growth rate and biomass are slightly higher compared to when glucose is used.

Section II

Aims

► See the areas with higher calcium ion concentration the Daya Bay area intuitively through the visualization analysis of the calcium ion concentration distribution.
► Show the importance and necessity of introducing CA and CARPS genes to enhance the calcium ion concentration in seawater for mineralization experiments.

Experimental background analysis

In the implementation phase of the mineralization experiment, this study aims to promote the precipitation of calcium carbonate minerals through bioengineering technology. Specifically, the CA gene expression system will be used to enhance the solubility of carbon dioxide in water, while the CARPS gene expression system will be used to promote the combination of calcium ions and carbonate ions, thereby promoting the crystallization process of calcium carbonate minerals. Given that areas with higher calcium ion concentrations are more conducive to the nucleation and growth of minerals, and considering that genetic engineering methods cannot directly regulate the calcium ion concentration in seawater, this study will focus on exploring areas with higher calcium ion concentrations in seawater to optimize the placement of the experimental device, thereby improving the actual application efficiency and the output rate of mineral precipitation.

Data Source

► The experimental team set up monitoring stations at five locations in the Daya Bay area (Baguang, Dongshan, Yangmeikeng, Nuclear Power Plant, and Tung Chung) and conducted offshore operations to collect calcium ion concentration data at different times.
► The Chinese Ecosystem Research Network (CERN) provided historical data on dissolved inorganic salt concentrations in seawater in the Daya Bay area in 2010 and 2011.

Modeling Approach

► Data processing: Slice by month and fill missing values.
► A continuous grid of values is created through two-dimensional interpolation , which represents the estimated value of inorganic carbon concentration within a given longitude and latitude range. This grid is then converted into heat map data, where the value of each grid point represents the inorganic carbon concentration at that location.

Specific Analyses and Results

Visualization of Calcium Ion Concentration Distribution

Based on the data provided by the experimental group, the modeling group created a dynamic visualization of the calcium ion concentration distribution based on changes over time.

At the same time, for the data provided by CERN, we selected the dissolved inorganic salt concentration data in February and August 2010 as representatives, and then drew a distribution map of calcium ion concentration in the area.

The two visualization results helped the experimental team to identify areas with higher calcium ion concentrations in the Daya Bay area to a certain extent.

The Necessity of Genetic Experiments

During the background investigation, the team obtained the geographical distribution of dissolved organic carbon (DOC) in Daya Bay in different seasons from HUANG Dao-jian (2012) ^[3] .

When reviewing the literature ^[3] on surface dissolved organic carbon (DOC) concentrations in the Daya Bay area, the team found that the DOC concentration in the area fluctuated between 0.953 and 2.109 mg·L-1, which is a relatively low value. In view of this, introducing CA and CARPS genes to enhance the calcium ion concentration in seawater has important scientific significance and application value for promoting the formation of calcium carbonate minerals. Increasing the calcium ion concentration through genetic engineering can provide more favorable conditions for the precipitation of calcium carbonate minerals, which enhance the mineralization potential of the area.

Section III

Aims

Under specific light exposure time and intensity, find the ocean depth threshold that can ensure the suicide of Vibrio natriuresis and minimize the spread of toxic proteins to meet biosafety requirements.

Experimental Background Analysis

When designing the suicide experiment of Vibrio natrii, the experimental team's goal was to minimize the spread of the toxic protein KillerRed to maintain biosafety. According to Maria E Bulina's research in 2005, KillerRed has very low phototoxicity under blue light (460nm~490nm), almost zero, indicating that blue light does not activate the toxicity of KillerRed.^[4] Therefore, the experimental team plans to use red or green light to activate the phototoxicity of KillerRed, while blue light is used during the growth period of KillerRed to promote its expression but not trigger toxicity. This strategy is designed to effectively control Vibrio natrii while ensuring the biosafety of the experiment and the environment. blue_rectangle In the natural environment, according to the central limit theorem, it can be assumed that the content of toxic proteins in Vibrio natrii cells follows a normal distribution. Assuming that the expected killing rate is constant, the modeling team needs to verify whether the content of toxic proteins reaches the critical value corresponding to the killing rate under a specific blue light irradiation intensity or irradiation time to prove the feasibility of experimental killing.

Model Approaching

► Surrogate Modeling: Use DsRed protein instead of killerRed protein for research (the feasibility will be shown later).
► Visible light attenuation model: Gain different light attenuation coefficients under different wavelengths of light.
► Beer-Lambert Law: When the wavelength of light and the light intensity at sea level are determined, the actual light intensity is calculated from the depth of the ocean.

Tools

► WebPlotDigitizer software: Extract and analyze image data.

Specific Modeling Process and Results

The Feasibility of Surrogate Modeling

In the absence of direct literature support, this study used an indirect method to study the growth mechanism of toxic proteins under blue light. Since the molecular weight of DsRed protein is similar to that of KillerRed protein, and the expression level of DsRed protein is stable and has low toxicity to host cells, DsRed protein was used in the study to simulate the expression of KillerRed protein in host cells, so as to infer the behavior of KillerRed protein.

At the same time, since the culture system is not affected by green light or red light, DsRed protein can accurately simulate the enrichment and high-concentration maintenance process of KillerRed protein in host cells, which is crucial for studying the growth mechanism of KillerRed protein under blue light conditions.

Confirm Lighting Conditions

In the natural environment, according to the central limit theorem, it can be assumed that the content of toxic proteins in Vibrio natrii cells follows a normal distribution. Assuming that the expected killing rate is constant, the modeling team needs to verify whether the content of toxic proteins reaches the critical value corresponding to the killing rate under a specific blue light irradiation intensity or irradiation time to prove the feasibility of experimental killing.

In Ohlendorf (2012), it was observed that the accumulation of DsRed protein under blue light irradiation increased exponentially with the increase of light intensity and time until it reached a plateau.^[5] This pattern can be visualized by the fitting curve in the figure, thus predicting the steady-state expression level of DsRed protein under specific light conditions.

By using WebPlotDigitizer software, the study found that when the blue light intensity was 100 \(nW \cdot cm^{-2}\), the time required for DsRed protein expression to reach steady state was at least 4.633 hours, and the mean protein content was about 0.904. When the blue light intensity was at least 35.228 \(nW \cdot cm^{-2}\), the final expression of DsRed protein tended to be stable, with a mean of about 0.705. Based on these data, it can be reasonably inferred that if the blue light intensity exceeds 35 \(nW \cdot cm^{-2}\) (equivalent to 376.736 lx) and the illumination time exceeds 5 hours, the content of DsRed protein is sufficient to achieve effective killing efficiency.

Ocean Depth Lower Threshold (Blue Light Band Analysis)

In this study, in order to accurately measure the blue light intensity under experimental conditions and correlate it with the seawater depth, the team introduced seawater depth as a variable and drew on the model of seawater attenuation of visible light proposed in the study of Li Li (2014).^[6] Considering the similarity between the laboratory setting and the seawater environment studied by Li Li, the research team adopted the total attenuation coefficient of \(0.1105m^{-1}\) for light with a wavelength of 475nm at a chlorophyll concentration of \(0.01mg/m^3\) provided in the literature. This band covers the range of blue light (460nm~490nm) for subsequent calculation of blue light intensity.

Sea area (chlorophyll concentration)	Pure sea water	\( 0.01 \, \text{mg/m}^3 \)	\( 0.1 \, \text{mg/m}^3 \)
Attenuation coefficient (\( m^{-1} \))	0.0198	0.1105	0.3052

Based on the total attenuation model of visible light by seawater, the actual blue light intensity can be calculated by the Beer-Lambert law, given the sea surface light intensity and depth. The formula is as follows. \(I(z) = I_0 \cdot e^{-k(\lambda) \cdot z}\) In this model, \(z\) represents the depth of the ocean in meters (m), \(I(z)\) represents the light intensity at depth \(z\) in lux (lx), \(I_0\) is the blue light intensity at sea level, and \(k(\lambda)\) is the total attenuation coefficient of blue light at a specific wavelength \(\lambda\), which is taken as \(0.1105m^{-1}\) in this study.

The experimental group conducted field measurements of the sea level light intensity at a specific geographical location (longitude 113.07325, latitude 21.89165). The measured light intensity values ranged 25,000 lux (lx) to 185,400 lux (lx). This study selected the sea level light intensity at this geographical location as a reference benchmark to verify the rationality of the experimental design.

Specifically, when the sea level light intensity is the lowest at 25,000 lx, the calculation shows that the experimental conditions can be met when the sea depth does not exceed 38 meters. When the sea level light intensity is the measured mean of 101,620 lx, the experimental conditions can also be met when the sea depth does not exceed 51 meters. These calculations provide theoretical support and parameter range for subsequent experimental design.

Ocean Depth Lower Threshold (Red Light Band Analysis)

In order to verify the potential of DsRed protein as a substitute for KillerRed protein, the study must ensure that both proteins do not show toxicity under low-intensity green light or red light. Studies have shown that the lower limit of the light intensity threshold for DsRed protein to produce toxicity under green and red light is 10 \(nW \cdot cm^{-2}\), namely 107.63 lux (lx). Therefore, the subsequent experimental design should be carried out at a light intensity below this threshold to ensure the accuracy and reliability of the experiment.

We take the red light band as an example (wavelength between 620 nanometers and 750 nanometers), and we select red light with a wavelength of 685 nanometers as a typical representative for analysis. Li Li (2014)'s study on the visible light attenuation model gave the results of the attenuation coefficient of light of different wavelengths under different chlorophyll concentrations (as shown in the figure below). Using WebPlotDigitizer software, we found that when the chlorophyll concentration is \(0.01 mg/m^{3}\), the light attenuation coefficient of red light with a wavelength of 685nm is 0.5325.

According to the Beer-Lambert law, when the sea level light intensity is at the maximum value of the measurement range of 185400 lx, red light cannot stimulate the toxicity of DsRed protein when the sea depth exceeds 18.5 meters. Furthermore, when the sea level light intensity takes the average value of the measurement range of 101620 lx, the calculation results show that red light basically cannot stimulate the toxicity of DsRed protein when the sea depth does not exceed 17 meters.

Conclusion

When designing a suicide experiment for Vibrio natrii under natural light conditions, the depth of seawater is a key factor, which affects the effectiveness of the suicide mechanism and the spread of toxic proteins. According to the experimental design, the suicide effect is best when the seawater depth is between 18.5 meters and 28 meters, which can effectively prevent the spread of toxic proteins and achieve the goal of biosafety. In the range of seawater depth between 17 meters and 56 meters, the suicide effect basically meets the requirements, can better prevent the spread of toxic proteins, and basically meet the goal of biosafety.

Section IV

Aims

We obtained a set of data from the experimental group, which describes the optical density (OD) of Vibrio Natriegens over time, representing the growth curve of the bacteria. Since the experimental data cannot be continuously measured, and sometimes measurements are taken at uneven intervals, we need to fit the curve to get continuous biomass information. This will enable us to predict and evaluate the growth of Vibrio Natriegens.

Problem Introduction

Why Linear Regression Won't work?

After initially examining the scatter plot of Vibrio natriegens optical density over time, we found that the relationship is not linear. A linear regression model would not fully capture the characteristics of the growth curve. Therefore, we must use non-linear methods for data fitting.

Limitations of Conventional Non-Linear Fitting

Conventional non-linear fitting methods have the following limitations:

► They require assumptions about data distribution.
► They may overfit by capturing excessive variations in the data.

Additionally, the growth curve has time-related characteristics, meaning that changes in data are related to time. In a neural network, a Recurrent Neural Network(RNN) has the ability to handle both time series and non-linear fitting, making it a good choice for this problem.

Mathematical Model of Growth

Huang's Model

In our literature review, we found that microbial growth can be divided into phases and can be described as a continuous and differentiable curve. Huang ^[7] ^[8] developed a basic model to describe bacterial growth, which divides the process into three phases: lag, exponential, and stationary phases. The complete mathematical formula for the growth curve is as follows: \(Y = Y_0 + Y_{max} - ln\{e^{Y_0} + [e^{Y_{max}} - e^{Y_0}]e^{-\mu_{max}B(t)}\}\) where B(t) is expressed as: \(B(t) = t + \frac{1}{\alpha}ln\frac{1+e^{-\alpha(t-\lambda)}}{1+e^{\alpha\lambda}}\) Parameters to be fitted:

► \(Y_{max}\), \(\mu_{max}\), \(\alpha\), and \(\lambda\) are the parameters to be fitted.
► \(t\) is time.

Model Application:
In the original formula, \(Y\) represents the logarithm of cell counts, but in our experiment, we record optical density, which also reflects biomass. Therefore, this formula can be applied to our data.

Why Use RNN Instead of Traditional Methods

RNN vs. Traditional Numerical Solutions

In conventional non-linear fitting, Huang^[9] used the differential equation form of the model for optimization. The differential equation is derived by differentiating \(Y\) with respect to time \(t\): \(\frac{dY}{dt} = \frac{\mu_{max}}{1+e^{-\alpha(t-\lambda)}}(1-e^{Y-Y_{max}})\) Huang used the ode45 method in MATLAB to solve this differential equation. However, this method is less robust when the data is large, and it may experience convergence issues in regions where the data changes significantly.

By observation, we found that an RNN can actually correspond to a numerical solution of ODEs. Specifically:

ODE's Numerical Solution
The iterative formula introduced by Euler is as follows: \(x(t+h)=x(t)+hf(x(t),t)\) where \(h\) is the step size.

RNN and ODE's Numerical Solution
In an RNN, the recursive formula is \(y_t=f(y_{t-1},x_t,t)\), which is similar to the ODE's iterative formula. In fact, we can use \(h\) as the time unit, treating \(t=nh\), and the iterative formula becomes: \(x((n+1)h)=x(nh)+hf(x(nh),nh)\) Thus, the numerical solution of ODEs can be seen as a special case of RNN.

Methodology

Model Training Process

Parameter Initialization:
The parameters to be fitted include \(\mu_{max}\), \(\alpha\), \(\lambda\), and \(Y_{max}\). According to Huang^[9], we observed that these parameters are of the order of magnitude of \(10^1\). Therefore, we initialized these parameters to 1 and set them as trainable parameters for gradient updates during training.
Forward Propagation:
In the forward function, we implement the recursive calculation, where each step's value is generated by adding the previous step's value to the product of the step size and the derivative value.
Non-Uniform Time Measurements:
Since the experimental data has non-uniform time intervals, some values need to be masked during the loss calculation. We only calculate the loss for the measured time points.
Optimization:
We used the Adam optimizer, with an initial learning rate of 0.01. The learning rate is automatically adjusted during training to find the optimal solution. After multiple rounds of testing, we found that training for 100 epochs yielded the best results.

Tools

► Python 3.8.18.
► torch 1.12.1.

Result

Fitting Curve

The fitted growth curve is shown below:

Fitting Parameters

The fitted parameters are shown in the table below:

Parameters	\( \mu_{max} \)	\( \alpha \)	\( \lambda \)	\( Y_{max} \)
Values	0.1316	1.6364	1.6417	1.2153

Conclusion

By using RNN for non-linear fitting, we were able to capture the dynamic changes of the Vibrio natriegens growth curve more effectively, especially when faced with non-uniform time measurements. After training for 100 epochs, we obtained the optimal fitting result.

Section V

Aims

Provide guidance for experiments through modeling: predict the structure of the mCherry reporter protein connected to CaBP-ChBD and determine its validity.

Problem Framing, Approach and Tools

During the expression validation of the CaBP-ChBD engineered protein, the experimental group observed blue fluorescence under the fluorescence microscope. The members of the experimental group initially hypothesized that this phenomenon could be due to two potential reasons: denaturation of the mCherry reporter protein or a malfunction of the fluorescence microscope. Considering that repairing the fluorescence microscope would take a long time, the experimental group requested our assistance in using AlphaFold to predict the protein structure, helping them to intuitively analyze whether the protein had denatured.

Modeling Result

Among the results obtained from multiple runs of the model, the protein structure model with the highest confidence indicated that the chromophore of mCherry connected to CaBP-ChBD had not denatured. We also analyzed several other structures and found no obvious denaturation, ruling out issues with the target gene itself. After this feedback was provided to the experimental group, they decided to borrow another fluorescence microscope to conduct their experiments.

Most reliable structure

Other structures of model's prediction

Experiments

Result of First Round:

Result of Second Round:

References

[1] Coppens, L., Tschirhart, T., Leary, D. H., Colston, S. M., Compton, J. R., Hervey, W. J., 4th, Dana,K. L., Vora, G. J., Bordel, S., & Ledesma-Amaro, R. (2023). Vibrio natriegens genome-scale modelingreveals insights into halophilic adaptations and resource allocation. Molecular systems biology, 19(4),e10523. https://doi.org/10.15252/msb.202110523

[2] Brunk, E., Sahoo, S., Zielinski, D. et al. Recon3D enables a three-dimensional view of genevariation in human metabolism. Nat Biotechnol 36, 272–281 (2018). https://doi.org/10.1038/nbt.4072

[3] HUANG Dao-jian, GUO Zhen-ren, QI Shi-bin, et al. Spatial-temporal distribution of dissolved organic carbon in the Daya Bay[J]. Ecological Science, 2012, 31(05): 548-552.

[4] Bulina ME, Chudakov DM, Britanova OV, Yanushevich YG, Staroverov DB, Chepurnykh TV, Merzlyak EM, Shkrob MA, Lukyanov S, Lukyanov KA. A genetically encoded photosensitizer. Nat Biotechnol. 2006 Jan;24(1):95-9. doi: 10.1038/nbt1175. Epub 2005 Dec 20. PMID: 16369538.

[5] Ohlendorf R, Vidavski RR, Eldar A, Moffat K, Möglich A. From dusk till dawn: one-plasmid systems for light-regulated gene expression. J Mol Biol. 2012 Mar 2;416(4):534-42. doi: 10.1016/j.jmb.2012.01.001. Epub 2012 Jan 8. Erratum in: J Mol Biol. 2014 Jan 24;426(2):500. PMID: 22245580.

[6] Li Li.Underwater Portable Video Communication System Based On Blue Light LED[D].Nanjing University of Posts and Telecommunications,2014.

[7] Huang, L. (2008). Growth kinetics of Listeria monocytogenes in broth and beef frankfurters—determination of lag phase duration and exponential growth rate under isothermal conditions. Journal of Food Science , 73, E235-E242.

[8] Huang, L. (2010). Growth kinetics of Escherichia coli O157:H7 in mechanically-tenderized beef. International Journal of Food Microbiology , 140, 40-48.

[9] Huang, L. (2013). Optimization of a new mathematical model for bacterial growth. Food Control, 32, 283-288.