To provide guidance for the experiment through modeling: comparing the efficiency of Carbonate Anhydrase (CA) production by Vibrio natriegens when using glucose and sucrose as the carbon sources.
Section I
Aims
Problem Framing and Approach
Vibrio natriegens does not naturally produce CA. We achieved CA production by introducing a plasmid containing the CA gene. Since CA is not a reactant, we cannot directly evaluate Vibrio natriegens ' CA production efficiency by measuring CA output in the model. However, CA acts as a catalyst that facilitates the reaction between carbon dioxide and water to form bicarbonate. Thus, from a modeling perspective, by using Flux Balance Analysis (FBA), we can indirectly assess Vibrio natriegens ' CA production efficiency by monitoring the rate of this reaction.
Assumptions
►
The carbon source is sufficiently abundant, and its shortage does not reduce
Vibrio natriegens
' CA production efficiency.
►
The enzyme concentration is linearly related to the reaction rate, meaning the higher the enzyme
content, the faster the reaction.
►
No other factors besides the enzyme affect the reaction rate.
Tools
►
Python 3.8.18 package
cobra
(Constraint-Based Reconstruction and Analysis).
►
Models:
iLC858.sbml
[1]
and
Recon3D.xml
[2]
.
Experiment
Since Vibrio natriegens does not naturally produce CA, the existing models cannot directly simulate the experiment. We identified two models: iLC858.sbml , which provides and configures the basic growth environment of Vibrio natriegens , including all associated reactions, simulating the required medium conditions as closely as possible; and Recon3D.xml , which contains the HCO3 equilibration reaction associated with the CA3 gene. By incorporating this reaction into the first model, we simulate the plasmid’s introduction to create a new reaction pathway.
Upon examining the reaction components, we found that all necessary substances exist in the first model, allowing us to construct the CA-catalyzed bicarbonate production pathway within it. The rate of this reaction reflects the rate of CA production driven by the CA3 gene, thus quantifying Vibrio natriegens ' CA production efficiency. The reaction’s upper and lower bounds were set to 1000, indicating it is reversible and reaches a steady rate at equilibrium.
Both glucose and sucrose were found among the substrates in the model. By setting one of them to zero at the initial condition, we ensured the carbon source input was isolated to a single type. The optimization criterion was changed to maximize biomass, and we ran the model twice. The first run was a pre-experiment to simulate natural conditions in the medium. During the second run, we altered the carbon source input rates and monitored growth rates and biomass at equilibrium, thus evaluating Vibrio natriegens ' CA production efficiency and plotting the corresponding curves.
From the experimental results, we observe that under varying input rates, when sucrose is used as the carbon source, both the growth rate and biomass are slightly higher compared to when glucose is used.
Section II
Aims
►
See the areas with higher calcium ion concentration the Daya Bay area intuitively through the
visualization analysis of the calcium ion concentration distribution.
►
Show the importance and necessity of introducing CA and CARPS genes to enhance the calcium ion
concentration in seawater for mineralization experiments.
Experimental background analysis
In the implementation phase of the mineralization experiment, this study aims to promote the precipitation of calcium carbonate minerals through bioengineering technology. Specifically, the CA gene expression system will be used to enhance the solubility of carbon dioxide in water, while the CARPS gene expression system will be used to promote the combination of calcium ions and carbonate ions, thereby promoting the crystallization process of calcium carbonate minerals. Given that areas with higher calcium ion concentrations are more conducive to the nucleation and growth of minerals, and considering that genetic engineering methods cannot directly regulate the calcium ion concentration in seawater, this study will focus on exploring areas with higher calcium ion concentrations in seawater to optimize the placement of the experimental device, thereby improving the actual application efficiency and the output rate of mineral precipitation.
Data Source
►
The experimental team set up monitoring stations at five locations in the Daya Bay area
(Baguang,
Dongshan, Yangmeikeng, Nuclear Power Plant, and Tung Chung) and conducted offshore operations to
collect calcium ion concentration data at different times.
►
The Chinese Ecosystem Research Network (CERN) provided historical data on dissolved inorganic
salt
concentrations in seawater in the Daya Bay area in 2010 and 2011.
Modeling Approach
►
Data processing: Slice by month and fill missing values.
►
A continuous grid of values is created through
two-dimensional interpolation
, which represents the estimated value of inorganic carbon concentration within a given
longitude
and latitude range. This grid is then converted into heat map data, where the value of each grid
point represents the inorganic carbon concentration at that location.
Specific Analyses and Results
Visualization of Calcium Ion Concentration Distribution
Based on the data provided by the experimental group, the modeling group created a dynamic visualization of the calcium ion concentration distribution based on changes over time.
At the same time, for the data provided by CERN, we selected the dissolved inorganic salt concentration data in February and August 2010 as representatives, and then drew a distribution map of calcium ion concentration in the area.
The two visualization results helped the experimental team to identify areas with higher calcium ion concentrations in the Daya Bay area to a certain extent.
The Necessity of Genetic Experiments
During the background investigation, the team obtained the geographical distribution of dissolved organic carbon (DOC) in Daya Bay in different seasons from HUANG Dao-jian (2012) [3] .
When reviewing the literature [3] on surface dissolved organic carbon (DOC) concentrations in the Daya Bay area, the team found that the DOC concentration in the area fluctuated between 0.953 and 2.109 mg·L-1, which is a relatively low value. In view of this, introducing CA and CARPS genes to enhance the calcium ion concentration in seawater has important scientific significance and application value for promoting the formation of calcium carbonate minerals. Increasing the calcium ion concentration through genetic engineering can provide more favorable conditions for the precipitation of calcium carbonate minerals, which enhance the mineralization potential of the area.
Section III
Aims
Under specific light exposure time and intensity, find the ocean depth threshold that can ensure the suicide of Vibrio natriuresis and minimize the spread of toxic proteins to meet biosafety requirements.
Experimental Background Analysis
When designing the suicide experiment of Vibrio natrii, the experimental team's goal was to minimize the spread of the toxic protein KillerRed to maintain biosafety. According to Maria E Bulina's research in 2005, KillerRed has very low phototoxicity under blue light (460nm~490nm), almost zero, indicating that blue light does not activate the toxicity of KillerRed.[4] Therefore, the experimental team plans to use red or green light to activate the phototoxicity of KillerRed, while blue light is used during the growth period of KillerRed to promote its expression but not trigger toxicity. This strategy is designed to effectively control Vibrio natrii while ensuring the biosafety of the experiment and the environment. In the natural environment, according to the central limit theorem, it can be assumed that the content of toxic proteins in Vibrio natrii cells follows a normal distribution. Assuming that the expected killing rate is constant, the modeling team needs to verify whether the content of toxic proteins reaches the critical value corresponding to the killing rate under a specific blue light irradiation intensity or irradiation time to prove the feasibility of experimental killing.
Model Approaching
► Visible light attenuation model: Gain different light attenuation coefficients under different wavelengths of light.
► Beer-Lambert Law: When the wavelength of light and the light intensity at sea level are determined, the actual light intensity is calculated from the depth of the ocean.
Tools
Specific Modeling Process and Results
The Feasibility of Surrogate Modeling
In the absence of direct literature support, this study used an indirect method to study the growth mechanism of toxic proteins under blue light. Since the molecular weight of DsRed protein is similar to that of KillerRed protein, and the expression level of DsRed protein is stable and has low toxicity to host cells, DsRed protein was used in the study to simulate the expression of KillerRed protein in host cells, so as to infer the behavior of KillerRed protein.
At the same time, since the culture system is not affected by green light or red light, DsRed protein can accurately simulate the enrichment and high-concentration maintenance process of KillerRed protein in host cells, which is crucial for studying the growth mechanism of KillerRed protein under blue light conditions.
Confirm Lighting Conditions
In the natural environment, according to the central limit theorem, it can be assumed that the content of toxic proteins in Vibrio natrii cells follows a normal distribution. Assuming that the expected killing rate is constant, the modeling team needs to verify whether the content of toxic proteins reaches the critical value corresponding to the killing rate under a specific blue light irradiation intensity or irradiation time to prove the feasibility of experimental killing.
In Ohlendorf (2012), it was observed that the accumulation of DsRed protein under blue light irradiation increased exponentially with the increase of light intensity and time until it reached a plateau.[5] This pattern can be visualized by the fitting curve in the figure, thus predicting the steady-state expression level of DsRed protein under specific light conditions.
By using WebPlotDigitizer software, the study found that when the blue light intensity was 100 \(nW \cdot cm^{-2}\), the time required for DsRed protein expression to reach steady state was at least 4.633 hours, and the mean protein content was about 0.904. When the blue light intensity was at least 35.228 \(nW \cdot cm^{-2}\), the final expression of DsRed protein tended to be stable, with a mean of about 0.705. Based on these data, it can be reasonably inferred that if the blue light intensity exceeds 35 \(nW \cdot cm^{-2}\) (equivalent to 376.736 lx) and the illumination time exceeds 5 hours, the content of DsRed protein is sufficient to achieve effective killing efficiency.
Ocean Depth Lower Threshold (Blue Light Band Analysis)
In this study, in order to accurately measure the blue light intensity under experimental conditions and correlate it with the seawater depth, the team introduced seawater depth as a variable and drew on the model of seawater attenuation of visible light proposed in the study of Li Li (2014).[6] Considering the similarity between the laboratory setting and the seawater environment studied by Li Li, the research team adopted the total attenuation coefficient of \(0.1105m^{-1}\) for light with a wavelength of 475nm at a chlorophyll concentration of \(0.01mg/m^3\) provided in the literature. This band covers the range of blue light (460nm~490nm) for subsequent calculation of blue light intensity.
Sea area (chlorophyll concentration) | Pure sea water | \( 0.01 \, \text{mg/m}^3 \) | \( 0.1 \, \text{mg/m}^3 \) |
---|---|---|---|
Attenuation coefficient (\( m^{-1} \)) | 0.0198 | 0.1105 | 0.3052 |
Based on the total attenuation model of visible light by seawater, the actual blue light intensity can be calculated by the Beer-Lambert law, given the sea surface light intensity and depth. The formula is as follows. \(I(z) = I_0 \cdot e^{-k(\lambda) \cdot z}\) In this model, \(z\) represents the depth of the ocean in meters (m), \(I(z)\) represents the light intensity at depth \(z\) in lux (lx), \(I_0\) is the blue light intensity at sea level, and \(k(\lambda)\) is the total attenuation coefficient of blue light at a specific wavelength \(\lambda\), which is taken as \(0.1105m^{-1}\) in this study.
The experimental group conducted field measurements of the sea level light intensity at a specific geographical location (longitude 113.07325, latitude 21.89165). The measured light intensity values ranged 25,000 lux (lx) to 185,400 lux (lx). This study selected the sea level light intensity at this geographical location as a reference benchmark to verify the rationality of the experimental design.
Specifically, when the sea level light intensity is the lowest at 25,000 lx, the calculation shows that the experimental conditions can be met when the sea depth does not exceed 38 meters. When the sea level light intensity is the measured mean of 101,620 lx, the experimental conditions can also be met when the sea depth does not exceed 51 meters. These calculations provide theoretical support and parameter range for subsequent experimental design.
Ocean Depth Lower Threshold (Red Light Band Analysis)
In order to verify the potential of DsRed protein as a substitute for KillerRed protein, the study must ensure that both proteins do not show toxicity under low-intensity green light or red light. Studies have shown that the lower limit of the light intensity threshold for DsRed protein to produce toxicity under green and red light is 10 \(nW \cdot cm^{-2}\), namely 107.63 lux (lx). Therefore, the subsequent experimental design should be carried out at a light intensity below this threshold to ensure the accuracy and reliability of the experiment.
We take the red light band as an example (wavelength between 620 nanometers and 750 nanometers), and we select red light with a wavelength of 685 nanometers as a typical representative for analysis. Li Li (2014)'s study on the visible light attenuation model gave the results of the attenuation coefficient of light of different wavelengths under different chlorophyll concentrations (as shown in the figure below). Using WebPlotDigitizer software, we found that when the chlorophyll concentration is \(0.01 mg/m^{3}\), the light attenuation coefficient of red light with a wavelength of 685nm is 0.5325.
According to the Beer-Lambert law, when the sea level light intensity is at the maximum value of the measurement range of 185400 lx, red light cannot stimulate the toxicity of DsRed protein when the sea depth exceeds 18.5 meters. Furthermore, when the sea level light intensity takes the average value of the measurement range of 101620 lx, the calculation results show that red light basically cannot stimulate the toxicity of DsRed protein when the sea depth does not exceed 17 meters.
Conclusion
When designing a suicide experiment for Vibrio natrii under natural light conditions, the depth of seawater is a key factor, which affects the effectiveness of the suicide mechanism and the spread of toxic proteins. According to the experimental design, the suicide effect is best when the seawater depth is between 18.5 meters and 28 meters, which can effectively prevent the spread of toxic proteins and achieve the goal of biosafety. In the range of seawater depth between 17 meters and 56 meters, the suicide effect basically meets the requirements, can better prevent the spread of toxic proteins, and basically meet the goal of biosafety.
Section IV
Aims
We obtained a set of data from the experimental group, which describes the optical density (OD) of Vibrio Natriegens over time, representing the growth curve of the bacteria. Since the experimental data cannot be continuously measured, and sometimes measurements are taken at uneven intervals, we need to fit the curve to get continuous biomass information. This will enable us to predict and evaluate the growth of Vibrio Natriegens.
Problem Introduction
Why Linear Regression Won't work?
After initially examining the scatter plot of Vibrio natriegens optical density over time, we found that the relationship is not linear. A linear regression model would not fully capture the characteristics of the growth curve. Therefore, we must use non-linear methods for data fitting.
Limitations of Conventional Non-Linear Fitting
► They may overfit by capturing excessive variations in the data.
Mathematical Model of Growth
Huang's Model
► \(t\) is time.
In the original formula, \(Y\) represents the logarithm of cell counts, but in our experiment, we record optical density, which also reflects biomass. Therefore, this formula can be applied to our data.
Why Use RNN Instead of Traditional Methods
RNN vs. Traditional Numerical Solutions
In conventional non-linear fitting, Huang[9] used the differential equation form of the model for optimization. The differential equation is derived by differentiating \(Y\) with respect to time \(t\): \(\frac{dY}{dt} = \frac{\mu_{max}}{1+e^{-\alpha(t-\lambda)}}(1-e^{Y-Y_{max}})\) Huang used the ode45 method in MATLAB to solve this differential equation. However, this method is less robust when the data is large, and it may experience convergence issues in regions where the data changes significantly.
By observation, we found that an RNN can actually correspond to a numerical solution of ODEs. Specifically:
ODE's Numerical Solution
The iterative formula introduced by Euler is as follows:
\(x(t+h)=x(t)+hf(x(t),t)\)
where \(h\) is the step size.
RNN and ODE's Numerical Solution
In an RNN, the recursive formula is \(y_t=f(y_{t-1},x_t,t)\), which is similar to the ODE's
iterative
formula. In fact, we can use \(h\) as the time unit, treating \(t=nh\), and the iterative
formula
becomes:
\(x((n+1)h)=x(nh)+hf(x(nh),nh)\)
Thus, the numerical solution of ODEs can be seen as a special case of RNN.
Methodology
Model Training Process
- Parameter Initialization:
The parameters to be fitted include \(\mu_{max}\), \(\alpha\), \(\lambda\), and \(Y_{max}\). According to Huang[9], we observed that these parameters are of the order of magnitude of \(10^1\). Therefore, we initialized these parameters to 1 and set them as trainable parameters for gradient updates during training. - Forward Propagation:
In the forward function, we implement the recursive calculation, where each step's value is generated by adding the previous step's value to the product of the step size and the derivative value. - Non-Uniform Time Measurements:
Since the experimental data has non-uniform time intervals, some values need to be masked during the loss calculation. We only calculate the loss for the measured time points. - Optimization:
We used the Adam optimizer, with an initial learning rate of 0.01. The learning rate is automatically adjusted during training to find the optimal solution. After multiple rounds of testing, we found that training for 100 epochs yielded the best results.
Tools
►
Python 3.8.18.
►
torch 1.12.1.
Result
Fitting Curve
The fitted growth curve is shown below:
Fitting Parameters
The fitted parameters are shown in the table below:
Parameters | \( \mu_{max} \) | \( \alpha \) | \( \lambda \) | \( Y_{max} \) |
---|---|---|---|---|
Values | 0.1316 | 1.6364 | 1.6417 | 1.2153 |
Conclusion
By using RNN for non-linear fitting, we were able to capture the dynamic changes of the Vibrio natriegens growth curve more effectively, especially when faced with non-uniform time measurements. After training for 100 epochs, we obtained the optimal fitting result.
Section V
Aims
Provide guidance for experiments through modeling: predict the structure of the mCherry reporter protein connected to CaBP-ChBD and determine its validity.
Problem Framing, Approach and Tools
During the expression validation of the CaBP-ChBD engineered protein, the experimental group observed blue fluorescence under the fluorescence microscope. The members of the experimental group initially hypothesized that this phenomenon could be due to two potential reasons: denaturation of the mCherry reporter protein or a malfunction of the fluorescence microscope. Considering that repairing the fluorescence microscope would take a long time, the experimental group requested our assistance in using AlphaFold to predict the protein structure, helping them to intuitively analyze whether the protein had denatured.
Modeling Result
Among the results obtained from multiple runs of the model, the protein structure model with the highest confidence indicated that the chromophore of mCherry connected to CaBP-ChBD had not denatured. We also analyzed several other structures and found no obvious denaturation, ruling out issues with the target gene itself. After this feedback was provided to the experimental group, they decided to borrow another fluorescence microscope to conduct their experiments.
Most reliable structure
Other structures of model's prediction
Experiments
Result of First Round:
Result of Second Round: