Model

Part 1: Protein Engineering Module: Molecular dynamics simulation of TFD and rare earth coordination

1.1 Overview

TFD-EE is a C2-symmetric dimer protein capable of adsorbing rare earth ions, with a total of eight β-sheet in the barrel, and four C4-symmetric glutamates located on four separate β-sheet (Figure 1). Lanthanide trivalent cations are usually oxyphilic and can form coordination bonds with two oxygen atoms on the side chain of glutamate residues. In the presence of eight ligand oxygen atoms, lanthanide ions can be tightly bound to the protein cavity.

To test for protein binding, Shane J. Caldwell et al.[1] introduced a tryptophan (N6W mutation) as a sensitizer antenna effect to detect Tb(III) luminescence [2]. That is, when the protein binds Tb(III), the excitation wavelength of 280 nm is given, and the indole ring on tryptophan can absorb energy and transfer it to Tb(III), thus making Tb(III) emit characteristic fluorescence, and the adsorption of Tb(III) can be quantitatively detected according to the fluorescence intensity.

To facilitate subsequent experiments, we fused this dimer protein into a single-stranded TIM-FD-Single (TFD-S) and gave the sequence to AlphaFold3 for prediction, finding that its spatial structure was almost identical to that of the TFD-EE protein. The characterization experiments showed that its performance was similar to that of TFD-EE dimer. Subsequently, site-specific mutation will be carried out on the basis of TFD-S.

Figure 1. Combination of TFD-EE and Tb(III)

Figure 2. Combination of TFD-S and Tb(III)

We hope to further modify the protein to enhance its ability to trap rare earth ions. There are two main ideas: one is to enhance the binding ability of the protein to a single rare earth ion, search for mutation sites near the original four glutamic acids, and introduce new glutamic acid, aspartic acid or asparagine residues to increase oxygen coordination. The second is to increase the number of rare earth ions that a single protein can capture, look for four sites on the same β-sheet that are similar to the original four binding sites, and mutate them into glutamate or aspartic acid.

After mutating the amino acid sequence on PyMol, we submitted the results to AlphaFold3 for prediction, and finally obtained eight possible mutants, all of which have the mutated amino acid side chains towards the internal cavity of TFD. In order to exclude mutants with reduced rare earth coordination ability, we decided to first simulate the binding of proteins and rare earth ions with molecular dynamics simulation (MD Simulation).

1.2 Molecular dynamics simulation

The crystal structure of TFD-S is predicted by AlplaFold3, which is a TFD single-chain protein with a terbium ion. MD simulation was performed using CHARMM36 protein Force Field 45 in the GROMACS2022.3 package 46. The force constant of the harmonic bias potential is 1000 kJ∙mol^-1∙nm^-2. Set the 10 Å cuboid box where the edge is widest from the protein, use the TIP3P water model in it, and add sodium ions to the solution to maintain the neutrality of the system.

Eleven TFD-S-X(III) ( X(III) means Tb(III) or La(III) or Lu(III) ) systems were simulated at 300 K. To begin with, the system is minimized by 50,000 steps using the steepest descent algorithm. In the pre-balancing phase, the system is gradually heated to 300 K at 1 atm using a 200ps NVT set and 200ps NPT set. A 50ns production run followed to collect the balanced configuration for each 2ps interval. A speed scaling thermostat 49 with a time constant equal to 0.1ps was used to keep the temperature constant throughout the simulation. To maintain the pressure, the Berendsen pressure coupler was used in the NPT pre-balance run and the Parrinello-Rahman pressure coupler was used in the production run, with the pressure time constant and isothermal compression rate set at 2ps and 4.5×10^-5 bar^-1, respectively. In the whole simulation process, the integral time step of the motion equation is 2fs. The cut-off value for the non-bonding interaction is 12Å. The particle grid Ewald algorithm 53 is used to calculate long range electrostatic interactions. [2]

Finally, the data results of 12 systems are obtained. The results showed that the average distance between the 8 active oxygen atoms of TFD-S and terbium ions was 2.41 Å (Figure 3). In the coordination structure, the distance between the ligand and the central atom can show the size of the binding force, but if the distance is too far, it is difficult to effectively bind. Among all the protein mutants, only the original four mutant TIM-FD-Change (TFD-C) with isoleucine mutation to glutamate at positions 6, 52, 175, and 221 was similar to 2.41 Å (Figure 5), so we believed that this mutation had the greatest possibility to achieve our goal.

In addition, in order to detect the binding of rare earth ions to proteins, glutamine at site 154 and site 323 of TFD-C was mutated into tryptophan, and TIM-FD-Mutation (TFD-M) was obtained, whose sequence is shown below (Figure 4) [3]. The final MD results of Tb(III), La(III) and Lu(III) ions are shown in the figure below (Figure 5).

Figure 3. Schematic diagram of binding of eight active oxygen atoms to terbium ions in TFD-S. The Tb(III) ion is shown as green sphere, and coordination bonds are shown as dashed lines. The numbers near the coordination bonds represent the distances between the atoms, with the unit of Å.

Figure 4. Comparison of four protein sequences

Figure 5. Schematic diagram of TFD-M combined with Tb(III), La(III), Lu(III), molecular dynamics simulation process, and kinetic curve. The ordinate of the RMSD curve is the root-mean-square deviation, indicating the distance the system moves.

1.3 Results

The molecular dynamics simulation results based on TFD-M, La(III), Tb(III) and Lu(III) are satisfactory, and we believe that the probability of TFD-M adsorbing rare earth ions is high. In addition, in order to test whether the new site can adsorb rare earth ions, the original binding site of TFD-M was mutated back to the original TFD state, that is, E35V/E158K/E204V/E327K mutation was introduced into TFD-M. The resulting new mutant is called TIM-FD-Reduction (TFD-R) (Figure 6). After adding terbium ions to the pdb file, similar PyMol and molecular dynamics simulations were carried out, and the final results were shown in Figure 7. Subsequently, we will conduct wet experimental characterization of two mutants, TFD-M and TFD-R, as detailed in our Results page.

Figure 6. Combination of TFD-R and Tb(Ⅲ)

Figure 7. Molecular dynamics simulation process of TFD-R combined with Tb(Ⅲ), and kinetic curve. The ordinate of the RMSD curve is the root-mean-square deviation, indicating the distance the system moves.

Part 2: Yeast Surface Display Module: Lanthanide adsorption kinetics and adsorption equilibrium modeling

2.1 Overview

After the lanthanide metal binding protein TFD was displayed on the surface of yeast cells to construct a whole-cell biosorbent, we focused on the mathematical modeling of the adsorption process of lanthanide metal ions by engineered yeast, including adsorption kinetics and adsorption equilibrium. We hoped to help us understand the dynamic process of rare earth adsorption through mathematical fitting modeling, evaluate the adsorption capacity of engineered yeast to lanthanide metal ions, and predict the optimal cycle of biosorption for feasibility verification and subsequent design.

2.2 Kinetic modeling of lanthanide adsorption

In the kinetic part of the adsorption of lanthanide metal ions, we tried to use the pseudo-first-order and pseudo-second-order formulas to fit the kinetic experimental data:\(\frac{dq}{dt} = k_1(q_e - q_t) \) and \(\frac{dq}{dt} = k_2(q_e - q_t)^2\),( \(q\) and \(q_t\) are the adsorption signal strength, \(q_e\) is the equilibrium adsorption signal strength), [5] these two kinetic models are widely used to describe the experimental data of biomass adsorption of heavy metals. [6] In the “kinetic” range before the adsorption reaches saturation, we refer to the adsorption data of Tb³⁺ and Yb³⁺ by engineering strains expressing Lanthanide Binding Tag (LBT) [7] or lanmodulin (LanM) [5] in the previous papers. We evaluated the adsorption of lanthanide metal ions by the engineered strain similarly. The adsorption of lanthanide metal ions is more in line with the pseudo-first-order kinetic model, and the corresponding correlation coefficients (R²) value is higher.

2.3 Modeling of lanthanide adsorption equilibrium

In the equilibrium stage, we refer to the current mainstream equilibrium adsorption models: Langmuir and Freundlich isothermal models [8]. The Langmuir isotherm is an empirical model. It is assumed that the adsorption process occurs at a specific position on the surface of the adsorbent, resulting in a single-molecule adsorption layer. Different from Langmuir isotherm model, Freundlich model is not limited to monolayers, extending its application to the multilayer, reversible, and non-ideal adsorption processes.

The Langmuir isothermal equation is \(q_e = \frac{K_LC_e}{1+a_LC_e}\), ( \(q_e\) is the concentration of lanthanide metal ions in the equilibrium state of stem cells, \(C_e\) is the concentration of lanthanide metal ions in the solution system, \(K_L\) and \(a_L\) are Langmuir isothermal constants). The equilibrium saturated adsorption capacity of the engineering strain is determined by fixing the amount of lanthanide metal ions added, and the experimental data points are substituted into the linear expression \(\frac{C_e}{q_e} = \frac{1}{K_L} + \frac{a_L}{K_L}C_e \) . After fitting the experimental data, the theoretical monolayer saturated capacity \(q_{max}\)(\(\frac{K_F}{a_L}\)) can be obtained.

The Freundlich isothermal equation is \(q_e = K_LC_e^{ \frac{1}{n} }\), (\(K_F\) is the Freundlich isothermal constant). The same way is substituted into the logarithmic expression \(lnq_e = lnK_F + \frac{1}{n}lnC_e\) for fitting [5].

According to the R² results of the relevant literature data, the engineered yeast displayed on the surface has a high degree of fitting with the Langmuir model, that is, the lanthanide metal ions form a monolayer adsorption on the surface of the engineered strain. In theory, the engineered yeast TFD protein is more evenly distributed on the cell surface after the surface display modification. Each protein binds to a lanthanide metal ion, and the interaction between each other is weak due to the low expression level, which conforms to the applicable conditions of the Langmuir model.

2.4 Conclusions

Since the commonly used elemental analysis methods ICP-MS or ICP-OES are expensive, we had not been able to accurately characterize the modeling results of lanthanide adsorption kinetics and lanthanide adsorption equilibrium. In addition, although our lanthanide binding protein TFD has the characteristics of antenna effect to sensitize lanthanide luminescence after combining rare earth ions (see our Measurement page for details), in actual experiments, we found that the engineered yeast strain displaying TFD protein on the surface was not enough to produce a significant fluorescence signal of lanthanide metal ion adsorption change, which might be due to the inapplicability of the detection method when applied to cell objects. However, in any case, our lanthanide adsorption kinetics and adsorption equilibrium modeling process can be used as a reference for the future.

Part 3: Biofilm Module: biofilm formation modeling

3.1 Overview

When the modified yeast itself can adsorb lanthanide metal ions, we hope to adhere it to a stable carrier to form a biofilm, and the formation process of the biofilm is affected by many factors. Therefore, it is necessary to establish a corresponding model to estimate the optimal nutrient addition amount, biofilm formation time, and membrane stability parameters, etc. Data provide theoretical support for the engineering practice of the entire biomining line. Among them, some of the modeling refers to the data from classical literature. Due to the limitation of time, instruments, and other resources, we have not been fully verified. Here, we only refer to some theories and formulas, and verify them using our experimental conclusions.

3.2 One-dimensional continuum model

Biofilm modeling began with a low-dimensional continuous description based on kinematics and translational diffusion. Later, a more complex microscopic kinetic mechanism was introduced, resulting in abnormal diffusion and dissipation of various components in the biofilm. Recently, discrete-continuous, multi-fluid, and single-fluid multi-component models have been used to study the coupling of biofilm and bulk fluid (or solvent), and the whole biofilm-bulk fluid system is regarded as a system composed of various components [9].

The development of biofilms is a multistage process. First, free-floating planktonic cells land on a damp surface and later get attached to it. After a sufficiently high cell density is reached, cells begin excreting an extracellular polymeric substance (EPS) and forming a EPS matrix anchored on the surface. The strain population inside the matrix grows as well by consuming nutritious substrates supplied by the surrounding environment. As a result, the biofilm begins to grow. Biofilm growth is a dynamical process whose structure is largely influenced by environmental conditions including the substrate diffusion limitation and the shear stress of the bulk fluid [9]. Here, we just use the one-dimensional continuum model proposed by Wanner and Gujer to predict the biofilm formation process [10].

Model development is based on the following basic assumptions:

1. The total bulk liquid volume remains constant, while the biofilm volume increases.

2. Sugars and nitrogen are uniformly distributed in the liquid phase of the reactor.

3. Sugars and nitrogen diffuse from the bulk liquid to the biofilm.

4. Sugars and nitrogen are consumed only inside the biofilm.

5. Biofilm growth of fungus occurs uniformly on top of the liquid phase.

6. The biofilm porosity remains constant in time and space.

7. Detachment/attachment processes of fat-free biomass and lipids from/onto the surface of biofilm are assumed to be negligible.

8. Sugars are used for fat-free biomass growth and lipid synthesis.

9. Biofilm consists of two components: fat-free biomass and lipids. Thus, biofilm thickness increases due to the fat-free biomass growth and lipid accumulation.

10. The densities of fat-free biomass and lipids remain constant.

11. Lipid accumulation starts after nitrogen depletion in the reactor while cellular lipids turnover occurs after sugar consumption.

Mass balance of sugar and nitrogen in large quantities of liquids:

\( V_L \frac{dS_B}{dt} = -AD_S\phi \frac{\partial S}{\partial z}|_{z=H} - AU_f \phi S|_{z=H} - AU_f(1-\phi)S|_{z=H} = -AD_S\phi \frac{\partial S}{\partial z}|_{z=H} - AU_f S|_{z=H} \)

\( V_L \frac{dN_B}{dt} = -AD_N\phi \frac{\partial N}{\partial z}|_{z=H} - AU_f \phi N|_{z=H} - AU_f(1-\phi)N|_{z=H} = -AD_S\phi \frac{\partial N}{\partial z}|_{z=H} - AU_f N|_{z=H} \)

Among them, \(S_B\) and \(N_B\) are the concentrations of sugar and nitrogen in the bulk liquid phase (g/cm³ reactor liquid volume), \( V_L\) is the bulk liquid phase volume (cm³), \( \phi \) is the porosity of the biofilm, \(S\) and \(N\) are the concentrations of sugar and nitrogen in the biofilm (g/cm³ biofilm), \(D_S\) and \(D_N\) are the diffusion coefficients of sugar and nitrogen (cm²/h), \(A\) is the total surface area of the biofilm (cm²), \(U_f\) is the moving speed of the biofilm surface due to the growth of the biofilm (cm/h), and \(H\) is the thickness of the biofilm (cm). It represents the diffusion of sugar and nitrogen from bulk liquid to biofilm, respectively. The second term considers the displacement effect of the biofilm surface in space, and the third term describes the advection transport of dissolved components caused by the displacement of the biofilm surface.

The mass balance equation simulates the development of fat-free biomass and lipids in biofilms in time and space:

\( \frac{dX}{dt} = -U_f \frac{dX}{dz} + [(\mu_{SN} + \mu_L) X - \frac{X}{1-\phi} (\frac{\mu_{SN} + \mu_L}{\rho_X} X + \frac{q_L - \frac{1}{Y_{X/L} }\mu_L } {\rho_L} X)] \)

\( \frac{dL}{dt} = - U_f \frac{dL}{dz} + \left[ \left( q_L - \frac{1}{Y_{X/L} } \mu_L \right) X - \frac{L}{1-\phi} \left( \frac{\mu_{SN} + \mu_L}{\rho_X} X + \frac{q_L - \frac{1}{Y_{X/L} } \mu_L}{\rho_L} X \right) \right] \)

Among them, \(Y_{X/L}\) is the fat-free biomass growth yield coefficient of lipid (g fat-free biomass/g lipids), \(\rho_X \) is the cell density of fat-free biomass (g fat-free biomass/cm³ fat-free biomass), and \(\rho_L \) is the cell density of lipid (g lipid/cm³ lipids). The porosity \(\phi \) of the biofilm is assumed to be constant.

The changes of sugar and nitrogen concentration in biofilm were described by reaction-diffusion equation:

\( \phi \frac{\partial S}{\partial t} = (1 - \phi) U_f \frac{\partial S}{\partial z} + D_S \phi \frac{\partial^2 S}{\partial z^2} - \frac{1}{Y_{X/S} } \mu_{SN} X - \frac{1}{Y_{L/S} } q_L X \)

\( \phi \frac{\partial N}{\partial t} = (1 - \phi) U_f \frac{\partial N}{\partial z} + D_N \phi \frac{\partial^2 N}{\partial z^2} - \frac{1}{Y_{X/N} } \mu_{SN} X \)

Among them, \(X \) is the concentration of fat-free biomass in the biofilm (g/cm³ biofilm), \(Y_{L/S} \) (g fat-free biomass/g sugar), \(Y_{X/N} \) (g lipid/g sugar) and (g fat-free biomass/g nitrogen) are the yield coefficients of fat-free biomass and lipid produced on sugar and fat-free biomass produced on nitrogen, respectively, \(q_L \) is the specific accumulation rate of lipid (g lipid/g fat-free biomass h), and \( \mu_{SN} \) is the specific growth rate of fat-free biomass (1/h). In fact, the effect of this process on the changes in sugar and nitrogen concentrations in the biofilm is negligible, but to make the mass balance equation accurate, the flux must be included. The second item simulates the diffusion transport of sugar and nitrogen in the biofilm. The third term of the two equations describes the consumption of sugar and nitrogen by the growth of fat-free biomass in the biofilm. Finally, the last item in the equation describes the consumption of sugar to form lipid accumulation.

The change of biofilm thickness can be obtained by the following formula:

\( \frac{dH}{dt} = \int_{0}^{H} (\frac{\mu_{SN} + \mu_L}{(1-\phi)\rho_X} X + \frac{q_L - \frac{1}{Y_{X/L} }\mu_L}{(1-\phi)\rho_L} X ) dz \)

where \(H\) is the total biofilm thickness (cm).

The simplified equations are substituted into the boundary conditions:

\( \frac{\partial{S(0,t)} }{\partial{z} } = 0 \)

\( \frac{\partial{N(0,t)} }{\partial{z} } = 0 \)

\( S(H,t) = S_B \)

\( N(H,t) = N_B \)

and the initial conditions are:

\( S_B(0) = S_{in} \)

\( N_B(0) = N_{in} \)

\( S(z,0) = 0 \)

\( N(z,0) = 0 \)

\( H(0) = H_{in} \)

where \(S_{in}\) , \(N_{in}\) are the initial concentrations in the reactor (g/cm³) and \(H_{in}\) is the initial biofilm thickness [11].

The model includes a more flexible description of the transport of dissolved components in the biofilm and considers the diffusion and transport of particle components in the biofilm solid matrix, the change of the liquid phase volume fraction (porosity) of the biofilm, and the simultaneous separation and attachment of cells and particles on the biofilm surface. This extended model can reproduce most of the new experimental data.

3.3 Other biofilm models

In addition to the one-dimensional continuum model, there are other models related to biofilm characteristics, such as DLA model for biofilm growth and pattern formation, discrete-continuum / cellular automata (CA) model, the biofilm model coupled with biomass and flow, and individual-based modeling (IbM).

Although the methods used are different, the basic processes are usually introduced in some way, such as attachment, microbial growth, nutrient absorption, cell death, extracellular product formation, separation, and some chemical processes, which are mainly supported by classical theories such as mass balance equation, Fick's law and Navier-Stokes equation. For example, in the individual-based model, microbial growth is introduced through cell division, and cell division has a set of rules to control the structural changes of the matrix after the introduction of daughter cells. On the other hand, in the model where biomass is regarded as a continuum, growth can be described by continuous biomass expansion and movement. These models can focus on describing biofilms at the scale of the entire colony or the level of a single cell, taking into account the details of the cell structure and how it affects cell behavior [12].

3.4 Conclusions

In the experimental verification of biofilm formation, we adhered the engineered yeast to the surface of agar plate, fluidized bed carrier filler, polystyrene petri dish plate, and other carriers for observation. The engineered yeast overexpressing gene FLO11 was incubated with the fluidized bed carrier filler in the medium, and it took about 3 days to form a stable and most biofilm, and the biofilm was stained with ammonium oxalate-crystal violet staining solution to observe the morphology under the microscope. See our Results page for details. However, due to the lack of precise measuring instruments and ideal test carriers, we only transferred the one-dimensional continuum model appropriately and substituted it into the analysis of the physical and chemical processes of biofilm formation and shedding to provide theoretical references for the experimental conclusions. We look forward to further enriching the content of the modeling part and establishing a model that is more suitable for the characteristics of our engineered yeast when the experimental conditions allow.

References

Proceedings of the National Academy of Sciences

Peptide libraries: methods and protocols

[3] Tutorials and Webinars — GROMACS webpage https://www.gromacs.org documentation

Nucleic acids research

Yarrowia lipolytica. Journal of Hazardous Materials

Journal of Hazardous Materials

E. coli. Journal of Hazardous Materials

Journal of hazardous materials

Solid State Communications

Biotechnology and bioengineering

Bioresource technology