Overview

Modeling a pivotal role in the field of life sciences, especially in the construction of protein structures. It allows researchers to predict and visualize the three-dimensional shapes of proteins based on their amino acid sequences. In this year's project, successfully achieving the FIST project requires not only experimental verification but also essential preliminary modeling work. Good modeling can help avoid some detours in the experimental phase, much like standing on the shoulders of giants.

Firstly, An important aspect of constructing a fusion protein is that it must not affect the physiological activity of the protein, meaning it should not interfere with its binding to the receptor. To achieve this, it is necessary to carry out the prediction of essential protein active sites and the analysis of the three-dimensional structure of the fusion protein. In addition, to enhance the stability and binding efficiency of the fusion protein in physiological conditions, it is also feasible to make appropriate edits to certain amino acid sites. Our modeling work will focus on the two points mentioned above.

Prediction of the Activated Sites of FGF21

There is stability concern regarding the fusion protein of FGF21 with LMWP, as the fusion may impact the three-dimensional structure of FGF21, particularly in its receptor interactions, thereby reducing its functional efficiency. To investigate this, we performed a protein-protein interaction (PPI) network analysis using the STRING database (illustrated in the figure 1).

The receptor affinity score ranges from 0 to 1 and is calculated by integrating probabilities from various evidence channels, adjusted for the likelihood of random interactions (von Mering et al., 2005). To compute the combined score, we first remove a 'prior' probability (p=0.041) from each channel's score, then combine the scores as follows:

1. For each of the scores for the individual channels (s_i) remove the prior (p=0.041):

$ s_i = \frac{p}{1 - p} $

2. Combine the scores of the channels:

$ s_{\text{running total}} = 1 - s_{\text{running total}}(1 - s_{i, \text{no prior}}) $

3. Add the prior back (once):

$ s_{\text{total}} = s_{\text{running total}} + p(1 - s_{\text{running total}}) $

Using this formula, we calculated and compared the receptor affinity scores of highly homologous derivatives of FGF21 and constructed a PPI network for the top ten receptors. The core receptors with scores above 0.99 included KLB (0.999), FGFR1 (0.998), and FGFR4 (0.992), closely aligning with the receptors of unmodified native FGF21. This validation suggests that the FGF21-LMWP fusion protein does not introduce new active sites or alter existing ones.

Figure 1. The PPI network of potential receptors of FGF21-LMWP fusion protein.

Prediction of FGF21 Stablility

A fusion protein combining fibroblast growth factor 21 (FGF21) with a transmembrane peptide, low molecular weight protamine (LMWP), was designed to enhance the transport and circulation of FGF21 during the protein engineering phase. We assessed the stability of this fusion protein. Using PyMOL for visualization, we examined the three-dimensional conformation of the FGF21-LMWP complex. As shown in Figure 2, the active sites of FGF21 are highlighted in magenta, while the LMWP segment is depicted in yellow. The LMWP peptide displays a relatively loose conformation with an undefined spatial structure and low polarity. These characteristics suggest minimal risk of detrimental intramolecular interactions between LMWP and FGF21, which could otherwise obstruct FGF21's binding sites to its receptor, FGFR.

Figure 2. The FGF21-LMWP fusion protein with its active site.

Figure 3.The hydrophobicity analysis of FGF21-LMWP fusion protein.

Regarding the P9 protein, there is currently no research on its active sites. Creating fusion proteins of P9-cAAnchor with novel structures and functions implies the need for protein design. De novo protein design is a long-standing fundamental goal of synthetic biology, but this complex and challenging task is primarily hindered by the difficulty of reliably predicting the 3D structure of proteins from amino acid sequences. Machine learning algorithms, such as AlphaFold2, may eliminate this obstacle. Therefore, we utilized AlphaFold2 for molecular docking (Zhenyu Yang et al., 2003). In Figure 4, the green portion represents the P9 protein, the yellow represents the P9 protein receptor ICAM2, and the light blue is the cAAnchor protein. The outcomes of the molecular docking demonstrate that the key interaction sites are situated at asparagine at position 633 to valine at position 636.

Hydropathy predictions suggest that the P9 protein and the cAAnchor protein exhibit robust hydrophobic interactions, which promote their tight binding in aqueous solutions instead of forming a loose structure that might potentially conceal active sites.

Figure 4. The P9-cA_Anchor fusion protein with its active site.

Construction of Enhanced FGF21(EFGF21)

Previous research has reported low stability for native FGF21 (Onuma Y et al., 2015). To address this, we designed a more stable version, termed enhanced FGF21(EFGF21). We employed SWISS-MODEL for homology modeling, using fibroblast growth factor 21 with the protein ID 6m6e as the template based on both sequence coverage and identity. The sequence identity after alignment exceeds 40% (some suggest 35%), indicating that they are homologous proteins belonging to the same family.

We predicted the three-dimensional structure and physicochemical properties using appropriate methods for homologous protein modeling. Initially, we generated Ramachandran plots (Fig. 5, Fig 6), which illustrate the two dihedral angles of the α-carbon: φ (phi) for the C-N bond and ψ (psi) for the C-C bond. While these bonds can theoretically rotate freely, real molecular structures are constrained by steric hindrance and atomic interactions, leading to allowed and disallowed regions in the Ramachandran plot. This plot assesses the quality of homology modeling.

When comparing the Ramachandran plot of EFGF21 to that of FGF21, we observed that the amino acids in EFGF21, designed for enhanced stability, are more concentrated in the green allowed region. This indicates that EFGF21 adheres more closely to stereochemical rules.

Figure 5. The Ramachandran plot of FGF21.

Figure 6. The Ramachandran plot of EFGF21.

Drawing Enginerring Bacteria Growth Curve

The mathematical expression of the E. coli growth model is derived from the exponential growth model known as the Malthus model. However, to account for environmental factors and limitations such as nutrient availability and space, we introduce modifications to this basic model.

Initial Assumptions and Model Setup

Let $ N $ represent the number of bacteria at any given time $ t $.

$ N_0 $ is the initial bacterial count.

$ r $ is the natural, constant growth rate of bacteria in the absence of any environmental stressors.

The initial growth model is given by the differential equation:

$ \frac{dN}{dt} = rN $

Incorporating Environmental Resistance

The growth environment's capacity is finite, thus limiting bacterial growth.

We introduce a carrying capacity $ K $, which is the maximum population size that the environment can sustain.

The model is adjusted to include this carrying capacity, leading to the logistic growth model:

$ \frac{dN}{dt} = rN \left(1 - \frac{N}{K}\right) $

Parameter Adjustments

The natural growth rate $ r $ is adjusted based on the bacterial count to reflect increased environmental resistance with higher population densities.

This adjustment introduces a quadratic term to model the saturation effect:

$ \frac{dN}{dt} = r_0 N \left(1 - \frac{N}{K}\right) - \alpha N^2 $

Here, $ r_0 $ is the maximum specific growth rate when $ N $ is near zero, and $ \alpha $ is a parameter that scales the impact of population density on growth rate.

Hypothesis and Parameter Relationships

We hypothesize that as $ N $ approaches the carrying capacity $ K $, the growth rate $ r $ decreases to zero.

At $ N = K $, $ r(N) = 0 $, thus providing a relationship:

$ \alpha = \frac{r_0}{K} $

Model Equations

The final form of the model used for fitting experimental data is:

$ \frac{dN}{dt} = r_0 N \left(1 - \frac{N}{K}\right) - \frac{r_0 N^2}{K} $

This equation accounts for both exponential growth at low densities and the slowing of growth as $ N $ approaches $ K $.

Due to the scarcity of detailed analyses on fitting the Logistic equation based on OD measurements online, we are open-sourcing the code we used to create growth curves for everyone's reference and learning.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

from sklearn.cluster import KMeans

file_path = r'C:\Users\huang\Desktop\生长.xlsx'

df = pd.read_excel(file_path, sheet_name='Sheet1', usecols='C:M', skiprows=1)

U_1 = df.values

U_1[:, 1:] = U_1[:, 1:] / 1000

def growth_curve(x, b1, b2, b3):

return b1 / (1 + b2 * np.exp(-b3 * x))

a_num = 1

b_num = []

Rsquare = []

plt.figure(figsize=(10, 6))

for i in range(3, U_1.shape[1]):

x = U_1[:, 0]

y = U_1[:, i]

initial_params = [1, 0.6, 4.3]

popt, _ = curve_fit(growth_curve, x, y, p0=initial_params)

y1 = growth_curve(x, *popt)

residuals = y - y1

ss_res = np.sum(residuals ** 2)

ss_tot = np.sum((y - np.mean(y)) ** 2)

r_squared = 1 - (ss_res / ss_tot)

Rsquare.append(r_squared)

plt.plot(x, y, '*', label=f'原始数据 {i}')

x1 = np.linspace(min(x), max(x), 300)

y1_fit = growth_curve(x1, *popt)

plt.plot(x1, y1_fit, label=f'拟合曲线 {i}')

b_num.append(popt)

a_num += 1

plt.xlabel('生长时间(h)')

plt.ylabel('OD$_{600}$')

plt.legend(loc='best')

plt.xticks(np.arange(min(x), max(x) + 2, 2))

plt.show()

b_num = np.array(b_num)

kmeans = KMeans(n_clusters=3).fit(b_num[:, [0, 2]])

cidx = kmeans.labels_

plt.figure(figsize=(8, 5))

for cluster in range(3):

plt.scatter(b_num[cidx == cluster, 0], b_num[cidx == cluster, 2], label=f'聚类 {cluster + 1}')

for i, txt in enumerate(range(1, len(b_num) + 1)):

plt.text(b_num[i, 0], b_num[i, 2] * 1.04, str(txt))

plt.xlabel('k')

plt.ylabel('r')

plt.legend()

plt.show()

In developing a mathematical model to describe the growth dynamics of Lactococcus lactis, the following key assumptions are made to simplify the system:

Growth Rate Dependency

The growth rate of Lactococcus lactis is only a function of the current bacterial population size, implying that environmental factors such as pH and temperature, once optimized, remain constant and do not influence the growth rate.

Fixed Environment Volume

The culture medium's volume is fixed, introducing a carrying capacity that restricts the maximum population size, beyond which no further increase in population is possible.

Adequate and Constant Nutrient Supply

The culture medium is assumed to have a sufficient and constant supply of nutrients necessary to support the growth of Lactococcus lactis throughout the experiment.

Homogeneous Environment

The culture is homogeneously mixed, ensuring an even distribution of nutrients and bacteria at any given time.

No Genetic Variations

Genetic mutations or variations that might alter the growth characteristics are not considered over the short duration of the experiment.

The raw data obtained from the experiment, which included the optical density (OD) readings at 600 nm, were converted into bacterial concentration growth curves. Using a Python program, the OD600nm readings were translated into concentration units (CFU·ml^-1), allowing for the estimation of bacterial numbers.

Growth curve of lactic acid bacteria containing the plasmid BBa-K5283023

Future Work

Our modeling work primarily focus on simulating protein structures in this year, while also providing some useful insights related to bacterial growth for the wider iGEMer community. However, we have a long way to go. Although we attempted to establish a model for drug absorption in the human intestinal lumen, the complexities of influencing factors and environmental conditions prevented us from achieving satisfactory results. Moving forward, we hope to establish a robust model for drug absorption across the intestinal mucosal barrier in an in vitro environment based on our organ-on-a-chip hardware, which will provide ideas and insights for further experimental design.

Reference

Elisabeth Gasteiger, Christine Hoogland, Alexandre Gattiker, Séverine Duvaud, Marc R. Wilkins, Ron D. Appel, and Amos Bairoch. Protein Identification and Analysis Tools on the ExPASy Server.

Onuma Y, Yeo CY, Whitman M. XCR2, one of three Xenopus EGF-CFC genes, has a distinct role in the regulation of left-right patterning. Development. 2006 Jan;133(2):237-50. doi: 10.1242/dev.02188. Epub 2005 Dec 8. PMID: 16339189.

von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D433-7. doi: 10.1093/nar/gki005. PMID: 15608232; PMCID: PMC539959.

Yang, Z., Zeng, X., Zhao, Y. *et al.* AlphaFold2 and its applications in the fields of biology and medicine. *Sig Transduct Target Ther* **8**, 115 (2023).