MOLECULAR MODELING
  • Molecular Modeling
  • Reference

In our project, both the fusion protein and TRn protein are abundant in β-sheet structures, enabling their tight binding through hydrogen bond formation, which facilitates the repair process. When the material sustains damage, the hydrogen bonds can reform, thereby realizing self-healing. To better explore the protein structures and their binding processes, validate our project design, and assist the wet-lab team, we employed tools such as SWISS-MODEL and AlphaFold3 to predict the structures of the fusion protein and TRn protein. Following this, we utilized GRAMM for docking between them and GROMACS for molecular dynamics simulations to verify our experimental design. These efforts aim to elucidate the underlying mechanisms and support the further development of our self-healing material system. Fig. 1 illustrates our workflow.

Fig. 1 | Workflow of molecular Modeling

Our wet-lab experiments involved numerous interesting proteins, and we aimed to delve into their structures. Given that only amino acid sequences of the involved proteins are available, we utilized tools such as SWISS-MODEL and AlphaFold3 for homology modeling to construct their three-dimensional structures.

The fusion protein plays a crucial role in our material. It comprises TRn4, Mfp5, and a GS linker (which links the terminal proteins). However, the structure data for these proteins are not available in commonly used protein databases such as RCSB, UniProt, and NCBI. Consequently, we must predict their structures separately to validate their properties. Fortunately, potential structural features of TRn4 and Mfp5 are accessible from existing literature, guiding our predictions and enhancing accuracy. We will prioritize predicting these two proteins first, and based on their high-confidence predicted structures, we will proceed to model the overall fusion protein to ensure the highest possible credibility of our final structure.

Prediction of TRn4

Initially, we utilized SWISS-MODEL to generate structural predictions, and the result is shown in Fig. 2a. We found that the higest GMQE (Global Model Quality Estimation) value for TRn4 only reached 0.32, and other metrics also fell short, making us question the reliability of the results. Additionally, the short β-sheet length observed in the structure did not align with what was reported in the literature. Therefore, we did not use the results obtained from SWISS-MODEL. We continued to explore other alternative methods. (GMQE is an indicator used to evaluate the quality of protein models, ranging from 0 to 1. A GMQE value closer to 1 indicates a higher quality and reliability of the protein model. Generally, a GMQE value of 0.5 or higher is considered to represent a relatively good model quality.)

Subsequently, we employed tools such as I-TASSER and Modeller, but the results remained unsatisfactory. Ultimately, using AlphaFold, we successfully predicted a TRn4 model with β-sheet structures, achieving a pTM (predicted Template Modeling) score of 0.7. Generally, a pTM score above 0.5 suggests that the overall predicted fold of the complex is similar to the true structure. Therefore, we selected this model as our final result and proceeded with subsequent simulations. The result is shown in Fig. 2b.(pTM is a measure of how accurately AlphaFold-Multimer predicts the overall structure of a complex, representing the predicted TM score comparing the predicted and hypothetical true structures. A TM score above 0.5 indicates that the predicted fold likely resembles the true structure.)

Fig. 2 | Predictions of protein structure

Prediction of Mfp5

Firstly, we attempted to predict the structure of Mfp5 using AlphaFold, but the predicted result had a pTM score of only 0.3, indicating substantial error and low reliability in the predicted structure. Consequently, we turned to SWISS-MODEL to predict the structure of Mfp5. The GMQE value for Mfp5 obtained from SWISS-MODEL was 0.57, suggesting a relatively reliable model. Therefore, we selected this model as our final predicted structure. The result is shown in Fig. 2c.

Prediction of fusion protein

The fusion protein comprises TRn4, Mfp5, and a GS linker (amino acid sequence: GGGGSGGGGSGGGGS), designed to connect TRn4 and Mfp5. Given the limitations of SWISS-MODEL in predicting large protein structure, we chose to use AlphaFold for predicting the structure of this fusion protein. The result of fusion protein is shown in Fig. 2d.

The TRn4 and Mfp5 segments of the fusion protein displayed structures consistent with their individually predicted counterparts, confirming that our fusion protein design can effectively retain the respective functions of TRn4 and Mfp5, as expected. As the next step, we will proceed with molecular dynamics simulations to assess the stability of the fusion protein's structure. This will provide crucial insights into the dynamic behavior and potential conformational changes of the fusion protein under different conditions, ensuring its functionality and reliability for further applications. The molecular dynamics simulation result and root mean square deviation of the fusion protein are shown in Fig. 3 and Fig. 4, respectively.

Fig. 3 | Molecular dynamics simulation for fusion protein

Fig. 4 | Stability analysis

To determine if a variation trend exists in the RMSD graph, we applied the Cox-Stuart trend test to examine the data trend of the fusion protein during the 2ns-3ns stage of the simulation process. The Cox-Stuart trend test is a non-parametric test method that can examine whether a set of data has an upward or downward trend over time. We first made the following hypotheses:

$H_0$: There is no growth trend; $H_1$: There is a growth or decrease trend

Let: $c=n/2$, if n is even; $c=(n+1)/2$, if n is odd, where n is the number of data points

Take the data pairs$\left ( x_i,{x_i}_{+c} \right) $, $D_i=x_i-x_{i+c}$, where $S^+$ is the number of positive value in $D_i$, and $S^-$ the number of negative value in $D_i$. When there are too many positive or negative signs, it is considered that there is a trend in the data. Under $H_0$, $D_i$ follows a binomial distribution, which can be converted into a sign test problem.

Thus, we calculated the p value is 0.08543314 > 0.05, so we accept $H_0$, i.e., there is no trend in RMSD after 2 ns, indicating that the fusion protein can maintain a relatively stable structure after 2 ns.

The 3ns molecular dynamics simulation results indicate that the overall conformation of the protein remains relatively stable throughout the simulation process, with no significant structural dissociation or drastic deformations. Trajectory analysis provided insights into the flexibility or rigidity of specific regions, which may correlate with the protein's function. These observations provide valuable references for subsequent experiments and further simulations, facilitating a deeper understanding of the fusion protein's behavior and potential applications.

Prediction of TRn25

The excessive length of TRn25's amino acid sequence posed challenges, rendering traditional prediction tools such as SWISS-MODEL, I-TASSER, and AlphaFold ineffective. The significant discrepancies among the predictions from different software highlighted the complexity of the task. Although Modeller's predictions ranked highly in its scoring system and aligned with the expected β-sheet-rich network structure, their similarity to the template raised concerns about reliability, complicating our efforts.

After reviewing relevant literature, we discovered that TRn25 is a highly flexible, mesh-like protein with an amorphous shape, which may explain why existing prediction tools are unable to accurately predict its structure. Since TRn25 consists of repeating TRn5 units, and TRn5 shares similar structural properties with TRn25 at the docking regions and sites, we chose to use TRn5 as a substitute for TRn25 in the fusion protein's docking process. We use AlphaFold to predict TRn5, and the results are shown in the Fig. 5.

Fig. 5 | TRn5 predicted by AlphaFold

Hypothesis and Exploration on the Amorphous Region in TRn4

While predicting TRn4 with various tools, we encountered a puzzling phenomenon: apart from the regions where β-sheet structures were expected based on previous research papers, the regions that were expected to be amorphous consistently displayed numerous unpredicted β-sheet. This anomaly has been a persistent challenge for our team. To clarify the structure of these amorphous regions, we discussed it with our wet-lab colleagues. After a thorough examination, we formulated the hypothesis that the amorphous regions might be influenced by surrounding proteins, leading to the formation of a certain number of β-sheet. To test this hypothesis, we designed and conducted predictive experiments aimed at validating our conjecture.

We extracted the sequences that were originally expected to be amorphous regions and replaced the sequences with 0GS linker, 1GS linker, 2GS linker, and GS linker of equal length to the original β-sheet region, respectively (Fig. 6a-d). Subsequently, we conducted four sets of experiments using AlphaFold, with TRn4 serving as the control group, to compare the structural changes that occurred in the amorphous regions. Our findings were as follows:The 0GS-linker group, where four repetitive sequences of the amorphous regions were directly concatenated without any linker, resulted in a completely disorganized structure; The result from the 1GS-linker group showed a slightly more organized but still disordered structure, without forming any β-sheet regions; The 2GS-linker group exhibited longer regions of regularity compared to the 1GS-linker group, with the gradual formation of β-sheet structures within the previously anticipated amorphous regions; As the length of the GS-linker increases, the amorphous regions gradually became more ordered and resemble β-sheet structures. Eventually, when the length of the GS-linker matches the length of the β-sheet regions in TRn4, the newly formed protein displayed a similar extent and fixed number of β-sheet structures in the amorphous regions as those found in TRn4 itself. The result is shown in Fig. 6d. This observation suggested that the structure of the amorphous regions is highly susceptible to the influence of surrounding sequences.

Fig. 6 | Predictions of different GS linker groups

This experiment comfirmed our hypothesis and enhanced the accuracy and rationality of predicting the molecular structure of fusion proteins. The increased presence of β-sheet structures increases the likelihood of hydrogen bond interactions between proteins, leading to the formation of more compact and stable macromolecular structures upon docking. The experimental results demonstrated that the biological protein material we have engineered possesses enhanced self-healing capabilities.

After predicting the structure of fusion protein and TRn5, we proceeded to predict their docking scenario. Given that β-sheet easily form hydrogen bonds among themselves, TRn25, which possesses abundant β-sheet structures, and fusion protein are likely to interact through hydrogen bonds. Consequently, we needed to predict their binding domain to validate Wet-lab design.

First, we tried to use GROMACS to dock a β-sheet chain with the fusion protein. The result is shown in the Fig. 7, but this docking does not match our reference, and the two proteins are not docked through the interaction of β-sheet. Then we docked the fusion protein with TRn5 using GRMMA, and the result is shown in Fig. 8. As we can see, TRn5 and the fusion protein are connected through the β-sheet, which is in line with our reference. Next, we will proceed with molecular dynamics simulations.

Fig. 7 | Docking of the fusion protein with a single β-sheet

Fig. 8 | Docking of the fusion protein with TRn5

The molecular dynamics simulation process and data analysis of the docking results are shown in Fig. 9 and Fig. 10, respectively. It can be seen that the overall structure of the binding proteins remains unchanged during the 4ns simulation, and the interaction between β-sheet remains stable. Next, we utilized the cox-Stuart trend test, just as we did before, to analyze the data after 3ns. and obtained the p-value is 0.2354395. This indicated that the docked proteins can remain stable in soft robot's working environment and maintain their respective biological functions.

Fig. 9 | Molecular dynamic simulation of fusion protein with TRn5

Fig. 10 | Stability analysis

Initially, we attempted to use Gaussian to calculate the solvation-free energy of our proteins, but found that Gaussian is only suitable for small molecules and has certain limitations for large molecules. Subsequently, we referred to the relevant content on the GROMACS website and developed a method to calculate the solvation-free energy of proteins through free energy perturbation, which is used to evaluate the hydrophilicity of proteins.

Firstly, we introduced a coupling parameter λ to smoothly transition between state A and state B. In state A (λ=0), all interactions between the protein and the solvent completely exist, such as van der Waals forces and electrostatic interactions, and the protein is fully solvated. As the value of λ increases, the interactions between the protein and the solvent gradually decrease. In state B (λ=1), these interactions are completely turned off, and the protein in the simulation is in a vacuum state or an approximate vacuum state. Here, we chose to use 15 λ points to describe the transition from state A to state B, collecting sufficient data to thoroughly sample the phase space and generate reliable ∂H/∂λ and ΔH curves. Where H represents the Hamiltonian, which is equal to the energy for an isolated system. Finally, we used GROMACS commands to obtain $\varDelta G_{AB}$ (The free energy change of the system from state A to state B, also known as the solvation free energy. If the energy is greater than zero, it means that the protein is more hydrophobic than hydrophilic).

Fig. 11 | Fusion protein in different states

The states of the fusion protein at λ=0 and λ=1 are shown in Fig. 11a, b. Although we altered the interaction forces between the fusion protein and water molecules, theoretically, the intramolecular interactions within the protein should remain constant. Due to changes in the environment (e.g., from solution to vacuum), the protein may undergo some degree of conformational changes. For instance, under fully desolvated conditions, the protein might partially unfold or adopt a looser structure. Nevertheless, these structural changes are generally not drastic since the intramolecular interactions within the protein exist, so the response to the altered environment may lead to subtle adjustments in secondary or tertiary structure. Finally, we obtain the free energy of the fusion protein to be $8603.48±35.15KJ·mol^{-1}$, indicating that it has strong hydrophobicity.

  1. Cohen FE, Sternberg MJ, Taylor WR. Analysis and prediction of protein beta-sheet structures by a combinatorial approach. Nature. 1980; 285(5764): 378-382.
  2. Abramson J, Adler J, Dunger J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630(8016): 493-500.
  3. Waterhouse A, Bertoni M, Bienert S, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018; 46(W1): W296-W303.
  4. Drake ZC, Seffernick JT, Lindert S. Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling. Nat Commun. 2022; 13(1): 7846.