Model | OUC-Haide - iGEM 2024

Cell Wall Anchoring Design

Goals

We searched for and optimized the cell wall anchoring sequence in Aureobasidium melanogenum BZ-11 to achieve effective protein secretion and stable cell wall anchoring. This involved identifying and predicting the N-terminal signal peptide and C-terminal anchoring sequence, ensuring minimal polarity changes and correct cleavage site localization, followed by iterative modifications of the sequence.

Sequence Alignment

The cell wall anchoring sequence consists of an N-terminal signal peptide responsible for protein secretion and a C-terminal anchoring sequence that attaches to the cell wall. By comparing sequences and predicting conserved domain with Yarrowia lipolytica (CWP) ^[1], Ogataea angusta (Sed, Tip) ^[2], we identified the corresponding homologous sequences of cell wall anchoring protein sequence in A. melanogenum BZ-11, also referred to as AM.CWP, AM.Sed, and AM.Tip. Subsequently, we successfully predicted the N-terminal signal peptides and C-terminal anchoring regions of these sequences using SignalP and the big-PI Fungal Predictor website.

Based on the instructor's advice, in the secretory pathway, the N-terminal signal peptide is cleavage and the C-terminal anchoring sequence is added. and this cleavage can affect protein folding and polarity changes, thereby impacting the cleavage site. To optimize the signal peptide sequence, we fused the predicted N-terminal and C-terminal sequences with the target protein and used SignalP to simulate the cleavage sites. If the cleavage site shifted, it indicated that polarity changes affected the cleavage point. By iteratively adding amino acids and comparing the modified sequences with the target protein's recombinant sequence, we ultimately optimized the sequence to minimize polarity changes, providing crucial support for subsequent wet lab experiments.

Fig. 1: Original, EGFP, VHb signal peptide prediction for AM.CWP

Since we need to use EGFP in subsequent experiments, we predicted the signal peptide sequences for both EGFP and VHb (Fig. 15). The results show that the EGFP sequence has three additional amino acids, ‘AYI’, compared to the original sequence, while VHb has four additional amino acids, ‘AYIN’. These added amino acids ensure the stability and functionality of the proteins.

At the same time, We made the same prediction for AM.Sed and AM.Tip.

Summary

By comparing the sequences of Yarrowia lipolytica (CWP) and Ogataea angusta (Sed, Tip), we identified the homologous cell wall anchoring sequences in A. melanogenum BZ-11. Utilizing the SignalP and big-PI Fungal Predictor tools, along with guidance from our instructor, we successfully predicted the N-terminal signal peptides and C-terminal anchor regions for these sequences in VHb and EGFP, providing crucial support for subsequent wet experiments.

Pre-Treatment Structure

Goals

We ensured the accuracy and reliability of the VHb and its modified protein model. We repaired the missing residues in the wild-type VHb and used AlphaFold3 to predict the structure of the modified protein. The reliability of the predicted results was verified through multidimensional assessment tools, and the signal peptide was removed to simulate the actual protein state. This process will provide a high-quality protein structure model for further analysis and experimentation.

Repair Residue

We used the wild-type VHb as a basic protein, but we found missing amino acids within it. We utilized UCSF Chimera to complete the missing residues in VHb and selected the optimal conformation as the output.

AlphaFold3 Prediction

We used SWISS-Model for homology modeling, but we found that the cell wall anchoring sequence added to the protein could not be constructed. Therefore, we changed our approach and used AlphaFold3 to simulate the three modified proteins: AM.CWP, AM.Sed, and AM.Tip, selecting the results with the highest confidence as the output (Fig. 2).

Fig. 2 AlphaFold3 Prediction Results (Blue indicates higher confidence, Orange indicates lower confidence)

To better ensure the reliability of the predicted results, we used the Seves website for protein evaluation. It assesses the protein results from four dimensions:
1. ERRAT (identifies non-bonded interactions and detects spatial arrangement errors)
2. Verify3D (evaluates the compatibility of each amino acid with its environment)
3. WHATCHECK (assesses geometric structure and physical properties)
4. PROCHECK (evaluates the rationality of backbone and side-chain conformations)

Among these, only Verify3D resulted in a "Fail", which we initially attributed to the unstable structure of the model in the linear region. Overall, we can consider the results from AlphaFold3 to be acceptable.

Remove Signal Peptide

Since the signal peptide is removed during the protein processing, we excluded the N-terminal signal peptide from the molecular modeling of the protein to simulate its structure as closely as possible to its actual state.

Summary

We repaired the missing residues in the wild-type VHb to ensure the model's integrity and used AlphaFold3 to predict the structures of three modified proteins (AM.CWP, AM.Sed, AM.Tip), selecting the results with the highest confidence. To validate the accuracy of the predictions, we conducted a multidimensional assessment. We also removed the signal peptide from the protein to simulate its structure in a real state. These steps provide a reliable foundation for subsequent structural analysis and experiments.

Molecular Docking

Goals

Hemoglobin, as a core protein for oxygen transport, relies on the binding of heme for its function. This critical process ensures that VHb protein can efficiently transport oxygen within the body. To prevent irreversible effects on the assembly of VHb protein caused by the sequences we add and to select the most stable combinations, we will perform molecular docking between VHb protein and heme. The aim is to assess the binding affinity of three modified proteins (AM.CWP, AM.Sed, AM.Tip) to heme, in order to determine which protein exhibits the strongest activity. This will help us understand the performance of these modified proteins in binding heme and provide suggestions for subsequent structural optimization and experimental validation.

Molecular Docking

We performed molecular docking using the three proteins predicted by AlphaFold3 (AM.CWP, AM.Sed, AM.Tip) as well as the wild-type, evaluating VHb activity through the docking affinity between VHb protein and heme. This allowed us to determine which modified protein exhibits the strongest activity and incorporate it into our system.
First, we used the wild-type protein to identify the docking pocket, improving docking efficiency and accuracy. In the AM.Tip protein model, we discovered that its C-terminal anchoring sequence was embedded in the active pocket of VHb protein(Fig. 3).

Fig. 3: AM.Tip prediction results(Yellow represents the anchoring sequence, red indicates the incorrectly predicted sequence, and blue represents the sequence near the active pocket)

Although seves website did not predict a serious structural error, this is still not eligible, and we speculate that it may be because the C-terminal structure just matches the active pocket structure, and it also indicates that the C-terminal sequence may influence the binding of VHb protein to heme. This provides a suggestion for future structural optimization and iGEM teams. Subsequently, we repeated the previous cycle to perform molecular docking.

We used Autodock Vina for the simulation. First, we evaluated the docking results of the wild-type VHb. We redocked the wild-type VHb and compared the results with the actual structure to calculate the RMSD.

RMSD (Root Mean Square Deviation), which assesses the structural similarity by measuring the deviation between two sets of atomic coordinates. r_iis the position of the atom at a particular moment in time, r_i,0 is the atomic position of the reference structure, N is the total number of atoms

Fig. 4: Wild-type VHb molecular docking results

The RMSD calculation result is 0.16 Å, which is less than 2 Å, indicating that our docking process is reliable. This will further enhance the efficiency of molecular docking.

Next, we performed molecular docking of the three proteins (AM.CWP, AM.Sed, AM.Tip) with heme, applying multiple screening criteria, including: heme position, propionic acid groups direction, docking interactions, binding affinity...

After repeated screening and re-docking, we finally selected the most favorable docking results.

Analysis Result

Fig. 5 AutoDock Vina Docking Results (Red dashed lines: Metal coordination, Blue dashed lines: Hydrophobic interactions, Yellow solid lines: Hydrogen bonds)

Fig. 6 Docking Affinity Results

It’s clear that the affinities of the modified proteins are very close, but all are lower than the affinity between the wild-type protein and heme.(Fig. 6) This indicates that the addition of the anchor sequence does indeed affect the activity of VHb protein, providing valuable insights and recommendations for future iGEM teams. Additionally, the affinity of AM.Tip is higher than that of AM.Sed and AM.CWP, but due to the controversial nature of AM.Tip's structure prediction, we cannot determine the best candidate based solely on affinity. Further wet-lab experiments will be required to explore this.

Summary

We redocked the wild-type VHb and calculated a low RMSD, confirming the reliability of the docking process, which provided a solid foundation for subsequent docking analyses. We then docked the three AlphaFold3-predicted modified proteins (AM.CWP, AM.Sed, AM.Tip) with heme. By evaluating factors such as affinity, binding site, docking interactions, and binding energy, we found that all modified proteins had lower affinities than the wild-type VHb, indicating that the added anchor sequences indeed affected the binding between VHb and heme. Although the AM.Tip protein showed higher affinity than AM.CWP and AM.Sed, due to the controversial nature of AM.Tip's structural prediction, we cannot determine the best candidate based solely on affinity. Further wet-lab experiments will be required to verify the most suitable protein sequence for use in the system.

Molecular Dynamic Simulation

Goals

We will perform molecular dynamics (MD) simulations on the optimal protein selected from wet-lab experiments. MD simulation is a computational method used to study the time-dependent behavior of molecules. It allows us to dynamically monitor protein behavior at the atomic level under different conditions, such as conformational changes, molecular interactions, and solvent effects. Compared to molecular docking, MD simulations provide more realistic dynamic information. We will evaluate the stability of the protein and its interaction with heme using parameters like RMSD and RMSF, and analyze how residue fluctuations impact protein function and stability.

Gromacs Simulation

After screening through wet-lab experiments, we ultimately selected AM.CWP as the optimal protein for expression. We decided to use MD simulations to dynamically monitor various parameters of the AM.CWP protein. Since our system contains metal elements, and although some force fields provide parameters for transition metals, the process can be complex. After extensive exploration and comparison, we finally chose CHARMM-GUI ^[3] to generate our force field files, which saved us a significant amount of time. We then used Gromacs for the final MD simulation calculations.

Analysis Result

We performed a 10,000 ps MD simulation of the protein and analyzed the RMSD and RMSF.
RMSD and RMSF are important parameters used to analyze the dynamic behavior and structural stability of proteins or other large molecules. They are crucial in understanding conformational changes and mobility during the simulation process.

Fig. 7: RMSD fluctuation graph

We calculated the RMSD for both the protein and heme. From the graphs, we can see that after 2500 ps, heme tends to stabilize, with the RMSD value fluctuating around 1 Å, indicating stable binding. However, the RMSD value for the protein showed significant variation, reaching up to 5 Å. This suggests that the spatial position of the protein changes quite dramatically, and we will further investigate the reasons for this. (Fig. 21)

RMSF (Root Mean Square Fluctuation): Unlike RMSD, which measures the overall deviation of the molecule relative to the reference structure.

T is the number of simulation time steps, x_i(t_j) is the atomic position at moment j, X_i is the average position of the simulated process. RMSF focuses on the local fluctuations of each atom or residue, reflecting the flexibility or mobility of specific regions of the molecule. It specifically highlights the average fluctuations of individual residues during the simulation. A high RMSF value indicates greater flexibility in that region, which can be used to infer critical information, such as active sites.

Fig. 9: RMSF fluctuation graph

We performed RMSF calculations for the protein, and the analysis results show significant fluctuations near residue 185, reaching up to 7Å, indicating high flexibility in this region. (Fig. 22) However, this is not our active site. Based on the RMSD results and the comparison of the protein structure, we speculate that this fluctuation is influenced by our linear anchoring sequence.

We conducted 10 ns of visualization on the protein complex to assist with data observation. From the video, it is evident that the greatest factor affecting protein stability is the added C-terminal sequence, confirming our hypothesis. Since it is in a linear form, it exhibits significant instability during dynamic simulations. However, this instability is not present in our actual biological system, as it will bind to the cell wall, stabilizing its conformation.

Summary

Through 10 ns of MD simulation of the AM.CWP protein and its complex with heme, we found that the binding of heme exhibits high stability, with RMSD fluctuations within 1 Å. However, the overall conformational changes of the protein are significant, especially after the addition of the linear anchoring protein sequence, which shows a higher RMSD. RMSF results indicate significant fluctuations near residue 185. Based on visual observations and comparisons, we hypothesize that this instability will be mitigated in actual biological systems due to binding to the cell wall, further validating our model and providing strong theoretical support for wet experiments.

References

[1] Jaafar, L., & Zueco, J. (2004). Characterization of a glycosylphosphatidylinositol-bound cell-wall protein (GPI-CWP) in Yarrowia lipolytica. Microbiology, 150(1), 53–60. https://doi.org/10.1099/mic.0.26430-0
[2] Fujii, T., Shimoi, H., & Iimura, Y. (1999). Structure of the glucan-binding sugar chain of Tip1p, a cell wall protein of Saccharomyces cerevisiae. Biochimica et Biophysica Acta (BBA) - General Subjects, 1427(2), 133–144. https://doi.org/10.1016/s0304-4165(99)00012-4
[3] Brooks, B. R., Brooks, C. L., Mackerell, A. D., Nilsson, L., Petrella, R. J., Roux, B., Won, Y., Archontis, G., Bartels, C., Boresch, S., Caflisch, A., Caves, L., Cui, Q., Dinner, A. R., Feig, M., Fischer, S., Gao, J., Hodoscek, M., Im, W., & Kuczera, K. (2009). CHARMM: The biomolecular simulation program. Journal of Computational Chemistry, 30(10), 1545–1614. https://doi.org/10.1002/jcc.21287

Molecular Modeling

Cell Wall Anchoring Design

Goals

Sequence Alignment

Summary

Pre-Treatment Structure

Goals

Repair Residue

AlphaFold3 Prediction

Remove Signal Peptide

Summary

Molecular Docking

Goals

Molecular Docking

Analysis Result

Summary

Molecular Dynamic Simulation

Goals

Gromacs Simulation

Analysis Result

Summary

References