Modeling

How can our system impact the life cycle of E. coli?

Since the glycosylation process will occur in the periplasm of our engineered E. coli strain, it is essential to consider the insertion of glycans into nonspecific protein sites that are also transported by the SEC system. As Oligosaccharyltransferases (OST) act on non-folded proteins, we do not need to worry about proteins transported to the periplasm by the Tat system, which only transports already folded proteins.

To identify a list of proteins that can be glycosylated by our system, which will be introduced into the E. coli BL21(DE3) strain, we compiled a list of all the coding sequences (CDS) from this strain’s genome (NCBI Reference Sequence: NZ_CP053602.1). We then conducted a search for proteins containing one or more motifs of the type “N[^P][ST]”. This search yielded a FASTA file containing 2,565 proteins, each with at least one glycosylation site.

Since our goal is to identify only the proteins that can be transported to the periplasm by the SEC system, we used the SignalP 5.0 software (1) to search for signal peptides recognized by this system. We used as input the glycosylation motif-containing proteins from E. coli BL21(DE3).

The table resulting from the signal peptide search was converted into a Pandas DataFrame (2), from which we filtered the proteins predicted as SP (Sec/SPI) or LIPO (Sec/SPII), the two types of signal peptides recognized and imported by the SEC system. After filtering, we obtained a list of 410 proteins that possess glycosylation sites and are imported into the periplasm by the SEC system.

With this data, we intend to determine which of these proteins may have their functions altered due to the potential inserted glycosylations.

REFERENCES

  1. José Juan Almagro Armenteros, Konstantinos D. Tsirigos, Casper Kaae Sønderby, Thomas Nordahl Petersen, Ole Winther, Søren Brunak, Gunnar von Heijne and Henrik Nielsen. Nature Biotechnology, 37, 420-423, doi:10.1038/s41587-019-0036-z (2019)
  2. The pandas development team. (2024). pandas-dev/pandas: Pandas (v2.2.3). Zenodo. https://doi.org/10.5281/zenodo.13819579

Introduction to protein molecular dynamics?

Molecular dynamics (MD) simulations are used to predict how each atom of a macromolecule will move over time. These predictions are based on physical laws that, combined, govern the movement of the system. In this regard, force fields are considered, which are generally divided into bonded (or intramolecular) and non-bonded (or intermolecular) interactions [1, 2]. Each of these interactions is modeled with an appropriate function, which is adjusted based on experimental data or quantum calculations. Bonded interactions describe the dynamics between atoms that are directly connected by chemical bonds and are described as bond stretching energy (modeled as a harmonic function), bond angle energy (modeled as angular deformation between three atoms connected by two bonds), and torsional or dihedral energy (which describes rotation around a chemical bond). Conversely, non-bonded interactions describe forces between atoms or molecules that are not directly chemically bonded and are responsible for phenomena such as van der Waals interactions and electrostatic interactions. Van der Waals interactions are described by Lennard-Jones energy, while electrostatic interactions are determined by electrostatic energy (calculated from Coulomb's law).

At the molecular level, molecular dynamics simulations provide detailed insights into important biomolecular processes, such as conformational changes, ligand interactions, and the study of protein folding. Furthermore, molecular dynamics has the ability to predict how biomolecules will respond, at the atomic level, to perturbations such as mutations, post-translational modifications (such as phosphorylations and glycosylation), protonations, or the addition/removal of a ligand. These simulations are often combined with a wide range of experimental structural biology techniques such as X-ray crystallography, cryo-electron microscopy (cryo-EM), nuclear magnetic resonance (NMR), electron paramagnetic resonance (EPR), and Förster resonance energy transfer (FRET).

In the context of molecular dynamics (MD) applied to the study of biophysical and structural processes, we investigated the wild-type β-glucocerebrosidase (GCaseWT) and two mutants, N19E (GCaseMut1) and N19D (GCaseMut2). Our study focused on the stability of glycosylation in GCaseWT and its mutants, maintaining Man3GlcNac2 glycosylations at the N59, N146, and N270 residues, as seen in the workflow (Figure 1).

Figure 1. Workflow of this study. Homology modeling and molecular dynamics simulations of the GCaseWT and its variants N19E and N19D.

To construct accurate structural models, the imiglucerase structure obtained from X-ray crystallography (PDB ID 2J25) was used as a template in the Swiss-Model web server. Mutant structures with the N19E and N19D substitutions were generated by introducing these mutations into the Swiss-Model sequence. Glycosylation was performed using the Charmm-Gui web server (https://www.charmm-gui.org/). The system was equilibrated using NVT and NPT ensembles, followed by a 100 ns molecular dynamics simulation. The molecular dynamics results are shown in Figures 1 and 2.

Figure 2. Radius of gyration (Rg) and RMSD (root-mean-square deviation) analysis for GCase variants. The plot shows the radius of gyration (Rg) and RMSD for GCase Wild Type (WT), Mutant 1 (Mut1), and Mutant 2 (Mut2) over the simulation time. The Rg is used to evaluate the compactness and overall structural stability of the proteins, where consistent values indicate stable conformations. The RMSD provides a measure of structural deviation from the initial configuration, with smaller fluctuations indicating higher stability. Together, these metrics offer information into the conformational stability of the GCase variants during the simulation.
Figure 3. Root-mean-square fluctuation (RMSF) analysis. The RMSF values for GCase Wild Type (WT), Mutant 1 (Mut1), and Mutant 2 (Mut2) are presented, illustrating the backbone fluctuations within each variant over the simulation period. RMSF serves as a metric to assess the degree of movement of individual atoms, indicating regions of stability and flexibility within the protein structure. Higher RMSF values suggest greater structural flexibility, while lower values reflect less structural flexibility.

Overall, our results indicate that the mutants exhibited a stable radius of gyration throughout the simulation, providing strong evidence that the predicted and simulated structures closely resemble their potential real structures. The RMSD values of the three proteins were comparable, allowing us to conclude that the apo-state of GCase and its mutants shows good conformational stability. Minor variations observed are due to statistical fluctuations, which did not exceed 3 Å, further supporting the proteins' stable conformational behavior.

In addition to demonstrating good stability, it also was observed that the average fluctuations of the residues were preserved regardless of the mutations. This suggests that the protein may retain its functionality even in the presence of these alterations. The movies show the conformational changes of GCaseWT and its mutants N19E and N19D. However, it is essential to conduct further experiments to validate these results. Performing experimental studies will confirm whether the enzyme and its mutants exhibit efficient binding affinity, supporting the hypothesis that the protein's functionality is not compromised by these mutations. This investigation is key for understanding the functional implications of the mutations and their potential impact on biotechnological and therapeutic applications.

REFERENCES

[1] Schaffer LV, Ideker T. Mapping the multiscale structure of biological systems. Cell Syst. 2021 Jun 16;12(6):622-635. doi: 10.1016/j.cels.2021.05.012. PMID: 34139169; PMCID: PMC8245186.

[2] Sinha S, Tam B, Wang SM. Applications of Molecular Dynamics Simulation in Protein Study. Membranes (Basel). 2022 Aug 29;12(9):844. doi: 10.3390/membranes12090844. PMID: 36135863; PMCID: PMC9505860.