Molecular simulation
1 Introduction
Molecular simulation is a computational technique based on theoretical chemistry and classical mechanics, allowing for the quantitative prediction of molecular structures, properties, and chemical reactions that are often difficult to determine through experimental approaches. It is widely used to study molecular interactions and their dynamic behavior under various conditions. In this project, wet-lab experiments have demonstrated that RUP2 can competitively bind to UVR8, thereby displacing COP1 and facilitating the export of UVR8 from the nucleus, leading to negative feedback regulation of target protein expression. However, the underlying molecular mechanism responsible for this phenomenon remains unclear. Therefore, we aim to calculate the binding free energies of the UVR8-COP1 and UVR8-RUP2 protein complexes and compare the magnitudes of these values to provide a thermodynamic explanation for the observed experimental results.
Based on preliminary discussions and analyses, we have decided to use Gromacs 2024.31for molecular dynamics simulations of five proteins. After identifying the stable periods during the dynamics simulations, we will use the gmxMMPBSA tool2 to compute the MMPBSA binding free energies of the two protein complexes. By comparing their respective free energy values, we hope to offer theoretical insights into the molecular basis of the experimental observations.
2 Molecular Dynamics Simulation
Molecular dynamics (MD), as one of the core techniques in molecular simulations, is a method based on Newtonian mechanics to simulate the motion of molecular systems. By numerically integrating Newton's equations of motion, MD simulates the time evolution of molecules, enabling the analysis of thermodynamic quantities and other macroscopic properties of the system. It allows us to observe how molecules move and interact under varying environmental conditions, such as changes in temperature and pressure, thus aiding in the elucidation of the microscopic mechanisms underlying macroscopic physical phenomena.
2.1 Dynamic Simulation Process
The molecular dynamics simulation process for protein molecules can be summarized in the following steps:
- Acquisition and Preprocessing of Protein Files
First, we search for the required protein information on the UniProt website3. After identifying the appropriate entry, we download the corresponding protein's PDB file from the RCSB PDB database4. The PDB file is opened with PyMOL5, and solvent molecules, ions, and ligand molecules are removed. The modified PDB file is then processed with PDBFixer6to repair missing atoms in certain residues, resulting in a fully preprocessed PDB file.
- Conversion from PDB to GRO Files
Next, the preprocessed PDB file is imported into GROMACS and converted into GROMACS-compatible files: gro, itp, and top. After several trials, we selected the CHARMM27 force field7 for all five protein systems. Additionally, considering the real solution environment in the cytoplasm and nucleus, we maintained the terminal and side chain amino and carboxyl groups in their -NH3+ (Pro-NH2+) and -COO- forms to enhance the binding strength between proteins.
- Solvent and Ion Filling
Then, the TIP3P water model8 was selected to create the solvent environment. Given the significantly larger volume of protein complexes compared to monomeric proteins, which can easily leave the solvent box during simulation and destabilize the system, we created a dodecahedral solvent box for all proteins, ensuring a 1.5 nm distance between the protein and the box surface. After filling the solvent, we noticed that the proteins carried several positive/negative charges. Considering that the simulated environment represents the cytoplasm of human skin cells, we added NaCl to the solvent box to neutralize the system, ensuring a net charge of zero.
- Energy Minimization and System Equilibration
Subsequently, we performed energy minimization, NVT, and NPT equilibration steps9 using the em.mdp, nvt.mdp, and npt.mdp parameter files, respectively. By monitoring potential energy, pressure, and temperature changes after each step, we continuously adjusted the parameters in the mdp files until the system reached a fully equilibrated state.
- Molecular Dynamics Simulation
After all preprocessing and equilibration steps were completed, we carried out molecular dynamics simulations on the optimized gro files. A preliminary short-term simulation was conducted to observe the molecular trajectories. Based on the outcomes of this preliminary simulation, the parameters in the md.mdp file were refined. Once the short-term simulation results met our expectations, we proceeded with a 100-ns long molecular dynamics simulation.
- Correction of Simulation Trajectories
Upon completion of the 100-ns simulation, we obtained the simulated molecular trajectory file (xtc). The raw trajectory file was then processed by applying periodic boundary conditions to eliminate edge effects, removing translational and rotational motions of the protein, and smoothing the trajectory to better reflect real-world behavior. After these corrections, a new trajectory file was generated for subsequent analysis.
- Analysis of Simulation Results
Finally, we extracted the molecular motion data from the corrected trajectory file. The first frame of the protein conformation was chosen as a reference structure for the simulation. We calculated the root mean square deviation (RMSD) over the 100-ns period to assess whether the protein reached a stable state10. Additionally, we generated a Ramachandran plot to display the dihedral angle distribution of all residues in the protein, evaluating the structural reasonability of the conformation.

Fig.1 Modeling Process of Molecular Simulation
2.2 Display and Analysis
After completing the molecular dynamics simulations for all five proteins, we generated several plots: the potential energy variation curves during energy minimization, the RMSD (Root Mean Square Deviation) variation curves and their probability density distribution during the dynamics simulations, and the Ramachandran plots for the two protein complexes.
From the potential energy variation curves, it is clear that each protein successfully underwent energy minimization and reached a lower potential energy state, which is favorable for subsequent molecular dynamics simulations.
Fig.2 Energy Minimization Energy with Time
The RMSD variation curves reveal that all five proteins reached a stable state in the given simulation environment, with the RMSD values gradually converging to a steady level. The RMSD values for the two protein complexes are also concentrated, reflecting the stability of the protein conformations.
Fig.3 The Fluctuation of RMSD During the 100 ns Dynamics Simulation Process.
Fig.4 The RMSD Fluctuation Distribution of UVR8-COP1 and UVR8-RUP2.
The Ramachandran plots 11for the two protein complexes show that most backbone dihedral angles are primarily distributed in the upper left region (β-sheets), the left-center region (α-helices), and the right-center region (left-handed α-helices). The majority of the points fall within the fully allowed regions, followed by the allowed regions, with only a few points located in disallowed regions, indicating the rationality of the simulation results.
Fig.5 Ramachandran Plots of UVR8-RUP2 and UVR8-COP1
3 Free Energy Calculation
Free energy calculation is an important application in molecular simulations, aimed at quantitatively assessing the stability and strength of molecular binding to a target receptor. By performing free energy calculations, it is possible to predict the binding affinity of ligands interacting with targets, which holds significant importance for molecular screening and design. Several methods for calculating binding free energy have been developed, with commonly used approaches including free energy perturbation (FEP), thermodynamic integration (TI)12, and molecular mechanics Poisson-Boltzmann surface area/Generalized Born surface area (MM-PBSA/MM-GBSA)13.
Based on our preliminary exploration of these methods and a review of relevant literature, we have found that while FEP and TI offer high precision by accurately calculating the energy differences between different states to optimize ligands, these methods are overly complex for calculating the binding free energy of protein complexes12 in our project. They typically require extensive sampling and long simulation times, making them impractical for this particular case. In contrast, although MM-P(G)BSA has limitations in terms of accuracy and struggles to capture the detailed thermodynamic pathways involved in the binding process, it offers a relatively fast calculation speed and performs well when applied to large-scale systems such as protein-protein interactions13. Therefore, we have decided to use the MM-PBSA method to calculate the binding free energy of the two protein complexes.
3.1 Introduction of MM-PBSA14
According to thermodynamic principles, free energy can be decomposed into enthalpic and entropic contributions.
\[
G=H-TS
\]
According to the definition of enthalpy, in vacuum conditions, \( H=U \). Therefore, for solution systems, enthalpy can be decomposed into:
\[ G=U+G_{sol}-TS \]
The internal energy \( U \) of the solute in vacuum, also known as the gas-phase internal energy of the solute, can be decomposed into:
\[ U=E_{int}+E_{ele}+E_{vdW} \]
Here, \(E_{int}\) represents the internal energy of the molecule, including energy variations due to changes in bond lengths, bond angles, and dihedral angles; \(E_{ele}\) denotes the electrostatic energy; and \(E_{vdW}\) refers to the van der Waals energy.
The solvation energy \(G_{sol}\) includes the internal energy and entropy changes of the solute upon entering the solvent, as well as the work required to displace the solvent. If all solvent molecules are explicitly included in the simulation to account for solvent effects, the computational cost would be significantly high. Therefore, implicit solvent interactions are introduced. Accordingly, the solvation energy can be decomposed into:
\[ G_{sol}=G_{ele}+G_{dis}+G_{rep}+G_{cav}+G{the}+P\Delta V \]
Here, \(G_{ele}\) represents the electrostatic contribution; \(G_{dis}\) is the dispersion energy, reflecting van der Waals interactions; \(G_{rep}\) is the repulsion energy, accounting for short-range interactions caused by the Pauli exclusion principle; \(G_{cav}\) is the cavitation energy, which refers to the energy required to form a cavity in the solvent; and \(G{the}\) represents the thermal energy term, explaining the changes in solute vibrations and rotations.
By classifying the interactions, they can be divided into polar and non-polar components.
\[ \Delta G^0_{solv}=\Delta G^0_{solv,polar}+\Delta G^0_{solv,nonpolar} \]
The contribution of the non-polar component is relatively simple to calculate and can be considered proportional to the solvent-accessible surface area (SASA), reflecting the hydrophobic effect. It can be expressed as:
\[ \Delta G^0_{solv,nonpolar}=\gamma SASA+\beta \]
For the polar contribution to the solvation energy, it can be considered equivalent to the electrostatic potential energy. In gas-phase molecular simulations, the electrostatic potential can be obtained by solving the standard Poisson equation.
\[ \nabla ^2\phi (r)=-4\pi \rho (r) \]
Here, \(\phi (r)\) represents the electrostatic potential as a function of the charge density distribution.
By introducing a continuous dielectric cavity described by a dielectric distribution function, the solvent effects can be implicitly included in the model.
\[ \nabla\varepsilon(r)\nabla \phi (r)=-4\pi \rho (r) \]
If the solvent also contains mobile ions, an additional term is required to account for the contribution of the ions. The charge density distribution caused by the ions depends on the distribution of the electrostatic potential, leading to a nonlinear differential equation.
\[ \nabla\varepsilon(r)\nabla \phi (r)=-4\pi (\rho (r)+\rho_{ions}[\phi](r)) \]
Furthermore, this can be further decomposed into the sum of contributions from each type of ion.
\[ \nabla\varepsilon(r)\nabla \phi (r)=-4\pi (\rho (r)+\gamma [\varepsilon](r)eN_A\sum_{i=1}^{m}Z_ic_i[\phi](r)) \]
Based on the Boltzmann distribution, it can be expressed as:
\[ c_i[\phi](r)=c_i^\infty exp(-\frac{Z_ie\phi(r)}{k_BT} ) \]
This equation applies when the system reaches equilibrium between thermal fluctuations and electrostatic forces. If the electrostatic energy of the system is much smaller than \(k_BT\), the nonlinear function can be approximated by a linear function, simplifying the subsequent calculations. This approximation leads to the linearized Poisson-Boltzmann equation, which is more tractable for analytical or numerical solutions.
Based on the fundamental thermodynamic principles and derivations above, the basic principle of MM/PB(GB)SA is that, when calculating the binding free energy difference between two solvated molecules in their bound and unbound states, the total binding free energy in the solvent is decomposed into two parts: the binding free energy in vacuum and the solvation energy.
Fig.6 The Principal of MM-PBSA
The solvation energy can be calculated using the previously discussed formulas for polar and non-polar solvation energies. Meanwhile, the binding free energy in vacuum can be obtained by calculating the free energies of the complex, receptor, and ligand individually.
\[ \Delta G^0_{bind,vacuum}=\Delta G^0_{complex,vacuum}-\Delta G^0_{receptor,vacuum}-\Delta G^0_{ligand,vacuum} \]
For each molecule, the binding free energy can also be expressed as the sum of various energy contributions. Specifically, it can be broken down as follows:
\[ \Delta G^0_{vacuum}=\Delta E^0_{MM}-T\Delta S^0=(\Delta E^0_{int}+\Delta E^0_{vdW}+\Delta E^0_{ele})-T\Delta S^0 \]
If the receptor-ligand complex undergoes minimal conformational changes upon binding, the entropy contribution can be neglected when calculating the difference. In this case, the binding free energy simplifies to:
\[ \Delta G^0_{vacuum}=\Delta E^0_{MM}=(\Delta E^0_{int}+\Delta E^0_{vdW}+\Delta E^0_{ele})\]
In summary, the binding free energy between molecules in a solution system can be calculated using this approach. By applying a series of approximations to compute the energies, including vacuum molecular mechanics energy and solvation energy, the method is known as MM-PB(GB)SA. This approach provides an efficient way to estimate the binding free energy in solvated systems by balancing accuracy with computational efficiency, making it widely used in computational chemistry and molecular simulations.
3.2 Process of Calculating the Binding Free Energy
The process of calculating the binding free energy for protein complexes using the MM-PB(GB)SA method can be divided into the following steps:
- Selecting the trajectory from the stable time period of the motion.
By conducting a visual analysis of the results from the previous phase of molecular dynamics simulations, we selected the trajectory where both the complex and its constituent chains remained in a stable state throughout the simulation as the reference trajectory range for binding free energy calculations. Based on the analysis of the first phase results, for the UVR8-COP1 complex, we selected the 50-100 ns simulation trajectory, and for the UVR8-RUP2 complex, we selected the 60-100 ns simulation trajectory. The original trajectories were truncated using Gromacs, yielding motion trajectory files of the protein complexes in their stable states.
- Select reference conformations, generate and modify new index files.
After testing different reference conformations, we found that selecting the protein conformation from the point where the system first enters a stable state as the reference conformation results in smaller and more accurate errors in subsequent free energy calculations. Using the selected reference conformation, we generated new index files and established index entries for chains A and B based on the original protein information, ensuring consistency with the data in the top and itp files. This prepares the system for the upcoming calculations.
- Calculation of MM/PBSA Binding Free Energy.
The binding free energy between the two chains was calculated using the installed gmx MMPBSA tool. By adjusting the calculation parameters in the mmpbsa.in file and repeatedly performing free energy calculations, the results were statistically analyzed to demonstrate that there is a significant difference in the binding free energy between the two complexes. This difference helps to validate the observations from wet lab experiments.
- Significance Analysis of the Binding Free Energy Difference Between Two Complexes
Since this is an inferential statistical problem involving two samples with unknown population parameters, we decided to use the t-test to perform significance testing. First, we propose the following hypotheses:
\[
\begin{matrix}
H_0: \text{There is no significant difference in the binding free energies of the two complexes,} \\
\text{meaning the mean binding free energies of the two complexes are equal.}
\end{matrix}
\]
\[
\begin{matrix}
H_1:\text{There is a significant difference in the binding free energies of the two complexes,} \\
\text{meaning the mean binding free energies of the two complexes are not equal.}
\end{matrix}
\]
Next, we performed a test for homogeneity of variances between the two groups of data using Levene's test, where the difference in variances between the two groups was used as the test statistic. The test result yielded a p-value of 0.367, which is greater than 0.05, meaning we failed to reject the null hypothesis, indicating that the variances are homogeneous. Therefore, we applied Welch's t-test to determine whether there is a significant difference between the means of the two groups.
The constructed test statistic \(t\) is given by:
\[
t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}
\]
Here, \(s_p^2\) is
\[
s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}
\]
After substituting the data into the formula, we calculated \(t\)=7.28, with a p-value less than 0.05. Therefore, we reject the null hypothesis, concluding that there is a significant difference in the mean binding free energies between the UVR8-COP1 group and the UVR8-RUP2 group. This result validates the conclusions drawn from the wet-lab experiments, further supporting the experimental findings.

Fig.7 Modeling Process of Binding Free Energy Calculation
3.3 Display and Analysis
Based on the descriptive statistical results of 11 binding free energy calculations for the two protein complexes, it was observed that the binding free energy values for the same protein complex under different parameters exhibited significant variability (±30 kCal/mol). After discussion, we believe this may be due to the fact that, even in the stable phase of RMSD, the protein conformations can still differ significantly.15 While the deviation from the standard conformation is small, the conformations can vary greatly from each other. The inconsistent selection of conformations under different search ranges and intervals may ultimately lead to large variations in binding free energy. However, after multiple statistical analyses, using hypothesis testing and violin plots, we found that the overall trends are consistent with experimental conclusions, namely that RUP2 has a stronger affinity for UVR8 compared to COP1, thus validating the final conclusion.

Fig.8 Comparison of Binding Free Energy Between UVR8-COP1 and UVR8-RUP2.
Fig.9 Violin Plot Comparing the Binding Free Energy Distribution Between UVR8-COP1 and UVR8-RUP2.
Fig.10 uvr8-cop1.gif
Fig.11 uvr8-rup2.gif
Reference
- Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1-2, 19-25.
- Valdés-Tresanco, M. S., Valdés-Tresanco, M. E., Valiente, P. A., & Moreno, E. (2021). gmx_MMPBSA: A new tool to perform end-state free energy calculations with GROMACS. Journal of Chemical Theory and Computation, 17(10), 6281-6291.
- The UniProt Consortium. (2021). UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Research, 49(D1), D480–D489.
- Burley, S. K., Berman, H. M., Bhikadiya, C., Bi, C., Chen, L., Di Costanzo, L., ... & Zardecki, C. (2019). RCSB Protein Data Bank: Biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology, and energy. Nucleic Acids Research, 47(D1), D464-D474.
- Schrödinger, LLC. (2015). The PyMOL molecular graphics system, version 2.0.
- Eastman, P., & Pande, V. S. (2017). PDBFixer: Automatically improving PDB files. Protein Science, 26(12), 2626-2632.
- MacKerell, A. D., Bashford, D., Bellott, M., Dunbrack, R. L., Evanseck, J. D., Field, M. J., ... & Karplus, M. (1998). All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B, 102(18), 3586-3616.
- Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., & Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics, 79(2), 926-935.
- Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A., & Haak, J. R. (1984). Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics, 81(8), 3684-3690.
- McGibbon, R. T., Beauchamp, K. A., Harrigan, M. P., Klein, C., Swails, J. M., Hernández, C. X., ... & Pande, V. S. (2015). MDTraj: A modern open library for the analysis of molecular dynamics trajectories. Biophysical Journal, 109(8), 1528-1532.
- Ramachandran, G. N., Ramakrishnan, C., & Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. Journal of Molecular Biology, 7(1), 95-99.
- Shirts, M. R., & Pande, V. S. (2005). Solvation free energies of amino acid side chain analogs for common molecular mechanics water models. Journal of Chemical Physics, 122(13), 134508.
- Genheden, S., & Ryde, U. (2015). The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opinion on Drug Discovery, 10(5), 449-461.
- Hou, T., Wang, J., Li, Y., & Wang, W. (2011). Assessing the performance of the molecular mechanics Poisson-Boltzmann surface area and molecular mechanics generalized Born surface area methods in reproducing the binding free energies of protein-protein complexes. Journal of Chemical Information and Modeling, 51(1), 69-82.
- Gohlke, H., & Case, D. A. (2004). Converging free energy estimates: MM-PB(GB)SA studies on the protein-protein complex Ras-Raf. Journal of Computational Chemistry, 25(2), 238-250.