Overview

Molecular Dynamics Documentation

Molecular dynamics documentation

Created by Vishwaa Kannan

Any files used or created by Amber or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.

https://gitlab.igem.org/2024/software-tools/gcm-ky

Fixing last year's simulations

The error with the previous year’s simulation was how I set Matplotlib to parse the data files. This was the previous code:

              step = data[:,0]
              potential_energy = data[:,1]
              temperature = data[:,2]
              volume = data[:,3]
              kinetic_energy = data[:,4]
              total_energy = data[:,5]

The order was corrected to parse the data. This is the corrected code:

              step = data[:,0]
              potential_energy = data[:,1]
              temperature = data[:,4]
              volume = data[:,5]
              kinetic_energy = data[:,2]
              total_energy = data[:,3]

The graph part was re-run to obtain the correct graphs for pfoa luxr:

Docking of Human Liver Fatty Acid Binding Protein (FAB) with PFOA

The purpose of docking FAB and PFOA is to confirm that FAB works. We tested FAB in the lab and did not observe any noticeable fluorescence. So, we have docked these two into a complex, which we used for further calculations down the road.

Preparation of docking

We used AutoDock Vina on Windows, installed via their MSI installer.

https://vina.scripps.edu/downloads/

We used Alpha Fold 3 to fold the FAB sequence into a Protein Data Bank (PDB). Below is the URL and sequence we used.

https://alphafoldserver.com/

MSFSGKYQLQSQENFEAFMKAIGLPEELIQKGKDIKGVSE
IVQNGKHFKFTITAGSGSGSHNVYIMADKQKNGIKVNFKI
RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSAL
SKDPNEKRDHMVLLEFVTAAGITHGMDELYKGGTGGSMRK
GEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLT
LKFICTTGKLPVPWPTLVTTFGYGVQCFARYPDHMKQHDF
FKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRI
ELKGIDFKEDGNILGHKLEYNYNGGKVIQNEFTVGEECEL
ETMTGEKVKTVVQLEGDNKLVTTFKNIKSVTELNGDIITN
TMTLGDIVFKRISKRI

We also retrieved the Structured Data File (SDF) for PFOA from PubChem by downloading the 3D conformer here:

https://pubchem.ncbi.nlm.nih.gov/compound/9554\#section=2D-Structure

In a molecule viewer application, we used Chimera X (which can be installed via their app). We opened the FAB PDB, added hydrogens, and saved the output file as a PDB.


              Tools > Structure Editing > Add Hydrogens > Also consider H-bonds (slower) > Ok

Then, in a new window, we opened the SDF for PFOA, deleted the hydrogen, and saved the output file as a PDB. This is essential, as PFOA is naturally deprotonated.


              Select > Chemistry > Element > H  
              Actions > Atoms/Bonds > Delete

Creating AutoDock Vina Files

We then need to save our PDB files as AutoDock Vina files (PDBQT); this step is relatively simple.

To start AutoDock Vina, type “AutoDock Tools” in the search bar and run the application. After a short period, the GUI will load. Load the receptor PDB into AutoDock Vina by dragging it into the application. Then follow these commands to edit and save the receptor (FAB) file:


              Edit > Hydrogens > Add > Polar only > noBondOrder > yes  
              Edit > Charges > Add Kollman Charges > Ok  
              Grid > Macromolecule > Choose > FAB > Select Molecule > Ok > Save file<

Next, we will prepare the ligand with these commands:


              Right-click FAB > Delete
              Drag Ligand PDB file into AutoDock Vina
              Ligand > Input > Choose > PFOA
              Ligand > Output > save > Ok

Next, delete PFOA from AutoDock Vina via:


              Right-click PFOA > Delete

Performing the docking

We must select the domain to which AutoDock Vina will attempt to dock PFOA.

Drag the .pdbqt file for the FAB and PFOA into a new AutoDock Vina Window. It is essential not to overwrite charges; click “ok” and keep the charges when importing.


              Grid > Macromolecule > choose > FAB  
              Ligand > Input > Choose > PFOA  
              Grid > Gridbox  
              X-size: 60  
              Y-size: 60  
              Z-size: 60  
              X-center: 3.660  
              Y-center: -11.488  
              Z-center: 16.008

This should show a red, blue, and green box that occupies the binding pocket for FAB

Next, we will complete the docking procedure, open the command prompt, and type the following commands:


              cd path/to/files/  
              touch config.txt  
              nano config.txt  
              receptor = fab.pdbqt  
              ligand = pfoa.pdbqt  
              center_x = 3.660  
              center_y = -11.488  
              center_z = 16.008  
              size_x = 60
              size_y = 60
              size_z = 60
              energy_range = 20
              exhaustiveness = 500
              {ctrl + x}
              Y
              {enter}

Docking is an exhaustive study, so AutoDock Vina will try however many positions you tell it to and pick the best 10. The energy range represents the difference in binding affinity (Kcal/mol) between the best and the worst poses.

Finally, we can run AutoDock Vina. Find the path to Vina in your file manager and enter the following command into Command Prompt.


              “c:path/to/vina” –receptor protein.pdbqt –ligand ligand.pdbqt –config config.txt –log log.txt –out output.pdbqt

To extract the best pose, open the output.pdbqt file in Chimera X and select the first model at the bottom right of the window. Then click save, saving only the selected model. You should now have an output. PDB file with your complex.

Confirmation of the docking of Fab-Gfp to PFOA

Input / Output files are located in the Software Tools Repository
https://gitlab.igem.org/2024/software-tools/gcm-ky

Now that we have an approximate dock of PFOA and FAB, we need to simulate it to ensure the bind is stable. To do this, we used Amber. Amber can be installed by following their guide on the website and their PDF document here:

https://ambermd.org/doc12/Amber24.pdf

Since we are just determining if the bind is stable, we simulated the complex in an implicit solvent system, as this system is less accurate but very efficient with resources.

Preparing the Simulation

Generating Force Fields with Antechamber

Since we are working with the PFOA ligand (perfluorooctanoic acid), we must generate its charges and force field parameters. PFOA is deprotonated and thus should have a net charge of -1. Using the following antechamber commands, we will create a .mol2 file for PFOA and assign AM1-BCC charges suitable for small organic molecules in AMBER simulations. parmchk2 is then used to generate missing force field parameters.


              antechamber -i pfoa.pdb -fi pdb -o pfoa.mol2 -fo mol2 -c bcc -nc -1 -s 2
              parmchk2 -i pfoa.mol2 -f mol2 -o pfoa.frcmod

This creates two important files:

pfoa.mol2: contains the ligand structure with BCC charges.
pfoa.frcmod: additional force field parameters not covered by the standard GAFF.

Generating Topology and Coordinate Files with LEaP

To generate the necessary topology (prmtop) and coordinate (rst7) files, we will use LEaP, specifying the appropriate force fields and water model.

LEaP Input (leap_s.in)


              source leaprc.protein.ff19SB
              source leaprc.gaff2
              source leaprc.water.opc
              UNL = loadmol2 pfoa.mol2
              check UNL
              loadamberparams pfoa.frcmod
              saveoff UNL unl.lib
              saveamberparm UNL pfoa.prmtop pfoa.rst7
              quit

Force fields:

ff19SB: For the protein.

gaff2: General force field for organic molecules, like PFOA.

water.opc: For the water model used in implicit solvation.

This script prepares the prmtop and rst7 files for PFOA, which we’ll later combine with the protein for docking.

To run the script, use the following command:


              tleap -s -f tleap_s.in

Creating the Complex

Now that we have the ligand’s force field parameters, we’ll combine it with the protein to form the PFOA-FAB complex.

LEaP Input (tleap_f.in)


              source leaprc.protein.ff19SB
              source leaprc.gaff2
              source leaprc.water.opc

              # Load the ligand
              UNL = loadmol2 pfoa.mol2

              loadamberparams pfoa.frcmod
              loadoff unl.lib

              # Load the protein
              protein = loadpdb justFABnoHYDRO.pdb

              # Combine the protein and ligand
              complex = combine {protein UNL}

              # Save the combined complex
              saveamberparm complex pfoa_dock.prmtop pfoa_dock.rst7
              savepdb complex pfoa_dock.pdb
              quit

This script loads the ligand (pfoa.mol2) and the protein (justFABnoHYDRO.pdb), combines them into a single complex, and generates the topology and coordinate files needed for molecular dynamics (pfoa_dock.prmtop and pfoa_dock.rst7).

Use this command to run the script:


              tleap -s -f tleap_f.in

Running the Simulation

Minimization (min.in)

The first step in the MD simulation process is minimization. We’ll perform an initial minimization to relieve any steric clashes or bad contacts between atoms in the system. Here, we use the igb=1 parameter to specify that we’re running a simulation of an implicit solvent environment.

Minimization Input (min.in)


              &cntrl
              imin=1, maxcyc=200, ncyc=50,
              cut=16, ntb=0, igb=1,
              &end

maxcyc=200: The maximum number of minimization cycles.

ncyc=50: The switch point for using steepest descent and conjugate gradient methods.

igb=1: Generalized Born implicit solvent model.

cut=16: Non-bonded cutoff distance (16 Å).

To run this simulation, use this command:


              pmemd.cuda -O -i min.in -o min.out -p protein_solvated.prmtop -c protein_solvated.inpcrd -r min.rst -ref protein_solvated.inpcrd

Production (prod.in)

After minimization, we will perform the production molecular dynamics (MD) run. This is just a longer equilibration. In this step, we will keep the system at 300 K and allow it to equilibrate. We will use a timestep of 2 fs, Langevin dynamics (ntt=3), and the igb=1 implicit solvation model.

Production Input (prod.in)

Production MD run


              &cntrl
              imin=0, irest=0,
              nstlim=50000, dt=0.002, ntc=2, ntf=2,
              ntpr=500, ntwx=500,
              cut=16, ntb=0, igb=1,
              ntt=3, gamma_ln=1.0,
              tempi=0.0, temp0=300.0,
              &end

nstlim=50000: Number of steps (corresponds to 100 ps for a 2 fs timestep).

ntpr=500, ntwx=500: Write output and trajectory files every 500 steps.

igb=1: Implicit solvent (Generalized Born model).

ntt=3, gamma_ln=1.0: Langevin thermostat with a collision frequency of 1.0 ps⁻¹./p>

temp0=300.0: Target temperature (300 K).

cut=16: Non-bonded cutoff distance (16 Å).

To run this step, use the following pmemd.cuda command:


              -O -i prod.in -o prod.out -p pfoa_dock.prmtop -c pfoa_dock.rst7 -r prod.rst -x prod.nc

This will generate the MD trajectory (prod.nc) and output the final restart file (prod.rst).

Extracting Structures

After running the MD simulation, we can use the following input files to extract the final structures of the protein, ligand, and the entire complex.

Extract Full Complex (extract.in)


              parm pfoa_dock.prmtop
              trajin prod.nc
              trajout final_structure.pdb pdb onlyframes 100
              run

The following command extracts the entire complex at the 100th frame from the trajectory file.


              cpptraj -i extract.in

Extract Ligand (extractLig.in)


              parm pfoa_dock.prmtop
              trajin prod.nc
              strip !:UNL
              trajout final_ligand.pdb pdb onlyframes 100
              run

This command strips out everything except the ligand (UNL) and writes it to final_ligand.pdb.


              cpptraj -i extractLig.in

Extract Protein (extractProt.in)


              parm pfoa_dock.prmtop
              trajin prod.nc
              strip :UNL
              trajout final_protein.pdb pdb onlyframes 100
              run

This strips out ligand and writes the protein-only structure to final_protein.pdb.


              cpptraj -i extractProt.in

Examination of structures

The resulting PDBs from the extraction were viewed visually in ChimeraX, and it was confirmed that PFOA was still in the binding pocket.

Conclusion

It is important to note that this simulation was performed to confirm that the docking of PFOA to FAB was correct. However, not all proper procedures were followed, such as separating the production and equilibration, using implicit instead of explicit simulations, and shortened simulation times.

Documentation for MD MMPBSA Pipeline

Input / Output files are in the Software Tools Repository (including the pipeline!).
https://gitlab.igem.org/2024/software-tools/gcm-ky

This document explains how to set up, run, and understand the MMPBSA pipeline, which simulates and analyzes the interactions between proteins and ligands, such as the FAB-PFOA complex, using the Molecular Mechanics Poisson-Boltzmann Surface Area method. It also covers performing mutagenesis with ChimeraX using rotamers and how to source and run the pipeline.

Pipeline Overview

The script Simulation.sh automates a full molecular dynamics (MD) simulation pipeline, which uses an MPBSA analysis to calculate the binding affinity and give you a per-residue breakdown of the charge composition. The result from this pipeline can be pasted into an Excel sheet, and a pie graph will show which residues play the most significant role in the binding. This pipeline assumes you have already done the previous steps and obtained a pfoa.mol2 and a pfoa.frcmod.

Prerequisites

Before running the pipeline, ensure the following are installed:

AmberTools/AMBER (with CUDA support for pmemd.cuda).
MPI (for MMPBSA parallelization).
Python 3 (for data analysis scripts).
ChimeraX (for mutagenesis).

Making the Script Executable

To run the pipeline, follow these steps:

Make the Simulation.sh executable:


              chmod +x Simulation.sh

Run the script with the required arguments:


              -r receptor.pdb: Path to the receptor PDB file.
              -l ligand.pdb: Path to the ligand PDB file.
              -c complex.pdb: Path to the complex PDB file (protein-ligand complex).
              -n output_directory: Directory where output files will be stored.
              -j num_cores: Number of cores to use for MMPBSA calculations.

Example Usage:


              ./Simulation.sh -r protein.pdb -l ligand.pdb -c complex.pdb -n fab_pfoa_simulation -j 8

This creates an output directory, prepares input files, runs the MD simulation, and performs binding energy analysis.

Explanation of Input Files

The pipeline uses several key .in files for different stages of the simulation. These files control the simulation parameters, so understanding what each parameter does is critical.

min.in (Minimization)


              &cntrl
              imin=1, maxcyc=1000, ncyc=500,
              cut=10.0, ntb=1,
              ntc=2, ntf=2,
              ntpr=100,
              ntr=1, restraintmask=':1-376', restraint_wt=2.0

imin=1: Runs minimization (0 would run MD).

maxcyc=1000: Maximum number of minimization cycles.

ncyc=500: The first 500 cycles are steepest descent; the remaining are conjugate gradient.

cut=10.0: Cutoff distance for nonbonded interactions (10 Å).

ntb=1: Constant volume periodic boundary conditions (no pressure coupling).

ntc=2, ntf=2: Bonds involving hydrogen atoms are constrained with SHAKE.

npr=100: Print the output every 100 steps.

ntr=1: Apply harmonic restraints.

restraintmask=':1-376': Apply restraints to residues 1 to 376.

restraint_wt=2.0: Restraint weight of 2 kcal/mol/Å².

heat.in (Heating)


              &cntrl
              imin=0, irest=0, ntx=1,
              nstlim=25000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=1,
              ntpr=500, ntwx=500,
              ntt=3, gamma_ln=2.0,
              tempi=0.0, temp0=300.0, ig=-1,
              ntr=1, restraintmask=':1-376', restraint_wt=2.0,
              nmropt=1
              /
              &wt TYPE='TEMP0', istep1=0, istep2=25000,
              value1=0.1, value2=300.0, 
              /
              &wt TYPE='END'
              /

imin=0: No minimization; we are running MD.

irest=0, ntx=1: No restart; start from initial coordinates.

nstlim=25000, dt=0.002: 25,000 steps with a 2 fs timestep.

ntt=3, gamma_ln=2.0: Langevin thermostat with a collision frequency of 2 ps⁻¹.

temp0=300.0: Target temperature is 300 K.

ntr=1, restraintmask=':1-376': Restraints on residues 1 to 376.

Temperature ramping: Temperature is ramped from 0.1 K to 300 K over 25,000 steps.

density.in (Density Equilibration)


              &cntrl
              imin=0, irest=1, ntx=5,
              nstlim=25000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=2, ntp=1, taup=1.0,
              ntpr=500, ntwx=500,
              ntt=3, gamma_ln=2.0,
              temp0=300.0, ig=-1,ntr=1, restraintmask=':1-376', restraint_wt=2.0,
              /

ntb=2: Constant pressure periodic boundary conditions.

ntp=1: Isotropic pressure scaling.

taup=1.0: Pressure relaxation time of 1 ps.

Other conditions similar to Heating.

equil.in (Equilibration)


              &cntrl
              imin=0, irest=1, ntx=5,
              nstlim=250000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=2, ntp=1, taup=2.0,
              ntpr=1000, ntwx=1000,
              ntt=3, gamma_ln=2.0,
              temp0=300.0, ig=-1,
              /

This is a longer equilibration step, with 250,000 steps (500 ps).

prod.in (Production)


              &cntrl
              imin=0, irest=1, ntx=5,
              nstlim=10000000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=2, ntp=1, taup=2.0,
              ntpr=5000, ntwx=5000,
              ntt=3, gamma_ln=2.0,
              temp0=300.0, ig=-1,
              /

nstlim=10000000, dt=0.002: Production run with 10 million steps, totaling 20 ns of MD simulation.

mmpbsa.in (MMPBSA Calculation)


              &general
              startframe=601, endframe=2000, keep_files=2,
              /
              &pb
              istrng=0.100,
              /
              &decomp
              idecomp=1,
              /

startframe=601, endframe=2000: Start MMPBSA calculations from frame 601 to 2000.

istrng=0.100: Ionic strength is 0.1 M.

idecomp=1: Perform per-residue decomposition analysis.

Mutagenesis Using ChimeraX with Rotamers

Mutating residues in ChimeraX can be done as follows:

Open ChimeraX and load the structure:


              open structure.pdb

Select the residue to mutate: In ChimeraX, click on the residue you want to mutate or use a selection command:


              select :<residue_number>

Perform the mutation using rotamers:


              swapaa <new_residue> :<residue_number> rotamer

For example, to mutate residue 45 to Alanine:


              swapaa ala :45 rotamer

Save the mutated structure:


              save mutated_structure.pdb

After mutating, the new structure can be run through the same pipeline using the Simulation.sh script.

Conclusion

Multiple simulations can be run on this with a few minor changes, with the main change required being the restraint mask. For example, I have used this pipeline to simulate FAB-GFP and PFOA, mutations of FAB-GFP and PFOA: ILE 308 → ARG, SER 349 → THR, and even for Palmitic Acid (the natural ligand to FAB) and FAB-GFP. However, this step required me to run palmitic acid through antechamber to get a forcefield for it. After each pipeline use, I pasted the resulting data into Excel and made a pie graph. This helps me see which residues contribute most to the ligand binding to the protein.

Eberhardt, J., Santos-Martins, D., Tillack, A.F., Forli, S. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling.
Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461.
D.A. Case, H.M. Aktulga, K. Belfon, I.Y. Ben-Shalom, J.T. Berryman, S.R. Brozell, D.S. Cerutti, T.E. Cheatham, III, G.A. Cisneros, V.W.D. Cruzeiro, T.A. Darden, N. Forouzesh, M. Ghazimirsaeed, G. Giambaşu, T. Giese, M.K. Gilson, H. Gohlke, A.W. Goetz, J. Harris, Z. Huang, S. Izadi, S.A. Izmailov, K. Kasavajhala, M.C. Kaymak, A. Kovalenko, T. Kurtzman, T.S. Lee, P. Li, Z. Li, C. Lin, J. Liu, T. Luchko, R. Luo, M. Machado, M. Manathunga, K.M. Merz, Y. Miao, O. Mikhailovskii, G. Monard, H. Nguyen, K.A. O'Hearn, A. Onufriev, F. Pan, S. Pantano, A. Rahnamoun, D.R. Roe, A. Roitberg, C. Sagui, S. Schott-Verdugo, A. Shajan, J. Shen, C.L. Simmerling, N.R. Skrynnikov, J. Smith, J. Swails, R.C. Walker, J. Wang, J. Wang, X. Wu, Y. Wu, Y. Xiong, Y. Xue, D.M. York, C. Zhao, Q. Zhu, and P.A. Kollman (2024), Amber 2024, University of California, San Francisco.
D.A. Case, H.M. Aktulga, K. Belfon, D.S. Cerutti, G.A. Cisneros, V.W.D. Cruz eiro, N. Forouzesh, T.J. Giese, A.W. Götz, H. Gohlke, S. Izadi, K. Kasavajhala, M.C. Kaymak, E. King, T. Kurtzman, T.-S. Lee, P. Li, J. Liu, T. Luchko, R. Luo, M. Manathunga, M.R. Machado, H.M. Nguyen, K.A. O’Hearn, A.V. Onufriev, F. Pan, S. Pantano, R. Qi, A. Rahnamoun, A. Risheh, S. Schott-Verdugo, A. Shajan, J. Swails, J. Wang, H. Wei, X. Wu, Y. Wu, S. Zhang, S. Zhao, Q. Zhu, T.E. Cheatham III, D.R. Roe, A. Roitberg, C. Simmerling, D.M. York, M.C. Nagan*, and K.M. Merz Jr.* AmberTools. J. Chem. Inf. Model. 63, 6183-6191 (2023).

Umbrella Sampling

Created by Edward Kim

Any files used or created by OpenMM or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.

https://gitlab.igem.org/2024/software-tools/gcm-ky

Dependencies

You need to install the Conda dependencies this script uses.

Conda
Pip
OpenMM

Conda Setup

To set up the Conda environment, type the following command. Replace PATHTOPACAKGELISTTXT with the path to the environment.yml file:

conda env create -f PATHTOPACAKGELISTTXT

To activate the environment:

conda activate testenv

Add Permissions for Wham Subprocess

MacOS

To add executable permissions, run the following commands:

              cd path_to_scoresolver 
                cd wham 
                chmod +x wham

To verify:

ls -l wham

If you see an error:

chmod: wham: No such file or directory

Compile wham.c using:

make

Program Flow

Troubleshooting

If the histograms have poor overlap, you will need to use more windows and/or reduce the spring constant.

The windows do not need to be linearly spaced, and they can have different spring constants.

Theory

Definitions

Important terms used:

Equations

Adding Biasing

It acts a spring force with strength $k $ that restrains the system simulation to state ξ_i, the center of window i. For example, if the system at ξ is a bout to get away from ξ_i, a bias force will force the system to get back to the window, resisting normal eneergy landscape activity. If the strength of k is weak, the system may be able to explore a little more; contrastly, if k too strong, it may not explore enough of each window, so a balance is required. There are multiple windows i = 1, 2, 3…n that each sample sufficient amounts of the free energy landscape. This is the main premise of umbrella sampling that separates from other MD simulations.
NOTE: each quantity discussed previously must have biased version indicated by an apostrophe superscript bias. After computing/simulating the system with biasing potentials, we subtract the biased results by the biasing potential and retrive the true energy curve.

Calculating Binding Constant of a Ligand-Receptor System from the Free Energy Curve

Selecting Parameters

However, the pivot indexes can be a little tricky to determine. But, the best way to start off is by identifying your objective and looking at the starting PDB to get a glipmse at what you’re working with.

Generally Pick the strongest parts of the Complex

1. For proteins: C-alpha atoms are picked because they're usually the backbone atoms that represent the structure very well, so stretching that wont break the system entirely (they are strong enough to stretch).
2. For nucleic acids: Phosphate atoms or specific base atoms (because again, they're the strongest backbones).
3. For small molecules or ligands: Select atoms that are involved in key interactions or are structurally significant. If you are trying to get the binding affinity of a docked complex (which is what I am trying to do in this case,) I would select the center of mass of the ligand, and the key atom in a resiude in the binding pocket of the receptor.

As stored in the

./gh_scoresolver/tests/pdb/ahl_dock_luxr_1.pdb

PDB file, we have a ligand protein ahl docked into a receptor protein luxr.

Credits

Sources used:

Credits

Sources used: