OUR METHODS OF ARTIFICIAL SIMULATION OF MOLECULES AND THEIR INTERACTIONS
Created by Vishwaa Kannan
Any files used or created by Amber or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.
https://gitlab.igem.org/2024/software-tools/gcm-ky
The error with the previous year’s simulation was how I set Matplotlib to parse the data files. This was the previous code:
step = data[:,0] potential_energy = data[:,1] temperature = data[:,2] volume = data[:,3] kinetic_energy = data[:,4] total_energy = data[:,5]
The order was corrected to parse the data. This is the corrected code:
step = data[:,0] potential_energy = data[:,1] temperature = data[:,4] volume = data[:,5] kinetic_energy = data[:,2] total_energy = data[:,3]
The graph part was re-run to obtain the correct graphs for pfoa luxr:
The purpose of docking FAB and PFOA is to confirm that FAB works. We tested FAB in the lab and did not observe any noticeable fluorescence. So, we have docked these two into a complex, which we used for further calculations down the road.
We used AutoDock Vina on Windows, installed via their MSI installer.
https://vina.scripps.edu/downloads/
We used Alpha Fold 3 to fold the FAB sequence into a Protein Data Bank (PDB). Below is the URL and sequence we used.
MSFSGKYQLQSQENFEAFMKAIGLPEELIQKGKDIKGVSE
IVQNGKHFKFTITAGSGSGSHNVYIMADKQKNGIKVNFKI
RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSAL
SKDPNEKRDHMVLLEFVTAAGITHGMDELYKGGTGGSMRK
GEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLT
LKFICTTGKLPVPWPTLVTTFGYGVQCFARYPDHMKQHDF
FKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRI
ELKGIDFKEDGNILGHKLEYNYNGGKVIQNEFTVGEECEL
ETMTGEKVKTVVQLEGDNKLVTTFKNIKSVTELNGDIITN
TMTLGDIVFKRISKRI
We also retrieved the Structured Data File (SDF) for PFOA from PubChem by downloading the 3D conformer here:
https://pubchem.ncbi.nlm.nih.gov/compound/9554\#section=2D-Structure
In a molecule viewer application, we used Chimera X (which can be installed via their app). We opened the FAB PDB, added hydrogens, and saved the output file as a PDB.
Tools > Structure Editing > Add Hydrogens > Also consider H-bonds (slower) > Ok
Then, in a new window, we opened the SDF for PFOA, deleted the hydrogen, and saved the output file as a PDB. This is essential, as PFOA is naturally deprotonated.
Select > Chemistry > Element > H
Actions > Atoms/Bonds > Delete
We then need to save our PDB files as AutoDock Vina files (PDBQT); this step is relatively simple.
To start AutoDock Vina, type “AutoDock Tools” in the search bar and run the application. After a short period, the GUI will load. Load the receptor PDB into AutoDock Vina by dragging it into the application. Then follow these commands to edit and save the receptor (FAB) file:
Edit > Hydrogens > Add > Polar only > noBondOrder > yes
Edit > Charges > Add Kollman Charges > Ok
Grid > Macromolecule > Choose > FAB > Select Molecule > Ok > Save file<
Next, we will prepare the ligand with these commands:
Right-click FAB > Delete
Drag Ligand PDB file into AutoDock Vina
Ligand > Input > Choose > PFOA
Ligand > Output > save > Ok
Next, delete PFOA from AutoDock Vina via:
Right-click PFOA > Delete
We must select the domain to which AutoDock Vina will attempt to dock PFOA.
Drag the .pdbqt file for the FAB and PFOA into a new AutoDock Vina Window. It is essential not to overwrite charges; click “ok” and keep the charges when importing.
Grid > Macromolecule > choose > FAB
Ligand > Input > Choose > PFOA
Grid > Gridbox
X-size: 60
Y-size: 60
Z-size: 60
X-center: 3.660
Y-center: -11.488
Z-center: 16.008
This should show a red, blue, and green box that occupies the binding pocket for FAB
Next, we will complete the docking procedure, open the command prompt, and type the following commands:
cd path/to/files/
touch config.txt
nano config.txt
receptor = fab.pdbqt
ligand = pfoa.pdbqt
center_x = 3.660
center_y = -11.488
center_z = 16.008
size_x = 60
size_y = 60
size_z = 60
energy_range = 20
exhaustiveness = 500
{ctrl + x}
Y
{enter}
Docking is an exhaustive study, so AutoDock Vina will try however many positions you tell it to and pick the best 10. The energy range represents the difference in binding affinity (Kcal/mol) between the best and the worst poses.
Finally, we can run AutoDock Vina. Find the path to Vina in your file manager and enter the following command into Command Prompt.
“c:path/to/vina” –receptor protein.pdbqt –ligand ligand.pdbqt –config config.txt –log log.txt –out output.pdbqt
To extract the best pose, open the output.pdbqt file in Chimera X and select the first model at the bottom right of the window. Then click save, saving only the selected model. You should now have an output. PDB file with your complex.
Input / Output files are located in the Software Tools Repository
https://gitlab.igem.org/2024/software-tools/gcm-ky
Now that we have an approximate dock of PFOA and FAB, we need to simulate it to ensure the bind is stable. To do this, we used Amber. Amber can be installed by following their guide on the website and their PDF document here:
https://ambermd.org/doc12/Amber24.pdf
Since we are just determining if the bind is stable, we simulated the complex in an implicit solvent system, as this system is less accurate but very efficient with resources.
Since we are working with the PFOA ligand (perfluorooctanoic acid), we must generate its charges and force field parameters. PFOA is deprotonated and thus should have a net charge of -1. Using the following antechamber commands, we will create a .mol2 file for PFOA and assign AM1-BCC charges suitable for small organic molecules in AMBER simulations. parmchk2 is then used to generate missing force field parameters.
antechamber -i pfoa.pdb -fi pdb -o pfoa.mol2 -fo mol2 -c bcc -nc -1 -s 2
parmchk2 -i pfoa.mol2 -f mol2 -o pfoa.frcmod
This creates two important files:
To generate the necessary topology (prmtop) and coordinate (rst7) files, we will use LEaP, specifying the appropriate force fields and water model.
LEaP Input (leap_s.in)
source leaprc.protein.ff19SB
source leaprc.gaff2
source leaprc.water.opc
UNL = loadmol2 pfoa.mol2
check UNL
loadamberparams pfoa.frcmod
saveoff UNL unl.lib
saveamberparm UNL pfoa.prmtop pfoa.rst7
quit
Force fields:
ff19SB: For the protein.
gaff2: General force field for organic molecules, like PFOA.
water.opc: For the water model used in implicit solvation.
This script prepares the prmtop and rst7 files for PFOA, which we’ll later combine with the protein for docking.
To run the script, use the following command:
tleap -s -f tleap_s.in
Now that we have the ligand’s force field parameters, we’ll combine it with the protein to form the PFOA-FAB complex.
LEaP Input (tleap_f.in)
source leaprc.protein.ff19SB
source leaprc.gaff2
source leaprc.water.opc
# Load the ligand
UNL = loadmol2 pfoa.mol2
loadamberparams pfoa.frcmod
loadoff unl.lib
# Load the protein
protein = loadpdb justFABnoHYDRO.pdb
# Combine the protein and ligand
complex = combine {protein UNL}
# Save the combined complex
saveamberparm complex pfoa_dock.prmtop pfoa_dock.rst7
savepdb complex pfoa_dock.pdb
quit
This script loads the ligand (pfoa.mol2) and the protein (justFABnoHYDRO.pdb), combines them into a single complex, and generates the topology and coordinate files needed for molecular dynamics (pfoa_dock.prmtop and pfoa_dock.rst7).
Use this command to run the script:
tleap -s -f tleap_f.in
The first step in the MD simulation process is minimization. We’ll perform an initial minimization to relieve any steric clashes or bad contacts between atoms in the system. Here, we use the igb=1 parameter to specify that we’re running a simulation of an implicit solvent environment.
Minimization Input (min.in)
&cntrl
imin=1, maxcyc=200, ncyc=50,
cut=16, ntb=0, igb=1,
&end
maxcyc=200: The maximum number of minimization cycles.
ncyc=50: The switch point for using steepest descent and conjugate gradient methods.
igb=1: Generalized Born implicit solvent model.
cut=16: Non-bonded cutoff distance (16 Å).
To run this simulation, use this command:
pmemd.cuda -O -i min.in -o min.out -p protein_solvated.prmtop -c protein_solvated.inpcrd -r min.rst -ref protein_solvated.inpcrd
After minimization, we will perform the production molecular dynamics (MD) run. This is just a longer equilibration. In this step, we will keep the system at 300 K and allow it to equilibrate. We will use a timestep of 2 fs, Langevin dynamics (ntt=3), and the igb=1 implicit solvation model.
Production Input (prod.in)
Production MD run
&cntrl
imin=0, irest=0,
nstlim=50000, dt=0.002, ntc=2, ntf=2,
ntpr=500, ntwx=500,
cut=16, ntb=0, igb=1,
ntt=3, gamma_ln=1.0,
tempi=0.0, temp0=300.0,
&end
nstlim=50000: Number of steps (corresponds to 100 ps for a 2 fs timestep).
ntpr=500, ntwx=500: Write output and trajectory files every 500 steps.
igb=1: Implicit solvent (Generalized Born model).
ntt=3, gamma_ln=1.0: Langevin thermostat with a collision frequency of 1.0 ps⁻¹./p>
temp0=300.0: Target temperature (300 K).
cut=16: Non-bonded cutoff distance (16 Å).
To run this step, use the following pmemd.cuda command:
-O -i prod.in -o prod.out -p pfoa_dock.prmtop -c pfoa_dock.rst7 -r prod.rst -x prod.nc
This will generate the MD trajectory (prod.nc) and output the final restart file (prod.rst).
After running the MD simulation, we can use the following input files to extract the final structures of the protein, ligand, and the entire complex.
Extract Full Complex (extract.in)
parm pfoa_dock.prmtop
trajin prod.nc
trajout final_structure.pdb pdb onlyframes 100
run
The following command extracts the entire complex at the 100th frame from the trajectory file.
cpptraj -i extract.in
Extract Ligand (extractLig.in)
parm pfoa_dock.prmtop
trajin prod.nc
strip !:UNL
trajout final_ligand.pdb pdb onlyframes 100
run
This command strips out everything except the ligand (UNL) and writes it to final_ligand.pdb.
cpptraj -i extractLig.in
Extract Protein (extractProt.in)
parm pfoa_dock.prmtop
trajin prod.nc
strip :UNL
trajout final_protein.pdb pdb onlyframes 100
run
This strips out ligand and writes the protein-only structure to final_protein.pdb.
cpptraj -i extractProt.in
The resulting PDBs from the extraction were viewed visually in ChimeraX, and it was confirmed that PFOA was still in the binding pocket.
It is important to note that this simulation was performed to confirm that the docking of PFOA to FAB was correct. However, not all proper procedures were followed, such as separating the production and equilibration, using implicit instead of explicit simulations, and shortened simulation times.
Input / Output files are in the Software Tools Repository (including the pipeline!).
https://gitlab.igem.org/2024/software-tools/gcm-ky
This document explains how to set up, run, and understand the MMPBSA pipeline, which simulates and analyzes the interactions between proteins and ligands, such as the FAB-PFOA complex, using the Molecular Mechanics Poisson-Boltzmann Surface Area method. It also covers performing mutagenesis with ChimeraX using rotamers and how to source and run the pipeline.
The script Simulation.sh automates a full molecular dynamics (MD) simulation pipeline, which uses an MPBSA analysis to calculate the binding affinity and give you a per-residue breakdown of the charge composition. The result from this pipeline can be pasted into an Excel sheet, and a pie graph will show which residues play the most significant role in the binding. This pipeline assumes you have already done the previous steps and obtained a pfoa.mol2 and a pfoa.frcmod.
Before running the pipeline, ensure the following are installed:
To run the pipeline, follow these steps:
Make the Simulation.sh executable:
chmod +x Simulation.sh
Run the script with the required arguments:
-r receptor.pdb: Path to the receptor PDB file.
-l ligand.pdb: Path to the ligand PDB file.
-c complex.pdb: Path to the complex PDB file (protein-ligand complex).
-n output_directory: Directory where output files will be stored.
-j num_cores: Number of cores to use for MMPBSA calculations.
./Simulation.sh -r protein.pdb -l ligand.pdb -c complex.pdb -n fab_pfoa_simulation -j 8
This creates an output directory, prepares input files, runs the MD simulation, and performs binding energy analysis.
The pipeline uses several key .in files for different stages of the simulation. These files control the simulation parameters, so understanding what each parameter does is critical.
min.in (Minimization)
&cntrl
imin=1, maxcyc=1000, ncyc=500,
cut=10.0, ntb=1,
ntc=2, ntf=2,
ntpr=100,
ntr=1, restraintmask=':1-376', restraint_wt=2.0
imin=1: Runs minimization (0 would run MD).
maxcyc=1000: Maximum number of minimization cycles.
ncyc=500: The first 500 cycles are steepest descent; the remaining are conjugate gradient.
cut=10.0: Cutoff distance for nonbonded interactions (10 Å).
ntb=1: Constant volume periodic boundary conditions (no pressure coupling).
ntc=2, ntf=2: Bonds involving hydrogen atoms are constrained with SHAKE.
npr=100: Print the output every 100 steps.
ntr=1: Apply harmonic restraints.
restraintmask=':1-376': Apply restraints to residues 1 to 376.
restraint_wt=2.0: Restraint weight of 2 kcal/mol/Ų.
heat.in (Heating)
&cntrl
imin=0, irest=0, ntx=1,
nstlim=25000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=1,
ntpr=500, ntwx=500,
ntt=3, gamma_ln=2.0,
tempi=0.0, temp0=300.0, ig=-1,
ntr=1, restraintmask=':1-376', restraint_wt=2.0,
nmropt=1
/
&wt TYPE='TEMP0', istep1=0, istep2=25000,
value1=0.1, value2=300.0,
/
&wt TYPE='END'
/
imin=0: No minimization; we are running MD.
irest=0, ntx=1: No restart; start from initial coordinates.
nstlim=25000, dt=0.002: 25,000 steps with a 2 fs timestep.
ntt=3, gamma_ln=2.0: Langevin thermostat with a collision frequency of 2 ps⁻¹.
temp0=300.0: Target temperature is 300 K.
ntr=1, restraintmask=':1-376': Restraints on residues 1 to 376.
Temperature ramping: Temperature is ramped from 0.1 K to 300 K over 25,000 steps.
density.in (Density Equilibration)
&cntrl
imin=0, irest=1, ntx=5,
nstlim=25000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=1.0,
ntpr=500, ntwx=500,
ntt=3, gamma_ln=2.0,
temp0=300.0, ig=-1,ntr=1, restraintmask=':1-376', restraint_wt=2.0,
/
ntb=2: Constant pressure periodic boundary conditions.
ntp=1: Isotropic pressure scaling.
taup=1.0: Pressure relaxation time of 1 ps.
Other conditions similar to Heating.
equil.in (Equilibration)
&cntrl
imin=0, irest=1, ntx=5,
nstlim=250000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=2.0,
ntpr=1000, ntwx=1000,
ntt=3, gamma_ln=2.0,
temp0=300.0, ig=-1,
/
This is a longer equilibration step, with 250,000 steps (500 ps).
prod.in (Production)
&cntrl
imin=0, irest=1, ntx=5,
nstlim=10000000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=2.0,
ntpr=5000, ntwx=5000,
ntt=3, gamma_ln=2.0,
temp0=300.0, ig=-1,
/
nstlim=10000000, dt=0.002: Production run with 10 million steps, totaling 20 ns of MD simulation.
mmpbsa.in (MMPBSA Calculation)
&general
startframe=601, endframe=2000, keep_files=2,
/
&pb
istrng=0.100,
/
&decomp
idecomp=1,
/
startframe=601, endframe=2000: Start MMPBSA calculations from frame 601 to 2000.
istrng=0.100: Ionic strength is 0.1 M.
idecomp=1: Perform per-residue decomposition analysis.
Mutating residues in ChimeraX can be done as follows:
Open ChimeraX and load the structure:
open structure.pdb
Select the residue to mutate: In ChimeraX, click on the residue you want to mutate or use a selection command:
select :<residue_number>
Perform the mutation using rotamers:
swapaa <new_residue> :<residue_number> rotamer
For example, to mutate residue 45 to Alanine:
swapaa ala :45 rotamer
Save the mutated structure:
save mutated_structure.pdb
After mutating, the new structure can be run through the same pipeline using the Simulation.sh script.
Multiple simulations can be run on this with a few minor changes, with the main change required being the restraint mask. For example, I have used this pipeline to simulate FAB-GFP and PFOA, mutations of FAB-GFP and PFOA: ILE 308 → ARG, SER 349 → THR, and even for Palmitic Acid (the natural ligand to FAB) and FAB-GFP. However, this step required me to run palmitic acid through antechamber to get a forcefield for it. After each pipeline use, I pasted the resulting data into Excel and made a pie graph. This helps me see which residues contribute most to the ligand binding to the protein.
Created by Edward Kim
Any files used or created by OpenMM or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.
https://gitlab.igem.org/2024/software-tools/gcm-ky
You need to install the Conda dependencies this script uses.
To set up the Conda environment, type the following command. Replace PATHTOPACAKGELISTTXT
with
the path to the environment.yml file:
conda env create -f PATHTOPACAKGELISTTXT
To activate the environment:
conda activate testenv
To add executable permissions, run the following commands:
cd path_to_scoresolver
cd wham
chmod +x wham
To verify:
ls -l wham
If you see an error:
chmod: wham: No such file or directory
Compile wham.c
using:
make
If the histograms have poor overlap, you will need to use more windows and/or reduce the spring constant.
The windows do not need to be linearly spaced, and they can have different spring constants.
Important terms used:
It acts a spring force with strength $k $ that restrains the system
simulation to state ξi, the center
of window i. For example, if
the system at ξ is a bout to
get away from ξi, a bias force
will force the system to get back to the window, resisting normal
eneergy landscape activity. If the strength of k is weak, the
system may be able to
explore a little more; contrastly, if k too strong, it may not
explore
enough of each window, so a balance is required. There are multiple
windows i = 1, 2, 3…n
that each sample sufficient amounts of the free energy landscape. This
is the main premise of umbrella sampling that separates from other MD
simulations.
NOTE: each quantity discussed previously must have biased version
indicated by an apostrophe superscript bias. After computing/simulating
the system
with biasing potentials, we subtract the biased results by the biasing
potential and retrive the true energy curve.
However, the pivot indexes can be a little tricky to determine. But, the best way to start off is by identifying your objective and looking at the starting PDB to get a glipmse at what you’re working with.
Generally Pick the strongest parts of the Complex
1. For proteins: C-alpha atoms are picked because they're usually the backbone atoms that represent the structure very well, so stretching that wont break the system entirely (they are strong enough to stretch).As stored in the
./gh_scoresolver/tests/pdb/ahl_dock_luxr_1.pdb
PDB file, we have a ligand protein ahl docked into a receptor protein luxr.