Model

Overview

Modeling is a pivotal tool for understanding complex biological systems, it serves not only as a means to predict or interpret experimental data but also as a bridge for designing experiments and comprehending biological processes. In our project, we have deepened our understanding of the v8 protease mechanism and optimized our experimental design by constructing and analyzing models. Modeling also enhances the discovery of effective therapeutic agents and accelerates our experimental approach.

Model 1: Ordinary Differential Equations (ODE)

Our first model employs Ordinary Differential Equations (ODEs) to simulate the expression levels of the v8 protease and key components within the flipGFP system, specifically focusing on β1-9, β10-E5-β11-K5. This model is crucial for understanding the dynamics of protein expression and its impact on receptor cleavage. By using ODEs, we can predict the behavior of these proteins in E.coli , which aids in the design of our therapeutic interventions.

Model 2: Molecular Simulation

For our second model, we utilized Homology Modeling and Molecular Dynamics techniques. Initially, we performed homology modeling to obtain the structure of β10-E5-β11-K5, and then used AlphaFold 2 to predict the structure of β1-β9. This step was followed by protein preparation (using Schrodinger) and molecular dynamics simulation (using GROMACS), which provided insights into the protein’s behavior at the atomic level. This model is instrumental in understanding the flip-GFP system.

Model 3: Relative Fluorescence Model

The third model, which is the Relative Fluorescence Model, was developed to quantify the relationship between relative fluorescence and time, for relative fluorescence intensity is a function of the concentration of flipCherrysf while the concentration of flipCherrysf is also a function of time. By implying this model, we are able to predict the change of fluorescence intensity with time and examine if the self-packing ODE model is correct.

Model 4: Virtual Screening

Our final model involves Virtual Screening, where we searched the ChemDiv library for small molecules that can inhibit the v8 protease. This in silico approach allows us to narrow down potential candidates without the need for laborious and costly experimental trials. The model’s output guides our experimental design by predicting which compounds are most likely to be effective inhibitors.

ODE

1. Description of the system

In our experiment, we express three proteins in E.coli, namely V8 protease, beta1-9, and beta10-11. When the bacteria started protein expression, V8 protease cleaved beta10-11 into bata10-11cleavaged, and then bata1-9 and beta10-11cleavaged self-assembled into Flipcherry. Among them, the analysis of metabolism of these three proteins and their interaction is of great importance. Therefore, we use ordinary differential equation model to simulate and analyze the metabolism and interaction of these three proteins. The metabolic equations of these three proteins is a function of the concentration of the protein itself and the mRNA, which is a little bit complicated.1

An alternative model can be obtained by assuming that the mRNA dynamics are extremely fast when compared to the protein dynamics and hence reach their equilibrium instantly. Therefore, the dynamics can be described in just 1 variables, namely the concentration of protein.

The interactions among these proteins can be described as the following chemical formulas.

betaX1011VX8 proteasebetaX1011cleavaged\ce{beta_{10-11} ->[V8 protease] }{\ce{beta_{10-11cleavaged}} }
betaX1011cleavaged+betaX19Flipcherry\ce{beta_{10-11cleavaged} +beta_{1-9} -> }{\ce{Flipcherry} }

After V8 protease can cut beta1-9 into beta1-9cut, the structure of beta1-9cut would alter. This process can be described by Hill function mathematically. After this, beta1-9cut combines with beta10-11 to form Flipcherry by self-assembling, which can be considered as a reversible chemical reaction.

We also make some assumptions to build ODE model.

  1. The dynamics of mRNA is extremely fast.

  2. The degradation of b10-11-tev-cleavaged is slow

  3. The self-packing of Flipcherry can be considered as a standard reversible reaction.

  4. Neglect special effect and the concentration of each component is equal everywhere.

Therefore, we have the ODE models as following.

dcv8proteasedt=kcat1km1cv8proteasecv8protease+km1ka1cv8proteasekcat3cv8proteasecβ1011cβ1011+km3dcβ19dt=kcat4km4cβ19cβ19+km4ka4cβ19kcat5cβ19cβ1011cleavaged+kcat6cFlipcherrydcβ1011dt=kcat2km2cβ1011cβ1011+km2ka2cβ1011kcat3cv8proteasecβ1011cβ1011+km3dcβ1011cleavageddt=kcat3cv8proteasecβ1011cβ1011kcat5cβ19cβ1011cleavaged+kcat6cFlipcherrydcFlipcherrydt=kcat5cβ19cβ1011cleavagedkcat6cFlipcherry\begin{align*} \frac{\mathrm{d} c_{v8 \, \text{protease}} }{\mathrm{d} t} &= \frac{\frac{k_{\text{cat1}} }{k_{\text{m1}} }c_{v8 \, \text{protease}}}{c_{v8 \, \text{protease}}+k_{\text{m1}}} - k_{\text{a1}}c_{v8 \, \text{protease}} - \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}+k_{\text{m3}} } \\ \frac{\mathrm{d} c_{\beta1-9} }{\mathrm{d} t} &= \frac{\frac{k_{\text{cat4}} }{k_{\text{m4}} }c_{\beta1-9}}{c_{\beta1-9}+k_{\text{m4}}} - k_{\text{a4}}c_{\beta1-9} - k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } + k_{\text{cat6}}c_{\text{Flipcherry}} \\ \frac{\mathrm{d} c_{\beta10-11} }{\mathrm{d} t} &= \frac{\frac{k_{\text{cat2}} }{k_{\text{m2}} }c_{\beta10-11}}{c_{\beta10-11}+k_{\text{m2}}} - k_{\text{a2}}c_{\beta10-11} - \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}+k_{\text{m3}} } \\ \frac{\mathrm{d} c_{\beta10-11\, \text{cleavaged}} }{\mathrm{d} t} &= \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}} - k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } + k_{\text{cat6}}c_{\text{Flipcherry}} \\ \frac{\mathrm{d} c_{\text{Flipcherry}} }{\mathrm{d} t} &= k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } - k_{\text{cat6}}c_{\text{Flipcherry}} \end{align*}

2. Conclusion

If we only concern about the concentration of components at steady state, then from the perspective of physics and mathematics, we have the following results.

kcat1km1cv8proteasecv8protease+km1ka1cv8proteasekcat3cv8proteasecβ1011cβ1011+km3=dcv8proteasedt=0kcat4km4cβ19cβ19+km4ka4cβ19kcat5cβ19cβ1011cleavaged+kcat6cFlipcherry=dcβ19dt=0kcat2km2cβ1011cβ1011+km2ka2cβ1011kcat3cv8proteasecβ1011cβ1011+km3=dcβ1011dt=0kcat3cv8proteasecβ1011cβ1011kcat5cβ19cβ1011cleavaged+kcat6cFlipcherry=dcβ1011cleavageddt=0kcat5cβ19cβ1011cleavagedkcat6cFlipcherry=dcFlipcherrydt=0\begin{align*} \frac{\frac{k_{\text{cat1}} }{k_{\text{m1}} }c_{v8 \, \text{protease}}}{c_{v8 \, \text{protease}}+k_{\text{m1}}} - k_{\text{a1}}c_{v8 \, \text{protease}} - \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}+k_{\text{m3}} } &= \frac{\mathrm{d} c_{v8 \, \text{protease}} }{\mathrm{d} t} = 0 \\ \frac{\frac{k_{\text{cat4}} }{k_{\text{m4}} }c_{\beta1-9}}{c_{\beta1-9}+k_{\text{m4}}} - k_{\text{a4}}c_{\beta1-9} - k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } + k_{\text{cat6}}c_{\text{Flipcherry}} &= \frac{\mathrm{d} c_{\beta1-9} }{\mathrm{d} t} = 0 \\ \frac{\frac{k_{\text{cat2}} }{k_{\text{m2}} }c_{\beta10-11}}{c_{\beta10-11}+k_{\text{m2}}} - k_{\text{a2}}c_{\beta10-11} - \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}+k_{\text{m3}} } &= \frac{\mathrm{d} c_{\beta10-11} }{\mathrm{d} t} = 0 \\ \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}} - k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } + k_{\text{cat6}}c_{\text{Flipcherry}} &= \frac{\mathrm{d} c_{\beta10-11\, \text{cleavaged}} }{\mathrm{d} t} = 0 \\ k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } - k_{\text{cat6}}c_{\text{Flipcherry}} &= \frac{\mathrm{d} c_{\text{Flipcherry}} }{\mathrm{d} t} = 0 \end{align*}

Therefore, we notice that cv8proteasec_{v8 protease}, cbeta19c_{beta1-9} and cbeta1011c_{beta10-11 } at steady state do not dependent on the concentration of other components and are only relevant to the parameters of the ODEs. Moreover, cbeta1011cleavagedc_{beta10-11cleavaged } at steady state is relevant to the exact process of their interactions, and cFlipcherryc_{Flipcherry} at steady state is kcat5cbeta19kcat6\frac{ k_{cat5}c_{beta1-9}}{ k_{cat6}} times bigger than cbeta1011cleavagedc_{beta10-11cleavaged }.

To take a deeper insight into the change of each component with time, we import the above 5 ODEs into Matlab and utilize the ODE45 numerical method to simulate the entire process.

Then to examine the impact of V8 protease competitive inhibitor, we modify the following ODEs, for _competitive inhibitor can increase the kmk_{m} of enzymatic reactions_.2

dcv8proteasedt=kcat1km1cv8proteasecv8protease+km1ka1cv8proteasekcat3cv8proteasecβ1011cβ1011+km3dcβ1011dt=kcat2km2cβ1011cβ1011+km2ka2cβ1011kcat3cv8proteasecβ1011cβ1011+km3\begin{align*} \frac{\mathrm{d} c_{v8 \, \text{protease}} }{\mathrm{d} t} &= \frac{\frac{k_{\text{cat1}} }{k_{\text{m1}} }c_{v8 \, \text{protease}}}{c_{v8 \, \text{protease}}+k_{\text{m1}}} - k_{\text{a1}}c_{v8 \, \text{protease}} - \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}+k_{\text{m3}} } \\ \frac{\mathrm{d} c_{\beta10-11} }{\mathrm{d} t} &= \frac{\frac{k_{\text{cat2}} }{k_{\text{m2}} }c_{\beta10-11}}{c_{\beta10-11}+k_{\text{m2}}} - k_{\text{a2}}c_{\beta10-11} - \frac{k_{\text{cat3}}c_{v8 \, \text{protease}}c_{\beta10-11} }{c_{\beta10-11}+k_{\text{m3}} } \end{align*}

As the figure revealed, inhibitor can decrease the concentration of Flipcherry at steady state, demonstrating that the purpose of our project is practical.

3. Model limitation

Although this model have verified the feasibility of our project, we did not conduct experiments to compare our theoretical prediction with the exact expression process.

Moreover, if the degradation of mRNA is sufficiently faster (say, at least 10 times) than the degradation of the corresponding protein, then the stationarity approximation is biologically justified. Note that, under certain conditions, it can be assumed that the stationarity approximation is only valid for some genes of the network. In that case, only those genes for which the assumption is valid can be modelled by a single equation for the protein concentration, while the rest will be associated to two equations, one for the mRNA and one for protein concentration.

Molecular Simulation

Molecular simulation aims to help us better understand the mechanism of flipGFP system. As the inventor of flipGFP has done similar jobs before, we adopt some parameters directly from the essay3 and change some of the tools they use. For example, using Schrodinger together with GROMACS instead of NAMD, which are more convenient.

Figure 2.1 The Description From The Essay (Too General)
Figure 2.1 The Description From The Essay (Too General)

This model is described in six parts:

1. Getting β10-E5-β11-K5 part: Homology Modeling

2. Getting β1-β9 part: Using AlphaFold2

3. Protein Preparation: Using Schrodinger

4. Molecular Dynamics: Using GROMACS

a. Protein Preparation

b. EM (Energy Minimization) stage

c. NVT (Canonical Ensemble) stage

d. NPT (Isothermal-Isobaric Ensemble) stage

e. MD (Molecular Dynamics) stage

5. Results

6. Discussion

1. Getting β10-E5-β11-K5 part: Homology Modeling

Homology modeling is a technique to predict the three-dimensional structure of a protein based on its amino acid sequence and a known structure of a homologous protein. The structures of β10-E5-β11-K5 (both uncleaved and cleaved) were modeled by the Modeller4 with a heterodimeric coiled coil structure (PDB code: 1KD8)

  • Sequence of β10-E5-β11-K5:
    MDLPDDHYLSTQTILSKDLNMEVSALEKEVSALEKEVSALEKEVSALEKEVSALEKEKRDHMVLLEYVTAAGITDASKVSALKEKVSALKEKVSALKEKVSALKEKVSALKE
  • Relevant files

After downloading Modeller, we put the aforementioned files together into a folder and run the following commands:

cd path/to/the/folder
python align2d.py
python model-single.py

to obtain the PDB file of β10-E5-β11-K5.

Protein 2.1 a heterodimeric coiled coil structure (PDB code: 1KD8)
Protein 2.2 β10-E5-β11-K5

2. Getting β1-β9 part: Using AlphaFold2

We use AlphaFold2 to get β1-β9 instead of deleting the β10-β11 part in the whole protein, as the msa mode ”mmseqs2_uniref_env” not only ensures a similar architecture to flipGFP but also takes into account more dynamic factors compared to directly deleting the β part, which may more accurately reflect the real situation.

  • Sequence of β1-β9:
    RKGEELFTGVVPILIELDGDVNGHKFFVRGEGEGDATIGKLSLKFIATTGKLPVPWPTLVTTLXXXVQAFSRYPDHMKRHDFFKSAMPEGYVQERTIYFKDDGTYKTRAEVKFHGDHLVNRIELKGIDFKEDGNILGHKLEYNFNSHKVYITADKQNNGIKANFTIRHNVEDGSVQLADHYQQNTPIGDGPV
    By inputing the sequence into the platform, we acquire the following results.
Figure 2.2 Sequence Coverage Graph From AlphaFold2
Figure 2.2 Sequence Coverage Graph From AlphaFold2
Figure 2.3 Confidence Figure From AlphaFold2
Figure 2.3 Confidence Figure From AlphaFold2

Which shows that the generated structure is countable.

Protein 2.3 β1-β9 Structure From AlphaFold2

3. Protein Preparation: Using Schrodinger

The structures of β10-E5-β11-K5 were inserted into β1-9 based on the GFP structure (PDB code: 4W6T, shown below) 5. The modeled systems were solvated in water with 150 mM NaCl.

Protein 2.4 4W6T

The following are the prepared proteins for both uncleaved and Cleaved system.

We simply split the β10-E5-β11-K5 into β10-E5-β11 and K5 to obtain the cleaved ones to imitate the cleavage.

Protein 2.5 Uncleaved flip-GFP
Protein 2.6 Cleaved flip-GFP

4. Molecular Dynamics: Using GROMACS

a. Protein Preparation

First, we need to convert the files obtained from Schrodinger into a format recognizable by GROMACS.

  • Relevant files:

Having the .sh file above, we run:

sh 1_parse_pro_ter.sh

Then, we run:

gmx pdb2gmx -f 1_protein_nonacl_nonma.pdb -o protein_cutin.pdb -p protein.top -ter -ignh

where 1_protein_nonacl_nonma.pdb and 2_protein_nonacl_nonma.pdb are the pdb file of the uncleaved/cleaved flipGFP. We choose the CHARMM36 force field, TIP3P water model, None for the start terminus type for ACE-0 and COO- for end terminus type.

Eventually, we execute the following commands to finish the construction of the systems.

python3 tools/trans.py water s
grep '^HETATM'  na.pdb  > NA_cutin.pdb
grep '^HETATM' cl.pdb > CL_cutin.pdb
cat protein_cutin.pdb water_processed.pdb NA_cutin.pdb CL_cutin.pdb > system.pdb
find -name 'system.pdb' | xargs perl -pi -e 's|MDL||g'
cp protein.top topol.top

The modeled systems with solution are shown below. (The files are decimated due to the limitation of file size. Therefore, the structures of the prepared ones serve merely as illustrative examples.)

Protein 2.7 Prepared uncleaved flip-GFP system (Decimated)
Protein 2.8 Prepared cleaved flip-GFP system (Decimated)

b. EM (Energy Minimization) stage

We performed a 6000-step energy using the following .mdp file and commands:

  • File
  • Commands
gmx grompp -f mdp_final_step/em.mdp -c system.pdb -p topol.top -o em.tpr -maxwarn 100
gmx mdrun -v -deffnm em -gpu_id 0 -nt 36 -pin on -pinstride 1 -pinoffset 0 

This potential plot demonstrates the nice, steady convergence of EpotE_{pot}.

c. NVT (Canonical Ensemble) stage

Then the system was heated from 0 to 310 K in 20 ps, applying 2 kcal/(mol A2) harmonic constraints to the non-hydrogen atoms of β1-9 by changing the .itp file manually.

  • File
  • Commands
gmx grompp -f mdp_final_step/nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr -maxwarn 100
gmx mdrun -deffnm nvt -gpu_id 0 -nb gpu -pme gpu -pmefft gpu -bonded gpu  -update gpu  -nt 36 -pin on -pinstride 1 -pinoffset 0

From the plot, it is clear that the temperature of the system quickly reaches the target value (310 K), and remains stable over the remainder of the equilibration.

d. NPT (Isothermal-Isobaric Ensemble) stage

After the nvt stage, 250 ns simulation in the NPT ensemble was performed at 1 bar.

  • File
  • Commands
gmx grompp -f mdp_final_step/npt.mdp -c nvt.gro -r nvt.gro -p topol.top -o npt.tpr -maxwarn 100
gmx mdrun -deffnm npt -gpu_id 0 -nb gpu -pme gpu -pmefft gpu -bonded gpu  -update gpu  -nt 36 -pin on -pinstride 1 -pinoffset 0

e. MD (Molecular Dynamics) stage

Finally, molecular dynamics simulation was performed for 100 ns with an integration timestep of 2 fs.

  • File
  • Commands
gmx grompp -f mdp_final_step/md.mdp -c npt.gro -t npt.cpt -p topol.top -o md.tpr -maxwarn 100
gmx mdrun -deffnm md -gpu_id 0 -nb gpu -pme gpu -pmefft gpu -bonded gpu -update gpu -nt 36 -pin on -pinstride 1 -pinoffset 0

5. Results

In the figures above, the 1st and the 2nd figure are from the uncleaved situation, while the 3rd and the 4th are from the cleaved situation. As for the meaning of different colors, pink represents the initial state of the uncleaved md stage, green represents the final state of the uncleaved md stage; blue represents the initial state of the cleaved md stage, and brown represents the final state of the cleaved md stage.

In both scenarios, β10-E5-β11 is not embedded within β1-9, which is the expected outcome in the uncleaved condition.

Comparing the structures of the uncleaved form in the before-and-after contrast (figure 1 and figure 2 in the gallery above), it can be observed that in the uncleaved state, both the β1-9 and β10-E5-β11-K5 segments maintain their relative positions stably, indicating that in the uncleaved condition, GFP is unlikely to revert to its fluorescent state.

The self-assembly process was not fully completed in the cleaved state, which can be seen from the rmsd plots below that if the self-assembly were complete, the rmsd plot should stabilize at last. However, compared to the uncleaved state, it is evident that there is a significant increase in displacement in the cleaved state, indicating a less stable system, which is what we expect. Particularly in the β10-E5-β11 section after cleavage (comparing the 2nd and the 4th figure above), there is a noticeable interaction and morphological recovery between β10 and β11, with E5 being significantly folded by this interaction, which is precisely the function we hope the flipGFP system will achieve. It is possible that the complete self-assembly process still requires a longer simulation time and with more stringent restrictions.

It is worth noting that observing the blue and brown alone in Figure 4 may not be as striking in terms of morphological recovery, as the blue represents the initial state of the md stage, not the initial state of the β10-E5-β11 segment. Its actual initial state can be seen in Protein 2.6. However, recall that the initial state of the β10-E5-β11 segment is the same in both the uncleaved and cleaved conditions. Therefore, by comparing the green (results of the uncleaved state) and brown (results of the cleaved state), it is evident that the folding of β10-E5-β11 is quite pronounced.

6. Discussion

  • The model simulates and helps vitualize the recovery of the shape of β10 and β11 after the flipGFP has been cleaved.

  • Due to the limitations of time and computational resources, we have found it challenging to conduct more refined and longer simulations. However, by reviewing the literature and consulting with relevant personnel, we have identified the following areas that can still be optimized:

It was mentioned in the aforementioned reference3 that,

Due to the many hydrogen bonding interactions between β10-11 (in β10-E5-β11-K5) and their neighboring β sheets in β1-9, the separation of β10-11 from β1-9 could take millisecond or longer timescale.

Whereas, in our simulation, we used the mode pbc = xyz and may not be the restrictions that could satisfy the need to simulate the system in 100ns.

In the future, we could apply stronger restrictions and run longer time to help us acquire the expected results

Relative Fluorescence Model

1. Description of the model

In physics, the fluorescence intensity can be described by the following equation.6

I=φI0(1eεlc)d2I=\frac{\varphi I_{0} (1-e^{\varepsilon lc} ) }{d^{2} }

Where φ\varphi is the quantum fluorescence intensity of the material, I0I_{0} is the received light intensity l is the thickness of the material and cc is the concentration of the material.

In practical applications, what usually matters is relative fluorescence intensity, namely

dcβ19dt=kcat5cβ19cβ1011cleavaged+kcat6cFlipcherrydcβ1011cleavageddt=kcat5cβ19cβ1011cleavaged+kcat6cFlipcherrydcFlipcherrydt=kcat5cβ19cβ1011cleavagedkcat6cFlipcherry\begin{align*} \frac{\mathrm{d} c_{\beta1-9} }{\mathrm{d} t} &= -k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } + k_{\text{cat6}}c_{\text{Flipcherry}} \\ \frac{\mathrm{d} c_{\beta10-11\, \text{cleavaged}} }{\mathrm{d} t} &= -k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } + k_{\text{cat6}}c_{\text{Flipcherry}} \\ \frac{\mathrm{d} c_{\text{Flipcherry}} }{\mathrm{d} t} &= k_{\text{cat5}}c_{\beta1-9}c_{\beta10-11\, \text{cleavaged} } - k_{\text{cat6}}c_{\text{Flipcherry}} \end{align*}

to predict the change of relative fluorescence intensity with time.

2. Conclusion

we import the above 2 ODEs and the physical equation into Matlab and utilize the ODE45 numerical method to simulate the change of relative fluorescence intensity with time.

This figure illustrates that our relative fluorescence intensity model matches the experimental results to some extent.

Moreover, inhibitor can decrease the amount of beta10-11cleavaged. By applying this model, we can predict the change of relative fluorescence intensity with time in proper accuracy range without experiment. This approach can facilitate to find efficient V8 protease inhibitor in our project.

Virtual Screening

In wetlab part, we have chosen several compound libraries for drug screening Natural Compound Library, Bioactive Compound Library, Clinical Compound Library, and Approved Drug Library applying high-throughput screening technology. The experiment screened approximately 3,600 small molecules, and the process of mixing enzymes and substrates took about six hours, not to mention the preliminary preparation such as protein purification. In contrast, virtual screening can accomplish the screening of millions of compounds within just a few days, significantly expanding the scope of screening, increasing the likelihood of identifying inhibitors with high inhibition rates, and uncovering novel patterns. Therefore, we engaged in virtual screening modeling.

This model is described in three parts:

1. Description of the model

a. Regular part: This section provides a detailed description of our virtual screening process.

b. Rating method used when screening through visual inspection: This part focuses on sharing the strategies we observed during the screening.

2. Results: This section mainly covers the visualization and interpretation of our results.

3. Discussion

1. Description of the model

a. Regular part

The following flow chart shows the general steps we take to do virtual screening.

Figure 4.1-1 Flow Chart Of Virtual Screening Part 1
Figure 4.1-1 Flow Chart Of Virtual Screening Part 1
Figure 4.1-2 Flow Chart Of Virtual Screening Part 2
Figure 4.1-2 Flow Chart Of Virtual Screening Part 2

First, we downloaded the PDB file of the v8 protease enzyme (PDB code: 2O8L) from the RCSB, and prepared the protein using Schrodinger(removing water molecules, etc.).

Protein 4.1 Prepared V8 Protease

Then, based on the functional region displayed in UniProt (UniProt Code: Q99V45), which is a catalytic triad composed of a histidine (HIS), an aspartic acid (ASP), and a serine (SER), we designed the grid box, with the ligand centroid’s radius of action being 10 Å and the ligand’s overall radius of action being 12 Å.

In the PDF, the positions of the active sites are 119, 161 and 237. Whereas, this is under the condition that 68-69 is the cleavage site. So, we shall shift 68 units to obtain the sites in the protein obtained in RCSB, i.e. subtracting 68 respectively, which means that the active sites are 51, 93 and 169. This result is shown below.

Figure 4.2 Functional Region (HIS 51, ASP 93, SER 169)
Figure 4.2 Functional Region (HIS 51, ASP 93, SER 169)

The prepared protein pocket file and the 3D small molecule library ChemDiv (1,000,000 + molecules) were used as inputs to perform preliminary virtual screening on a Linux cluster using the GVSrun7 command (which internally calls Glide’s molecular docking method and its files are open-source on Wang Lin’s Github). The GVS command is as follows:

GVSrun -i "*.zip" -D ChemDiv -m Normal

Here is a brief Description of the meaning of mode Normal:

Normal mode:

First, a simple set of physicochemical rules is applied to filter the compound database (molecular weight ≤ 600, rotatable bonds ≤ 10, number of aromatic rings ≤ 6), followed by HTVS docking.

Then, according to the proportion or quantity specified by the -a parameter, the top-ranked molecules are extracted and subjected to physicochemical property calculations using the QIKPROP program and a simple 5-rule filter (molecular weight ≤ 650, lipophilicity ≤ 7, hydrogen bond donors ≤ 6, hydrogen bond acceptors ≤ 20). `

Finally, an optimized SP docking is performed (rewarding internal hydrogen bonds of the ligand molecule, allowing halogens to act as hydrogen bond acceptors), and ultimately, the number of molecules specified by -b parameter is output.

The -a parameter and the -b parameter means The number of Output compounds per Screening Task and The number of Output compounds after Standard docking Task respectively, and we use the default ones.

After several days of screening (less than one week), we obtained the promising molecules and started to screen by observing the structures and calculating relevant metrics.

Subsequently, we utilized Schrodinger software to observe bond formation and assess the degree of electrostatic negativity matching for the top 1000 ranked small molecules. (The following figures show the view in Schrodinger and he correlation between different colors and different bonds.)

Figure 4.3-1 One Of The Results As Illustrative Examples
Figure 4.3-1 One Of The Results As Illustrative Examples
Figure 4.3-2 Different Bonds With Corresponding Colors
Figure 4.3-2 Different Bonds With Corresponding Colors

We scored them (see method in part b)and calculated their MMGBSA ΔG bind to further eliminate false positive small molecules, resulting in 200 small molecules with favorable properties. Finally, based on the new protein and small molecule conformations obtained from the calculation of mmgbsa ΔG bind, we calculated the ligand strain energy for these 200 small molecules. Based on the results, and in conjunction with the previous indicators, we arrived at the final 60 small molecules. (All steps were executed within the Schrodinger software.)

Here are basic explanation of MMGBSA ΔG Bind and Ligand Strain Energy. Click here to skip.

MMGBSA ΔG Bind

MMGBSA (Molecular Mechanics-Generalized Born Surface Area) is a method used to estimate the binding free energy (ΔG bind) of a ligand to a receptor, such as a protein in our context.

In brief, the equation of ΔG bind can be represented as:

ΔGbind=(ΔEMM+ΔGsolvation)complex[(ΔEMM+ΔGsolvation)ligand+(ΔEMM+ΔGsolvation)receptor ΔG_{\text{bind}} = (ΔE_{\text{MM}} + ΔG_{\text{solvation}})_{\text{complex}} - [(ΔE_{\text{MM}} + ΔG_{\text{solvation}})_{\text{ligand}} + (ΔE_{\text{MM}} + ΔG_{\text{solvation}})_{\text{receptor}}

Where:

  • (ΔEMM)(ΔE_{\text{MM}}) is the change in molecular mechanics energy.
  • (ΔGsolvation)(ΔG_{\text{solvation}}) is the change in solvation free energy, which includes both Generalized Born and Surface Area terms.

Normally, MMGBSA dG Bind values are generally more favorable when they are lower. It's better be under -50

b. Rating method used when screening through visual inspection

In the current virtual screening process, visual inspection remains a crucial part. When done well, it can effectively eliminate false positive results and refine useful molecules and structures.

However, for beginners, the first encounter with visual inspection can be daunting, and standards vary, making it difficult to unify the results obtained by two or more people with different criteria. This is also a problem we encountered. To address this issue, we consulted professors in the field of computational biology and asked them and doctoral students to give us suggestions on our visual inspection process.

During visual inspection, there are three important evaluation points:

  • the geometric fit between the protein and ligand

  • the degree of electrostatic complementarity between the protein and ligand

  • the formation of good bonds (including hydrogen bonds, halogen bonds, salt bridges, aromatic H-bonds, pi-pi stacking, and pi-cation interactions), and the formation of bad bonds (mainly ugly crushes).

    • In particular, the denser the internal hydrogen bonds in the pocket, the better. Therefore, we will give extra points for hydrogen bonds formed internally within the binding pocket.

Based on the factors above, we established the following evaluation table:

Scoring AspectsDescriptionScoring Criteria
1. Number of good bondsIncluding the aforementioned
hydrogen bonds, halogen bonds,
pi-pi stacking, ect.
0-2 bonds: 0 point
3-4 bonds: 1 point
5-6 bonds: 2 points
7-8 bonds: 3 points
and so on
2. Hydrogen bonds within the pocketThe deeper and denser, the better0-2 points,
default is 0 points, each instance adds one point
3. Concentrated orange/red bondsThese are bad/ugly bonds often result in clashes that cannot be resolved even after optimization by the MMGBSA method, representing unreasonable structures.0-2 points,
default is 2 points,
one point deducted for each additional concentrated area of excessive contact
4. Morphological fiti.e. conformational matching0-1 point,
0 points are given only if there are significant gaps
5. Electronegativity Compatibility/0-1 point,
1 point is given if there are no obvious conflicts overall

In practice, the highest possible score is 11 points, and the lowest is 2 points.

2. Results

Ultimately, we identified the 63 most promising small molecules from ChemDiv. The following table displays detailed information about these small molecules, sequentially showing their identification numbers from the observational screening, their SMILES properties, their MM-GBSA ΔG bind, and ligand strain energy. To view the entire table, please click the “Expand: Rest of the Table” button.

No.TitleSMILEMMGBSA dG BindLig Strain Energy
1y-4593
O=C(c1nnc(Cc(cc2)cc3c2OCO3)o1)NCCCN(CCC1)C1=O
-52.972.532537
2y-4594
CC(N(CC1)CCC1c1noc(CC(C2)CN2C(C2CC=CC2)=O)n1)=O
-51.4-0.602846
3y-4598
Cc1cccc(CNC(CC(CCC2)CN2C(C=NN2CCOC)=CC2=O)=O)c1
-67.072.758263
4z-4602
O=C(c1cn(CCN(CCOc2ccccc2)C2c3ccccc3)c2n1)NCCc1cnccc1
-73.432.714755
5z-4604
Cc(cccc1)c1C(SC1=NC(CN(CC2)CCN2C(Nc2c(C)ccc(Cl)c2)=O)=C2)=NN1C2=O
-71.274.020383
6z-4605
O=C(c1ccccc1)N(Cc1ccco1)Cc(cc1)ccc1Cl
-50.874.478632
7z-4606
O=C(c(cc1)cc(NC(N2Cc3cccc(Br)c3)=S)c1C2=O)Nc(cc1)ccc1Oc1ccccc1
-69.872.245384
8z-4611
[O-][N+] (c1nn(Cc(cc2)ccc2C(Nc2cn(Cc3cccc(Cl)c3)nc2)=O)cc1Cl)=O
-67.652.819607
9z-4617
O=C(c(cc1)ccc1N(CCCN1Cc(cc2)ccc2F)C1=O)N(CC1)CCN1c(cccc1)c1F
-64.612.955748
10z-4619
CC(CN(CC1)C(c2ccc(CNS(c3ccccc3)(=O)=O)cc2)=O)N1c1cc(C)ccc1
-64.240.942398

Right behind the table, the structures of these small molecules are presented. Due to space limitations, only one 3D structure and its corresponding docking results are shown directly. Click the “Expand: Final 60 Molecules” button to view the structures of all small molecules, which are of various sizes.

Regrettably, hydrogen bonds and other interactions (as shown in Figure 4.4-1 and Figure 4.4-2) cannot be displayed in the wiki. Relevant results will be included in our files, and interested parties can download them for viewing with software such as Schrodinger.

Figure 4.4-1 z-4650 Docking Result With Surface Shown As Example
Figure 4.4-1 z-4650 Docking Result With Surface Shown As Example
Figure 4.4-2 z-4650 Docking Result Without Surface Shown As Example
Figure 4.4-2 z-4650 Docking Result Without Surface Shown As Example

Spinning 180° and zoom in to see the Morphological Fit of the ligand and the protein.

Protein 4.2-0-1 z-4650 docking result shown as example (Decimated)
Protein 4.2-0-2 z-4650 3D structure shown as example

Final 63 Molecules:


Molecule 1-10

3. Discussion

  1. Although the lengthy drug procurement cycle prevented us from experimentally validating the virtual screening outcomes before the deadline of this year’s iGEM competition, based on the quantitative results,(Click to see the ΔG bind and ligand strain energy, given that below -50 and 0-5 are the desirable results respectively.) these small molecules hold considerable promise for achieving stronger inhibitory effects than those obtained from laboratory screenings.

  2. However, it is worth noting that without experimental validation, the small molecules may be subject to cleavage by the V8 protease, thereby failing to achieve the desired inhibitory effect. We will conduct experiments in the future to verify the results of the virtual screening.

  3. Suggestions for beginners of virtual screening:

  • In HTVS, the protein conformation remains unchanged, so potential hydrogen bonds and other interactions may not be displayed. Identifying such situations requires experience, which is why we recommend, if computational resources permit, to use the MM-GBSA to obtain all optimized conformations and bonding information, thereby also accumulating experience for oneself during the visual inspection process.

  • When a protein has multiple binding pockets, ensure that the selected molecules include those that occupy multiple pockets simultaneously as well as those that occupy different individual pockets, to ensure comprehensive experimental validation.





Visit our models on iGEM gitlab for more details.

Footnotes

  1. Polynikis A, Hogan SJ, di Bernardo M. Comparing different ODE modelling approaches for gene regulatory networks. J Theor Biol. 2009 Dec 21;261(4):511-30. doi: 10.1016/j.jtbi.2009.07.040. Epub 2009 Aug 6. PMID: 19665034.

  2. Robin T, Reuveni S, Urbakh M. Single-molecule theory of enzymatic inhibition. Nat Commun. 2018 Feb 22;9(1):779. doi: 10.1038/s41467-018-02995-6. PMID: 29472579; PMCID: PMC5823943.

  3. Zhang, Schepis, A., Huang, H., Yang, J., Ma, W., Torra, J., Zhang, S.-Q., Yang, L., Wu, H., Nonell, S., Dong, Z., Kornberg, T. B., Coughlin, S. R., & Shu, X. (2019). Designing a Green Fluorogenic Protease Reporter by Flipping a Beta Strand of GFP for Imaging Apoptosis in Animals. Journal of the American Chemical Society, 141(11), 4526–4530. https://doi.org/10.1021/jacs.8b13042 2

  4. Kuhlman B and Baker D. (2000) Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA 97(19):10383-8.

  5. Huang PS, Ban Y, Richter F, Andre I, Vernon R, Schief WR, Baker D. (2011) RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design. PLoS One 6(8):e24109.

  6. Itagaki, H. Chapter 3 - Fluorescence Spectroscopy. In Experimental Methods in Polymer Science; Tanaka, T., Ed.; Academic Press: Boston, 2000; pp 155–260. https://doi.org/10.1016/B978-0-08-050612-8.50009-X.

  7. “Schrodinger Script Library: Easily Run Computational Biology Tasks on Linux Clusters.” (2021, July). Zhihu Column. Retrieved from https://zhuanlan.zhihu.com/p/370850885