Molecular Simulation | HZAU-China

Overview Structure Prediction of TbrA Substrate Affinity Analysis Complex Structure Prediction Protein Design Reference

Overview

In molecular model, we focused on the structural prediction and functional analysis of key proteins involved in the gene circuits of engineered bacteria. The first task involved predicting the 3D structure of aconitate isomerase (TbrA) using AlphaFold 2.0, followed by molecular dynamics simulations at various temperatures to assess substrate affinity. Molecular docking was performed using Rosetta and MOE to evaluate interactions between cis-aconitate and the enzyme.

Additionally, we predicted the HrpRS-Phrpl complex structure using AlphaFold 3 and molecular docking strategies, followed by binding free energy calculations to estimate the dissociation constant (K_d). This work aimed to understand the interaction between the HrpRS protein complex and the Phrpl promoter.

Lastly, to improve the sensitivity of the NahR protein to salicylic acid (SA), we performed saturation mutagenesis around the binding site, generating 5000 mutant sequences. After several rounds of screening based on energy calculations, we identified 22 mutant sequences with enhanced binding affinity to SA.

Model 1: Structure Prediction of TbrA

Why we built the model?

Since the crystal structure of the chosen TbrA has not been resolved, we aim to predict its structure using bioinformatics methods to guide subsequent modeling efforts.

How will this model be implemented?

First, we obtained the tbrA gene sequence encoding TbrA from Bacillus thuringiensis YBT-1520 and translated its coding sequence(CDS) into its corresponding amino acid sequence. We initially attempted to reconstruct the 3D structure of TbrA through homology modeling based on this amino acid sequence. However, after performing a BLAST search in the PDB database, the highest sequence similarity we found was only 41.8%. Therefore, we concluded that there is a lack of suitable templates among the resolved protein structures. Ultimately, we chose to predict the 3D structure of TbrA using AlphaFold 2.0 [1] and downloaded the best-predicted conformation.

Figure 1. 3D structure of TbrA predicted by AlphaFold 2.0.

Figure 2. Ramachandran plot of AlphaFold 2.0 predicted TbrA structure.
96.9% (344/355) of all residues were in favored (98%) regions.98.9% (351/355) of all residues were in allowed (>99.8%) regions.
There were 4 outliers (phi, psi):306 Lys (-55.6, 52.7), 308 Lys (-53.3, 79.0), 309 Lys (88.7, 95.7), 310 Leu (-35.1, -19.5)

Conclusion

From the Ramachandran plot, it can be observed that the majority of amino acid residues in the TbrA structure predicted by AlphaFold are within the allowed regions. Only four residues fall outside of these regions. However, since molecular dynamics simulations will be conducted later, we still consider this to be a reliable prediction.

Model 2: Substrate Affinity Analysis

Why we built the model?

In natural environments, the temperature of shallow soils exhibits significant variation due to diurnal cycles and seasonal fluctuations. These temperature differences may influence enzyme activity in practical applications, thereby affecting the overall effectiveness of their use.

What question we want to answer?

We wanted to verify whether the structure of TbrA changes at different temperatures.

We wanted to verify whether TbrA could bind tightly to the substrate cis-Aconitic acid (CAA) at different temperatures.

How will this model be implemented?

Ⅰ.Molecular dynamics simulations

First we used AmberTools[2] to conduct molecular dynamics simulations at 273K, 288K, and 301K (RT) to obtain the structures closest to their natural states. Besides, we employed MOE for molecular docking and analyzed the affinity between the substrate and enzyme.

Before running the molecular dynamics simulations, we completed the following steps:

1. Selected the AMBER14SB force field.
2. Chose the TIP3P model.
3. Added an 8Å water box for protein solvation.
4. Minimized the energy of the dynamical system.

We then performed molecular dynamics simulations at temperatures(1500ps) of 273K, 288K, and 301K (RT).

The MD trajectories were run on the bioinformatics server provided by Professor Wang Xiaocong, and we generated root mean square deviation (RMSD) graphs for the trajectories at different temperatures.

Figure 3. Plot for TbrA variants at different temperatures.

Figure 4. Ramachandran map of molecular dynamics optimization at different temperatures.

A.273K: 92.5% (322/348) of all residues were in favored (98%) regions. 99.4% (346/348) of all residues were in allowed (>99.8%) regions. There were 2 outliers (phi, psi):54 Pro (-38.5, -24.5), 259 Ser (-175.0, 0.3). B.288K: 94.5% (329/348) of all residues were in favored (98%) regions. 99.4% (346/348) of all residues were in allowed (>99.8%) regions. There were 2 outliers (phi, psi):194 Ser (46.1, -6.2), 269 Pro (-51.2, -1.0). C.301K(RT): 93.4% (325/348) of all residues were in favored (98%) regions. 98.3% (342/348) of all residues were in allowed (>99.8%) regions. There were 6 outliers (phi, psi):20 Asp (-36.7, -23.7), 84 Arg (60.3, -26.9), 194 Ser (47.6, -11.7), 246 Val (-68.5, 39.9), 269 Pro (-52.1, 7.0), 309 Lys (33.9, 84.8)

Ⅱ.Molecular docking

As the mechanism of aconitase isomerase catalyzing the isomerization of CAA to trans-Aconitic acid(TAA) has not been resolved, we proposed two docking strategies:

1.Blind Docking: Aiming for optimal binding sites by increasing the number of structures screened.
2.Site Prediction: Using the Site Finder module in MOE[3] to predict potential binding sites on TbrA, selecting the site with the highest PLB score as the docking site.

Based on the above two strategies, we performed molecular docking using MOE, downloading the ionized form of CAA from PubChem for the docking process.

Figure 5. Results of the molecular docking between TbrA and CAA.

A. Docking results of TbrA and CAA at 273K. B. Docking results of TbrA and CAA at 288K. C. Docking results of TbrA and CAA at 301K(RT). D. Comparison of TbrA and CAA docking results at different temperatures

Table 1. Docking score of MOE

Interactions

Ligand	S	RMSD_refine	E_conf	E_place	E_refine
273_rank1	-4.4164	1.8100	-227.8124	-61.0975	-11.7496
273_rank2	-3.9612	2.0821	-229.0727	-52.7169	-17.2199
273_rank3	-3.9342	3.1524	-230.7773	-51.9211	-20.9030
273_rank4	-3.6839	1.1420	-230.3506	-64.3159	-19.6059
273_rank5	-3.6433	1.8679	-228.3595	-61.2554	-20.0703
288_rank1	-4.6168	0.9003	-232.3806	-55.4807	-16.2197
288_rank2	-4.5758	0.7806	-231.7147	-70.8167	-21.3109
288_rank3	-4.0866	2.1987	-231.0153	-54.9606	-21.6501
288_rank4	-4.0549	1.5879	-230.3193	-71.9990	-17.1423
288_rank5	-4.0251	1.3626	-232.0007	-54.3167	-21.8283
301_rank1	-4.8968	1.7925	-232.3603	-62.0534	-26.4293
301_rank2	-4.7133	1.6083	-233.3967	-78.9352	-27.4523
301_rank3	-4.6520	1.3701	-234.7277	-82.1306	-27.9046
301_rank4	-4.6269	2.1093	-230.1262	-72.3201	-24.7022
301_rank5	-4.5862	1.0748	-277.5286	-61.3496	-19.5964

Conclusion

During the molecular docking process, we observed that whether through blind docking or docking after predicting the binding site, the ligand consistently bound to the same cavity. Since the overall performance of docking after predicting the binding site was better, we chose the results from this approach.

After docking, we evaluated the top five structures based on the docking results and found that all ligands were generally located within the same cavity. We believe this cavity to be a potential active site of the isomerase.

Through structural alignment, we discovered that temperature had little impact on the structure around the binding site. However, the structural changes still affected the specific binding sites of CAA and TbrA. After analyzing the interactions between the ligand and receptor in the docking results, we found that at the temperatures we selected, TbrA could form relatively stable interactions with CAA. However, we were unable to identify specific key residues involved in the process of CAA isomerization to TAA.

Model 3: Prediction of HrpR,HrpS, and P_hrpl Complex Structures

Why we built the model?

In our circuit, signals from engineered bacteria responding to SA upregulation in plants are transmitted through a cascade involving the two-component system HrpR, HrpS, and the P_hrpl. We performed detailed modeling analyses of this process in the bacteria lysis model. However, despite an extensive literature review, we found no relevant data on the binding interaction between the HrpRS protein complex and the P_hrpl.

What question we want to answer?

We wanted to obtain the structure of the complex of HrpR,HrpS and phrpl by structural prediction or molecular docking.

We wanted to calculate the dissociation constant Kd by calculating the binding free energy between the HprRS complex and the hrpl promoter.

How will this model be implemented?

Ⅰ.Preparation of HrpRS-P_hrpl complex structures

We used the online server of AlphaFol 3[4] to predict the structure of the HrpR, HrpS, and P_hrpl complex.

Figure 6. Structure of the complex predicted by AlphaFold 3.

Ⅱ.Calculation of binding free energy

We use the Interface Analyzer module in Rosetta[5] to calculate several metrics, including the binding free energy (ΔG), for the HrpRS complex with P_hrpl based on the structure predicted by AlphaFold 3.

For the reaction:

$$\text R + L \stackrel{k_{\text {on}}}{\underset{k_{\text {off}}}{\rightleftharpoons}} RL $$

when the reaction reaches a steady state, we can derive the fyzaerlowing kinetic equation:

$$k_{on}[R][L]=k_{off}[RL]$$

The binding constant(K_d) is given by:

$$K_{d}=\frac{k_{off}}{k_{on}}$$

There is a quantitative relationship between the dissociation constant and the molar Gibbs free energy (ΔG, binding free energy):

$$ΔG=RTlnK_{d}$$

Conclusion

Through calculations using the Interface Analyzer, we obtained the binding free energy (ΔG) between HrpRS and P_hrpl, which is -36.602 kJ/mol. Based on this value, the dissociation constant (K_d) was calculated to be 3.78 × 10^-7M , providing a reasonable parameter for use in the bacteria lysis model.(Click here learn more about this model)

Model 4: Protein Design Based on Ligand Binding Sites

Why we built the model?

In expert interviews conducted by the HP group, Professor Shunping Yan from the College of Life Sciences and Technology at Huazhong Agricultural University mentioned that "the concentration of SA varies across different plant tissues". Most available information on SA concentration is related to leaves, as the synthesis pathway of SA in plants is closely linked to chloroplasts.

Therefore, we hypothesize that the concentration of SA in plant roots is relatively low. To enhance NahR's sensitivity to SA in order to address potential variations in SA concentration, we performed saturation mutagenesis on residues near the docking site based on NahR's affinity for SA.

What question we want to answer?

We wanted to screen out mutants with higher affinity for SA through virtual mutation, hoping to improve the sensitivity of NahR to SA through this method.

How will this model be implemented?

Ⅰ.Generation of NahR-SA Complex

Before performing saturation mutagenesis, we first need to identify the ligand binding site and prepare a conformational library of SA isomers,the steps are as follows.

1.We initially used MOE to conduct preliminary docking of SA to identify its binding site with NahR.

Figure 7. Molecular docking results of SA and NahR.

2.We utilized the BCL::Conf web server[6] to generate a conformational library of SA isomers, obtaining all possible structures of SA for molecular docking.

3.The ligands were placed into the binding pocket of the protein.

Ⅱ.Rosetta Design

1.Generation of mutant structures

After obtaining the complex structure, we selected residues near the docking site as mutation sites for saturation mutagenesis. The mutant structures were generated using RosettaLigand[7][8] based on RosettaScript [9], in order to obtain 5000 mutants along with their energy and other parameter information.

2.Preliminary screening of mutants

Using the total scores from the energy information generated by RosettaLigand, we performed a preliminary screening based on the van der Waals interactions between the protein and the ligand, as well as among amino acid side chains. This process yielded 3681 sequences.

3.Refinement of mutant energy calculations and selection

In this step, we employed the Interface Analyzer to conduct a more detailed calculation of interactions between the mutants and SA from the preliminary screening results. Based on the calculated results, we filtered the mutants using metrics such as packstat and delta unsatHbonds, progressively identifying those with improved binding affinity for SA. After nine rounds of selection, we ultimately obtained 22 mutant sequences with enhanced binding capabilities for SA.

Figure 8. Mutant interaction heat map.

The figiure visualizes interaction data from the initial screening of 3064 mutants, normalized using non-parametric methods. Mutants were eliminated in successive rounds of refined screening: 1–1473 (1st round), 1473–2205 (2nd round), 2205–2584 (3rd round), 2584–2790 (4th round), 2790–2911 (5th round), 2911–2980 (6th round), 2980–3016 (7th round), 3016–3042 (8th round). The final set consists of mutants 3042–3064. Selection was based on packstat, hbonds_ints, delta_unsatHbonds, and dG_separated/dSASAx100. Click here to download data in Figurte 8.

Table 2. Interactions of 22 mutant

Interactions

Sequence Name	Mutations	Interaction Detail	Interaction	Distance (Å)	E (kcal/mol)
nahR_SA_0028	M103T,Y110V,V151G,R152G,N169E,I271T,I273T	O3:OE1(E169)	H-donor	2.84	-3.4
nahR_SA_0028	M103T,Y110V,V151G,R152G,N169E,I271T,I273T	O1:N(V153)	H-acceptor	2.8	-2.9
nahR_SA_0124	M103T,T104E,I106T,V151G,R152T,V153T,N169A,H206T,R248Y,I271L,I273V	O2:OG1(T153)	H-acceptor	2.86	-2.6
nahR_SA_0449	T104G,I106V,Y110V,V151G,R152T,N169L,R248W,I271T	O3:NE2(H206)	H-donor	2.71	-1.7
		O1:OG1(T152)	H-acceptor	2.67	-1.3
		O1:N(V153)	H-acceptor	3.4	-1.1
nahR_SA_0460	M103T,F111M,V151G,R152T,N169E,I271T,I273V	O2:NE2(H206)	H-acceptor	2.83	-7.6
nahR_SA_0592	M103T,I106V,V151G,R152T,N169A,R248Y,I271T	O3:NE2(H206)	H-acceptor	2.88	-1.6
		O1:OG1(T104)	H-acceptor	2.85	-1.9
		O1:OG1(T152)	H-acceptor	2.86	-2
		O2:NE2(H206)	H-acceptor	2.73	-2.4
nahR_SA_1617	F111M,V151G,R152M,N169E,R248F,I271T	O3:OE1(E169)	H-donor	3.14	-0.6
		O3:OE2(E169)	H-donor	2.82	-4.5
		O1:NE2(H206)	H-acceptor	2.6	-1.3
nahR_SA_1955	M103T,I106A,V151G,R152M,N169E,R248K,I271T	O3:NE2(H206)	H-donor	2.89	-3.2
nahR_SA_1955	M103T,I106A,V151G,R152M,N169E,R248K,I271T	O2:OG1(T104)	H-acceptor	2.87	-1.3
nahR_SA_2204	M103T,I106A,V151G,R152G,N169L,I271T	NA	NA	NA	NA
nahR_SA_2254	M103T,I106V,V151G,R152G,N169L,I271T	O3:NE2(H206)	H-donor	2.94	-0.8
nahR_SA_2357	M103T,I106V,V151G,R152G,N169L,I271T	O2:CA(G152)	H-acceptor	3.18	-0.5
nahR_SA_2526	M103T,T104S,V151G,R152G,N169E,I271T,I273V	O1:N(V153)	H-acceptor	2.88	-3.4
nahR_SA_2526	M103T,T104S,V151G,R152G,N169E,I271T,I273V	O2:CB(T204)	H-acceptor	3.41	-0.6
nahR_SA_2615	M103T,I106V,V151G,R152G,N169L,I271T	O2:OG1(T104)	H-acceptor	2.84	-1.3
nahR_SA_2652	M103T,I106V,Y110V,V151G,R152G,N169E,R248L,I271T	O3:NE2(H206)	H-donor	2.82	-3.8
nahR_SA_2652	M103T,I106V,Y110V,V151G,R152G,N169E,R248L,I271T	O2:OG1(T104)	H-acceptor	2.85	-1.7
nahR_SA_3102	M103T,T104S,V151G,R152G,N169L,I271T	O1:N(V153)	H-acceptor	2.82	-2.7
nahR_SA_3102	M103T,T104S,V151G,R152G,N169L,I271T	O2:CB(S104)	H-acceptor	3.22	-0.6
nahR_SA_3385	M103T,Y110V,V151G,R152G,V153T,I271T,I273T	O1:N(T153)	H-acceptor	2.79	-2.8
nahR_SA_3385	M103T,Y110V,V151G,R152G,V153T,I271T,I273T	O2:OG1(T153)	H-acceptor	2.82	-1.3
nahR_SA_3491	V151G,R152M,N169A,R248Y,I271T,I273V	O2:NE2(H206)	H-acceptor	2.86	-7.4
nahR_SA_3773	M103T,I106A,F111M,V151G,R152T,N169E,R248K,I271T	O3:OE1(E169)	H-donor	2.77	-3.6
		O3:NE2(H206)	H-acceptor	2.87	-1.5
		O2:OG1(T104)	H-acceptor	2.85	-1.9
nahR_SA_3827	M103T,I106V,V151G,R152M,N169E,R248Y,I271T	O3:OE1(E169)	H-donor	2.79	-3.6
nahR_SA_3907	M103T,Y110V,V151G,R152G,N169V,P246G,R248L,I271T	O3:NE2(H206)	H-donor	2.9	-0.5
nahR_SA_3914	M103T,I106V,V151G,R152G,N169E,R248K,I271T	O3:NE2(H206)	H-donor	2.85	-2.9
nahR_SA_3914	M103T,I106V,V151G,R152G,N169E,R248K,I271T	O2:OG1(T104)	H-acceptor	2.85	-1.5
nahR_SA_4085	M103T,V151G,R152M,N169A,R248Y,I271T	O3:NE2(H206)	H-donor	2.8	-4.1
nahR_SA_4085	M103T,V151G,R152M,N169A,R248Y,I271T	O2:OG1(T104)	H-acceptor	2.85	-1.8
nahR_SA_4464	M103T,I106V,Y110V,V151G,R152G,N169E,R248K,I271T	O3:OE1(E169)	H-donor	2.86	-4
nahR_SA_4464	M103T,I106V,Y110V,V151G,R152G,N169E,R248K,I271T	O1:N(V153)	H-acceptor	2.83	-3.2

Conclusion

Based on the interactions between the ligand and the protein, we screened 22 NahR mutants with enhanced binding affinity to SA from the 5000 mutants generated by RosettaLigand.

Future work & cooperation

Initially, we planned to measure the binding ability of NahR mutants to SA through Surface Plasmon Resonance (SPR) and verify its function as a biosensor. However, unfortunately, due to some problems in the functional characterization of wild-type SA biosensors, we did not have enough time to verify it through experiments.

Click here to download the mutant sequences.

Reference

[1] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583-589.

[2] Case DA, Aktulga HM, Belfon K, Cerutti DS, Cisneros GA, Cruzeiro VWD, Forouzesh N, Giese TJ, Götz AW, Gohlke H, Izadi S, Kasavajhala K, Kaymak MC, King E, Kurtzman T, Lee TS, Li P, Liu J, Luchko T, Luo R, Manathunga M, Machado MR, Nguyen HM, O'Hearn KA, Onufriev AV, Pan F, Pantano S, Qi R, Rahnamoun A, Risheh A, Schott-Verdugo S, Shajan A, Swails J, Wang J, Wei H, Wu X, Wu Y, Zhang S, Zhao S, Zhu Q, Cheatham TE 3rd, Roe DR, Roitberg A, Simmerling C, York DM, Nagan MC, Merz KM Jr. AmberTools. J Chem Inf Model. 2023 Oct 23;63(20):6183-6191.

[3] Chemical Computing Group ULC. Molecular Operating Environment (MOE), 2022.02; Chemical Computing Group, Inc.: Montreal, QC, Canada, 2023.

[4] Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 Jun;630(8016):493-500.

[5] Stranges PB, Kuhlman B. A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds. Protein Sci. 2013 Jan;22(1):74-82.

[6] Mendenhall J, Brown BP, Kothiwale S, Meiler J. BCL::Conf: Improved Open-Source Knowledge-Based Conformation Sampling Using the Crystallography Open Database. J Chem Inf Model. 2021 Jan 25;61(1):189-201.

[7] DeLuca S, Khar K, and Meiler J. (2015). Fully Flexible Docking of Medium Sized Ligand Libraries with RosettaLigand. PLoS One 10(7):e0132508.

[8] Lemmon G, and Meiler J. (2012). Rosetta Ligand docking with flexible XML protocols. Methods Mol Biol 819:143-55.

[9] Fleishman SJ, Leaver-Fay A, Corn JE, Strauch E-M, Khare SD, Koga N, Ashworth J, Murphy P, Richter F, Lemmon G, Meiler J, and Baker D. (2011). RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite. PLoS ONE 6(6):e20161.

Go back to Model

Why we built the model?

How will this model be implemented?

Conclusion

Why we built the model?

What question we want to answer?

How will this model be implemented?

Ⅰ.Molecular dynamics simulations

Ⅱ.Molecular docking

Conclusion

Why we built the model?

What question we want to answer?

How will this model be implemented?

Ⅰ.Preparation of HrpRS-Phrpl complex structures

Ⅱ.Calculation of binding free energy

Conclusion

Why we built the model?

What question we want to answer?

How will this model be implemented?

Ⅰ.Generation of NahR-SA Complex

Ⅱ.Rosetta Design

Conclusion

Future work & cooperation

Ⅰ.Preparation of HrpRS-P_hrpl complex structures