Part 1: Machine Learning Assisted Enzyme Retrofitting

Overview

The aim of this project is to perform targeted modification of specific enzymes to improve their thermal stability and catalytic efficiency through machine learning techniques. Through literature research, the MutCompute platform was selected for mutation site prediction, and its predictions were utilized for experimental validation with the aim of improving enzyme performance.

Step 1: Background check

First of all, in the first stage, we distill several major models that are currently mainstream by extensively reading the current literature related to machine learning for directed modification of enzymes, learning their basic processes as well as the use of the method. And we understand them accordingly, and then we find the most suitable model and method for our experiment against the purpose of our experiment.

Step 2: Finding the best approach

Since the main goal of our experiment is to modify the enzyme through machine learning to improve its stability or catalytic efficiency, and we found a platform called Mutcompute in the literature, which was trained through previous research and has mature applications in effectively modifying the enzyme to improve its stability.

The website uses a 3D self-supervised convolutional neural network architecture to predict unstable sites in proteins by learning the local chemical microenvironment of proteins, which uses 19,000 sequence-balanced proteins as a training set and learns the local chemical microenvironment of amino acids, which is used to identify the amino acid sites to be optimized in proteins. Thus with the help of this website, we help scientists to design protein variants with desired properties by predicting the possibility of amino acid substitutions.

Step 3: Derive targeted results

The PDB numbers of several target enzymes are known, and we can get the corresponding predicted results from the website. Based on the visualization of the predicted results and the description of the results in the authors’ literature, we can analyze the sites that can be mutated against the predicted results, and then combine the molecular docking part with the experiments to see whether there is any improvement in the thermal stability.

Among them, the predicted mutable sites in the website are ranked according to their predicted probabilities as follows:

Ranking table of machine learning locus prediction probabilities.pdf

Part 2: Molecular docking-assisted validation

Molecular docking

Molecular docking calculation is to put the ligand molecule in the position of the receptor active site, and then evaluate the interaction of the ligand and receptor in real time according to the principles of geometric complementarity, energy complementarity and chemical environment complementarity, and find the best binding mode between the two molecules¹ . Molecular docking considers the effect of ligand and receptor binding on the whole, and can better avoid the better local effect and poor overall binding that are easy to occur in other methods. Molecular docking is of great importance in the enzyme design. In the process of plastic degradation enzyme degradation of plastic, plastic molecules and target enzymes combined with each other, first of all, two molecules need to be fully close to, take the appropriate orientation, so that the two in the necessary parts of each other, interaction, and then through the appropriate conformation adjustment, to get a stable complex conformation.

In our experimental design, we aimed to verify by molecular docking whether the previous mutations obtained in machine learning and predicted O-glycosylation sites at N meet the requirements for improving the heat resistance, acid-base tolerance and catalytic activity of both IsPETase^PA and FAST-PETase-212 / 277. By comparing the binding free energies of the enzyme-docking Polyethylene terephthalate (PET) and 1- (2-hydroxyethyl) 4-methyl terephthalate (HEMT) before and after the mutation, RMSD values and the number and length of hydrogen bonds, The final mutations in IsPETase^PA (S29A, T59S, T122P, N183A) and FAST-PETase-212 / 277 (N212D, S223T, S92E, S169A, N190D) play a role in improving the heat resistance, acid-base tolerance and catalytic activity of the two enzymes, and the quality of the effect is predicted and verified.

Figure 1. Plot of the mutation sites of IsPETase^PA .

Figure 2. Plot of the mutation sites of FAST-PETase-212/277.

Materials

Prepare receptor

We entered the PDB numbers for IsPETase^PA and FAST-PETase-212 / 277 in MutCompute (https://mutcompute.com) and identified stable mutations. The predicted mutation sites are ranked according to the predicted probability (fold change in fit). We also learned that glycosylation can have an effect on the active site of the enzyme, and by inputting amino acid sequences on YinOYang 1.2-DTU Health Tech-Bioinformatic Services and NetNGlyc 1.0-DTU Health Tech-Bioinformatic Services two sites, predicting O-glycosylation and N-glycosylation sites, respectively, and glycosylation site-mutation potential values. IsPETase^PA (S29A, T59S, T122P, N183A) and FAST-PETase-212 / 277 (N212D, S223T, S92E, S169A, N190D). These sites were mutated separately using Pymol, and the mutated proteins were added with polar hydrogen atoms using AutoDockTools and selected as acceptors. Interactive visualization illustrations of MutCompute can be viewed in https://mutcompute.com/view/8J17 and https://mutcompute.com/view/7SH6.

Prepare ligand

We found PET small molecules in Pubchem (https://pubchem.ncbi.nlm.nih.gov/). After reading a certain amount of literature, we found that HEMT can be regarded as a short analogue of PET, which can simplify the model, facilitate computational processing, and reduce the computational complexity and computational cost. At the same time, HEMT retains the same reactive group as PET and thus can thus be used to mimic the critical step of the enzyme-catalyzed reaction ². From this, we chose PET and HEMT as the ligands.

Method

We performed docking between ligand and receptor using Autodock using a semiflexible docking method with ligands PET and HEMT set as flexible molecules and the receptor treated as rigid molecules. After docking, the conformation with the lowest binding energy was selected as the basis for assessing the affinity between the ligand and the receptor, but not as the sole criterion. The score , RMSD value and the number and length of hydrogen bonds were used as reference terms.

1. Score

Compared with the docking results before mutation, the greater the absolute score, the lower the binding energy, so that the enzyme and ligand are easier to bind together. However, due to the error of about 2 Kcal / mol calculated from Autodock, it is obviously not comprehensive to evaluate the docking results only by relying on docking scoring.

2. Root mean square deviation (RMSD)

The smaller the calculated root mean square deviation, the greater the probability of being found in the molecular docking calculation. While a dominant binding mode is much more likely to be found in molecular docking calculations than those with non-dominant binding modes.

3. Number and length of hydrogen bonds

We used Pymol software to visualized the final docking results, showing the number and length of hydrogen bonds between enzyme and ligand binding. The analysis of the number and length of hydrogen bonds can indicate its chemical rationality. The more the number of hydrogen bonds, the shorter the length, indicating that the greater the hydrophobic force, and the greater the possibility of close to the correct binding mode.

Results

Based on the docking results, combining the binding free energy, root mean square deviation (RMSD), hydrogen bond number and length from Autodock, we found that mutations at positions 122 and 183 for IsPETase^PA and for mutations.

Figure 3. Molecular docking of IsPETase^PA to PET before and after the mutation.
a. Molecular docking of unmutated IsPETase^PA with PET.
b. Molecular docking of S29A and PET.
c. Molecular docking of T59S to PET.
d. Molecular docking of T122P to PET.
e. Molecular docking of N183A to PET. The Score can be seen as a binding free energy.

Figure 4. Results of IsPETase^PA docking with PET

Figure 5. Docking of IsPETase^PA with HEMT before and after the mutation.
a. Molecular docking of unmutated IsPETase^PA with HEMT.
b. Molecular docking of SER-29-ALA and HEMT.
c. Molecular docking of THR-59-SER with HEMT.
d. Molecular docking of THR-122-PRO with HEMT.
e. Molecular docking of ASN-183-ALA to HEMT. The Score can be seen as the binding free energy.

Figure 6. Results of IsPETase^PA docking with HEMT

Figure 7. Molecular docking of FAST-PETase-212 / 277 with PET.
a. Molecular docking of the unmutated FAST-PETase-212 / 277 with the PET.
b. Molecular docking of SER-169-ALA and PET.
c. Molecular docking of ASN-190-ASP to PET.
d. Molecular docking of SER-223-THR to PET. The Score can be seen as the binding free energy.

Figure 8. Results of FAST-PETase-212/277 docking with PET.

Figure 9. Molecular docking of FAST-PETase-212 / 277 with HEMT.
a. Molecular docking of the unmutated FAST-PETase-212 / 277 to HEMT.
b. Molecular docking of SER-169-ALA with HEMT.
c. Molecular docking of ASN-190-ASP with HEMT.
d. Molecular docking of SER-223-THR with HEMT. The Score can be seen as the binding free energy.

Figure 10. Results of FAST-PETase-212/277 docking with HEMT

Limits

In the process of mutation, the N212D and S92E sites predicted in FAST-PETase-212/277 may not be analyzed using Pymol due to their rigid nature, this makes the summary analysis results of this part of the dry experiment may not be accurate and comprehensive enough. At the same time, the ligands PET and HEMT may be controversial because different researchers have different views on the conformation of PET chain in the enzyme. For HEMT alone, HEMT can be studied as a substrate for PETase, but caution should be taken when using it for docking studies. There are two points to be considered. First, HEMT may fully occupy the binding site during docking, limiting the exploration of other binding modes. The second is the uncertainty of the docking conformation: Although computational docking studies provide multiple binding modes, these modes may not be completely accurate, especially for PET on rigid polymer chains² .

Yue Junjie, Feng Hua, Liang Long. (2010). Experimental guide for protein structure prediction. The Chemical Industry Press. ISBN: 9787122077554 ↩
Shrimpton-Phoenix, E., Mitchell, J. B. O., & Bühl, M. (2022). Computational Insights into the Catalytic Mechanism of Is-PETase: An Enzyme Capable of Degrading Poly(ethylene) Terephthalate.Chemistry-AEuropeanJournal,28(7),e202201728.https://doi.org/10.1002/chem.202201728 ↩ ↩²

On this page