Overview

In this project, the dry lab developed a pipeline for PETase protein design and validation. The first (design) part of it contains an AI-based model to predict mutations in protein sequences and analyze their effects on enzyme performance. Using a transformer-based machine learning model, we identified potential mutations that could enhance the enzyme's activity.

Additionally, we employed the Michaelis-Menten equation to model the enzyme activity as the second (validation) part of the pipeline, which allowed us to calculate important parameters such as \(V_{max}\) (maximum reaction rate) and \(K_m\) (Michaelis constant). This approach enabled us to simulate the enzyme’s behavior and assess the impact of mutations on its enzyme activity under different substrate concentrations.

The purpose of this pipeline is to streamline enzyme design/optimization. The AI model helps us predict which mutations might improve the enzyme’s catalytic efficiency, while the kinetic modeling using the Michaelis-Menten equation allows us to quantify how these changes affect the enzyme’s reaction rates. By predicting enzyme kinetics and activity in Silico, we can identify the most promising mutations for further experimental validation. This makes the process more efficient for the wet lab group, as they can focus on testing only few top-performing variants.

Our modeling efforts have provided valuable insights for the wet lab group. By narrowing down the possible mutations through AI prediction, we provide them with just a few enzyme variants that are likely to exhibit improved enzyme activity.

The Michaelis-Menten equation has allowed us to quantify those improvements, providing a solid theoretical foundation for our experimental designs. As a result, our dry lab work contributes to the overall integrity of the project. It also helps us achieve our goal of designing new efficient enzymes for practical applications more efficiently.

AI-based Mutation Generation for PETase Protein Design

Introduction:

Protein design is a rapidly advancing field with various methods to create novel proteins or improve existing ones. Traditional approaches, such as directed evolution, generate new variants of proteins in the lab by introducing random mutations and selecting desired traits. This method has been instrumental in developing enzymes with new functions or improved performance [1].As discussed by Hilvert, rational design involves altering protein sequences based on detailed knowledge of their structures and catalytic mechanisms [2]. Computational approaches, such as those using the Rosetta software suite, enable the design of proteins by predicting structures and evaluating stability and functionality based on energy calculations [3].

Our pipeline leverages the Transformer architecture, a deep learning model, to predict potential mutation points in the PETase enzyme. The Transformer model has shown success in natural language processing tasks and is now being adapted for biological applications, providing a powerful tool for understanding and manipulating protein sequences [4].Furthermore, we predict the mutation content on the mutation point using Meta’s Evolutionary Scale Modeling (ESM) 1b model [5].

Figure 1: The overall pipeline of our method

Method and Experiment:

Our pipeline involves several key steps:

1.Model Training:

1.We trained a Transformer model on 1007 homologous PETase protein sequences obtained from the UniProt Database using the masked language model (MLM) training method. This approach allows the model to learn contextual information about amino acid sequences and predict masked residues accurately [4].

2.Prediction of Mutation Points:

The trained model predicts the top 10 potential points of mutation within the PETase enzyme. These points are likely to have significant impacts on the enzyme's structure and function [5].

3.Mutation Analysis:

3.Each predicted mutation point is individually masked and analyzed using Meta's ESM-1b model to predict the specific mutations. This step ensures that the predicted mutations are realistic and likely to be beneficial by utilizing the pre-trained knowledge obtained by ESM-1b. We discard any mutation points where the predicted mutations are the same as the original sequence.

4.Structural Visualization:

Finally, using AlphaFold 3 developed by Google DeepMind, we generated the predicted 3D structure of our new mutants. We then labeled the point of mutation on the 3D model.

Figure 2: The general architecture of the Transformer Model [4]. The validation lost curve during the training process

As the figure above shows, the model performs well during the validation process, which thus generates an optimal combination of parameters we will then use to create the Top 10 possible locations on the given PETase where a possible mutation will occur.

During training, we stopped at around 100 epochs, where the loss of the Transformer Model converges

Figure 3: The predicted scores for the possible mutations of the top 10 positions

As the figure above shows, the ESM-1b will evaluate the score for all mutations on the 10 positions and choose the one with the highest score. The redder the cell, the more score it will get. However, some mutations such as C241C are meaningless, so we discard those points and only leave the points circled in purple as shown above. We will then compare that will other existing mutations to see which of them are genuine new point mutations.

We then feed the locations and the original sequence into a pre-trained ESM-1b one by one to create the actual content of mutation of the mutation.

Figure 4: One of our new PETases where the point of mutation is labeled in red (BhrPETase, N205G)

As the figure above shows, the ESM-1b will evaluate the score for all mutations on the 10 positions and choose the one with the highest score. The redder the cell, the more score it will get. However, some mutations such as C241C are meaningless, so we discard those points and only leave the points circled in purple as shown above. We will then compare that will other existing mutations to see which of them are genuine new point mutations.

Results:

We then applied our pipeline to two types of PETase: BhrPETase and IsPETase. The model generated several potential mutations for each enzyme. These mutations were then evaluated for their impact on enzyme activity and stability.

BhrPETase: The model identified mutations that could enhance its ability to degrade polyethylene terephthalate (PET). The specific mutations and their predicted effects will be further tested in the wet lab (W229F,N205G,N191S,M57L).

IsPETase: Similar predictions were made for IsPETase, with potential improvements in PET degradation efficiency identified (H104W, K95S, M10L, M128L, M154L).

Figure 5: One of our new PETases where the point of mutation is labeled in red (BhrPETase, N205G)

In addition, as the figure aboves shows, we implememted docking between the predicted variants and the microplastic molecules to further justifiy the effect of the predicted mutants.

Figure 6: The the affinity of the top 15 positions of the BhrPETase enzyme to the microplastic molecules
points of mutations. In addition, we will apply that to other variations of PETase.

Conclusion:

Our AI-based pipeline offers a novel approach to protein design by predicting and analyzing potential mutations in enzymes. By leveraging deep learning models like the Transformer architecture, we can efficiently identify impactful mutations and enhance enzyme functionality. The next steps involve validating these predictions through experimental testing in the wet lab, which will provide insights into the practical applications of our designed mutations.

Enzyme Activity

Process Details: Model Construction

Data Collection and Preliminary Analysis

We have obtained enzymatic activity data for several PETase mutants (W159H/F229Y, S121E/R224Q, WT IsPETase, and LCC) from experiments. The measurement indicator is the absorbance at 514 nm (A514), which reflects the change in the amount of small molecules produced during the PET degradation process. The experiment was conducted under two conditions: with IPTG induction and without IPTG induction, to observe the degradation effects of these mutants under different conditions.

Figure 7: The experimental data for enzymatic activity of the wild type BhrPETase enzyme and its variants

Modeling Framework

To quantify the change in enzymatic activity over time, we chose a model based on the Michaelis-Menten Equation to simulate the rate changes in enzyme-catalyzed reactions.

Main Parameters Involved in the Model Include:

  • \(V_{max}\): The maximum reaction rate, which measures the maximum catalytic capacity of the enzyme under sufficient substrate conditions.
  • \(K_m\): The Michaelis constant, indicating the affinity of the enzyme for the substrate.
  • \([S]\): Substrate concentration, which in this case is the concentration of PET.

Data Sources

Experimental Data: From our own laboratory, dated August 17, 2024.

References and Known Parameters: We reviewed numerous literature sources to obtain known kinetic parameters of PETases, such as the \(V_{max}\) and \(K_m\) for WT IsPETase. These references provided a basis for setting our initial model parameters.

Software and Tools: We used Python as the primary programming language for modeling, SciPy and NumPy libraries for numerical calculations and data processing. We also utilized Matplotlib for data visualization to make the results more interpretable.

Model Results

Simulation of Enzyme Activity Changes

Fitting

Our enzyme kinetic modeling successfully characterized the degradation efficiency of BhrPET ase and G205N on microplastics. Our modeling measured the \(V_{max}\) and \(K_m\) of both enzymes.

Figure 8: The fitted Michaelis-Menten equation parameters for the wild typeBhrPETase enzyme and the N205G variant

In particular, we can use the fitted Michaelis-Menten parameters from the experimental data above to extrapolate the reaction kinetics values of our enzyme kinetics experiment results after 30 minutes, thereby further supporting the validity of our approach.

Figure 9: The result of enzymatic activity prediction using the Michaelis-Menten equation on the N205G variant and the wild type BhrPETase.

As the figure above shows, after 30 minutes, the enzymatic activity of the N205G variants starts to exceed the enzymatic activity of wild type BhrPETase, and the difference in the absorbance becomes larger after 30 minutes. This indicates that the N205G variants have a higher catalytic efficiency than the wild type according to the extrapolation of the the Michaelis-Menten equation. In turn, this result can be used to support the efficiency of the N205G variants quantitatively, which thus helps to further demonstrated the optimized enzymatic performance of our new variant in PET degradation.

Application

1. The Role of the Michaelis-Menten Equation in Quantifying Enzymatic Reactions

The Michaelis-Menten equation assists us in quantitatively characterizing the kinetic performance of the enzyme PETase as it catalyzes the degradation of PET plastics, particularly the relationship between enzyme activity and substrate concentration. Through the Michaelis-Menten equation, we can specifically calculate two crucial kinetic parameters:

Maximum Reaction Rate (\(V_{max}\)​): This parameter reflects the enzyme's highest catalytic capacity at saturated substrate concentrations and is vital for assessing the ultimate catalytic performance of different enzyme variants.

Michaelis Constant (\(K_m\)​): This parameter represents the substrate concentration at which the reaction rate is half of \(V_{max}\)​, thereby measuring the enzyme's affinity for the substrate.

These two parameters directly influence our understanding of the enzyme's efficiency and its performance under varying conditions. Specifically, by analyzing the \(V_{max}\)​ and \(K_m\) of different PETase variants, we can quantitatively evaluate their catalytic efficiency for PET degradation, providing a scientific basis for selecting the most efficient enzyme variant.

2. Guiding Experimental Design and Optimizing Enzymatic Reactions

The Michaelis-Menten equation also provides specific theoretical support for our experimental design. When designing experiments for PETase-catalyzed PET degradation, we can use the Michaelis-Menten equation to predict the reaction rates at various substrate concentrations, effectively determining the optimal substrate concentration range. This allows us to systematically screen different concentrations, optimize experimental conditions, and ensure the maximum efficacy of the enzyme is fully realized.

Specifically, the Michaelis-Menten equation helps us avoid using either too high or too low substrate concentrations in our experimental design, thereby enhancing the precision of our data and the efficiency of our experiments. Given the wide variation in PET concentrations in actual environments, the guiding role of the Michaelis-Menten equation enables us to find the conditions that best exert PETase activity within a feasible substrate concentration range, which is crucial for the success of the project.

3. Assisting in the Design and Screening of Enzyme Variants

In our project, we plan to engineer PETase to create a series of variants that enhance its ability to degrade PET. The Michaelis-Menten equation provides us with a clear criterion for analyzing the \(V_{max}\) ​and \(K_m\)​ of different variants, allowing us to quantitatively compare their differences in catalytic efficiency.

In particular, modifying the enzyme's structure to lower \(K_m\)​ enables it to maintain high efficiency at low substrate concentrations, which is especially important for practical applications. At the same time, the Michaelis-Menten equation aids in screening for variants with higher \(V_{max}\)​ to improve overall degradation efficiency. Thus, the Michaelis-Menten equation serves as the theoretical foundation for our screening and design of efficient enzyme variants, allowing us to optimize enzyme performance more targetedly.

4. Prediction for Practical Applications and Optimization of Reaction Conditions

The Michaelis-Menten equation can also help predict reaction rates under different environmental conditions (such as temperature, pH value, and substrate concentration), which is crucial for process control in future practical applications. In our project, we need to apply PETase enzyme in various degradation environments that may experience significant changes in conditions.

By applying the Michaelis-Menten equation, we can predict the reaction kinetics under these varying conditions, allowing us to optimize the enzyme application conditions in advance and avoid efficiency losses in practical applications. For instance, fluctuations in temperature and pH during actual operations can significantly affect the catalytic efficiency of the enzyme. The application of the Michaelis-Menten equation helps us simulate these changes in advance, thereby optimizing operational parameters to achieve the highest degradation efficiency.

5. Data Interpretation and Verification of Experimental Results.

In the analysis of experimental data, the Michaelis-Menten equation provides a standard mathematical model to help interpret the data and verify its consistency. When we obtain data from enzymatic reactions in experiments, we can utilize the Michaelis-Menten equation to fit the data and obtained the two values.

Specifically, the Michaelis-Menten equation allows us to transform experimental data into specific kinetic parameters, which helps us verify whether the experimental results match theoretical expectations, ensuring the reliability and repeatability of the results. When analyzing experimental data under different conditions, the Michaelis-Menten equation also helps us eliminate potential experimental errors as we can see clear outliers from the ideal curve, providing strong data support for subsequent experiments.

The Michaelis-Menten equation plays an essential role in our project, providing a scientific framework to quantify enzymatic catalytic efficiency and offering specific guidance for our experimental design, enzyme variant screening, and reaction condition optimization. Moreover, it helps for interpreting and verifying experimental data, ensuring that our results are reasonable.

References

[1] Arnold, F. H. (2018). Directed evolution: bringing new chemistry to life. Angewandte Chemie (International Ed. in English), 57(16), 4143. https://www.nobelprize.org/uploads/2018/10/arnold-lecture.pdf

[2] Hilvert, D. (2000). Critical analysis of antibody catalysis. Annual review of biochemistry, 69(1), 751-793. https://www.annualreviews.org/content/journals/10.1146/annurev.biochem.69.1.751

[3] Huang, P. S., Ban, Y. E. A., Richter, F., Andre, I., Vernon, R., Schief, W. R., & Baker, D. (2011). RosettaRemodel: a generalized framework for flexible backbone protein design. PloS one, 6(8), e24109. https://pubmed.ncbi.nlm.nih.gov/21909381/

[4] Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

[5] Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://www.pnas.org/doi/10.1073/pnas.2016239118

Appendix--Michaelis-Menten Equation

1. Reaction Mechanism

Consider the basic process of PETase-catalyzed degradation of PET plastic:

\[ E + S \rightleftharpoons ES \rightarrow E + P \]

where:

  • \(E\) is the free PETase enzyme.
  • \(S\) is the PET substrate.
  • \(ES\) is the PETase-substrate complex.
  • \(P\) is the small molecule product generated after degradation.

2. Rate Constants

Define the following rate constants:

  • \(K_{on}\): The rate constant for the binding of PETase to PET substrate to form the complex.
  • \(K_{off}\): The rate constant for the dissociation of the complex back into free PETase and PET substrate.
  • \(K_{cat}\): The rate constant for the conversion of the complex into products and free PETase.

3. Reaction Rate

Define the reaction rate as the rate of formation of product \(P\):

In our project, this represents the rate of PET plastic degradation.


4. Steady-State Assumption

The steady-state assumption states that in the early stages of the reaction, the concentration of ES quickly reaches a dynamic equilibrium, meaning its concentration remains relatively stable for a period of time. Therefore, at steady state, the rate of formation of ES equals its rate of decomposition:

\[ \frac{d[ES]}{dt} = k_{on}[E][S] - k_{off}[ES] - k_{cat}[ES] = 0 \]


5. Solve for \([ES]\)

From the steady-state equation, solve for \([ES]\):

\[ [ES] = \frac{k_{on}[E][S]}{k_{off} + k_{cat}} = \frac{[E][S]}{\frac{k_{off} + k_{cat}}{k_{on}}} \]

Derivation of the Michaelis-Menten Equation

1. Define the Michaelis Constant \(K_M\)

The Michaelis constant \(K_M\) is defined as:

\[ K_M = \frac{k_{off} + k_{cat}}{k_{on}} \]


2. Substitute the Definition of \(K_M\)

Substitute the definition of \(K_M\) into the original expression for \([ES]\):

\[ [ES] = \frac{k_{on}[E][S]}{k_{off} + k_{cat}} \]

\[ [ES] = \frac{k_{on}[E][S]}{\frac{k_{off} + k_{cat}}{k_{on}} \cdot k_{on}} \]

\[ [ES] = \frac{[E][S]}{\frac{k_{off} + k_{cat}}{k_{on}}} \]

\[ [ES] = \frac{[E][S]}{K_M} \]

3. Introduce the Total Enzyme Concentration Assumption

In most cases, we assume that the total enzyme concentration \([E]_T\) is much greater than the concentration of the enzyme-substrate complex \([ES]\), therefore, the concentration of free enzyme \([E]\) is approximately equal to the total enzyme concentration \([E]_T\):

\[ [E] \approx [E]_T \]


4. Substitute the Total Enzyme Concentration

Substitute \([E] \approx [E]_T\) into the expression for \([ES]\):

\[ [ES] = \frac{[E]_T [S]}{K_M} \]

5. Introduce Another Form of \(K_M\)

\(K_M\) can also be expressed in terms of \(k_{off}\) and \(k_{cat}\):

\[ K_M = \frac{k_{off} + k_{cat}}{k_{on}} \]

Thus, the expression for \([ES]\) can be further written as:

\[ [ES] = \frac{[E]_T [S]}{\frac{k_{off} + k_{cat}}{k_{on}}} \]


6. Simplify the Expression

Simplify the above expression to get:

\[ [ES] = \frac{k_{on}[E]_T [S]}{k_{off} + k_{cat}} \]

Conclusion

The final expression:

\[ [ES] = \frac{[E]_T [S]}{K_M + [S]} \]

Demonstrates how \([ES]\) changes with substrate concentration, taking into account \(K_M\) and the total enzyme concentration. This expression is the basis of the Michaelis-Menten equation, which describes the rate of enzyme-catalyzed reactions at various substrate concentrations.

6. Introduce Michaelis Constant \( K_M \)

Define:

\[ K_M = \frac{k_{off} + k_{cat}}{k_{on}} \]

This is the substrate concentration at which the enzyme shows half of its maximum velocity.


7. Substitute \([ES]\) Expression

Substitute \(K_M\) into the expression for \([ES]\):

\[ [ES] = \frac{[E][S]}{K_M + [S]} \]

8. Total Enzyme Concentration Assumption

Assume the total enzyme concentration \([E]_T\) is much greater than \([ES]\), so \([E]\) is approximately equal to \([E]_T\):

\[ [E] \approx [E]_T \]

9. Final Expression for Reaction Rate \(v\)

Substitute the expression for \([ES]\) into the reaction rate v:

\[ v = k_{cat}[ES] \] \[ v = k_{cat} \left( \frac{[E]_T[S]}{K_M + [S]} \right) \]

Define \( V_{max} \) as \( k_{cat}[E]_T \):

\[ V_{max} = k_{cat}[E]_T \]

10. Michaelis-Menten Equation

Finally, obtain the Michaelis-Menten equation:

\[ v = \frac{V_{max}[S]}{K_M + [S]} \]

Application to Enzyme Kinetics

In our project, we use this equation to simulate and optimize the kinetics of PETase. We apply this equation through the following steps:

  1. Data Collection: Measure the initial reaction rates at various substrate concentrations.
  2. Parameter Estimation: Use nonlinear regression analysis to fit experimental data to the Michaelis-Menten equation, extracting \( V_{\text{max}} \) and \( K_M \).
  3. Model Validation: Validate the model using independent datasets to ensure its predictive accuracy.
  4. Analysis and Interpretation: Analyze the kinetic parameters of different PETase variants to determine which variants have higher catalytic potential and substrate affinity.