In this project, the dry lab developed a pipeline for PETase protein design and validation. The first (design)
part of it contains an AI-based model to predict mutations in protein sequences and analyze their effects on
enzyme performance. Using a transformer-based machine learning model, we identified potential mutations that
could enhance the enzyme's activity.
Additionally, we employed the Michaelis-Menten equation to model the enzyme activity as the second
(validation) part of the pipeline, which allowed us to calculate important parameters such as \(V_{max}\) (maximum
reaction rate) and \(K_m\) (Michaelis constant). This approach enabled us to simulate the enzyme’s behavior and
assess the impact of mutations on its enzyme activity under different substrate
concentrations.
The purpose of this pipeline is to streamline enzyme design/optimization. The AI model helps us predict which
mutations might improve the enzyme’s catalytic efficiency, while the kinetic modeling using the Michaelis-Menten
equation allows us to quantify how these changes affect the enzyme’s reaction rates. By predicting enzyme
kinetics and activity in Silico, we can identify the most promising mutations for further experimental
validation. This makes the process more efficient for the wet lab group, as they can focus on testing only few
top-performing variants.
Our modeling efforts have provided valuable insights for the wet lab group. By narrowing down the possible
mutations through AI prediction, we provide them with just a few enzyme variants that are likely to exhibit
improved enzyme activity.
The Michaelis-Menten equation has allowed us to quantify those
improvements, providing a solid theoretical foundation for our experimental designs. As a result, our dry lab
work contributes to the overall integrity of the project. It also helps us achieve our goal of designing new
efficient enzymes for practical applications more efficiently.
Introduction:
Protein design is a rapidly advancing field with various methods to create novel proteins or improve existing
ones. Traditional approaches, such as directed evolution, generate new variants of proteins in the lab by
introducing random mutations and selecting desired traits. This method has been instrumental in developing
enzymes with new functions or improved performance [1].As discussed by Hilvert,
rational design involves altering protein sequences based on detailed knowledge of their structures and
catalytic mechanisms [2]. Computational approaches, such as those using the
Rosetta software suite, enable the design of proteins by predicting structures and evaluating stability and
functionality based on energy calculations [3].
Our pipeline leverages the Transformer architecture, a deep learning model, to predict potential mutation
points in the PETase enzyme. The Transformer model has shown success in natural language processing tasks
and is now being adapted for biological applications, providing a powerful tool for understanding and
manipulating protein sequences [4].Furthermore, we predict the mutation content
on the mutation point using Meta’s Evolutionary Scale Modeling (ESM) 1b model [5].
Figure 1: The overall pipeline of our method
Method and Experiment:
Our pipeline involves several key steps:
1.Model Training:
1.We trained a Transformer model on 1007 homologous PETase protein sequences obtained from the UniProt
Database
using the masked language model (MLM) training method. This approach allows the model to learn contextual
information about amino acid sequences and predict masked residues accurately [4].
2.Prediction of Mutation Points:
The trained model predicts the top 10 potential points of mutation within the PETase enzyme. These points are
likely to have significant impacts on the enzyme's structure and function [5].
3.Mutation Analysis:
3.Each predicted mutation point is individually masked and analyzed using Meta's ESM-1b model to predict the
specific mutations. This step ensures that the predicted mutations are realistic and likely to be beneficial
by
utilizing the pre-trained knowledge obtained by ESM-1b. We discard any mutation points where the predicted
mutations are the same as the original sequence.
4.Structural Visualization:
Finally, using AlphaFold 3 developed by Google DeepMind, we generated the predicted 3D structure of our new
mutants. We then labeled the point of mutation on the 3D model.
Figure 2: The general architecture of the Transformer Model [4].
The validation lost curve during the training process
As the figure above shows, the model performs well during the validation process, which thus generates an
optimal
combination of parameters we will then use to create the Top 10 possible locations on the given PETase where
a
possible mutation will occur.
During training, we stopped at around 100 epochs, where the loss of the Transformer Model converges
Figure 3: The predicted scores for the possible mutations of the top 10 positions
As the figure above shows, the ESM-1b will evaluate the score for all mutations on the 10 positions and
choose
the one with the highest score. The redder the cell, the more score it will get. However, some mutations
such as
C241C are meaningless, so we discard those points and only leave the points circled in purple as shown
above. We
will then compare that will other existing mutations to see which of them are genuine new point mutations.
We then feed the locations and the original sequence into a pre-trained ESM-1b one by one to create the
actual
content of mutation of the mutation.
Figure 4: One of our new PETases where the point of mutation is labeled in red
(BhrPETase,
N205G)
As the figure above shows, the ESM-1b will evaluate the score for all mutations on the 10 positions and
choose
the one with the highest score. The redder the cell, the more score it will get. However, some mutations
such as
C241C are meaningless, so we discard those points and only leave the points circled in purple as shown
above. We
will then compare that will other existing mutations to see which of them are genuine new point mutations.
Results:
We then applied our pipeline to two types of PETase: BhrPETase and IsPETase. The model generated several
potential mutations for each enzyme. These mutations were then evaluated for their impact on enzyme activity
and
stability.
BhrPETase: The model identified mutations that could enhance its ability to degrade polyethylene
terephthalate (PET). The specific mutations and their predicted effects will be further tested in the wet
lab
(W229F,N205G,N191S,M57L).
IsPETase: Similar predictions were made for IsPETase, with potential improvements in PET degradation
efficiency identified (H104W, K95S, M10L, M128L, M154L).
Figure 5: One of our new PETases where the point of mutation is labeled in red
(BhrPETase,
N205G)
In addition, as the figure aboves shows, we implememted docking between the predicted variants and the
microplastic molecules to further justifiy the effect of the predicted mutants.
Figure 6: The the affinity of the top 15 positions of the BhrPETase enzyme to the
microplastic molecules
points of mutations. In addition, we will apply that to other variations of PETase.
Conclusion:
Our AI-based pipeline offers a novel approach to protein design by predicting and analyzing potential
mutations
in enzymes. By leveraging deep learning models like the Transformer architecture, we can efficiently
identify
impactful mutations and enhance enzyme functionality. The next steps involve validating these predictions
through experimental testing in the wet lab, which will provide insights into the practical applications of
our
designed mutations.
Process Details: Model Construction
Data Collection and Preliminary Analysis
We have obtained enzymatic activity data for several PETase mutants (W159H/F229Y, S121E/R224Q, WT IsPETase,
and
LCC) from experiments. The measurement indicator is the absorbance at 514 nm (A514), which reflects the
change
in the amount of small molecules produced during the PET degradation process. The experiment was conducted
under
two conditions: with IPTG induction and without IPTG induction, to observe the degradation effects of these
mutants under different conditions.
Figure 7: The experimental data for enzymatic activity of the wild type BhrPETase enzyme and its variants
Modeling Framework
To quantify the change in enzymatic activity over time, we chose a model based on the Michaelis-Menten
Equation to simulate the rate changes in enzyme-catalyzed reactions.
Main Parameters Involved in the Model Include:
- \(V_{max}\): The maximum reaction rate, which measures the maximum catalytic capacity of the enzyme
under sufficient substrate conditions.
- \(K_m\): The Michaelis constant, indicating the affinity of the enzyme for the substrate.
- \([S]\): Substrate concentration, which in this case is the concentration of PET.
Data Sources
Experimental Data: From our own laboratory, dated August 17, 2024.
References and Known Parameters: We reviewed numerous literature sources to obtain known kinetic
parameters of PETases, such as the \(V_{max}\) and \(K_m\) for WT IsPETase. These references provided a basis for setting
our initial model parameters.
Software and Tools: We used Python as the primary programming language for modeling, SciPy
and NumPy libraries for numerical calculations and data processing. We also utilized Matplotlib for data
visualization to make the results more interpretable.
Model Results
Simulation of Enzyme Activity Changes
Fitting
Our enzyme kinetic modeling successfully characterized the degradation efficiency of BhrPET ase and G205N on microplastics. Our modeling measured the \(V_{max}\) and \(K_m\) of both enzymes.
Figure 8: The fitted Michaelis-Menten equation parameters for the wild typeBhrPETase enzyme and the N205G variant
In particular, we can use the fitted Michaelis-Menten parameters from the experimental data above to extrapolate
the reaction kinetics values of our enzyme kinetics experiment results after 30 minutes, thereby further supporting
the validity of our approach.
Figure 9: The result of enzymatic activity prediction using the Michaelis-Menten equation on the N205G variant and the wild type BhrPETase.
As the figure above shows, after 30 minutes, the enzymatic activity of the N205G variants starts to exceed
the enzymatic activity of wild type BhrPETase, and the difference in the absorbance becomes larger after 30
minutes. This indicates that the N205G variants have a higher catalytic efficiency than the wild type according
to the extrapolation of the the Michaelis-Menten equation. In turn, this result can be used to support the efficiency
of the N205G variants quantitatively, which thus helps to further demonstrated the optimized enzymatic
performance of our new variant in PET degradation.
1. The Role of the Michaelis-Menten Equation in Quantifying Enzymatic Reactions
The Michaelis-Menten equation assists us in quantitatively characterizing the kinetic performance of the enzyme
PETase as it catalyzes the degradation of PET plastics, particularly the relationship between enzyme activity
and substrate concentration. Through the Michaelis-Menten equation, we can specifically calculate two crucial
kinetic parameters:
Maximum Reaction Rate (\(V_{max}\)): This parameter reflects the enzyme's highest catalytic capacity at
saturated substrate concentrations and is vital for assessing the ultimate catalytic performance of different
enzyme variants.
Michaelis Constant (\(K_m\)): This parameter represents the substrate concentration at which the
reaction
rate is half of \(V_{max}\), thereby measuring the enzyme's affinity for the substrate.
These two parameters directly influence our understanding of the enzyme's efficiency and its performance under
varying conditions. Specifically, by analyzing the \(V_{max}\) and \(K_m\) of different PETase variants, we can
quantitatively evaluate their catalytic efficiency for PET degradation, providing a scientific basis for
selecting the most efficient enzyme variant.
2. Guiding Experimental Design and Optimizing Enzymatic Reactions
The Michaelis-Menten equation also provides specific theoretical support for our experimental design. When
designing experiments for PETase-catalyzed PET degradation, we can use the Michaelis-Menten equation to predict
the reaction rates at various substrate concentrations, effectively determining the optimal substrate
concentration range. This allows us to systematically screen different concentrations, optimize experimental
conditions, and ensure the maximum efficacy of the enzyme is fully realized.
Specifically, the Michaelis-Menten equation helps us avoid using either too high or too low substrate
concentrations in our experimental design, thereby enhancing the precision of our data and the efficiency of our
experiments. Given the wide variation in PET concentrations in actual environments, the guiding role of the
Michaelis-Menten equation enables us to find the conditions that best exert PETase activity within a feasible
substrate concentration range, which is crucial for the success of the project.
3. Assisting in the Design and Screening of Enzyme Variants
In our project, we plan to engineer PETase to create a series of variants that enhance its ability to degrade
PET. The Michaelis-Menten equation provides us with a clear criterion for analyzing the \(V_{max}\) and
\(K_m\) of
different variants, allowing us to quantitatively compare their differences in catalytic efficiency.
In particular, modifying the enzyme's structure to lower \(K_m\) enables it to maintain high efficiency at low
substrate concentrations, which is especially important for practical applications. At the same time, the
Michaelis-Menten equation aids in screening for variants with higher \(V_{max}\) to improve overall degradation
efficiency. Thus, the Michaelis-Menten equation serves as the theoretical foundation for our screening and
design of efficient enzyme variants, allowing us to optimize enzyme performance more targetedly.
4. Prediction for Practical Applications and Optimization of Reaction Conditions
The Michaelis-Menten equation can also help predict reaction rates under different environmental conditions
(such as temperature, pH value, and substrate concentration), which is crucial for process control in future
practical applications. In our project, we need to apply PETase enzyme in various degradation environments that
may experience significant changes in conditions.
By applying the Michaelis-Menten equation, we can predict the reaction kinetics under these varying conditions,
allowing us to optimize the enzyme application conditions in advance and avoid efficiency losses in practical
applications. For instance, fluctuations in temperature and pH during actual operations can significantly affect
the catalytic efficiency of the enzyme. The application of the Michaelis-Menten equation helps us simulate these
changes in advance, thereby optimizing operational parameters to achieve the highest degradation efficiency.
5. Data Interpretation and Verification of Experimental Results.
In the analysis of experimental data, the Michaelis-Menten equation provides a standard mathematical model to
help interpret the data and verify its consistency. When we obtain data from enzymatic reactions in experiments,
we can utilize the Michaelis-Menten equation to fit the data and obtained the two values.
Specifically, the Michaelis-Menten equation allows us to transform experimental data into specific kinetic
parameters, which helps us verify whether the experimental results match theoretical expectations, ensuring the
reliability and repeatability of the results. When analyzing experimental data under different conditions, the
Michaelis-Menten equation also helps us eliminate potential experimental errors as we can see clear outliers
from the ideal curve, providing strong data support for subsequent experiments.
The Michaelis-Menten equation plays an essential role in our project, providing a scientific framework to
quantify enzymatic catalytic efficiency and offering specific guidance for our experimental design, enzyme
variant screening, and reaction condition optimization. Moreover, it helps for interpreting and verifying
experimental data, ensuring that our results are reasonable.
[1] Arnold, F. H. (2018). Directed evolution: bringing new chemistry to life.
Angewandte Chemie (International Ed. in English), 57(16), 4143. https://www.nobelprize.org/uploads/2018/10/arnold-lecture.pdf
[2] Hilvert, D. (2000). Critical analysis of antibody catalysis. Annual review of
biochemistry, 69(1), 751-793. https://www.annualreviews.org/content/journals/10.1146/annurev.biochem.69.1.751
[3] Huang, P. S., Ban, Y. E. A., Richter, F., Andre, I., Vernon, R., Schief, W.
R., & Baker, D. (2011). RosettaRemodel: a generalized framework for flexible backbone protein design. PloS
one, 6(8), e24109. https://pubmed.ncbi.nlm.nih.gov/21909381/
[4] Vaswani, A. (2017). Attention is all you need. Advances in Neural Information
Processing Systems. https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[5] Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R.
(2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein
sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://www.pnas.org/doi/10.1073/pnas.2016239118
1. Reaction Mechanism
Consider the basic process of PETase-catalyzed degradation of PET plastic:
\[ E + S \rightleftharpoons ES \rightarrow E + P \]
where:
- \(E\) is the free PETase enzyme.
- \(S\) is the PET substrate.
- \(ES\) is the PETase-substrate complex.
- \(P\) is the small molecule product generated after degradation.
2. Rate Constants
Define the following rate constants:
- \(K_{on}\): The rate constant for the binding of PETase to PET substrate to form the complex.
- \(K_{off}\): The rate constant for the dissociation of the complex back into free PETase and PET
substrate.
- \(K_{cat}\): The rate constant for the conversion of the complex into products and free PETase.
3. Reaction Rate
Define the reaction rate as the rate of formation of product \(P\):
In our project, this represents the rate of PET plastic degradation.
4. Steady-State Assumption
The steady-state assumption states that in the early stages of the reaction, the concentration of ES
quickly reaches a dynamic equilibrium, meaning its concentration remains relatively stable for a period of time.
Therefore, at steady state, the rate of formation of ES equals its rate of decomposition:
\[ \frac{d[ES]}{dt} = k_{on}[E][S] - k_{off}[ES] - k_{cat}[ES] = 0 \]
5. Solve for \([ES]\)
From the steady-state equation, solve for \([ES]\):
\[ [ES] = \frac{k_{on}[E][S]}{k_{off} + k_{cat}} = \frac{[E][S]}{\frac{k_{off} + k_{cat}}{k_{on}}} \]
Derivation of the Michaelis-Menten Equation
1. Define the Michaelis Constant \(K_M\)
The Michaelis constant \(K_M\) is defined as:
\[ K_M = \frac{k_{off} + k_{cat}}{k_{on}} \]
2. Substitute the Definition of \(K_M\)
Substitute the definition of \(K_M\) into the original expression for \([ES]\):
\[ [ES] = \frac{k_{on}[E][S]}{k_{off} + k_{cat}} \]
\[ [ES] = \frac{k_{on}[E][S]}{\frac{k_{off} + k_{cat}}{k_{on}} \cdot k_{on}} \]
\[ [ES] = \frac{[E][S]}{\frac{k_{off} + k_{cat}}{k_{on}}} \]
\[ [ES] = \frac{[E][S]}{K_M} \]
3. Introduce the Total Enzyme Concentration Assumption
In most cases, we assume that the total enzyme concentration \([E]_T\) is much greater than the concentration
of the enzyme-substrate complex \([ES]\), therefore, the concentration of free enzyme \([E]\) is
approximately equal to the total enzyme concentration \([E]_T\):
\[ [E] \approx [E]_T \]
4. Substitute the Total Enzyme Concentration
Substitute \([E] \approx [E]_T\) into the expression for \([ES]\):
\[ [ES] = \frac{[E]_T [S]}{K_M} \]
5. Introduce Another Form of \(K_M\)
\(K_M\) can also be expressed in terms of \(k_{off}\) and \(k_{cat}\):
\[ K_M = \frac{k_{off} + k_{cat}}{k_{on}} \]
Thus, the expression for \([ES]\) can be further written as:
\[ [ES] = \frac{[E]_T [S]}{\frac{k_{off} + k_{cat}}{k_{on}}} \]
6. Simplify the Expression
Simplify the above expression to get:
\[ [ES] = \frac{k_{on}[E]_T [S]}{k_{off} + k_{cat}} \]
Conclusion
The final expression:
\[ [ES] = \frac{[E]_T [S]}{K_M + [S]} \]
Demonstrates how \([ES]\) changes with substrate concentration, taking into account \(K_M\) and the total
enzyme concentration. This expression is the basis of the Michaelis-Menten equation, which describes the
rate of enzyme-catalyzed reactions at various substrate concentrations.
6. Introduce Michaelis Constant \( K_M \)
Define:
\[ K_M = \frac{k_{off} + k_{cat}}{k_{on}} \]
This is the substrate concentration at which the enzyme shows half of its maximum velocity.
7. Substitute \([ES]\) Expression
Substitute \(K_M\) into the expression for \([ES]\):
\[ [ES] = \frac{[E][S]}{K_M + [S]} \]
8. Total Enzyme Concentration Assumption
Assume the total enzyme concentration \([E]_T\) is much greater than \([ES]\), so \([E]\) is
approximately equal to \([E]_T\):
\[ [E] \approx [E]_T \]
9. Final Expression for Reaction Rate \(v\)
Substitute the expression for \([ES]\) into the reaction rate v:
\[ v = k_{cat}[ES] \]
\[ v = k_{cat} \left( \frac{[E]_T[S]}{K_M + [S]} \right) \]
Define \( V_{max} \) as \( k_{cat}[E]_T \):
\[ V_{max} = k_{cat}[E]_T \]
10. Michaelis-Menten Equation
Finally, obtain the Michaelis-Menten equation:
\[ v = \frac{V_{max}[S]}{K_M + [S]} \]
Application to Enzyme Kinetics
In our project, we use this equation to simulate and optimize the kinetics of PETase. We apply this equation
through the following steps:
- Data Collection: Measure the initial reaction rates at various substrate concentrations.
- Parameter Estimation: Use nonlinear regression analysis to fit experimental data to the Michaelis-Menten
equation, extracting \( V_{\text{max}} \) and \( K_M \).
- Model Validation: Validate the model using independent datasets to ensure its predictive accuracy.
- Analysis and Interpretation: Analyze the kinetic parameters of different PETase variants to determine which
variants have higher catalytic potential and substrate affinity.