Part1: Overview

The name of our project is "The PET Degradation STORM: Unleashing PETase's Power with PET-binding peptides", and our design focuses on the following aspects: screen, test, optimization, and re-modification, which can be abbreviated as "STORM".

Part2: Screen——Toward a Deep Learning Model for PET-binding Peptide Classification Prediction

We used two different models to predict short peptides with high PET affinity. First, the initial screening of single-sequence amino acids was performed by a Long Short-Term Memory (LSTM) model. The traditional Fully Connected Layer (FCL) cannot effectively capture the backward and forward relationships in amino acid sequences, and deep learning models also have significant limitations in processing time series data with sequential dependencies, making it difficult to capture long term dependencies in the sequences. The LSTM model, due to its unique structure including input and output gates (Fig. 1), is able to effectively capture and memorize long term dependencies in sequences, and thus is suitable for short peptide prediction tasks [1].

Fig. 1   LSTM model gate architecture

The LSTM model excels in processing time series data, which is reflected in the deep learning process for protein amino acid sequences by taking into account the associations between the sequences before and after the amino acids. Then, we attempted to develop a holistic understanding by analyzing features in the amino acid sequences to initially screen for short peptides that may be hydrophobic. However, when fitting the LSTM model, the one-dimensional sequence information of the protein (amino acid sequence text) was used as the main feature input, which ignores the three-dimensional structural information of the protein that is essential for understanding protein function and properties. To compensate for this shortcoming, we use a graph neural network (GCN) with protein 3D structural information as input as a second screening model.


We first simulated the one-dimensional amino acid sequences after the initial screening using AlphaFold3 to obtain the protein three-dimensional structure files, and then these files obtained from the simulation were converted into adjacency matrices and used as feature inputs to the graph convolutional network (GCN) (Fig. 2). In addition, we used the attention mechanism to analyze the relationship between the nodes and edges of amino acids, and extracted the key feature information such as amino acid type features, secondary structure features, etc., with a view to better resolving the three-dimensional structure information of amino acids. Using our established GCN model to perform re-screening on the basis of the initial screening results, we excluded short peptides with low PET plastic affinity from the perspective of protein 3D structure information.


Fig. 2   GCN model architecture

Part3: Test — — Validated Characterization and Fermentation Optimization of Highly Scoring
PET-binding Peptides

A fusion protein is a protein that consists of at least two structural domains that are encoded by separate genes, usually linked by linker sequences, which are transcribed and translated as a unit after the genes are linked together to produce a polypeptide. In this project, we composed a fusion protein of PETase and a highly rated (positive prediction result) PET-binding peptide obtained based on model prediction results. We used one-step cloning to construct a recombinant expression plasmid to express the fusion protein in Escherichia coli. Meanwhile, PET-binding peptides attached to the N-terminal (amino-terminal) and C-terminal (carboxyl-terminal) will affect their position and functional properties in the peptide chain or protein, and may have an impact on the overall properties of the proteins; therefore, we tried to attach PET-binding peptides to the N-terminal and C-terminal terminals, respectively. And we characterized these purified fusion proteins to verify their PET degradation abilities, respectively.


As one of the most popular heterologous expression systems, E. coli grows rapidly and is easy to achieve high cell-density culture, as well as has a large number of genetic and genome engineering tools, and the transformation of exogenous DNA is fast and easy, which are the reasons why we chose E. coli as the chassis strain [2, 3]. E. coli BL21 (DE3) is suitable for T7 promoter-driven recombinant protein expression system. Exogenous addition of IPTG induces rapid expression of T7 RNA polymerase in E. coli BL21 (DE3) strain, which in turn recognizes the T7 promoter on the expression plasmid vector to drive efficient expression of the fusion protein [4]. The plasmid construct is shown in Fig. 3.


Fig. 3   Plasmid constructs expressing the fusion protein

Green fluorescent protein (GFP) has the property of acting as a reporter molecule or fluorescent marker, emitting green fluorescence when irradiated by excitation light, a property that makes it a powerful tool. It can be used to further characterize and validate the binding ability of PET-binding peptides and PET microplastics. We fused the coding sequence of the target PET-binding peptide with that of GFP to construct a fusion protein gene, which was inserted into the pET-20b plasmid and transferred into E. coli BL21 (DE3) for expression (Fig. 4). The expressed fusion protein and PET microplastics were combined and eluted, and the characterization of the binding ability of PET-binding peptides and PET microplastics was achieved by observing and comparing the intensity of fluorescence signals with blank control groups using fluorescence microscopy.


Fig. 4   Schematic diagram of GFP fusion protein
A. PET-binding peptide fused with eGFP at the N-terminus. B. PET-binding peptide fused with eGFP at the C-terminus.

Part4: Optimization——Optimization of Fermentation Conditions in Shake Flasks and Replacement of Linker

The mined PET-binding peptides originated from non-E. coli hosts and were also extremely hydrophobic, thus having a great negative effect on the expression of the fusion proteins, which needs to be further optimized.


For this reason, we carried out optimization of fermentation conditions, in order to explore the optimal fermentation temperature, optimal inducer concentration, optimal medium and optimal fermentation time, with a view to improving the expression of the fusion protein.

Fig. 5   Optimization of fermentation conditions

Linkers act as ligases and short peptide fragments and are categorized into flexible linkers, rigid linkers, and in vivo cleavable linkers (Fig. 6) [5]. The selection and design of linker can affect the binding affinity between the enzyme and the short peptide, and may also provide many other advantages for the generation of fusion proteins, such as improving bioactivity and increasing expression [5]. Therefore, we tried to replace the linker to optimize the spatial conformation between PETase and the PET-binding peptide, to reduce the mutual influence between the two components, thereby improving the catalytic efficiency of fusion protein for substrate.

Fig. 6   Classification of linker [5]

Part5: Re-modification — — Molecular Modification of Highly Scored PET-binding Peptides

We obtained PET-binding peptides with high PET plastic binding ability, but there is still room for improvement in the expression level of the fusion proteins and the PET microplastics binding ability. Therefore, based on a certain degree of understanding of the structure and function of the enzyme proteins, we used Mutcompute-super [6] to predict beneficial mutations and performed site-directed mutagenesis at specific sites (Fig. 7), and then used the extracted bioinformatics features as the input for the 3D deconvolutional neural network of Mutcompute-super. Finally, we achieved the combination of a deep learning classifier to predict the beneficial substitution to optimize the protein's folding conformation, thereby further improving the expression level of the fusion protein and the degradation ability of the PET microplastic substrate by molecular modification.

Fig. 7   Schematic diagram of site-directed mutagenesis [7]

References

[1] Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.

[2] Rosano G L, Ceccarelli E A. Recombinant protein expression in Escherichia coli: advances and challenges [J]. Frontiers in Microbiology, 2014, 5: 172.

[3] Ki M R, Pack S P. Fusion tags to enhance heterologous protein expression [J]. Applied Microbiology and Biotechnology, 2020, 104(6): 2411-2425.

[4] Du F, Liu Y Q, Xu Y S, et al. Regulating the T7 RNA polymerase expression in E. coli BL21 (DE3) to provide more host options for recombinant protein production [J]. Microbial Cell Factories, 2021, 20(1): 189.

[5] Chen X, Zaro J L, Shen W C. Fusion protein linkers: property, design and functionality [J]. Advanced Drug Delivery Reviews, 2013, 65(10): 1357-1369.

[6] Deng Z, Cai C, Wang S, et al. A protein design method based on amino acid microenvironment and EMO neural network, CN118136092A [P/OL].

[7] Ball D W, Hill J W, Scott R J. The Basics of General, Organic, and Biological Chemistry [M]. 2011.

To Top

To Top