Literature Search

In order to obtain polyethylene degrading enzyme genes as well as to understand the related theoretical knowledge and research progress, a literature crawler search was conducted to establish the theoretical basis of the study and to improve the depth and breadth of the study.

We used polyethylene biodegradation (polyethylene biodegradation), polyethylene degrading enzyme (polyethylene degrading enzyme), biodegradability (biodegradability), transcriptomics (transcriptomics), laccase ), cutinase, peroxidase, alkane monooxygenas, lignin peroxidase, hydrolase, etc. in English and Chinese as the keywords in the form of title, keywords and abstract in the Hundred Chain Database. Literature crawler search.

Figure 1. Literature crawler code partial (Python 3.11)

Figure 2. Crawling results (partial)

A total of 13949 documents were retrieved by the literature crawler, mainly including journals Applied Microbiology & Biotechnology, International Biodeterioration & Biodegradation, Environmental Pollution and so on. A total of 3,855 articles on polyethylene biodegradation, 2,493 articles on polyethylene degrading enzyme, 2,065 articles on biodegradability, 806 articles on transcriptomics were found.

Transcriptomics, 868 articles on laccase, 428 articles on cutinase, 1502 articles on peroxidase, 517 articles on alkane monooxygenas, 564 articles on lignin peroxidase, 564 articles on polyethylene degrading enzyme, 2,493 articles on biodegradability, 2065 articles on biodegradability, 806 articles on transcriptomics, 868 articles on laccase, 428 articles on cutinase, 564 articles on alkane monooxygenas, and 564 articles on lignin peroxidase. peroxidase (564), lignin peroxidase (517), hydrolase (851). The retrieved literature was read and organised to find the nucleic acid sequence numbers and protein sequence numbers of the 14 known polyethylene degrading enzymes in the public databases as shown in Fig.

Table 1. Nucleic acid sequence numbers and protein sequence numbers of polyethylene degradation enzymes

Mining and preliminary screening of polythene degrading enzymes

Reconstructed macro-genome sequencing data

We used the sequences of 14 enzymes with polyethylene degradation function known from literature research as probe sequences, mined the macro-genome to obtain the target sequences, and analysed the target sequences using various bioinformatics methods to assess the quality of the sequences.

We used FASTP to eliminate low-quality data, performed sliding-window quality trimming and base correction of bipartite data to obtain high-quality reads, and then assembled the reads to obtain long Scaffolds sequences using MEGAHIT to reconstruct the macrogenomic data.

Figure 3. Process Execution Source Code (22 lines of code) Linux Terminal Runtime Environment)

Macrogenome sequencing data annotation and assembly comparison

QUAST was used to evaluate the MEGAHIT assembly results to ensure the quality of the data for downstream analysis, while Prokka was used to perform preliminary annotation of MEGAHIT-assembled sequences, and MMseqs2 was used to perform comparative searches of MEGAHIT-assembled sequences to obtain the target sequences.

(1) Macrogenome annotation by Prokka

Figure 4. prokka_annote.smk file

Figure 5. config.yaml file

(2) Using MMseqs2 to Contrast Query Sequences Based on Bash Scripts

Figure 6. Comparison query sequence source code (total 12 lines of code Linux terminal running environment)

Macro-genome sequencing data assembly annotated results catalogue as Figure

Figure 7. Catalogue of results corresponding to macro-genome sequencing data

Search results of macro-genome sequencing data target gene comparison

After comparing the probe sequences with the assembled and annotated gene fragments in the macro-genome data, the following catalogue files (‘Nucl’ is the nucleic acid sequences, ‘Prot’ is the amino acid sequences) and part of the gene sequences were retrieved.

Figure 8. Catalogue of macro-genome sequencing data.

Primary sequence comparison of polythene degrading enzymes

The sequences of 14 polyethylene-degrading enzymes compiled from 13,949 papers searched by crawlers and the corresponding amino acid sequences of the target genes obtained from macro-genome mining were compared with each other in a primary sequence comparison by using Snap Gene 6.1.1, and the results were integrated with the subsequent protein tertiary structure comparison and molecular docking data, so as to screen the target genes.

Tertiary structure construction of polyethylene degrading enzyme

The tertiary structures of the sequences screened in the second step were predicted using SWISS-MODEL (SWISS-MODEL Interactive Workspace (expasy.org)) and AlphaFold2.

Figure 9. Predicted resulting pdb file displayed by pymol2.6

Molecular docking of polyethylene degrading enzyme with hexadecane small molecules

Using Deepsite (https://www.playmolecule.com/deepsite/) to determine the size and spatial coordinates of the docking box to establish the docking box file, using qvinaw software terminal command window according to the information in the docking box file for automated docking, docking results after the end of the python 2.5 to extract the binding energy information in the docking results file.

Docking code: