RECAPITULATING LAST YEAR’S PROJECT

Structural Modeling

First of all, to understand the advances in this year’s modeling, its extremely important to bring back all the data developed in the 2023 part. Starting from the beginning, in our first steps, even before our project was written, we began our research to understand how the CBD synthesis pathway functioned in the Cannabis sativa plant.

A fruitful research effort provided us with a solid foundation of the metabolic pathway required for CBD production, along with a detailed understanding of each stage and the mechanisms of the enzymes involved, as outlined below.

As a chassis, among the various possibilities of fungi and plants considered for the project, the yeast Saccharomyces cerevisiae proved to be a highly suitable choice. Not only does it possess the necessary cellular machinery for the synthesis and modification of the needed enzymes, but it also naturally carries out the mevalonate pathway, one of the two essential metabolic routes for CBD production (Zirpel et al. 2017).

The subsequent step involved an in-depth review of the current bibliography on each stage and enzyme within the metabolic pathway, with the objective of elucidating the structure and functionality of each enzyme.

In the next step, structural modeling of enzymes was carried out in order to review the function of enzymes and their properties. Starting with a bibliographical review of the enzyme and its function, moving on to its kinetic properties, FASTA sequence, evaluation of signal peptide and structural parameters. We then generated our first native protein models via homology models, checking possibilities, improved stability and angles of amino acids that would appear to be less stable. Finally, it was verified whether the models could be used correctly.

After the gathering of the FASTA sequences, the existence of signal peptides was evaluated for protein expression using the SignalP 6.0 website (Teufel et al., 2022). The sequences were then submitted to the Swiss Model server (Waterhouse et al., 2018), and the models were chosen based on the highest identity and query possible, and we prioritized the more accurate templates, ideally obtained from previously crystallographic studies. Then, the Ramachandran plots (Hollingsworth; Karplus, 2010) and the MolProbity (Williams et al., 2018) parameters were also evaluated for the first model validation.

_{Figure 1: CsHCS1 identity based model obtained from Swiss-Model.}

CsHCS1 identity based model obtained from Swiss-Model

_{Source: 2023, Authors.}

_{Figure 2: OS identity based model obtained from Swiss-Model.}

none

_{Source: 2023, Authors.}

_{Figure 3: OAC model obtained from Swiss-Model.}

none

_{Source: 2023, Authors.}
_{Figure 4: CsaPT4 model obtained from Swiss-Model. In A shows the structure with the transit peptide and in B we
built a new model by removing the transit peptide.}

none

_{Source: 2023, Authors.}

_{Figure 5: CBDAS model obtained from Swiss-Model.}

none

_{Source: 2023, Authors.}

Based on the files generated by SwissModel, evaluations were carried out using other softwares, such as ProCheck from UCLA-DOE LAB SAVESv6.0 (Laskowski et al, 1993), to visualize the results in summarized and graphical form. Energy refinement was also tested using the Charmm-Gui server (Jo et al., 2008; Park et al., 2023), using ProCheck to verify QMEAN (Benkert, Biasini, Schwede; 2011) and evaluating the generated Ramachandran plots.

In 2023, to ensure the synthesis and proper functionality of the five enzymes within the biological circuit, we identified a potential solution: the use of 2A sequences. These short, high-efficiency sequences enable the production of multiple individual proteins from a single mRNA molecule. The technique leverages the eukaryotic ribosome's inability to form a peptide bond between the proline and glycine residues, this failure leads to a jump between amino acid residues without stopping protein synthesis proteins. T2A and P2A sequences are particularly efficient in this process and can be easily sourced from bioinformatics databases (Mansouri, 2014).

Thus, we needed to evaluate three different positions in each enzyme to implement the T2A sequence technique.

The protein acquires the (EGRGSLLTCGDVEENPG) sequence at its C-terminal;
It gets the initial (P) on its N-terminal and the final (EGRGSLLTCGDVEENPG) on its C-terminal
Or it retains only the single proline at the N-terminal for each expressed protein.

To address these possibilities, we constructed new models, based on previously evaluated structures, that could represent the altered size and conformation of the enzymes. After generating and evaluating all 15 models with the proposed modifications, we focused on key factors: the quality of the native protein model and the impact of size changes, both of which directly influence the protein's functionality, to ensure the most accurate protein models possible. Finally, we could obtain the new structures, using the extension MODELLER in the PyMOL Molecular Graphics System software and building a comparative modeling, based on our native-protein model as a template.

_{Figure 6: T2A CsHCS1 obtained from PyMOL.}

none

_{Source: 2023, Authors.}
_{Figure 7: T2A OS obtained from PyMOL.}

T2A OS obtained from PyMOL

_{Source: 2023, Authors.}
_{Figure 8: T2A OAC obtained from PyMOL.}

T2A OAC obtained from PyMOL

_{Source: 2023, Authors.}
_{Figure 9: T2A CsaPT4 obtained from PyMOL.}

T2A CsaPT4 obtained from PyMOL

_{Source: 2023, Authors.}
_{Figure 10: T2A CBDAS obtained from Pymol.}

T2A CBDAS obtained from Pymol

Source: 2023, Authors.

Docking

To optimize substrate positioning within the enzymes and enhance system efficiency, we employed molecular docking using multiple software tools, including ClusPro (Kozakov et al., 2017), HADDOCK (“HADDOCK Web Server,” n.d., para. 2), and HDOCK (Yan, Zhang, Zhou, Li, & Huang, 2017). This analysis was conducted both with the enzymes in their natural form and after the introduction of the T2A gene sequence, allowing for comparative evaluation and identification of optimal binding sites.

Initially, molecular docking was performed between the enzymes and the metabolic pathway substrates, using the natural forms of the enzymes. Subsequently, the docking process was repeated for the enzymes containing the T2A sequence, revealing similar optimal binding positions as those found in the natural forms.

_{Figure 11. Hexanoyl-CoA Synthase With Hexanoic Acid Surface.}

none

Source: 2023, Authors.

Figure 12. Olivetol Synthase with Hexanoyl-CoA Surface

Olivetol Synthase with Hexanoyl-CoA Surface

_{Source: 2023, Authors.}
_{Figure 13. Olivetolic Acid Cyclase with Olivetol-CoA Surface}

Olivetolic Acid Cyclase with Olivetol-CoA Surface

_{Source: 2023, Authors.} _{Figure 14. CsPT4 with Olivetolic Acid Surface.}

none

_{Source: 2023, Authors.}

Figure 15. CsPT4 with Geranyl Diphosphate Surface none

_{Source: 2023, Authors.}
_{Figure 16. Cannabidiolic Acid Synthase with Cannabigerolic Acid Surface.}

none

_{Source: 2023, Authors.}

Further analysis involved the use of CavityPlus software, which detects cavities and potential binding sites based on 3D protein models (“CavityPlus,” n.d.), and the molecular docking plugin for PyMOL, which facilitated in-depth exploration of these binding sites (Seeliger & De Groot, 2010). For each enzyme, we generated approximately 10 ligand models and identified between 6 to 15 cavities. Additional results from HDOCK and CavityPlus supported and validated our analysis.

Phylogenetic Analysis

In parallel, our team did a protein blast using NCBI in order to find the FASTA sequences for similar proteins for each enzyme, which were then compared using the T-COFFEE MUSCLE alignment site in order to evaluate the alignment of these proteins, and, in order to compare and get a landscape of the essential parts of these alignments, we used the software Jalview. Having our sequences alignment visualization on Jalview, we gathered as much information as we could find in the literature about the binding sites, active sites and catalytic pockets of each enzyme.

Docking Validation

Subsequently, we compared the results of this phylogenetic analysis with the outputs from the molecular docking and catalytic pocket prediction servers. We systematically evaluated and cross-referenced the data from these tools and the literature, ultimately filtering out the best results.

Biological Circuits

The biological circuits for the genes of interest used can be assembled in any expression vector that has a replication origin for E. coli and yeast, as well as a strong inducible promoter for yeast. Using Saccharomyces cerevisiae as a chassis, the olivetolic acid and CBDA synthesis pathways require the insertion of its synthesis pathway by means of biological circuits compatible with the expression system. The genes for CBDA synthesis were cloned into two separate biological circuits, so that the resulting inserts and expression vectors are of a reasonable size and, when translated, are more efficient at converting substrates and producing CBDA. Using the fusion proteins, pairs of constructs containing two or three genes, in different combinations, were evaluated for the biosynthesis of two fusion proteins. The genes were fused into an insert, spaced with flexible linkers of 10 amino acids to maintain the activity and structure of the individual proteins. Each insert was cloned into pRS vectors.

For the use of 2A sequences, two inserts were constructed, of two and three genes, spaced by the T2a sequence (gagggcaggcagtctgctgacatgcggtgacgtggaagagaatcccggccct). The genes were combined until they formed and generated two inserts of approximately the same size. Structural and kinetic modeling of the proteins separated using 2A sequences or fusion proteins allowed the construction of expression cassettes already suitable for the chassis, which will be cloned into the pRS425 and pRS426 vectors using the Gibson Assembly method. The biological circuits were constructed using the cassettes developed previously, with Gal1 promoters, efficient ADH1 terminators for the chassis, the insert developed with genes intercalated by 2A sequences, and selection markers.

The assembled plasmids were visualized using the SnapGene Viewer software, so we could analyze their sequence, features and genetic constructs. After we obtained these plasmids, it was possible to build a total of 72 different circuits with different combinations between the enzymes’ coding sequences. With this, we could also evaluate all of them, optimize and choose two biological circuits, using the pRS425 and the pRS426 plasmid, shown in the following figures.

_{Figure 17: https://static.igem.wiki/teams/5452/mod16.png.}

none

_{Source: 2023, Authors.} _{Figure 18: Developed pRS426 plasmid for the expression of the enzymes CsHCS (Hexanoyl-CoA Synthetase), OS
(Olivetol synthase) and OAC (Olivetolic acid synthase).} none

_{Source: 2023, Authors.}

Gene Expression Modeling

To represent gene expression and metabolic pathway activity in our biological circuits, we turned to mixed-effects (ME) models, which describe cell behavior in response to stimuli. We simplified gene expression using dynamic equations, where mRNA and protein production and decay rates are modeled. Additionally, to help us understand and optimize CBDA production in S. cerevisiae, Boolean logic was applied to describe the logic gates of the metabolic pathway, particularly the activation of enzymes by substrates like galactose and hexanoic acid.