Introduction
Every engineering process consists of multiple milestones that together build the final system. Engineering a biological system is not straightforward and often requires many customizations and fine tunings to achieve optimization. In Figure 1. you can see the engineering cycle that describes the main stages in our every engineering process. We used the Design-Build-Test-Learning (DBTL) loop to continuously improve our experimental design and methods, making our experimental process easier to understand so that future iGEM teams can build on our achievements.
Figure 1. Engineering success cycle
Cycle 1: Preliminary Screening
In 1995, while conducting diabetes research at Carlinloskar Medical School in Sweden, Dr. Zhengwang Chen isolated Aglycin, a 37-amino acid peptide that acts as an incretin, from the small intestine of pigs. Between 2002 and 2016, advancements in sequencing technology enabled Chen's team to discover that the genes encoding Aglycin family peptides originate from leguminous plants rather than mammals, identifying four distinct groups of Aglycin family peptides in soybeans. This finding underscores the potential for these peptides to be further developed into antidiabetic drugs. However, it is important to note that polypeptides with similar structures are also present in various plants, including the traditional Chinese medicine licorice and the significant oil crop castor. Following discussions with Dr. Chenguang Yao, a member of chen’group, we posit that through cross-species research, synthetic biology, and comparisons of artificially concentrated advantageous traits across reproductive isolation, Vg within the Aglycin family peptide holds considerable promise as a drug development solution (Figure 2).
Figure 2. Evolution and homology analysis of Vg based on the NCBI.The data in this section is sourced from the BLAST in the NCBI database, employing the maximum likelihood algorithm of the Jones-Taylor-Thornton (JTT) model in megax software to construct a phylogenetic tree, which was beautified using cnsknowall (https://cnsknowall.com/). The red star marks Vg, the species in the green branches belong to Fabaceae, and those in the blue branches belong to Euphorbiaceae. No protein sequences with high overall similarity were found in animals and microorganisms.
Design
With JPred4 revealed a distinct secondary structure, a β-sheet, spanning amino acids 21 to 31 in P37 family. Further investigation through the NCBI database indicated that this β-sheet constitutes an Albumin_I superfamily functional domain, which is capable of specifically binding to the 43 kDa protein 2jk4. Consequently, we identified the Albumin_I superfamily functional domain formed by amino acids 21 to 31 as a candidate site for mutation (Figure 3).
Figure 3. Prediction of the conservation of P37 family.The conserved domain database (CDD) from the NCBI database was used, and through CD-search, a highly reliable albumin-I superfamily functional domain of Vg was identified. According to the annotations from the conserved domain database (CDD), the albumin-I superfamily functional domain can bind to the 43 kDa receptor protein VDAC-1.
By comparing the homologous sequences of amino acids 21 to 31, we obtained the conserved domain. This analysis, combined with the phosphorylation site assessment of the hypoglycemic peptide P37 using NetPhos3.1, our team selected the NetPhos3.1 algorithm because it has extensively and systematically learned to capture the relationship between protein three-dimensional structure and function among the many algorithms using Artificial Neural Networks (ANNs). This aspect is often overlooked in traditional sequence-based prediction tools[1]. By integrating this information, NetPhos 3.1 provides a more comprehensive perspective for the prediction of phosphorylation sites. Furthermore, the evaluation of disulfide bond sites and steric hindrance indicated that the structure of hypoglycemic peptide P37 was stable. From an evolutionary perspective, we utilized the branch-site algorithm and likelihood ratio tests to analyze the accelerated rate of non-synonymous substitutions (dN) in specific genes as a measure of positive selection[2]. This research on positive selection provides an important theoretical basis for protein engineering, which can assist in designing proteins with specific functions[3]. In terms of algorithm selection, we chose PAML because compared to other software such as HyPhy, PAML can process large datasets more quickly when analyzing selection pressure[4] and also incorporates model-based inference. This approach makes the selection process more scientific and reduces the risk of selection bias[5].
Through the comprehensive analysis of the above methods, our team established the mutation sites and schemes (Figure 4). Thus, we integrated the different amino acid evolutionary strategies at positions 25, 27, and 28 from 32 similar peptides (Figure 5).
Figure 4. The conserved regions of Vg evolution.Utilized the align by clustalw function of megax software, where the yellow regions represent conserved domains and the white areas indicate mutation sites.
Figure 5. Prediction of phosphorylation sites in Vg.NetPhos3.1 exports predicted phosphorylation sites, with points marked as "yes" or exceeding the purple boundary indicating phosphorylation sites predicted with higher confidence by machine learning, resulting in predicted phosphorylation sites at positions 18 and 36.
Build
Through the design section, our team established the mutation sites and plans, so we integrated different amino acid evolution strategies of 33 similar polypeptides at positions 25, 27, and 28, forming a total of 26 mutation products, which we named For(R25A, R25A_V27F, R25A_V27F_V28A, R25A_V27F_V28I, R25A_V27L, R25A_V27L_V28A, R25A_V27L-V28I, R25A_V28A, R25A_V28I, R25G, R25G_V27F, R25G_V27F_V28A, R25G_V27F_V28I, R25G_V27L, R25G_V27L_V28A, R25G_V27L_V28I, R25G_V28A, R25G_V28I, V27F, V27F_V28A, V27F_V28I, V27L, V27L_V28A, V27L_V28I, V28A, V28I) (Figure 6).
Figure 6. The prediction of the mutation sites in Vg.(A), (B), and (C) respectively display the mutation sites 2, 5, 2, 7, and 2, 8 of Vg. The sites 2, 5, 2, 7, and 2, 8 are represented in stick form, while the remaining parts are depicted in cartoon form.
Test
We next planed to use chemSocre to evaluate the binding potential of small molecule ligands, distinguish active compounds from inactive compounds through an empirical scoring mechanism based on machine learning technology, and eliminate some mutation schemes that are inactivated by random mechanical combinations (Figure 7). Our team then used the Hex scoring function as the earliest stage of drug virtual screening, which can save us time and computing power, but it has limitations such as the inability to accurately predict all types of protein-ligand interactions. Finally, our team used molecular docking in Ligand Docking to collect the exported Glide Score data. The three data were combined and origin was used to draw a three-dimensional graph.
Figure 7. Calculating the free binding of Vg to the VDAC-1 receptor using three sophisticated algorithms.The x-axis represents the results derived from the chemscore algorithm in Maestro software, the y-axis represents the hex scoring function from Discovery Studio software, and the z-axis represents the glide score results from Maestro software. The binding site of VDAC-1 with Vg was again determined based on the binding positions obtained from the autodocking blind docking algorithm. In the figure, vdac-1 is displayed in stick form, the binding sites are shown as white surfaces, and the binding pockets are represented as purple cubes. The red spheres indicate Vg, while the blue spheres represent the other 32 evolutionary strategies. In the data processing, since the chemscore results are within the range of (0-1), and both the hex scoring function and glide score represent binding free energy, we artificially took the negative values of all binding free energy data for easier visualization and observation of conclusions.
For the convenience of data visualization, we artificially take negative values for all data of binding free energy and concentrate all values into the first quadrant.
Learn
By learning three indicators, we could preliminarily obtain the activity of Vg and the calculation of free binding energy by two preliminary scoring functions. It could be considered that the binding effect of Vg and 2jk4 has significant advantages compared with other evolutionary strategies. However, these algorithms have simplified and ignored some parts in order to save computing power. In the next step, we used more complex algorithms to perform more precise prediction calculations.
Cycle 2: pPIC9K-His-DDDDK-Vg
In the first cycle, through preliminary screening, our team found that some mutation schemes have excellent performance under certain specific molecular positions and scoring functions.
Design
In the whole project, our goal was to make Vg production cost-effective and high yield. In order to select a proper expression vector, our team gathered information on various microorganisms (including Escherichia coli, Saccharomyces cerevisiae, mammalian cells …) and ultimately chosen Pichia pastoris as the expression host. Pichia pastoris offers significant advantages over other expression systems in terms of protein processing, secretion, post-translational modification, and glycosylation[6]. As one of the most widely protein expression systems, Pichia pastoris is recognized as a prominent host in molecular biology for the purpose of generating recombinant proteins. The advantages of utilizing the Pichia pastoris system for protein production encompass the proper folding process occurring within the endoplasmic reticulum[7]. Initially, we intended to use the most common exogenous expression system-Escherichia coli. Although Vg has three disulfide bonds, a small amount of disulfide bond formation can also form in E. coli. However, we learned that there have been previous attempts to express Vg in E. coli, but the Vg produced lacked biological activity. Therefore, we turned to Pichia pastoris.
Then, we optimized the sequence of Vg to enhance heterologous expression in Pichia pastoris. In design circuit, we planned to extract Vg in a simplified manner, but how can we make it? Firstly, we conducted extensive literature review. We added 6x-His tag for subsequent protein separation and purification. This tag not only facilitates affinity purification through the Ni column, but also does not significantly interfere with the folding of small peptides. In the Pichia pastoris expression system, a lot of projects purified the target protein by adding His-tag, which proved to be feasible and easy to purify[8]. Then we need to select an endopeptidase to separate our target protein from other proteins. We learned that commonly used protease cleavage sites include SUMO protease (Ulp1), HRV 3C protease, enterokinase, TEV protease, caspase-3-like proteases, and trypsin. Because our Vg is ultimately required to be produced as a hypoglycemic drug, all our end products need to ensure that no residues are left after cleavage to maintain the activity or structural integrity of the protein. At the same time, we also need to avoid non-specific cutting, to ensure the integrity of the Vg function, so we finally chose Enterokinase (EK). We inserted the DDDDK segment specifically recognized by Enterokinase between the Vg and other components. The recombinant plasmid was identified by DNA sequencing (Figure 8).
Figure 8. Visualization of the pPIC9K-His-DDDDK-Vg. The expression of P37 gene in the recombinant plasmid was strictly regulated by methanol using AOX promoter, and α factor secretion signal was an extracellular signal peptide mediating Vg fermentation and secretion into culture medium. EK (Enterokinase) can recognize DDDDK sequence in protein efficiently and specifically, and separate P37 from other non-target proteins to obtain P37 with high purity[9].
Build
We subjected the plasmid map to Beijing Tsingke Biotech Co., Ltd. for synthesis.
Test
After inducing the expression of the target protein by methanol, we carried out Coomassie Brilliant blue staining and WB experiments to detect the product. The results showed that His-Vg band was detected successfully, but the molecular weight did not accord with the theoretical value. Our theoretical molecular weight is 16.9 kDa, but Coomassie Brilliant blue test did not show a significant difference in the band (25 kDa), while molecular positive signal can be detected at size of about 66 kDa by western blotting (Figure 9). After the recombinant protein was successfully obtained, we performed nickel column affinity chromatography on the supernatant of culture medium to explore whether His-tag could bind to nickel column successfully. It was found that the recombinant protein could not be attached to the nickel column.
Figure 9. The expression of Vg was detected by Coomassie Brilliant Blue staining and western blotting. The difference between GS115 and the culture supernatants of strains 2,3,5,6,7 was observed, and the size was inconsistent with the theory(16.9 kDa). G was negative control group of Pichia pastoris strain without transforming P37 plasmid.
Learn
Given that we detected the expression of the target protein, but did not enrich the target protein with Ni-NTA. We hypothesized that Pichia pastoris may be involved in glycosylation of the target protein, which Posttranslational modification a significant increase in the molecular weight of the protein. Especially for secretory proteins, glycosylation is a common phenomenon, which may lead to the actual molecular weight greater than the theoretical value[10]. Our analysis indicated that the His-tag was likely masked by the N-terminal part of the hypoglycemic peptide, preventing it from interacting with the nickel ions on the column.
Cycle 3: pPIC9K-3Vg-His
Design
To prevent His-tag from being masked, we moved it to the C-terminal and ran the experiment again. By reviewing the literature, we learned that we could increase the expression level of our Vg by increasing the copy number of the target gene[11], and therefore, we changed the original single-copy Vg to three copies. We also learned that, in addition to DDDDK being suitable for protein purification, Asp-Pro is also a good choice and is more economical and efficient to use[12,13]. So our plasmid design turns out to be like this (Figure 10).
Figure 10. Visualization of the designed plasmid pPIC9K-3Vg-His.
Build
The new plasmid was successfully constructed.
Test
After inducing the expression of the target protein successfully, we carried out Coomassie Brilliant blue staining and WB experiments to detect the product. No obvious target band was detected by SDS-PAGE and Coomassie Brilliant blue staining.
Learn
We suspected that the molecular weight of the target protein we expressed was still too small, and that the small polypeptides were easily dispersed during protein separation, making them less detectable in western blotting experiments.
Cycle 4: pPIC9K-NK-3Vg-His
Design
To make the recombinant protein easier to detect, we inserted a fusion gene of Nattokinase (NK) in front of the Vg to expand the molecular weight of the target product. NK, an alkaline serine endopeptidase secreted by Bacillus subtilis (natto), has the advantages of long half-life, high specificity, low side effects and direct oral administration with molecular weight of 27.7 kDa[14] . We added the gene of NK to the original plasmid, which can compensate for the low molecular weight defect of our target protein. Besides, in the process of late purification, we could collect Vg and NK, respectively, as our expression products. The plasmid element design can be seen in Figure 11.
Build
We gave the plasmid map to Nanjing Genco Biotechnology Co., Ltd. for synthesis.
Figure 11. Visualization of the open reading frame of pPIC9K-NK-3Vg-His.pPIC9K-NK-3vg-3xhis
Test
After inducing the expression of the target protein successfully, we carried out Coomassie Brilliant blue staining and WB experiments to detect the product. Finally, the obvious target bands were successfully detected by western blotting at the corresponding sites of 55 kDa, indicating the successful expression of the recombinant protein after western blotting (Figure 12).
Figure12. Detection of P37 expression by Western blotting.SDS-PAGE electrophoresis and His-antibody incubation showed that there was a weak signal band between GS115 and the supernatants of No. 1,3,4,5,7 in 50 kDa. His was negative control group. 1-6 was recombinant P37 expressed by Pichia pastoris strain. G was negative control group of Pichia pastoris strain without transforming P37 plasmid.
Learn
We suspected that it may be due to the continuous passage of the strain, resulting in the loss of our target protein plasmid, thus changing our experimental results.
Cycle 5: Mutation model
Design
In order to maximize the advantages of the mutation scheme, we selected three molecular force fields and two more accurate calculation methods. For design, we chose Vina force field with Autodocking algorithm, OPLS-AA force field and AMBER force field, all using TIP4P water and solvent model, GBSA solvent model and MMGBSA algorithm, to conduct comprehensive calculation and comparison of these three algorithms.
Build
Our team still used the previous round of 26 mutation products and Vg, a total of 27 solutions for calculation and prediction, and used the Origin software to draw a three-dimensional map.
Test
Our team put 27 sets of data into the program and found that the top four were R25A_V28A, VG, R25G_V27F_V28A and R25G_V28A.
Figure 13. Evaluation of the free binding energy between Vg variants and VDAC1.The x-axis represents the results derived from the vina force field combined with the autodocking algorithm, the y-axis represents the results from the mmgbsa calculation scheme using the opls-aa force field, and the z-axis represents the results from the mmgbsa calculation scheme using the amber force field. The red spheres indicate Vg, the orange spheres represent the top three mutation strategies in terms of binding effectiveness with vdac-1 excluding Vg, and the blue spheres represent the other 29 mutation strategies.
Learn
Through this round of calculations, our team noticed that position 28 was mutated to A, which has great potential. In the follow-up, our team is going to try to conduct further research around position 28, which may provide a better solution. However, the current virtual prediction results confirmed that we will directly use Vg without mutation to explore the next wet experimental synthesis method.
Conclusions & Future Prospects
We have successfully established an expression system for the exogenous protein Vg in Pichia pastoris GS115. Subsequently, we will conduct small-scale fermentation experiments in the laboratory, continuously optimizing the medium formulation and cultivation protocols to enhance the economic viability of the culture. We will also improve strain stability to provide better conditions for future production expansion. In later stages, we will perform activity identification experiments using mice, comparing recombinant protein with naturally extracted proteins to assess the biological expression of Vg activity.
In the subsequent experimental design, we plan to incorporate molecular chaperones and enhancers into the plasmid to further increase the yield of Vg and enhance its biological activity. Molecular chaperones are beneficial for maintaining the stability of the disulfide bonds in Vg, ensuring its activity. We intend to include hsp70 in our experimental design, as literature indicates that there have been successful experiments utilizing hsp70 in Pichia pastoris to maintain the disulfide bonds of the product[15], which is very similar to our experiment, thus demonstrating the feasibility of our experimental design. Additionally, the enhancer we plan to add is exin21/Qα booster. Literature has shown that the addition of this enhancer significantly increases protein expression levels and promotes the secretion of recombinant proteins, making our product easier to collect and purify[16].
References
[1] Ismail, H.D., Jones, A., Kim, J.H., Newman, R.H., Kc, D.B. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest. BioMed Research International. 2016:2016:3281590. doi: 10.1155/2016/3281590. Epub 2016 Mar 15.
[2] Yang Z. “Statistical properties of the branch-site test of positive selection.” Molecular Biology and Evolution 28, no. 3 (2011): 1217-28. doi: 10.1093/molbev/msq303. Epub 2010 Nov 18. PMID: 21087944; PMCID: PMC3030880
[3] Anderson, G., Hare, K.J., Jenkinson, E.J. “Positive selection of thymocytes: the long and winding road.” Immunology Today 20, no. 10 (1999): 463-468. Epub [Epub date not available in the sources]. PMID: 10500294
[4] Ramasamy, S., et al. “paPAML: An Improved Computational Tool to Explore Selection Pressure on Protein-Coding Sequences.” Genes 13, no. 6 (2022): 1090. doi: 10.3390/genes13061090. Epub 2022 Jun 18. PMID: 35741852;
[5] Álvarez-Carretero, S., Kapli, P., & Yang, Z. “Beginner’s Guide on the Use of PAML to Detect Positive Selection.” Molecular Biology and Evolution 40, no. 4 (2023): msad041. doi: 10.1093/molbev/msad041. Epub 2023 Apr 4.
[6] Wen-Jing Zhou, Jiang-Ke Yang, Lin Mao, Li-Hong Miao. Codon optimization, promoter and expression system selection that achieved high-level production of Yarrowia lipolytica lipase in Pichia pastoris. Enzyme and Microbial Technology. Volume 71, April 2015, Pages 66-72
[7] Chronopoulou S, Tsochantaridis I, Tokamani M, Kokkinopliti KD, Tsomakidis P, Giannakakis A, Galanis A, Pappa A, Sandaltzopoulos R. Expression and purification of human interferon alpha 2a (IFNα2a) in the methylotrophic yeast Pichia pastoris. Protein Expr Purif. 2023 Nov;211:106339. doi: 10.1016/j.pep.2023.106339. Epub 2023 Jul 17. PMID: 37467825.
[8] Chronopoulou S, Tsochantaridis I, Tokamani M, Kokkinopliti KD, Tsomakidis P, Giannakakis A, Galanis A, Pappa A, Sandaltzopoulos R. Expression and purification of human interferon alpha 2a (IFNα2a) in the methylotrophic yeast Pichia pastoris. Protein Expr Purif. 2023 Nov;211:106339. doi: 10.1016/j.pep.2023.106339. Epub 2023 Jul 17. PMID: 37467825.
[9] Prachayasittikul V, Isarankura Na Ayudhya C, Piacham T, Kiatfuengfoo R. One-step purification of chimeric green fluorescent protein providing metal-binding avidity and protease recognition sequence. Asian Pac J Allergy Immunol. 2003 Dec;21(4):259-67. PMID: 15198344.]
[10] ZHANG Xinran, LING Yan, YANG Ying. Molecular level strategy for high expression of foreign protein in Pichia pastoris [J] . Food and Fermentation Industries,2022,48(17) :321 – 328
[11] Xinran Zhang, Yan Ling, Yin Yang. Molecular level strategy for high expression of foreign protein in Pichia pastoris [J]. Food and Fermentation Industries ,2022,48(17):321-328. DOI:10.13995/j.cnki.11-1802/ts.029940.
[12] Mollaev M, Zabolotskii A, Gorokhovets N, Nikolskaya E, Sokol M, Tsedilin A, Mollaeva M, Chirkina M, Kuvaev T, Pshenichnikova A, Yabbarov N. Expression of acid cleavable Asp-Pro linked multimeric AFP peptide in E. coli. J Genet Eng Biotechnol. 2021 Oct 14;19(1):155. doi: 10.1186/s43141-021-00265-5. PMID: 34648110; PMCID: PMC8517049.
[13] Thambi T, Jung JM, Lee DS. Recent strategies to develop pH-sensitive injectable hydrogels. Biomater Sci. 2023 Mar 14;11(6):1948-1961.
[14] Chen H, McGowan EM, Ren N, Lal S, Nassif N, Shad-Kaneez F, Qu X, Lin Y. Nattokinase: A Promising Alternative in Prevention and Treatment of Cardiovascular Diseases. Biomark Insights. 2018 Jul 5;13:1177271918785130. doi: 10.1177/1177271918785130. PMID: 30013308; PMCID: PMC6043915.
[15] Cai Yian, Zhang Yiqun, Yang Zixuan, Liu Yexue, Liu Wenlong, Lu Fuping, Li Yu. Enhanced Expression of Protease K in Pichia pastoris through Molecular Chaperones and Analysis of Its Effect on Wool Scale Layer. Biotechnology Bulletin. 2024, 40(7):307-313.
[16] Yuanjun Zhu, A. Sami Saribas, Jinbiao Liu, Yuan Lin, Brittany Bodnar, Ruotong Zhao, Qian Guo, Julia Ting, Zhengyu Wei, Aidan Ellis, Fang Li, Xu Wang, Xiaofeng Yang, Hong Wang, Wen-Zhe Ho, Ling Yang, Wenhui Hu. Protein expression/secretion boost by a novel unique 21-mer cis-regulatory motif (Exin21) via mRNA stabilization. Molecular Therapy. 2023, Volume 31, Issue 4, p1136-1158