This section outlines the engineering processes involved in producing a multipeptide protein made up of bioactive peptides sourced from various foods, targeting therapeutic effects for diabetes and its related conditions. The initial phase focused on peptide selection and the technical assembly necessary to ensure a stable protein. Additionally, there were investigated the interactions between pancreatic proteases and the release of these peptides during digestion. The final phase aimed to produce the multipeptide protein in both E. coli and Bacillus subtilis natto , utilizing strategies for efficient expression, purification, and activity validation. Finally, this work seeks to develop a dietary supplement encapsulating the multipeptide protein to enhance metabolic health.
A study was conducted on various amino acid short fragments known as functional peptides. These peptides are derived from the foods we consume, and extensive research has been conducted on their effects on various diseases. We undertook a comprehensive study to identify peptides with proven effects, along with additional criteria as outlined below. Peptides were selected based on their active properties of interest related to metabolic processes associated with diabetes and its comorbidities, the existence of in vivo studies demonstrating its effect, data on its IC50, efficacy of inhibitory activity (antihypertensive, antithrombotic, antidiabetic and hypocholesterolemic effect). Also, we found that the length of the identified peptides was between 3-23 amino acids.
The IC50 value (half maximal inhibitory concentration) (Table 1) is a measure that indicates the concentration of a compound required to inhibit a biological function or response by 50% compared to a control without inhibition [21].
Based on the peptides investigated, it was concluded that the length of the amino acid chain they are composed of is not directly related to their IC50. While some short peptides like VPP derived from bovine β-casein have an IC50 of 9 μM, there are also similar length peptides like LPYP with an IC50 concentration of 480 μM. Peptides from cumin seed protein hydrolysates were found to require extremely low concentrations to reach their IC50, such as DPAQPNYPWTAVLVFRH with an IC50 of 0.15 μM. However, concrete results regarding their inhibitory effects to consider them as having anti-diabetic properties were not conclusively found. Foods where a higher quantity of peptides with active properties of interest were identified include milk, goat milk, calpis, quinoa, egg, and soybean.
The functioning of the molecular mechanisms of the selected properties is shown below (Table 2):
Our proposal aimed to achieve the simultaneous expression of multiple peptides by combining their sequences within a single gene. This approach would allow for the transcription and translation of a multipeptide protein, enabling the release of the individual peptides through the action of proteases.
There were proposed several sequences with combinations of 5 bioactive peptides that were chosen based on their similar IC50 values to facilitate dose determination in the future. Before joining the amino acid sequences of the peptides to create a multi-peptide protein, a sequence of 10 amino acids was added to each side to improve their stability and prevent degradation. These sequences were carefully designed to avoid overlap with protease recognition sites.
As a next step, after generating different multipeptide proposals, we modeled their structure and analyzed different stability aspects. AlphaFold was utilized to predict the 3D structure of the synthesized multi-peptide proteins. This AI-based tool uses a deep neural network to analyze the peptide sequences and predict how they fold into stable 3D structures based on patterns learned from thousands of protein structures in the Protein Data Bank [25].
The predicted structures were further refined using ModRefiner, which optimizes protein models at an atomic level to improve structural accuracy. ModRefiner enhances the precision of predicted protein models by refining their structures. More reliable models facilitate subsequent studies, including functional analyses and interactions with other biomolecules. It helps optimize hydrogen bonds, backbone topology, and side-chain positioning, all of which are vital for the stability and functionality of proteins. The models were uploaded in PDB format, and the software adjusted main-chain and side-chain conformations to increase the resolution and reliability of the 3D structures [26].
What does their structure mean?
Sequence A (Sequence 1): This sequence shows a relatively compact and well-folded structure, with some beta-sheets. The well-defined fold indicates that this protein might have good structural stability, making it likely to be functionally efficient.
Sequence B (Sequence 2): This sequence appears less compact, with a looser structure. The random coil elements suggest potential instability, which could impact its functionality or folding efficiency.
Sequence C (Sequence 3): It has some structured regions, such as several beta-sheets and an alpha-helix but there are also extended unstructured regions so, this mixture suggests moderate stability.
Sequence D (Sequence 4): This sequence appears more tangled, with less discernible structured elements. The high degree of randomness and overlap suggests poor folding, which could lead to instability or functional inefficiency.
Sequence E (Sequence 5): The dominance of coils with poor beta-sheets points to a lack of organized secondary structure, which could imply lower stability and potentially lower functional efficiency compared to proteins with more defined structural elements like Sequence 1 or 3.
What we aimed to observe after predicting the structure of our multi-peptide proteins was a compact and well-folded structure, as this is crucial for their stability and functionality. Since these are synthetic proteins, achieving a compact, stable fold is essential for their successful synthesis and performance. As shown in Figure 3B, 3D and 3E, those multi-peptides proposals (Sequences 2, 4 and 5) exhibit several random coils, which could lead to instability. In contrast, both sequence 1 (Figure 3A) and sequence 3 (Figure 3C) display better compact structures with proper folding, which are indicative of greater structural integrity and functionality [27].
The structural analysis based on visual inspection of folding patterns was complemented by the validation through SAVESv6.0 and Ramachandran plots. The Ramachandran plot assesses the backbone torsion angles (Ф, Ψ) of amino acids in the protein, verifying whether these angles are within allowed regions that are favorable for protein folding. This is critical because even visually compact proteins can have torsion angles that fall into disallowed regions, which would result in poor folding and instability when expressed in a biological system. A good quality model would be expected to have over 90% residues in the most favored regions [28].
As shown in the analysis, Sequences 4 and 5 (Figure 4D and Figure 4E), which visually appeared disorganized and random, were confirmed to have the most unstable structures. Sequence 4 got 52.7% of amino acids residues in favored regions and Sequence 5 52.9%. Also, they respectively had a percentage of 6.2% and 1.9% residues in disallowed regions. This is evidenced by several amino acids (red and black points at the upper right corner) (Figure 4D and Figure 4E) falling within the disallowed and non-favored regions of the Ramachandran plot, indicating that their torsion angles are likely unfavorable for proper folding. In contrast, Sequence 2, despite its loose appearance, had 85.7% of its amino acids residues in most favored, suggesting better stereochemical feasibility than initially expected.
Sequences 1 and 3 showed the best results, with 94.2% and 91.2% of residues, respectively, falling in the most allowed regions, with non amino acids in disallowed regions. This strongly suggests that these proteins have a higher likelihood of folding correctly and being functionally stable, providing a promising opportunity for synthesis. Furthermore, this validation ensures that the designed proteins are more likely to fold correctly, supporting their potential stability and functionality in biological systems.
Shorter amino acid sequences tend to adopt a looser structure, leading to fewer folds and reduced stability. Adding 10 amino acids to each peptide when assembling the multipeptide proved beneficial in preventing overlap with protease recognition sites, ensuring the peptides remain intact during downstream applications. When creating peptide combinations, it was essential to ensure their IC50 values fall within the same range to establish a future dosing strategy that guarantees the bioavailability of all peptides comprising the multi peptide protein. However, before considering these criteria, it is crucial to predict the 3D model, as it determines how feasible the structure will be and its potential stability. Additionally, protein structure can be optimized through refinement, and Ramachandran analysis helps confirm the distribution of the amino acids residues.
Sequences 4 and 5 were discarded and remain pending for potential redesign. On the one hand, sequence 4 in addition to having an unstable structure, was composed of peptides that lacked information in the literature regarding their characterization and bioactive properties. Similarly, sequence 5 was rejected due to its peptide composition being very similar to sequence 1, and it also demonstrated an unstable structure. On the other hand, sequences 1, 2, and 3 were selected for simulations to evaluate their behavior when digested by various proteases from the digestive tract.
To simulate the digestion of the multipeptide in silico, chymotrypsin, trypsin, and pancreatic elastase were selected due to their activity for protein breakdown within the human digestive system. These enzymes are part of pancreatic juice and are responsible for cleaving peptide bonds at specific amino acid sequences. Trypsin targets bonds at the carboxyl side of basic amino acids like lysine and arginine, while chymotrypsin hydrolyzes bonds next to aromatic amino acids such as phenylalanine, tryptophan, and tyrosine. Pancreatic elastase, on the other hand, acts on smaller neutral amino acids, crucial for the digestion of proteins with diverse amino acid sequences [29]. Also, the pancreatic juice pH juice was considered too. It is an alkaline fluid, whose pH normally ranges from 8.3 to 8.6 and can be as high as 9.0 due to bicarbonate [30].
The objective of this analysis was to observe how specific bioactive peptides of interest are released during digestion, for providing their properties. By focusing on these three enzymes, which represent a broad range of cleavage specificity, we aim to evaluate how efficiently these peptides are released. Additionally, we seek to determine whether the protein remains stable under the conditions of the intestinal pH, ensuring that the cleavage occurs without compromising its integrity [29].
To ensure that peptides maintain their sequence after digestion with proteases, the cleavage sites of enzymes like trypsin, chymotrypsin C, and pancreatic elastase II were predicted using EHP tool from DFBP (Database of Food-derived Bioactive Peptides) [30]. An advantage of this tool is that in addition to predicting protein digestion, it also analyzes the resulting peptides and compares them against its database to identify which ones have potential for functional activity. These predictions helped adjust the sequences to minimize undesired cleavage, ensuring their functionality remains intact in vivo.
Some peptides may be cleaved into fragments smaller than their original sequences. Although this could suggest incomplete digestion, such smaller peptide fragments, between 26 and 28 fragments, may still retain or even enhance their bioactive properties. This highlights the need for further exploration to determine the exact benefits of these shorter fragments.
Through the Protein-sol heatmap analysis [31], an energy heatmap was generated to observe how stable each multipeptide will be at different pH levels and ionic strengths (salt concentrations). This serves as a prediction tool to assess whether the peptides will remain stable in the digestive tract. The broader and more intense the green zone, the more stable the peptide is, providing insight into the limits of ionic strength and pH that represent the exposure conditions. Sequence 3 was found to be the most stable (Figure 7C), while Sequence 1 also exhibited a good stability range (Figure 7A), being highly feasible to be stable in pancreatic juice. However, Sequence 2 was generally found to be unstable under these conditions (Figure 7B).
Molecular docking was conducted to predict the binding interactions between the designed peptides and their target proteins. This step is critical for confirming the biological activity of the peptides, ensuring that they can interact with target molecules in a stable and functional manner [32]. The docking results indicate that there is an interaction between the digestive enzymes and the protein, this suggests that the protein has recognized one of its cleavage sites. Consequently, it confirms that peptide fragments and bioactive peptides will be released during digestion. This interaction is essential for validating that the enzyme-protein binding will result in the hydrolysis of the protein, leading to the liberation of the peptides of interest.
Results
To see more results go to Parts.
The molecular docking analysis provided valuable insights into the interactions between the proposed multiprotein peptides and the selected digestive proteases. This method served as a general tool to observe how these peptides might engage with enzymes such as trypsin, chymotrypsin, and pancreatic elastase, offering a preliminary understanding of their potential digestion patterns. While the activity of digestive proteases can vary under different physiological conditions, these predictions indicate a high likelihood of peptide and active fragments derived from peptides release.
Reverse translation refers to the process of converting an amino acid sequence back into its corresponding DNA sequence. Since amino acids are encoded by codons in DNA, the reverse process is determined by the codons that code each amino acid [33].
As multiple codons can code for the same amino acid, it is necessary to choose the most appropriate codons for a specific organism for efficient expression. Optimizing a sequence for a particular microorganism involves adjusting the DNA sequence to match the preferred codon usage of the target organism. This ensures that the multipeptide proteins can be synthesized at high levels by the host bacteria without stressing the system or reducing protein yield [34].
In this project, while Bacillus subtilis is the target microorganism for producing the multi peptides, all the in silico simulations involved B. subtilis as well as E. coli . This is because E. coli is a well-characterized bacteria frequently used in experimental setups due to its fast growth, ease of manipulation, and extensive knowledge base. Using both organisms helps validate the experimental steps for the protein production process in the main biofactory, Bacillus subtilis natto .
All the in silico processes were conducted using Benchling, and the design aimed to be compatible with type IIS restriction enzymes for Golden Gate Assembly and BioBrick-compatible parts. Type IIS restriction enzymes cut outside of their recognition sites, allowing for the seamless assembly of multiple DNA fragments in a single reaction, which is the basis of Golden Gate Assembly. BioBrick compatibility refers to the standardized system of assembling genetic parts (such as promoters, ribosome binding sites, and genes) using specific prefix and suffix sequences. These sequences contain recognition sites for enzymes to facilitate the modular and predictable assembly of parts into larger genetic constructs [35][36][37].
The results after the reverse translation and the optimization are shown below:
After optimizing the sequence, rare codons were minimized, helping to ensure high protein expression levels. The GC content was kept between 30% and 70% to guarantee DNA stability, as well as efficient replication and transcription. The uridine content was low, which is expected to improve mRNA stability and translation efficiency. Finally, none of the sequences exhibited hairpins, suggesting that the mRNA structure is unlikely to impede protein synthesis. Overall, the optimized sequence is stable and efficiently translatable in the host organism [38].
We were also interested in designing a method to demonstrate the expression of our multi peptides, facilitate purification through affinity chromatography, and enable the removal of the reporter or tags. To achieve this, we decided to use a reporter (a fluorescent protein) and a histidine tag for purification, along with the addition of a TEV recognition site to remove the tags or reporter sequences after purification. To this end, we generated different genetic constructs for expression in both E. coli and Bacillus. The various versions we designed to produce our multi peptides are described in the following sections, along with the in silico cloning design for each type of microorganism.
In silico cloning processes in E. coli (K12)
After the reverse translation and codon optimization for E. coli was done, for the three proposals, it was decided to perform the in silico cloning design in an expression plasmid. It was selected pET-24b(+) as the main plasmid. As an alternative, it was proposed pET-29a(+), which shares similar characteristics with pET-24 such as T7 promoter for high-level expression, inducible by IPTG, a C-terminal His-Tag, which facilitates purification by affinity chromatography and resistance to Kanamycin (Figure 11). A 6xHisTag was identified downstream of the MCS along with a T7 tag. Since the goal was to fuse to the histidine tag but not to the T7 tag, it was found that both plasmids had unique recognition sites for the restriction enzymes XhoI and NdeI. The closest enzyme to the 6xHisTag was XhoI, while NdeI allowed binding to the plasmid without fusing to the T7 tag.
Once the restriction enzymes were identified, the digestion of the gene of interest to be cloned and the recipient plasmids was done. Using the Assembly Wizard tool “Digest and Ligation” from Benchling, the digested fragment of the genes from our multi peptide sequences were joined to pET-24 (Figure 12). The translation of the cloned fragment into the plasmids was performed to confirm that it matched with the original sequences before reverse translation and codon optimization for E. coli . The cloning process in both plasmids was correct with the three proposals as it matched with the original sequences.
After successfully cloning the simple sequence into the selected expression plasmid (Figure 12), several improvements were made to the constructs (Figure 11.1B). A TEV protease cleavage site was added to the original sequence, along with its His-Ttag. The His-Tag facilitates purification of the expressed multi peptide via affinity chromatography, while the TEV site allows for specific cleavage of the His-Tag after purification, ensuring that the final multipeptide is tag-free.
Additionally, Yukon OFP (Orange Fluorescent Protein), a synthetic fluorescent protein derived from marine invertebrates (Ceriantharia), was introduced. This fluorescent protein serves two purposes: it acts as a reporter that fluoresces orange upon IPTG induction, enabling visualization of protein expression in the bacterial strain, and it also functions as an indicator during purification, allowing monitoring of the multi-peptide as it moves through the chromatographic system (Figure 13) [39].
Each sequence was added with the TEV site, Yukon orange fluorescent protein and His-Tag (Figure 11.1B) and they were optimized again. The same enzymes used for the initial cloning were employed in these alternative cloning strategies.
Cloning in B. subtilis
Bacillus subtilis is a highly regarded expression system due to several key advantages. It can produce proteins without endotoxins and features an efficient secretion system that allows for the release of large quantities of proteins directly into the culture medium, simplifying purification and ensuring structural authenticity. Given these characteristics, it was essential for us to design and compare the expression of our multi peptides both with and without fusion to a well-characterized signal peptide for Bacillus proteins. Additionally, similar to our approach with E. coli , we created constructs incorporating the Yukon reporter, a histidine tag for purification, and a TEV recognition site for tag removal.
The pHT01 plasmid, an E. coli - B. subtilis shuttle vector [40], was selected to clone the multi-peptide constructs into Bacillus subtilis . Four cloning alternatives were designed (Figure 15), all utilizing the restriction sites for the BamHI and XbaI enzymes. Additionally, a signal peptide was added to alternatives 2 and 4. The purpose of adding a signal peptide at the beginning of a construct is to direct the synthesized protein outside the cell, allowing the secretion of the expressed protein from the bacterial cell, allowing for easier harvest and purification of the multipeptide once the bacteria are induced by IPTG to produce the protein.
The process of adding the necessary tags for each alternative proposal, optimizing the sequence, and cloning by digestion and ligation was repeated for each proposal using the BamHI and XbaI enzymes. Some of the final cloning results are shown below:
Once it was confirmed that each one of the E. coli and B. subtilis constructs were able to be cloned into the selected plasmids, their sequences were put in a chart to identify them easily for synthesis.
Check more about constructions in Drylab -> Parts
Due to limited resources, it was decided to synthesize only two of the three sequences for E. coli . From sequence 1 there were synthetic both simple constructs and the one with Yukon, TEV site, and His tag, as it had consistently demonstrated the highest stability since the initial in silico tests. Additionally, both constructs from sequence 2 were chosen for synthesis despite its unstable results in several tests. This decision was made because, although sequence 3 showed a significantly better structure, it shared a very similar peptide composition with sequence 1. It was considered that synthesizing sequence 2 would allow for the study of developmental differences between two multipeptide proteins composed of a greater variety of peptides.
Transformation in E. coli
Cloning sequences for expressing the multipeptide
After conducting the in silico analysis, the peptide sequences were initially planned to be cloned directly into the pET-24 vector, but this approach was unsuccessful. As an alternative, the pJET plasmid was used as an intermediary cloning vector. The sequences were successfully cloned into pJET, and the plasmid was later digested to release the sequence for ligation into pET-24, aimed at transforming an expression strain. Transformation into E. coli NEBα using the pJET plasmid was successful through thermal shock transformation (Figure 17).
Liquid cultures were prepared and an aliquot from the culture was taken for each overnight digestion using NdeI and XhoI to release the constructs. Through the electrophoresis, it was confirmed that the constructs were released as there were observed the bands corresponding to the constructs, indicating that the constructs were indeed released. However, due to time constraints, it was not possible to complete the purification and ligation into pET-24 or pET-29. The next objective is to replicate this successful cloning process using a shuttle vector for Bacillus subtilis natto , with transformation conditions currently under exploration.
It is assumed that the bands corresponding to the constructs look lighter due to insufficient digestion time; however, they match the expected bp values, which allows us to conclude that E. coli NEB α cells were successfully transformed with the sequences.
To see more results go to Wetlab Results.
Throughout this process, several learnings were gained by experimenting with different strategies for cloning genes. Using an intermediary vector like pJET1.2 for cloning offers significant advantages. Directly working with a synthesized sequence is limited by the amount of gene available, while cloning into pJET allows for easier propagation of the plasmid containing the target sequence, eliminating the risk of running out. This is especially beneficial for larger-scale experiments.
Additionally, when digesting the pJET plasmid for further cloning steps, you can easily confirm that your gene of interest has been successfully cut by the restriction enzymes. After digestion, running an electrophoresis gel shows both the band for the plasmid and the gene insert, providing clear visual confirmation of the process. In contrast, if you were working directly with a synthesized gene, the small difference between a digested and undigested gene is often too subtle to detect in an electrophoresis gel.
One troubleshooting step that can improve success while working with expression plasmids, extracted from bacteria and purified, is performing a dephosphorylation treatment. This step helps prevent the vector from ligating without the insert, thereby reducing the number of negative clones that lack the gene of interest. Additionally, standardizing protocols that have been successfully performed is useful to minimize the risk of inconsistent results.
[21] Swinney, D. (2011). Chapter 18 - Molecular Mechanism of Action (MMoA) in Drug Discovery. Elsevier. https://www.sciencedirect.com/science/article/abs/pii/B9780123860095000096
[22] V, Valeriy. (2023). Modeling of hydrophobic tetrapeptides as a competitive inhibitor for HMG-CoA reductase. Elsevier. https://doi.org/10.1016/j.molstruc.2023.136248
[23] Rutherfurd KJ, Gill HS. Peptides affecting coagulation. British Journal of Nutrition. 2000;84(S1):99-102. doi:10.1017/S0007114500002312 https://www.cambridge.org/core/journals/british-journal-of-nutrition/article/peptides-affecting-coagulation/01C055C1F588CF09B658DFF62C19152B
[24] Chen K, Pittman RN, Popel AS. (2008). Nitric oxide in the vasculature: where does it come from and where does it go? A quantitative perspective. Antioxid Redox Signal. https://ncbi.nlm.nih.gov/pmc/articles/PMC2932548/
[25] AlphaFold. Protein Structure Database. (2024). Background. https://alphafold.ebi.ac.uk/
[26] Dong Xu and Yang Zhang. (2011). Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-step Atomic-level Energy Minimization. Biophysical Journal, vol 101, 2525-2534. https://zhanggroup.org/ModRefiner/
[27] SITN. (2010). Protein Folding: The Good, the Bad, and the Ugly. Harvard Graduate School of the Arts and Sciences. https://sitn.hms.harvard.edu/flash/2010/issue65/
[28] Sheik, P. Sundararajan, A.S.Z. Hussain, K. Sekar, Ramachandran plot on the web, Bioinformatics, Volume 18, Issue 11, November 2002, Pages 1548–1549, https://doi.org/10.1093/bioinformatics/18.11.1548
[29] Pandol, S. (2010). DIGESTIVE ENZYME SYNTHESIS AND TRANSPORT. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/books/NBK54127/
[30] EHP tool from DFBP (Database of Food-derived Bioactive Peptides). https://www.cqudfbp.net/commonPages/pretools/pretoolPage.jsp
[31] Hebditch and Warwicker. (2019). Web-based display of protein surface and pH-dependent properties for assessing the developability of biotherapeutics. Scientific Reports. https://protein-sol.manchester.ac.uk/heatmap
[32] Garret, M. (2008). Molecular docking. National Center for Biotechnology Information. https://pubmed.ncbi.nlm.nih.gov/18446297/
[33] Aarthi. (2022). Back translate an AA sequence. Benchling. https://help.benchling.com/hc/en-us/articles/9684246809357-Back-translate-an-AA-sequence
[34] Prem, M. (2019). Benchling’s Codon Optimization Tool for Improved Protein Expression. Benchling. https://www.benchling.com/blog/introducing-benchlings-new-codon-optimization-tool-for-improved-protein-expression
[35] Bhakta, S. (2023). Golden Gate Assembly. Bennett Lab Wiki. https://wiki.rice.edu/confluence/display/BIODESIGN/Golden+Gate+Assembly
[36] iGEM. (2024). Golden Gate Assembly. https://technology.igem.org/assembly/golden-gate
[37] iGEM. (2024) Help: Assembly Compatibility. https://parts.igem.org/Help:Assembly_Compatibility
[38] Puigbo, P.(2007). OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1933141/
[39] Fernández, E. (2014). Orange Fluorescent Protein (OFP) "Yukon" coding region with RBS, intellectual property-free. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1933141/
[40] Vector Data Base. Addgene. Plasmid: pHT01. https://www.addgene.org/vector-database/5885/