To predict and confirm the processes and results of wet lab experiments, SERENE constructed a series of models:
5-HTP production induction: We developed a constant tryptophan concentration model in E. coli, a response model for BH4 cofactors, and a tryptophan conversion model to simulate the induction of 5-HTP production by our engineered bacteria.
Enzyme efficiency enhancement: To accelerate the reaction, we used a DNA scaffold to bring Zif268-m-hTPH1, PBSII-hPCBD, and ZFa-hQDPR protein closer together. We then constructed a collision model to simulate the catalytic process of the enzymes and predict the increased efficiency due to this proximity.
The intracellular concentration of tryptophan is involved in the action of various enzymes and activators, so the feedback inhibition effect of tryptophan is explained by the Hill equation [2] [3].
First, the inactive repressor binds to two tryptophan molecules to yield the holorepressor, RT2.
The holorepressor binds with the free operator, O, and forms operon-holorepressor complex, ORT2, thus, repressing tryptophan synthesis.
From the above three equations, it can be derived:
The total operator concentration (Ot) is the sum of the free operator (O) and the operator bound to the full repressor (ORT2).
Total tryptophan concentration (Tt) is the sum of free tryptophan (T), repressor-bound tryptophan (RT), and (RT2).
The total repressor concentration (Rt) is the sum of the free repressor (R), the repressor bound to one tryptophan molecule (RT), and the repressor bound to two tryptophan molecules (RT2).
The fraction of the total free operator concentration, p, is defined as follows:
Hill equation:
Substituting the equation
Anthranilate synthase synthesis (E) is governed by (a) the rate of enzyme generation by expression of structural genes determined by free operator concentration
The rate of tryptophan usage in protein production
Enzyme-catalyzed synthesis of tryptophan (Ts) is
Santillan & Mackey [3] assumed fast transport of exogenous tryptophan. They modeled extracellular tryptophan (To) by the following static equation:
The activity of tryptophan within a cell comprises two parts: the synthesized tryptophan, denoted as Ts, and the exogenous tryptophan transported into the cell denoted as To. Therefore, the total intracellular activity, represented as Tt, can be expressed as the sum of these two components.
BH4 is oxidized to BH3OH in a catalytic reaction and then reduced back to BH4 via the regeneration pathway. This process requires pterin-4a-methanolamine dehydrase (PCBD) and quinoid dihydropteridine reductase (QDPR). To model the regenerative system of BH4, we use the Michaelis-Menten equations.
We use the Michaelis-Menten kinetics equation for the simulation:
ODE was solved with MATLAB to obtain the concentration of BH2:
Then, simulate BH4 regeneration
The ODE was solved to obtain the BH4 concentration, and the simulation was connected into a cycle
TPH1 deoxygenates Tryptophan to 5-HTP. We use the Michaelis-Menten equation for the simulation.
The production rate of 5-HTP was modeled using the Michaelis-Menten kinetics equation [5]:
The scaffold and the first enzyme are at the sphere's center, and other enzymes are at the spherical surface. Then, the reaction substrate starts from the first center point, the first enzyme, moves randomly inside the sphere, and then contacts the second and third enzymes.
Consider a sphere with a diameter of 100 nm and the scaffold positioned at its center. PCBD is located at (0, 0, 0), QDPR is at (1nm, 0, 0), and TPH1 is at (1nm, 0, 0). When BH4 is catalyzed into BH3OH, it starts from PCBD and is converted into BH2. BH2 then encounters QDPR and is reduced back to BH4. Finally, BH4 acts as a cofactor and collides with TPH1. Simulate 1000 points starting from the sphere's center, moving randomly and colliding with two enzymes. When hitting the sphere shell, there is a 1/3 chance of encountering the required enzyme [7].
There are four types of collisions:
Type1:Move to Target Point 1 first, then continue to Target Point 2
Type2:Move to Target Point 1 first, then continue to the boundary.
Type3:Move to the boundary first, then return to the starting point and move to Target Point 2.
Type4:Move to the boundary first, then return to the starting point and move to the boundary again.
The concentration of tryptophan synthesized by the enzyme exhibits underdamping behavior, eventually stabilizing at a constant level. Consequently, the intracellular tryptophan concentration also remains nearly constant.
▲ Figure: Changes in tryptophan concentration over time.
By adjusting the concentration of BH4 in the simulation, it was observed that after 120 mins, the concentration of BH4 becomes nearly constant.
▲ Figure: Changes in intracellular concentrations of BH4, BH2, and BH3OH over time.
Three different initial concentrations of BH4 were used:
(A) [BH4]0=0.25 µM; (B) [BH4]0=0.5 µM; (C) [BH4]0=1 µM
Using the simulation data, we determined that BH4 needs at least 100 µM to achieve the fastest production velocity of 5-HTP[8].
▲ Figure: Relationship between BH4 concentration and the velocity of 5-HTP production.
In the simulation of 5-HTP production, the Michaelis-Menten equation indicates that the production rate of 5-HTP is proportional to the intracellular tryptophan concentration. The simulated terminal velocity is approximately 0.135 µM/min.
▲ Figure: Changes in intracellular 5-HTP production velocity over time.
The simulation shows that the reaction rate is faster with a scaffold than without.
▲ Figure: Time to complete 1000 collisions.
1. The production rate of 5-HTP was successfully simulated, and then we compared it to the fastest-yielding engineered bacteria available [6].
Fastest production speed | Results of our simulations | 12.6 μM/hr | 8.1 μM/hr |
---|
Since the production rate is positively correlated with the concentration of the anthranilate synthase, to further speed up our production rate, we can further change the free operon's response to the anthranilate synthase synthesis.
2. It is hypothesized that using a scaffold to accelerate the reaction rate is feasible.
In the SERENE project, we utilize the chemical reaction catalyzed by monomeric human tryptophan hydroxylase 1 (m-hTPH1) to convert L-tryptophan into 5-Hydroxytryptophan (5-HTP), which is the precursor of the “happy hormone” serotonin [1]. Human pterin-4a-carbinolamine dehydratase (hPCBD1) and human dihydropteridine reductase (hQDPR) are co-expressed with monomeric hTPH1 to promote the regeneration of coenzyme tetrahydrobiopterin (BH4) [2] (see Design for details).
To regulate and enhance the 5-HTP production, we decided to cluster m-hTHP1, hPCBD1, and hQDPR using a protocatechuic acid (PCA)-regulated DNA scaffold. The DNA scaffolds harboring the zinc finger domain binding motifs are generated by the rolling circle replication (RCR) mechanism. At the same time, m-hTHP1, hPCBD1, and hQDPR are fused with zinc finger binding domains (Zif268, PBSII, and ZFa) to recognize corresponding motifs.
Since the folding of the fusion protein could be affected by the order of domains and types of linkers, we conducted protein modeling to predict the fusion protein structure using AlphaFold2 [4] and iTASSER [5] [6] [7]. Since 2018, AlphaFold has emerged as a trusted tool for predicting protein folding from sequences. On the other hand, iTASSER is a widely recognized protein structure prediction tool known for its strength in modeling the full-length structure of proteins.
The first step of modeling is collecting the protein sequences. In our project, the protein sequences of the enzymes, coenzymes, and zinc finger binding domains are listed below:
name | protein sequences |
---|---|
hTPH1 | MIEDNKENKDHSLERGRASLIFSLKNEVGGLIKALKIFQEKHVNLLHIESRKSKRRNSEFEIFVDCDINREQLNDIFHLLKSHTNVLSVNLPDNFTLKEDGMETVPWFPKKISDLDHCANRVLMYGSELDADHPGFKDNVYRKRRKYFADLAMNYKHGDPIPKVEFTEEEIKTWGTVFQELNKLYPTHACREYLKNLPLLSKYCGYREDNIPQLEDVSNFLKERTGFSIRPVAGYLSPRDFLSGLAFRVFHCTQYVRHSSDPFYTPEPDTCHELLGHVPLLAEPSFAQFSQEIGLASLGASEEAVQKLATCYFFTVEFGLCKQDGQLRVFGAGLLSSISELKHALSGHAKVKPFDPKITCKQECLITTFQDVYFVSESFEDAKEKMREFTKTIKRPFGVKYNPYTRSIQILKDTKSITSAMNELQHDLDVVSDALAKVSRKPSI |
momomeric hTPH1 (m-hTPH1) | DGMETVPWFPKKISDLDHCANRVLMYGSELDADHPGFKDNVYRKRRKYFADLAMNYKHGDPIPKVEFTEEEIKTWGTVFQELNKLYPTHACREYLKNLPLLSKYCGYREDNIPQLEDVSNFLKERTGFSIRPVAGYLSPRDFLSGLAFRVFHCTQYVRHSSDPFYTPEPDTCHELLGHVPLLAEPSFAQFSQEIGLASLGASEEAVQKLATCYFFTVEFGLCKQDGQLRVFGAGLLSSISELKHALSGHAKVKPFDPKITCKQECLITTFQDVYFVSESFEDAKEKMREFTKTIKRPFGVKYNPYTRSIQILKDTKSITSA |
hPCBD1 | MAGKAHRLSAEERDQLLPNLRAVGWNELEGRDAIFKQFHFKDFNRAFGFMTRVALQAEKLDHHPEWFNVYNKVHITLSTHECAGLSERDINLASFIEQVAVSMT |
hQDPR | MAAAAAAGEARRVLVYGGRGALGSRCVQAFRARNWWVASVDVVENEEASASIIVKMTDSFTEQADQVTAEVGKLLGEEKVDAILCVAGGWAGGNAKSKSLFKNCDLMWKQSIWTSTISSHLATKHLKEGGLLTLAGAKAALDGTPGMIGYGMAKGAVHQLCQSLAGKNSGMPPGAAAIAVLPVTLDTPMNRKSMPEADFSSWTPLEFLVETFHDWITGKNRPSSGSLIQVVTTEGRTELTPAYF |
Zif268 | MPGEKPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFSRSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHT |
PBSll | MPGEKPYACPECGKSFSQRANLRAHQRTHTGEKPYKCPECGKSFSRSDHLTTHQRTHTGEKPYKCPECGKSFSRSDVLVRHQRTHT |
ZFa | MPGERPFQCRICMRNFSDSPTLRRHTRTHTGEKPFQCRICMRNFSVRHNLTRHLRTHTGEKPFQCRICMRNFSDRTSLARHLKTH |
Linkers are short peptide sequences used to connect different domains within fusion proteins. The design of linkers is crucial as they provide flexibility and prevent interference between the fused domains. In our project, we selected three types of linkers for modeling. The first type utilizes six alanines as spacers, maintaining a fixed distance of six short peptide sequences between the protein domains. Another type is the commonly used flexible linker (Gly-Gly-Gly-Gly-Ser)n, applied when the connected domains require a certain degree of movement or interaction. The final type is the rigid linker (EAAAK)n, which can separate functional domains more effectively than flexible linkers [3].
name | protein sequences |
---|---|
flexible linker (fLinker) | GGGGSGGGGS |
rigid linker (rLinker) | AEAAAKEAAAKA |
six Alanines (6A) | AAAAAA |
We first modeled the fusion protein structure using AlphaFold2 with the following parameter settings:
Each parameter directly influences the accuracy and quality of the protein model:
We selected the predicted Local Distance Difference Test (pLDDT) value to assess the quality of the predicted model. The pLDDT score provides a confidence measure for the accuracy of each residue in the protein structure. A pLDDT score above 90 indicates very high confidence, while scores between 70 and 90 suggest high confidence. Scores below 50 reflect shallow confidence. By analyzing the pLDDT values across the model, we could identify regions with high structural reliability, ensuring that the model is suitable for further analysis.
We first built the hTPH1, hPCBD1, and hQDPR structures by AlphaFold2. The pLDDT scores were mainly above 80 (blue colored), suggesting high confidence in the predicted structure.
We then built the structures of the selected zinc finger binding domains, Zif268, PBSII, and ZFa. For all three domains, the plDDT scores were mostly above 80, suggesting reliable prediction of domain structure.
Compared to hTPH1, 99 amino acids at the N-terminal and 24 at the C-terminal are deleted in m-TPH1. To examine whether the structure of m-hTPH1 is affected by deletion, we conducted modeling of m-hTPH1 by AlphaFold2. The pLDDT scores were mainly above 80 (blue colored), suggesting high confidence in the predicted m-hTPH1 structure. The structure overlap indicates that the core structure of m-hTPH1 remains essentially the same as hTPH1.
The direct fusion of m-hTPH1 and Zif268
We first build the structures of m-TPH1 protein directly fused with Zif268. The pLDDT scores were mainly above 80 (blue colored). The structure prediction indicates that the m-hTPH1 structure was not affected by fusion, but the Zif268 structure was disrupted either at the N-terminal or C-terminal.
The fusion of m-hTPH1 and Zif268 by different linkers
We then thought we could improve the Zif268 structure by adding a linker peptide between m-hTHP1 and Zif268. The addition of 6A linker did not significantly improve the folding of Zif268, which is even worse when using a flexible linker (fLinker). Finally, we applied the rigid linker (rLinker) in the fusion protein, showing that the N-terminal Zif268 fused with rLinker and C-terminal m-hTPH1 harbored the best structure prediction.
The direct fusion of hPCBD1 and PBSII
We initially constructed the structures of the hPCBD1 protein directly fused to PBSII. The pLDDT scores were predominantly above 80 (blue-colored). The structural predictions showed that the hPCBD1 structure remained unaffected by the fusion, while the PBSII structure experienced changes at both the N-terminus and C-terminus.
The fusion of hPCBD1 and PBSll by different linkers
We then considered improving the PBSII structure by adding a linker peptide between hPCBD1 and PBSll. The addition of 6A linker did not significantly improve the folding of PBSll, which is even worse when using fLinker. Finally, we applied the rLinker in the fusion protein, showing that the N-terminal PBSll fused with rLinker and C-terminal h-PCBD1 harbored the best structure prediction.
The direct fusion of hQDPR and ZFa
We first built the structures of hQDPR protein directly fused with ZFa. The pLDDT scores were mainly above 80 (blue colored). The prediction indicated that protein fusion did not affect the hQDPR structure, but the ZFa structure was affected if located at the C-terminal of the fusion protein.
The fusion of hQDPR and ZFa by different linkers
We then aimed to improve the ZFa structure by adding a linker peptide between hQDPR and ZFa. The addition of a 6A linker did not significantly improve ZFa folding, and using fLinker or rLinker also made it worse. Ultimately, different linkers did not effectively enhance ZFa protein expression. In conclusion, the best structural predictions were observed with ZFa at the N-terminus of hQDPR.
The AlphaFold2 model offers protein structure prediction and insight into reliability, allowing us to efficiently decrease the candidate number of fusion proteins. However, protein structure predictions may sometimes lack precision. Accordingly, we proceeded with iTASSER to re-confirm the protein structure predictions.
iTASSER (Iterative Threading ASSEmbly Refinement) is a widely recognized protein structure prediction tool known for its strength in modeling the full-length structure of proteins by combining threading, ab initio modeling, and structural refinement techniques. It has consistently performed exceptionally well in CASP competitions, achieving the top rankings across multiple years. We use iTASSER to further refine these models and ensure the accurate assembly of the entire protein structure.
We first explored the RCSB protein data bank, finding that the protein sequences of 3HF6 and 1F93 show 100% identity to m-hTHP1 and hPCBD1, respectively. The protein sequence of 1DHR shows 95% identity and 98% positives to hQDPR. Since the protein structure in the RCSB protein data bank was experimentally validated. We could apply 3HF6, 1F93, and 1DHR as standards to evaluate whether the building of m-hTPH1, hPCBD1, and hQDPR protein structures by iTASSER was reliable. The result indicated that the protein structures built by iTASSER were similar to those in the RCSB protein data bank.
In the subsequent analysis, we examined the structures of the selected zinc finger binding domains. In the prediction results using iTASSER, the models consistently exhibited high-performance scores.
The direct fusion of m-hTPH1 and Zif268
We first constructed the structures of the m-hTPH1 protein directly fused with Zif268. The results indicated that the m-hTPH1 part in the fusion protein largely retained its original structure, but the helix structure of Zif268 was affected at either the N-terminal or C-terminal. (Pale blue: m-TPH1; Pale orange: Zif268).
The fusion of m-hTPH1 and PBSll by different linkers
We then thought to improve the structure by adding a linker peptide between m-hTPH1 and Zif268. The addition of 6A linker did not significantly improve the folding of Zif268, which is even worse when using fLinker or rLinker. In conclusion, iTASSER did not suggest the best result of Zif268-m-hTPH1 protein structure prediction. (Pale blue: m-TPH1; Pale orange: Zif268; Yellow: linkers).
The direct fusion of hPCBD1 and PBSll
We first constructed the structures of the hPCBD1 protein directly fused with PBSll. The results showed that the α-helix of PBSII largely retained its original structure, but hPCBD1 was affected at either the N-terminal or C-terminal. (Pale blue: hPCBD1; Pale orange: PBSll).
The fusion of m-hTPH1 and PBSll by different linkers
We then aimed to improve the hPCBD1 structure by adding a linker peptide between hPCBD1 and PBSII. The fLinker shows no improvement in fusion protein folding, while the 6A linker and rLinker promoted the hPCBD folding at the C-terminal of the fusion protein. The result indicated that PBSII should be positioned at the N-terminal of the rLinker or 6A linker and PCBD1 at the C-terminus. (Pale blue: hPCBD1; Pale orange: PBSII; Yellow: linkers).
The direct fusion of hQDPR and ZFa
We initially constructed the structures of the hQDPR protein directly fused with ZFa. The results showed that ZFa retained its original structure while at the N-terminal of the fusion protein but not at the C-terminal (Pale blue: hQDPR; Pale orange: ZFa).
The fusion of hQDPR and ZFa by different linkers
We then sought to improve the ZFa structure by adding a linker peptide between hQDPR and ZFa.
Applying the 6A linker, fLinker, or rLinker abolished the helix structure of ZFa. In conclusion, the results indicated that linkers may be not applicable in this fusion protein (Pale blue: hQDPR; Pale orange: ZFa; Yellow: Linker).
Compared to AlphaFold2, the fusion protein structures are often impacted by different linkers in iTASSER. This may be due to the different parameters used in AlphaFold2 and iTASSER.
Nevertheless, based on the results, we decided to use the following:
Through modeling, we have successfully predicted the fusion protein structure. The results still require further experimental validation and analysis.