Wet lab engineering

Regulation and synthesis module

The engineering cycle is divided into four phases: design, build, test, and learn. We design experiments according to our experimental purpose, build the system we designed using biological or other experimental methods, test the buildings using reasonable approaches, and finally learn as well as analyze the results of the testing and generate new questions and purposes so that we can move on to the next iteration.

We followed this principle during our experiments, and the engineering cycle helps us a lot to overcome difficulties. This page describes the five iterations we performed during the construction of the plasmids of the Synthesis circuit and the validation of the regulation module.

1st Iteration

DESIGN

The Synthesis module is divided into regulation module and pfa biosynthetic gene clusters. We planed to use the genome and plasmid as templates to amplify four elements of the regulation module using PCR. At the same time, we added homologous sequences to the 5' end of the primers, and finally assembled the four elements into the regulation module by overlap PCR.

On the other hand, we divided the extensive pfabiosynthetic gene cluster into five fragments (PUFA1-5) to be synthesized by biotech company. We incorporated the RBS sequences of NifD, NifK, NifE and NifN from Sinorhizobium fredii CCBAU45436 in front of the start codons to ensure compatibility with the nifH promoter. Furthermore, we optimized the original sequences according to the codon preference of Sinorhizobium fredii CCBAU45436 to enhance gene expression.

We also intended to initially fuse the regulation module with PUFA1, PUFA2 with PUFA3, and PUFA4 with PUFA5 using overlap PCR, followed by Gibson assembly to clone the fragments into pBBR1MCS-2 plasmid.

BUILD

We amplified glnK promoter and nifH promoter from the genome of Sinorhizobium fredii CCBAU45436 using PCR, and obtained gfp and tracrRNA from corresponding plasmid templates. We attached the sgRNA targeting the gfp sequence to the 5' end of the tracrRNA during PCR. By performing overlap PCR, we connected glnK promoter with sgRNA-GFP, then nifH promoter with gfp, and subsequently linked the two products through overlap PCR to yield the regulation module glnK-sgRNA-GFP-nifH-gfp. We used the plasmids of pfa biosynthetic gene clusters synthesized by the biotech company as templates to amplify the PUFA1-5 fragments through PCR, incorporating homologous sequences of adjacent fragments. We expected to achieve pairwise connections between the regulation module and PUFA1, PUFA2 and PUFA3, PUFA4 and PUFA5 through overlap PCR.

TEST

Our overlap PCR successfully yielded the regulation module glnK-sgRNA-GFP-nifH-gfp (Figure 1), with Sanger sequencing confirming the correct product.

Subsequently, we conducted overlap PCR experiments on three groups simultaneously, finding that none produced the desired fusion product bands. We employed Touchdown PCR to adjust the annealing temperature and attempted to modify extension time and primers, but the fusion PCR bands consistently exhibited severe trailing and strong non-specific amplification.

Figure1 Agarose gel electrophoresis for detection and purification of the regulatory sequence DNA fragment amplified by overlap PCR. The size of the regulatory sequence glnK-sgRNA-GFP-nifH-gfp should be 2211 bp, and we purified the DNA indicated in the white box using a gel recovery kit.

LEARN

We suspected that it may be due to the homologous arms being too short compared to the genes’ coding sequences for effective complementary linkage during annealing. Additionally, our PCR kit might not be suitable for complex templates amplification, which could explain the low efficiency of overlap PCR.

2nd Iteration

DESIGN

We abandoned the overlap PCR and opted for seamless cloning to ligate two fragments into the plasmid vector, and use the recombinant plasmids as templates for PCR amplification to obtain pairwise connected fragments. To achieve this, we designed three pairs of primers with corresponding fragment junctions on the MCS of the pUC19 plasmid. We linearized pUC19 via PCR, with the hope that the regulation module would first be joined with fragments of pfa biosynthetic gene cluster pairwise by seamless cloning.

Subsequently, we planned to use the recombinant plasmid as a template for PCR amplification to obtain regulation module-PUFA1, PUFA2-PUFA3, PUFA4-PUFA5, and then conduct seamless cloning with the linearized pBBR1MCS-2 plasmid, which was double digested with EcoRⅠ and XmaⅠ, to construct the final plasmid.

BUILD

We used Gibson assembly to connect the six fragments pairwise to the pUC19 plasmid vector and transformed E. coli (DH5α). After adding X-gal and IPTG to the culture, we plated it on LB solid medium containing ampicillin and incubated at 37°C overnight. We picked white colonies for overnight shaking in LB liquid medium with ampicillin. Plasmid extraction followed by EcoRⅠ single enzyme digestion and agarose gel electrophoresis (Figure 2) allowed us to select the correct plasmids: pUC19-regulation module-PUFA1, pUC19-PUFA2-PUFA3, as well as pUC19-PUFA4-PUFA5.

Figure2 Agarose gel electrophoresis for detecting correct recombinant plasmids. The extracted plasmids were digested by EcoRⅠ (TaKaRa) at 37°C for 10 minutes. The correct sizes of pUC19-regulation module-PUFA1 should be 8512 bp, at the same time, pUC19-PUFA2-PUFA3 and pUC19-PUFA4-PUFA5 should be 9799 bp and 9145 bp respectively. The bands indicated by boxes were of the correct sizes.

We selected plasmids 2, 5, and 8 for PCR amplification of the fused fragments. Meanwhile, we linearized the pBBR1MCS-2 plasmid using EcoR Ⅰ and Xma Ⅰ. Next, we used Gibson assembly to connect the linearized vector with the three fused DNA fragments.

TEST

Our PCR amplification successfully yielded correct sized fusion products (Figure 3), and we also obtained correctly sized bands from double digestion (Figure 4). We transformed the Gibson assembly products of the four fragments into E. coli (DH5α), plated them on LB solid medium containing kanamycin after adding X-gal and IPTG, and incubated at 37°C overnight. Both replicates yielded only one blue colony(Figure 5).

Figure3 Agarose gel electrophoresis for detecting and purifying the fusion DNA fragments obtained from PCR. Using plasmid 2 as template, we amplified the regulatory sequence-PUFA1 (5836 bp). Using plasmid 5, we amplified PUFA2-PUFA3 (7118 bp). Using plasmid 8, we amplified PUFA4-PUFA5 (6482 bp).

Figure4 Agarose gel electrophoresis for detecting and purifying the linearized pBBR1MCS-2 fragment obtained from double digestion. The pBBR1MCS-2 plasmid was digested with EcoR Ⅰ (NEB) and Xma Ⅰ (NEB) at 37°C for 15 minutes.

Figure5 LB-Agar plates. only one blue colony grew on both medium.

LEARN

We initially speculated that the low efficiency of multi-fragment Gibson assembly or the low transformation efficiency of large plasmid might be the cause. We adjusted the conditions for seamless cloning as well as temperature and time during transformation, but the results did not change; we also ruled out the possibility of problems with primer design.

Meanwhile, the unusual results of another plasmid constructed by the experimental group drew our attention. This recombinant plasmid inserted the glnK-gfp fusion product into the pBBR1MCS-2 plasmid vector to detect glnK promoter’s response to nitrogen. This plasmid vector was also linearized using EcoR Ⅰ and Xma Ⅰ, and the results from transformants colony PCR indicated that most of them contain empty plasmid. Given the high efficiency of the kit we used for single-fragment seamless cloning, such a situation was quite unusual.

Therefore, we speculated that there might have been low efficiency or even inactivation of the single enzyme during double digestion, resulting in extensive self-ligation of the vector and minor seamless cloning products. We suspected this could also explain the failure of the second iteration.

3rd Iteration

DESIGN

To investigate whether the restriction enzymes used for double digestion were functioning properly, we designed a validation experiment using single enzyme digestion. If a linearized plasmid band produced by a specific enzyme digestion exhibited low brightness while the corresponding band for the supercoiled plasmid showed high brightness, this would support our hypothesis.

BUILD

To verify the activity of the restriction enzymes, We employed the pBBRMCS-2 plasmid as a substrate and performed single enzyme digestions with EcoR Ⅰ and Xma Ⅰ under the same conditions as the second iteration.

TEST

The expected size for the linearized vector was 5132 bp, while the supercoiled plasmid typically exhibited a band around 3500 bp. The EcoR Ⅰ single enzyme digestion product showed a bright band at approximately 5000 bp with no band around 3500bp. Conversely, the Xma Ⅰ single enzyme digestion product displayed only a faint band at the same position, and the band of supercoiled plasmid is bright.

Figure6 Agarose gel electrophoresis used to detect band brightness from the pBBR1MCS-2 plasmid after single enzyme digestions with EcoR Ⅰ or Xma Ⅰ. The pBBR1MCS-2 plasmid (5132 bp) was digested with EcoR Ⅰ (NEB) or Xma Ⅰ (NEB) at 37°C for 15 minutes.

LEARN

The results of agarose gel electrophoresis indicated that the EcoR Ⅰ restriction enzyme exhibited normal activity. However, the activity of the Xma Ⅰ restriction enzyme was significantly low. This discrepancy resulted in very few recombinant products from our seamless cloning attempts. We speculated that repeated freeze-thaw cycles may have diminished enzyme activity. Given that the two restriction sites in the MCS were in close proximity, it was challenging to distinguish between single and double digestion products based on electrophoresis results, leading to a misinterpretation of the success of the enzyme digestion in the second iteration.

4th Iteration

DESIGN

Consequently, we abandoned the use of double digestion tor linearize the plasmid and designed the primers to linearize pBBR1MCS-2 using PCR. The linearized pBBR1MCS-2 was purified through gel extraction, and we employed Gibson assembly to combine the linearized pBBR1MCS-2 with the three fused DNA fragments, which were purified from the second iteration to construct the PUFA synthesis circuit plasmid.

BUILD

We obtained the correct bands through PCR and purified the linearized vector fragment using a gel extraction kit. Following the same protocol, we performed Gibson assembly and transformed the product into E. coli (DH5α). Then we selected white colonies for plasmid extraction.

Figure7 Agarose gel electrophoresis used to detect and purify the PCR-linearized pBBR1MCS-2 plasmid. Agarose gel (0.8%) was run at 150V for 15 minutes in 1x TAE buffer, stained with GelRed DNA gel stain, and Gel analysis was performed using blue light illuminator. The correct size for the linearized vector was expected to be 5148 bp, and we purified the band indicated by the box.

TEST

We successfully screened a correct clone on LB solid medium containing kanamycin , and the extracted plasmid was notably large, with an electrophoretic mobility slower than the 10000 bp band of 1 kb marker. Sanger sequencing results revealed a nonsense mutation in the open reading frame of pfa1, resulting in resulted in the termination codon appearing 113 amino acids earlier. Additionally, a frameshift mutation was detected in the pfa2 gene. Further sequencing of the corresponding mutation sites in the pUC19-regulation module-PUFA1 and pUC19-PUFA2-PUFA3 revealed that the former one was correct while the latter exhibited the same mutation.

Figure8 Agarose gel electrophoresis used to detect the recombinant pBBR1MCS-2 plasmid. Agarose gel (0.7%) was run at 150V for 15 minutes in 1x TAE buffer, stained with Safegreen DNA gel stain, and Gel analysis was performed using blue light illuminator. The size of the PUFA synthesis circuit recombinant plasmid is 24479 bp.

Figure9 Sequencing results. The purified PUFA synthesis circuit recombinant plasmid was subjected to commercial Sanger sequencing, with the original sequence indicated in bold. The bottom row represents the sequencing data, with red boxes denoting mutations.

LEARN

The mutation in pfa1 led to a partial deletion of the substrate-binding domain of the enoyl-CoA reductase (ER), while the mutation in pfa2 disrupted the expression of hydroxyacyl-CoA dehydrase (DH). These critical mutations necessitated the abandonment of this plasmid for validation experiments.

5th Iteration

DESIGN

Due to delays in gene synthesis, sequencing, and plasmid reconstruction, we lacked sufficient time to introduce the target plasmid into Sinorhizobium fredii CCBAU45436 who carried Cas12k gene and verify it on JiDou 17. Thus, we decided to separate the regulation module and pfa biosynthetic gene cluster to conduct vitro validation. For the regulation module, we planned to construct PnuoA-Cas12k-Terminator-regulation module into pBBR1MCS-2 plasmid. We intended to use triparental mating to introduce the plasmid into Sinorhizobium fredii CCBAU45436, and cultivate the resulting strains under high nitrogen-high oxygen, high nitrogen-low oxygen, low nitrogen-high oxygen, and low nitrogen-low oxygen conditions. By utilizing nifH promoter to regulate gfp expression, we would measure the fluorescence intensity and the OD600 ratio to quantitatively assess gfp expression, thereby validating the effectiveness of the regulation module.

As for the PUFA synthesis module, we aimed to construct PnuoA-pfa biosynthetic gene cluster into the pBBR1MCS-2 plasmid and introduce the plasmid into both Sinorhizobium frediiCCBAU45436. These strains would be cultured in nutrient-rich TY liquid medium, which would be used to conduct RT-qPCR to quantify expression of related genes.

BUILD

We reconstructed the aforementioned plasmids using existing parts, and confirmed their accuracy by Sanger sequencing. Employing triparental mating, we introduced the two recombinant plasmids into Sinorhizobium fredii CCBAU45436 respectively, spreading the mixed bacterial culture onto TY solid medium and incubating at 28°C for 48 hours. The resulting mixed bacterial sludge was diluted to and plated onto TY-TP-Kan solid medium, incubating at 28°C for 4 days. We pick single colonies and streaked them onto TY-TP-Kan solid medium. Using colony PCR, we selected three clones containing the recombinant plasmid, which involved PnuoA-Cas12k-Terminator and regulation module.

To prepare media with nitrogen concentration gradients, we substituted the TY medium with M9 medium supplemented with \(NH_{4}Cl\) as the nitrogen source, allowing for quantitative adjustment of nitrogen levels. Based on the result of preliminary nitrogen respond experiment on glnK promoter, we formulated a low-nitrogen medium with 0 g/L \(NH_{4}Cl\) and a high-nitrogen medium with 0.02 g/L \(NH_{4}Cl\). Additionally, to create hypoxic conditions in vitro, we boiled the M9 medium, added melted petroleum jelly, and then cooled it on ice before injecting the bacterial culture into the medium. This mixture was incubating under shaking conditions at 28°C for 3 days. We designed four experimental groups based on oxygen and nitrogen levels, with three replicates for each group.

As for the PUFA synthesis module, we identified one clone containing the recombinant plasmid. After 2 days shaking incubation, we use RNA extraction and reverse transcription kit to obtained the cDNA from the +pfa strain and the wild-type strain. Meanwhile, we send bacteria solution for gas chromatographic identification.

TEST

Regulation module

We measured the relative fluorescence intensity of each group of bacterial fluids under excitation light at 488 nm using a fluorescence spectrophotometer. We also measured the OD600 value of each group of bacterial fluids using a spectrophotometer, as well as used the relative fluorescence intensity divided by the amount of expression of gfp representing individual bacterium, and the results were as shown in (Figure 10). Regardless of nitrogen concentration, the gfpnifH promoter can enhance gene expression in response to low oxygen concentration. Regardless of oxygen concentration, gfp expression at high nitrogen was significantly higher than that at low nitrogen (c,d). This suggests that our designed deterrent system can indeed increase sgRNA expression in response to elevated nitrogen, thereby suppressing downstream gene’s expression. Overall, the gfp expression at high nitrogen and low oxygen was significantly higher than the three other conditions, which is in line with expectations (Figure 11).

Figure10 Ratio of relative fluorescence intensity to OD600 value of bacterial. oxygen conditions at low nitrogen (0 g/L \(NH_{4}Cl\)). (b) Different oxygen conditions at high nitrogen (0.02 g/L \(NH_{4}Cl\)). (c) Different conditions of nitrogen concentration during hypoxia. (d) Different conditions of nitrogen concentration during hyperoxia. Student’s t-test, ns: no significant difference; *, p-value < 0.05; **, p-value<0.01; ***, p-value < 0.001.

Figure11 Overall ratio of relative fluorescence intensity to OD600 value of bacterial. Student’s t-test, ns: no significant difference; *, p-value < 0.05; **, p-value<0.01; ***, p-value < 0.001.

Synthetic module

As for PUFA synthesis module, using endogenous nuoA gene as contrast and 16SrRNA as internal reference, we detected the transcription level of pfa genes by RT-qPCR. The result shows that the transcription level of both pfa genes were significantly higher than nuoA, and the closer the gene near to the promoter, the higher the transcription level it detected (Figure 12).

As gas chromatographic result, most fatty acid level shows no significant difference between WT and +pfa, but there is a new fatty acid C14:0 been detected, which is never appeared in the wild type(Figure13).

Figure12 The relative transcription level between nuoA and pfa genes measured by RT-qPCR. Student’s t-test, ns: no significant difference; *, p-value < 0.05; **, p-value<0.01; ***, p-value < 0.001.

Figure13 The relative content of fatty acids measured by gas chromatography.

LEARN

Regulation module

Although the results were to some extent as expected as the model, we still detected a certain amount of gfp expression in the remaining conditions except for high nitrogen and low oxygen. The difference of the gfp between high nitrogen and low oxygen conditions with others was not as large as desired.

This might due to the fact that our Cas12k is not efficient enough for transcriptional repression, and there is a certain leakage expression from glnK promoter. Although our initial idea was making Cas12k to act as a terminator, according to the experimental results, it is still necessary to introduce a terminator after sgRNA, which will restrict the leakage expression of glnK promoter.

In addition, the design of sgRNA can likewise be improved. We can insert a variety of different sgRNA after glnK promoter, targeting different sequences of downstream genes. Whether it can also improve the efficiency of the Cas12k deterrent system is also a worthwhile area to be verified and improved.

Further more, lab condition may not intimate the natural environment of nodule sufficiently. For example, legume produce soybean hemoglobin for rhizobium, which can chelated oxygen to maintain a microaerobic environment for nitrogen fixation. Whereas our boiling method may not be able to achieve as adequate a low-oxygen environment as in nodule. And oxygen may still be introduced when inoculating the bacteria under shaking culture conditions, which might be one of the reasons why gfp expression in the high-nitrogen, low-oxygen group was not as high as we expected.

Synthetic module

The RT-qPCR result indicated that our pfa gene biosynthetic cluster were successfully Transcribed. But the different transcription level between pfa1, pfa2 and pfa3 demonstrate a significant transcriptional decrease of large gene cluster, which might limit the efficiency of PUFA synthesis. We hypothesized that the incompatibility between weak nuoA promoter and large PUFA biosynthetic gene cluster might be the cause, and similar experiment proved our hypothesis(Figure14). Thus, nifH promoter from our original conception might have better performance in the same experiment. Because it also initiates the transcription of a nitrogenase cluster.

Figure14 The relative transcription level between nuoA promoter and nifH promoter measured by RT-qPCR. Student’s t-test, ns: no significant difference; *, p-value < 0.05; **, p-value%lt0.01; ***, p-value %lt 0.001.

As the gas chromatographic result show, it’s a pity that we did not identify the presence of DHA and EPA. Here is some reasons we supposed. First, the transcriptional reduction is the main cause, which lead to a bottleneck of pfa3 that encoding a lot of important enzymes in LC-PUFA synthetic pathway.

Second, the lack of incubation time also might be the cause. Because of the in vitro experiment, we could not unlimitedly extend the incubating time due to the limitation of nutrition. Thus, we only left a little time for rhizobia to translate PUFA synthetase and synthetize LC-PUFA.

Overall, the experiments of regulation module yielded results that were relatively fit our expectations, proving that the design of the regulation module was relatively successful. Although we did not obtained every ideal results of synthesis module, but few results shows the effectiveness of our design, which have a clear direction for iteration, improvement, and further validation. However, due to time constraints, we were unable to conduct more in-depth experiments.

PHB deletion

The composite part composed of pJQ200SK, Up-Arm phaC2 and Down-Arm phaC2 has been engineered to delete the key gene phaC2, which is responsible for the synthesis of PHB in Sinorhizobium fredii CCBAU45436. The plasmid pJQ200SK can replicate in E. coli and can be conjugally transferred into rhizobia. The function of Up-Arm phaC2 and Down-Arm phaC2 is to facilitate homologous recombination with upstream and downstream of the phaC2 gene. We first constructed pJQ200SK_phaC2_Deletion and utilized electron microscope to observe bacterial cells to assess its deletion efficiency. Ultimately, after implantation, we obtained the expected results.

1st iteration

AIM: The objective of the experiment is to delete phaC2 to block the synthesis of PHB.

DESIGN: Through literature research, it was discovered that pJQ200SK can replicate in E. coli and can be conjugally transferred into rhizobia. Besides, it cannot replicate in rhizobia and is a suicide plasmid. The upstream and downstream DNA fragments of the target gene can be sequentially cloned into pJQ200SK to construct a homologous double-crossover suicide plasmid. This plasmid can be introduced into rhizobia by triparental mating, then DNA recombination between the plasmid and the genome is achieved under the action of the homologous recombination repair system. In this way, the target gene can be deleted. As a result, we attempted to use this method to delete phaC2 in Sinorhizobium fredii CCBAU45436. Figure 15 shows the circuit we developed.

Figure15 Deletion_phaC2_plasmid

BUILD: Using PCR technology, we successfully obtained the uptream fragment(700bp) and downstream fragment(700bp) of phaC2 from the genome of Sinorhizobium fredii CCBAU45436. To insert these two fragments into the multiple cloning sites of pJQ200SK, we utilized Gibson. The expression vector was transformed into E. coli(DH5α). The constructed genetic circuit was successfully developed, and its accuracy was confirmed through the implementation of colony PCR and sequencing techniques.

LEARN: After literature research and analysis on the results, we suspect that the reason for the unsuccessful homologous recombination is the lack of sequence specificity in the upstream and downstream fragments.

2nd iteration

DESIGN: To increase sequence specificity, we re-selected the upstream(700bp) and downstream(700bp) fragments, which consist of genomic fragments upstream and downstream of phaC2 and partial sequences of phaC2.

Figure16 Single recombinants on solid medium.

Then we used a sucrose-killing gene (sacB), which encodes sucrose phosphorylase that can catalyze the hydrolysis of sucrose into glucose and fructose, and also polymerizes fructose into high molecular weight levan. The accumulation of high molecular weight levan has a potential toxic effect on cells, leading to cell death, and is applicable to various Gram-negative bacteria to screen the double recombinants. It's pity that we did not obtain any double recombinant.

LEARN: Although longer homologous regions can provide more sequence similarity, which helps the recognition of recombinase and promotes the occurrence of recombination events, overly long homologous regions may have some negative effects. For example, excessively long homologous regions might affect the stability of the vector. We suspect that the reason for the lack of double recombination is the unsuitable length of the upstream and downstream fragments.

3rd iteration

DESIGN: We adjusted the upstream(500bp) and downstream(500bp) fragments, which consist of genomic fragments upstream and downstream of phaC2 and partial sequences of phaC2.

BUILD: Using PCR technology, we successfully obtained the uptream fragment(500bp) and downstream fragment(500bp) of phaC2 from the genome of Sinorhizobium fredii CCBAU45436. To insert these two fragments into the multiple cloning sites of pJQ200SK, we utilized Gibson. The expression vector was transformed into E. coli(DH5α). The constructed genetic circuit was successfully developed, and its accuracy was confirmed through the implementation of colony PCR and sequencing techniques.

TEST: Using the triparental mating, we introduced it into Sinorhizobium fredii CCBAU45436. Utilizing Gm resistance carried on pJQ200SK, we screened for single recombinants. We successfully obtained single recombinants. Then we used a sucrose-killing gene to screen the double recombinants. This time, we successfully obtained double recombinants(Figure 17).

Figure17 Double recombinants on solid medium.

Then its accuracy was confirmed through the implementation of colony PCR(Figure18) and sequencing techniques(Figure19).

Figure18 Results of colony PCR.

Figure19 Sequencing result

LEARN 3.0: The pJQ200SK_phaC2_Deletion we designed definitely works. We successfully deleted the phaC2 gene and explored factors that affect the effectiveness of the homologous recombination. Next, we conducted phenotypic experiments to compare the differences between the mutant and wild-type strains, testing the effect of inhibiting the PHB synthesis, which will be presented in the “Results” section.

Suicide circuit

1st iteration

DESIGN: Through literature research, it was discovered that pJQ200SK can be introduced into rhizobia using triparental mating, and DNA recombination between the plasmid and the genome is achieved under the action of the homologous recombination repair system. In this way, the suicide circuit can be integrated into the genome of the rhizobia.

BUILD: Using the triparental mating, we introduced it into Sinorhizobium fredii CCBAU45436. Utilizing Gm resistance carried on pJQ200SK, we screened for single recombinants. We successfully obtained single recombinants.

TEST: We used a sucrose-killing gene to screen the double recombinants. We selected some positive colonies, cultured them overnight on a shaker at 28°C. Then spread the culture on solid media containing sucrose and NA, cultured them overnight at 28℃. It’s pity that we have faced microbial contamination (Figure20).

Figure20 Contaminated solid media

LEARN: After literature research and reflections on the experimental programme, we suspect that the reason for contamination is that rhizobia degraded a certain antibiotic in the culture media, allowing miscellaneous bacteria that do not have resistance to antibiotics we used to survive. Moreover, due to the introduction of a suicide pathway into rhizobia, miscellaneous bacteria have a competitive advantage over them, leading to contamination by miscellaneous bacteria.

2nd iteration

AIM: We suspect that the positive colonies on the plates used for single recombinants screening are contaminated with miscellaneous bacteria, so we want to further purify these positive colonies.

DESIGN: The streak plate method is a commonly used technique for the isolation and purification of microorganisms. It is easy to operate, with a low cost, to separate single colonies. The growth of colonies is easy to observe, and the isolated single colonies are convenient to store and less prone to contamination. For the above reasons, we have decided to use the streak plate method for further purification of positive colonies.

TEST: We picked positive colonies on the solid media used for screening single recombinants and streaked continuously on the solid media used for screening double recombinants. After incubating at 28°C for 5 days, we observed transparent colonies (Figure21). Then we identified these colonies as double recombinants using colony PCR(Figure22).

Figure21 Transparent colonies

Figure22 Electropherogram Marker, double recombinants(lane4&5), WT (lane10)

LEARN 2.0: Our purification method definitely works. We have successfully purified single recombinants and screened for double recombinants. Next, we conducted stage verification to test the function of our suicide circuit, which will be presented in the “Results” section.

Dry lab engineering

Software

1st iteration: Software Construction

DESIGN: At first, we wanted to design simple software of retrieving pathways when input substrate. So, we intended to create a database, where we could store and fetch data.

BUILD: We chose MATLAB App designer to create software interface and chose MySQL to store and fetch data. We stored selected data in MySQL, and then wrote code on MATLAB App designer, linking MySQL to read the data.

Figure23 Picture of using MySQL to store data

TEST: After constant debugging, we finally made it. When we entered a substrate, it would pop up all the relevant pathways we had stored in our MySQL database.

Figure24 Picture of the initial software interface

LEARN: When using software, the input substrate could be directly retrieved to the relevant pathways, which made the call of data very convenient.

2nd iteration: Changing Databases

AIM: For the sake of users, we realized that many users do not have a MySQL database configured on their computers, and it might be very inconvenient for them to connect to the database. We hoped that our software was friendly for non-professional users.

DESIGN: We finally selected EXCEL as our database to store data. Because EXCEL is a widely used tool, and it is very convenient to fetch data from EXCEL.

BUILD: We changed the database to EXCEL by adjusting the code.

TEST: We made the polished software available to other members of the team and found that it could be used by much wider users than MySQL.

LEARN: If we want to build software that can be used widely, it's better to use configurations that most people are using.

3rd iteration: Configuration Files of Pathways

AIM: In addition to displaying the relevant pathways retrieved, we wanted to display the concentrations of each product in the pathway in real time, to investigate the changes of the products we focused on.

DESIGN: We chose SSH keys as a linker between Computer terminal and files of pathways hosted on GitLab.

BUILD: We configured the SSH public key, posted the relevant files of pathways on GitLab, and wrote the code to use SSH to clone the files into the main folder of the computer for calling and running.

TEST: After continuous debugging, we successfully realized the process of inputting substrates - retrieving relevant pathways - selecting pathways - running pathway files.

LEARN: With a real-time visualization of product concentrations, we hoped to make the pathway investigations more easily for other groups.

4th iteration: Modification of Concentration

AIM: As we delved deeper into usability, we realized that conducting quantitative analyses was constrained by the inability to modify substrate concentrations. Therefore, we dedicated significant efforts to enable users to adjust substrate concentrations, allowing for the generation of concentration curves for various products.

DESIGN: We planned to read data from EXCEL when inputting substrates to display the relevant pathways. Subsequently, the software matched the pathway name to files cloned via SSH. Within the file, the software searched for the name of the substrate we input, allowing us to modify the desired concentration values. Once we had made these changes, the software proceeded to run the modified file. After completion, the substrate values will be restored to their initial states.

BUILD: We accomplished the aim by writing code.

TEST: The result was satisfactory. For more information of our final results, please visit our wiki page.

LEARN: Please visit the discussion of our wiki page.

5th iteration: Software beautification and error tips

AIM: We wanted to make the users gain great experience.

DESIGN: We decided to beautify the interface and add some feedback services.

BUILD: We designed the software logo and interface pictures by ourselves, then beautified the curve graph for the files we collected and added legends. Eventually, we added the error program. We ensured that the design is not only visually appealing but also user-friendly, including the addition of legends to each graph that were omitted in the original version.

TEST: We tested the error-reporting program to make sure it worked.

Figure25 Picture of the beautified software

Figure26 Picture of error tip

LEARN: It seems that our software has great potential for application. Currently, we are consistently expanding our database to include pathways related to immune regulation, gene expression, and cell differentiation, building upon our foundational metabolic pathways. This expansion aims to create a more comprehensive and powerful resource for our users.

Nodule Formation Model

1st iteration: 2D cellular automata

DESIGN: Since our project focuses on nodules, we want to predict the formation of nodules and provide some guidance to the project. The growth of Sinorhizobium and the formation of nodules follow a definite pattern which has a strong periodicity. We initially used 2D cellular automata to simulate the changes of the nodule section. Through literature research, we got the basic understanding of the mechanism.

BUILD: According to the literature, we made some assumptions based on the rhizomatous mechanism to simplify the model. After that, we set up the four states of the cellular automata, including empty, root, bacteria and sheath. Finally, we set up the Moore neighborhoods and the rules for state transitions.

TEST: We finished coding and visualizing the results using Python. The code establishes a 2D coordinate system and implements the logic of the cellular automata. The results provide a good picture of the dynamics of the 2D cross-section during nodule formation.

LEARN: By observing the results of the run, we found that 2D cellular automata cannot well simulate the rhizomes with 3D structures. For example, Sinorhizobium interaction in the longitudinal direction is difficult to represent, and we can't know its formation mechanism in the z-axis. Also, since we want to calculate DHA production, it’s necessary to quantify the amount of Sinorhizobium in nodules, which needs further improvement.

2nd iteration: 3D cellular automata 1.0

AIM: We need to simulate the mechanism of nodules formation in z-axis and quantify the changes in the number of Sinorhizobium in nodules. It is the basis for calculating the DHA yield.

DESIGN: Considering that we want to complete the 3D modeling, we change the original basis to 3D cellular automata and change the whole system to 3D coordinate system.

BUILD: We kept the four states of the cellular automata and changed the neighborhoods to 3D Moore neighborhoods. In addition, we transformed the rules to interact in 3D.

TEST: We used python to code and visualize the program. The code establishes a 3D coordinate system and implements the logic of the cellular automata. The result is a good representation of the dynamics of nodules formation in three dimensions.

LEARN: 3D cellular automata can simulate the formation of nodules very well. We showed the simulation results to our instructor Prof. Jiao, who said that the existing rules for the formation of rhizomes are simple and cannot simulate the actual situation of growth, so he recommended us literature to read and modify the rules. In addition, we found that the results are not aesthetically pleasing and need to be improved.

Figure27 Result of 3D cellular automata 1.0

3rd iteration: 3D cellular automata 2.0

AIM: Set up more detailed 3D cellular automata rules based on the latest literature on nodules' formation. Beautify the result. Make curves of the number of each state versus iterations for quantitative calculations.

DESIGN: We augmented the states in the 3D cellular automata strictly according to the mechanism of nodule' formation demonstrated in the literature and formulated new state transition rules.

BUILD: We set up six states of the cellular automata, including empty, root, bacteria, attached bacteria, infection thread, nodule, \(N_{2}\) fixing nodule. In addition, we changed the neighborhoods to 3D Moore neighborhoods. After that, we developed new and reliable rules and transformed them into 3D interactions.

TEST: We wrote and visualized the program in Python. The code establishes a 3D coordinate system and implements the logic of the cellular automata. The results have been embellished by choosing different colors to represent different states.

LEARN: 3D cellular automat 2.0 can simulate the actual way of nodule formation very well and gives a great reference for the whole project. We obtained the variation of the number of states with the number of iterations, which facilitates subsequent quantitative calculations, such as the prediction of DHA yield.

Figure28 Result of 3D cellular automata 2.0

Nitrogen Regulatory Protein Docking Model

1st iteration: AlphaFold3 model

DESIGN: The production of DHA is regulated by nitrogen. Although in Rhizobium etli, NtrC protein is proven to be feasible according to the literature, we need to know whether the parts we select are suitable for S. fredii CCBAU45436. Furthermore, we can verify the broad-spectrum nature of nitrogen regulation in rhizobia. In short, the essence of the problem is to simulate protein-DNA docking, where the DNA sequence is known, but the structure of NtrC protein needs to be predicted.

BUILD: We chose Alphafold3 to predict the structure of the NtrC protein in both species, and then continued to predict the protein-DNA docking using Alphafold3. Afterwards, we used PyMol to visualize and analyze the interactions of the bases and amino acid residues involved.

TEST: Alphafold predicts protein structure with high confidence. However, its prediction of protein-DNA docking is unsatisfactory: the docking of NtrC with the Rhizobium etli glnK/amtB operon is the opposite of what has been demonstrated in literature. We completed the docking results visualization and parameter analysis using PyMol.

LEARN: We assumed that Alphafold3 has a limition to predict protein-DNA docking, as it is unable to complete the validation of the conclusions already made in the literature. Further, we decided to replace Alphafold3 with a model specialized in protein-DNA docking.

Figure29 The Alphafold3 docking result of NtrC protein and glnK/amtB operon in Rhizobium etli

2nd iteration: HDOCK model

AIM: Based on the prediction that the model can validate the findings of the literature on Rhizobium etli, we will predict whether the same regulatory mechanism can be utilized in S. fredii CCBAU45436.

DESIGN: After reviewing a large amount of literature, we decided to use HDOCK to predict protein-DNA docking, which combines template-based and free docking, allows for efficient prediction of protein-DNA binding.

BUILD: We imported the NtrC protein structure.pdb file predicted by Alphafold3 and the glnK/amtB operon sequence into the HDOCK model for prediction. We completed the docking results visualization and parameter analysis using PyMol.

TEST: The confidence score of docking given by HDOCK validated the findings in the literature and was found to be higher in the predicted S. fredii CCBAU45436. In addition, we determined the bases and residues at the binding site, the length of the hydrogen bond, the length of the protein-DNA binding, and the length of the interactions by PyMol.

LEARN: HDOCK is an excellent predictor of protein-DNA docking and confirms the literature and our conjecture. In our chassis S. fredii CCBAU45436, the NtrC protein is homologous to that in Rhizobium etli and can perform the same function. This greatly enhances the confidence of our project to regulate DHA synthesis with nitrogen.

Figure30 The HDOCK docking result of NtrC protein and glnK/amtB operon in S. fredii CCBAU45436

DHA Yield Prediction Model

1st iteration: Flux Balance Analysis (FBA) Model

DESIGN: Initially, we focused on constructing a model that captured a static state of the metabolic system, not accounting for time-dependent changes. To ensure data accuracy and analyze the system’s steady-state, we chose Flux Balance Analysis (FBA) as our modeling approach. FBA allows us to simulate metabolic fluxes based on mass balance constraints without considering dynamics over time.

BUILD: The FBA model was developed and implemented to simulate the metabolic pathways under static conditions. This provided a foundation for understanding how metabolic fluxes were distributed across various pathways in the system.

TEST: We ran simulations with the FBA model to predict the metabolic fluxes under given conditions. These results were useful for identifying potential bottlenecks and optimal flux distributions within the metabolic network.

LEARN: While the FBA model provided insights into the static distribution of metabolic fluxes, it became clear that real biological systems exhibit dynamic behavior over time, which could not be captured by this model. As a result, we concluded that to better simulate the system, we needed to account for time-dependent changes, leading us to the next iteration.

2nd iteration: Ordinary Differential Equations (ODE) Model

AIM: To incorporate time-dependent dynamics and better simulate biological processes, we aimed to build a model that could reflect changes in the system over time.

DESIGN: We designed an Ordinary Differential Equations (ODE) model that would enable us to track the time-course behavior of key metabolites and enzymes. The ODE model provided a framework for understanding how the concentrations of these components evolve over time based on reaction rates and other dynamic factors.

BUILD: The ODE model was constructed and implemented, with time-dependent equations representing the interactions between metabolites and enzymes. This allowed us to simulate the temporal changes in the system under various initial conditions.

TEST: We ran simulations using the ODE model and performed sensitivity analysis to evaluate how changes in various parameters affected the system’s behavior over time. These tests revealed key insights into which parameters had the most significant impact on the model's outputs.

Fig31 Initial results of the ODE model

LEARN: Sensitivity analysis showed that the activity of PUFA synthase had a major effect on the model’s results. It was also observed that PUFA synthase activity is regulated by nitrogen availability, indicating that nitrogen dynamics play a critical role in the system. This prompted the need to incorporate nitrogen absorption into the model.

3rd iteration: Nitrogen Absorption Model

AIM: The next goal was to incorporate the influence of nitrogen into the model to better capture the regulation of PUFA synthase and its effects on the metabolic pathway.

DESIGN: A Nitrogen Absorption Model was designed to account for the effects of nitrogen availability on PUFA synthase activity. This model would enable us to simulate how fluctuations in nitrogen levels impact the system’s metabolic processes.

BUILD: The Nitrogen Absorption Model was integrated with the existing ODE framework. We included equations that represented nitrogen uptake and its regulatory effect on PUFA synthase, ensuring that the interactions between nitrogen availability and enzyme activity were accurately reflected.

TEST: Simulations were conducted to assess how varying levels of nitrogen affected the system over time. These tests provided a deeper understanding of the relationship between nitrogen absorption and DHA yield.

LEARN: The Nitrogen Absorption Model successfully demonstrated the critical role of nitrogen in DHA yield prediction. The integrated model now offers a more accurate depiction of the biological system, showing how nitrogen availability influences metabolic fluxes and fatty acid synthesis over time.

Suicide Circuit Model

1st iteration: Toxin Expression Model

DESIGN: In the initial stages, we referenced Part: BBa_K3512042. The toxin expression equations for our suicide circuit were listed on the basis of their differential equations. We didn’t take the influence of newly generated proteins towards the production of proteins in to consideration.

BUILD: We listed basic differential equations according to the interactions in the circuit. And ran the model at MATLAB.

TEST: At the beginning, there are always errors when we ran the model, and sometimes the solution couldn’t be figured out. It took a long time to get the solution, and the resulting curve was unusual.

LEARN: By looking up literature and analyzing the material interactions repeatedly, we found our logical flaws in the equations. The equations were not accurate enough because the relationships between substance interaction were not understood correctly.

2nd iteration: Suicide Circuit Model

AIM: We wanted to rebuild the interaction lines and enhance the differential equations, thus gaining a more accurate result.

DESIGN: We reformulated the differential equations with deeper understanding of the circuit, and made innovative modifications to the equations that were more realistic: adding a condition that the tendency of newly generated proteins slow down the production of proteins.

BUILD: We incorporated a mechanism where the accumulation of newly generated proteins negatively regulates their own production, featured two response-regulated promoters, and included interactions between Cas12K and sg-mCherry, as well as the mutual binding dynamics of VapC and VapB.

TEST: We ran the model at MATLAB. The result was as expected. It was observed that only in a high-nitrogen and high-oxygen environment does the net expression of toxins increase significantly, leading to the suicide of Sinorhizobium.

Figure32 Expression content plot of toxins in different cases

LEARN: It can be concluded that the suicide pathway was safe and feasible in a single invariant environment.

3rd iteration: Whole Process of Plant Growth Model

AIM: The growth environment of Sinorhizobium is variable, so on the basis of a single invariant environment, we wanted to know whether the suicide circuit was safe during the whole growth process. Beside that, we wanted our result more intuitive and visually appealing.

DESIGN: We set up different differential equations according to the environmental changes during Sinorhizobium growth. We adjusted the colors of the curves and refined the legends to improve the visual presentation.

TEST: The result was satisfactory. The amount of toxin in the early stage wasn’t high enough. Once the nodules escaped, the toxin content quickly increased, leading to the death of Sinorhizobium, thus preventing gene pollution and other issues. At the same time, the presentation of the results was pretty.

Figure33 Expression content plot of toxins in different cases(plus)

Figure34 Expression content plot of toxins in whole process

LEARN: We concluded that our suicide circuit is safe and feasible. Through iterative modifications and refinements of these equations, we developed more complex and rigorous parameterized differential equations. This comprehensive modeling effectively simulates the full process of the suicide circuit's operational mechanisms.