In our project, we successfully optimized promoters, signal peptides, strains, and vectors through multiple rounds of engineering cycles, ultimately constructing a highly efficient system for ICCG expression. Our contributions in the Part and Model sections are particularly notable, detailed as follows:
We conducted three rounds of "Design-Build-Test-Learn" cycles for promoter optimization. First, we generated a Mtac promoter library based on random mutagenesis and used the screening data to build a CNN-LSTM model to predict promoter strength. Then, we utilized genetic algorithms (GA) and binomial statistical models to design new DMtac promoters. These mutants were experimentally validated for their expression efficiency, and we further optimized the model. Finally, we obtained highly efficient promoter sequences, providing strong drivers for ICCG expression. We uploaded the optimized DMtac series promoters for community use.
Signal peptide optimization also went through three rounds. In the first round, we screened 200 signal peptides from the Escherichia coli signal peptide library and validated them through wet lab experiments, selecting the best-performing nfaA signal peptide. In the following two rounds, we used a Hidden Markov Model to generate a series of MnfaA signal peptide mutants and experimentally validated their enhancement of ICCG secretion. We uploaded the optimized signal peptides, further enriching the signal peptide library in synthetic biology.
We built a CNN-LSTM-based promoter prediction model using experimental data to predict the strength of promoter mutants. The model demonstrated high prediction accuracy, providing valuable references for promoter design and optimization. This model can be widely applied to promoter design in other gene expression systems.
We developed a statistical model based on the binomial distribution, which offers better interpretability than genetic algorithms. Through operations like confidence intervals and hypothesis testing, we obtained high-frequency combinations of site-base pairs for strong promoters, guiding the design and generation of strong promoters.
We trained a Random Forest model using a large dataset, specifically to predict the secretion efficiency of signal peptides in Gram-negative bacteria. By analyzing and optimizing experimental data, we constructed a model that accurately predicts signal peptide secretion efficiency, particularly for signal peptide design in Escherichia coli systems. This model provides new insights into signal peptide design and supports future synthetic biology projects.
We built a comprehensive scoring matrix and, based on the Markov transfer frequency matrix, constructed a Hidden Markov Model-based generator to modify the H-region of signal peptides, generating artificially designed and optimized mutant signal peptides.
Our team's contributions extend beyond part optimization in the experiments, as we have also made significant progress in model construction and prediction. Through these contributions, we have provided the community with multiple efficient genetic parts and practical predictive tools, advancing synthetic biology research.